自聯接，交叉聯接和分組

我已經從一些渠道獲得溫度採樣的表隨着時間的推移，我想找到的最小，最大和平均溫度在所有數據源在設定的時間間隔。乍一看，這是很容易，像這樣做：自聯接，交叉聯接和分組

SELECT MIN(temp), MAX(temp), AVG(temp) FROM samples GROUP BY time;

然而，事情變得更加複雜（給我難倒點在哪裏！）如果源下降進出而非期間忽略丟失的來源有問題的間隔我想使用來源的最後知道的溫度爲缺失的樣本。使用日期時間和建設的時間間隔（比如每分鐘）跨分佈不均隨着時間的推移進一步樣品複雜的事情。

我認爲應該可以通過在樣本表上進行自聯接來創建結果，其中第一個表的時間大於或等於第二個表的時間，然後計算聚合值對於按源分組的行。然而，我很難理解如何真正做到這一點。

這裏是我的測試表：

+------+------+------+ 
| time | source | temp | 
+------+------+------+ 
| 1 | a | 20 | 
| 1 | b | 18 | 
| 1 | c | 23 | 
| 2 | b | 21 | 
| 2 | c | 20 | 
| 2 | a | 18 | 
| 3 | a | 16 | 
| 3 | c | 13 | 
| 4 | c | 15 | 
| 4 | a | 4 | 
| 4 | b | 31 | 
| 5 | b | 10 | 
| 5 | c | 16 | 
| 5 | a | 22 | 
| 6 | a | 18 | 
| 6 | b | 17 | 
| 7 | a | 20 | 
| 7 | b | 19 | 
+------+------+------+ 
INSERT INTO samples (time, source, temp) VALUES (1, 'a', 20), (1, 'b', 18), (1, 'c', 23), (2, 'b', 21), (2, 'c', 20), (2, 'a', 18), (3, 'a', 16), (3, 'c', 13), (4, 'c', 15), (4, 'a', 4), (4, 'b', 31), (5, 'b', 10), (5, 'c', 16), (5, 'a', 22), (6, 'a', 18), (6, 'b', 17), (7, 'a', 20), (7, 'b', 19);

要盡我的最大，最小和平均計算，我想在中間表看起來像這樣：

+------+------+------+ 
| time | source | temp | 
+------+------+------+ 
| 1 | a | 20 | 
| 1 | b | 18 | 
| 1 | c | 23 | 
| 2 | b | 21 | 
| 2 | c | 20 | 
| 2 | a | 18 | 
| 3 | a | 16 | 
| 3 | b | 21 | 
| 3 | c | 13 | 
| 4 | c | 15 | 
| 4 | a | 4 | 
| 4 | b | 31 | 
| 5 | b | 10 | 
| 5 | c | 16 | 
| 5 | a | 22 | 
| 6 | a | 18 | 
| 6 | b | 17 | 
| 6 | c | 16 | 
| 7 | a | 20 | 
| 7 | b | 19 | 
| 7 | c | 16 | 
+------+------+------+

下面的查詢讓我靠近我想要什麼，但它需要源的第一個結果的溫度值，而不是在給定的時間間隔最近的一個：

SELECT s.dt as sdt, s.mac, ss.temp, MAX(ss.dt) as maxdt FROM (SELECT DISTINCT dt FROM samples) AS s CROSS JOIN samples AS ss WHERE s.dt >= ss.dt GROUP BY sdt, mac HAVING maxdt <= s.dt ORDER BY sdt ASC, maxdt ASC; 

+------+------+------+-------+ 
| sdt | mac | temp | maxdt | 
+------+------+------+-------+ 
| 1 | a | 20 |  1 | 
| 1 | c | 23 |  1 | 
| 1 | b | 18 |  1 | 
| 2 | a | 20 |  2 | 
| 2 | c | 23 |  2 | 
| 2 | b | 18 |  2 | 
| 3 | b | 18 |  2 | 
| 3 | a | 20 |  3 | 
| 3 | c | 23 |  3 | 
| 4 | a | 20 |  4 | 
| 4 | c | 23 |  4 | 
| 4 | b | 18 |  4 | 
| 5 | a | 20 |  5 | 
| 5 | c | 23 |  5 | 
| 5 | b | 18 |  5 | 
| 6 | c | 23 |  5 | 
| 6 | a | 20 |  6 | 
| 6 | b | 18 |  6 | 
| 7 | c | 23 |  5 | 
| 7 | b | 18 |  7 | 
| 7 | a | 20 |  7 | 
+------+------+------+-------+

更新：（！偉大的名字，順便說一句） chadhoc給出了一個很好的解決方案，遺憾的是沒有在MySQL的工作，因爲它不支持他所使用的FULL JOIN。幸運的是，我相信一個簡單的UNION是一種有效的替代：

-- Unify the original samples with the missing values that we've calculated 
(
    SELECT time, source, temp 
    FROM samples 
) 
UNION 
(-- Pull all the time/source combinations that we are missing from the sample set, along with the temp 
    -- from the last sampled interval for the same time/source combination if we do not have one 
    SELECT a.time, a.source, (SELECT t2.temp FROM samples AS t2 WHERE t2.time < a.time AND t2.source = a.source ORDER BY t2.time DESC LIMIT 1) AS temp 
    FROM  
    (-- All values we want to get should be a cross of time/temp 
    SELECT t1.time, s1.source 
    FROM 
    (SELECT DISTINCT time FROM samples) AS t1 
    CROSS JOIN 
    (SELECT DISTINCT source FROM samples) AS s1 
) AS a 
    LEFT JOIN samples s 
    ON a.time = s.time 
    AND a.source = s.source 
    WHERE s.source IS NULL 
) 
ORDER BY time, source;

更新2：的MySQL提供了以下EXPLAIN輸出chadhoc代碼：

+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+ 
| id | select_type  | table  | type | possible_keys | key | key_len | ref | rows | Extra      | 
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+ 
| 1 | PRIMARY   | temp  | ALL | NULL   | NULL | NULL | NULL | 18 |        | 
| 2 | UNION    | <derived4> | ALL | NULL   | NULL | NULL | NULL | 21 |        | 
| 2 | UNION    | s   | ALL | NULL   | NULL | NULL | NULL | 18 | Using where     | 
| 4 | DERIVED   | <derived6> | ALL | NULL   | NULL | NULL | NULL | 3 |        | 
| 4 | DERIVED   | <derived5> | ALL | NULL   | NULL | NULL | NULL | 7 |        | 
| 6 | DERIVED   | temp  | ALL | NULL   | NULL | NULL | NULL | 18 | Using temporary    | 
| 5 | DERIVED   | temp  | ALL | NULL   | NULL | NULL | NULL | 18 | Using temporary    | 
| 3 | DEPENDENT SUBQUERY | t2   | ALL | NULL   | NULL | NULL | NULL | 18 | Using where; Using filesort | 
| NULL | UNION RESULT  | <union1,2> | ALL | NULL   | NULL | NULL | NULL | NULL | Using filesort    | 
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+

我能得到查爾斯的代碼工作像這樣：

SELECT T.time, S.source, 
    COALESCE(
    D.temp, 
    (
     SELECT temp FROM samples 
     WHERE source = S.source AND time = (
     SELECT MAX(time) 
     FROM samples 
     WHERE 
      source = S.source 
      AND time < T.time 
    ) 
    ) 
) AS temp 
FROM (SELECT DISTINCT time FROM samples) AS T 
CROSS JOIN (SELECT DISTINCT source FROM samples) AS S 
    LEFT JOIN samples AS D 
ON D.source = S.source AND D.time = T.time

它的解釋是：

+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+ 
| id | select_type  | table  | type | possible_keys | key | key_len | ref | rows | Extra   | 
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+ 
| 1 | PRIMARY   | <derived5> | ALL | NULL   | NULL | NULL | NULL | 3 |     | 
| 1 | PRIMARY   | <derived4> | ALL | NULL   | NULL | NULL | NULL | 7 |     | 
| 1 | PRIMARY   | D   | ALL | NULL   | NULL | NULL | NULL | 18 |     | 
| 5 | DERIVED   | temp  | ALL | NULL   | NULL | NULL | NULL | 18 | Using temporary | 
| 4 | DERIVED   | temp  | ALL | NULL   | NULL | NULL | NULL | 18 | Using temporary | 
| 2 | DEPENDENT SUBQUERY | temp  | ALL | NULL   | NULL | NULL | NULL | 18 | Using where  | 
| 3 | DEPENDENT SUBQUERY | temp  | ALL | NULL   | NULL | NULL | NULL | 18 | Using where  | 
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+

來源

2009-11-11 pr1001

我想你會在mySql中使用排名/窗口函數獲得更好的性能，但不幸的是我不知道那些以及TSQL實現。下面是一個符合ANSI標準的解決方案，雖然工作：

-- Full join across the sample set and anything missing from the sample set, pulling the missing temp first if we do not have one 
select coalesce(c1.[time], c2.[time]) as dt, coalesce(c1.source, c2.source) as source, coalesce(c2.temp, c1.temp) as temp 
from samples c1 
full join (-- Pull all the time/source combinations that we are missing from the sample set, along with the temp 
      -- from the last sampled interval for the same time/source combination if we do not have one 
      select a.time, a.source, 
        (select top 1 t2.temp from samples t2 where t2.time < a.time and t2.source = a.source order by t2.time desc) as temp 
      from  
       ( -- All values we want to get should be a cross of time/samples 
        select t1.[time], s1.source 
        from 
        (select distinct [time] from samples) as t1 
        cross join 
        (select distinct source from samples) as s1 
       ) a 
      left join samples s 
      on a.[time] = s.time 
      and a.source = s.source 
      where s.source is null 
     ) c2 
on c1.time = c2.time 
and c1.source = c2.source 
order by dt, source

來源

2009-11-11 20:41:57 chadhoc

我知道這看起來很複雜，但它的格式來解釋自己...... 它應該工作...希望你只有三個來源...如果你有源比這個任意數量將無法正常工作......在這種情況下，看到第二個查詢... 編輯：刪除第一次嘗試

編輯：如果你不知道來源的時間提前，你必須做，你創建一箇中間結果集「填補」缺失值東西.. 這樣的事情：

第二次編輯：通過移動邏輯刪除需要合併，以檢索每個來源的最新臨時讀數從Select條款進入連接條件。

Select T.Time, Max(Temp) MaxTemp, 
    Min(Temp) MinTemp, Avg(Temp) AvgTemp 
From 
    (Select T.TIme, S.Source, D.Temp 
    From (Select Distinct Time From Samples) T 
    Cross Join 
     (Select Distinct Source From Samples) S 
    Left Join Samples D 
     On D.Source = S.Source 
      And D.Time = 
       (Select Max(Time) 
       From Samples 
       Where Source = S.Source 
        And Time <= T.Time)) Z 
Group By T.Time

來源

2009-11-11 21:23:56

謝謝，查爾斯，但您的解決方案假定所有來源都提前知道。當他們不知道時你有什麼建議嗎？ – pr1001 2009-11-11 22:28:29

如果您不知道源文件，則添加另一個sql查詢... – 2009-11-12 00:50:57

將IsNull更改爲COALESCE後，我能夠使查詢在我的MySQL數據庫上工作。謝謝。 – pr1001 2009-11-12 01:24:02

自聯接，交叉聯接和分組

回答

相關問題