2010-06-04 65 views
2

我需要從已知集合中選擇每個類別的頂行(與this question有點類似)。問題是,如何使這個查詢在大量的行上有效。爲集合中的每個類別有效選擇頂行

例如,我們創建一個表格,在幾個地方存儲溫度記錄。

CREATE TABLE #t (
    placeId int, 
    ts datetime, 
    temp int, 
    PRIMARY KEY (ts, placeId) 
) 

-- insert some sample data 

SET NOCOUNT ON 

DECLARE @n int, @ts datetime 
SELECT @n = 1000, @ts = '2000-01-01' 

WHILE (@n>0) BEGIN 
    INSERT INTO #t VALUES (@n % 10, @ts, @n % 37) 
    IF (@n % 10 = 0) SET @ts = DATEADD(hour, 1, @ts) 
    SET @n = @n - 1 
END 

現在我需要獲得最新的記錄每個地區1,2,3

這種方式是有效的,但不能很好地(而且看上去髒)。

SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 1 
    ORDER BY ts DESC 
) t1 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 2 
    ORDER BY ts DESC 
) t2 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 3 
    ORDER BY ts DESC 
) t3 

以下看起來更好但效率低得多(根據優化器,30%vs 70%)。

SELECT placeId, ts, temp FROM (
    SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum 
    FROM #t 
    WHERE placeId IN (1, 2, 3) 
) t 
WHERE rownum = 1 

的問題是,在#T執行聚集索引掃描和300個檢索行,分類,編號,然後過濾,只留下3行後者查詢執行計劃中。對於前一個查詢,三次獲取一行。

有沒有辦法有效地執行查詢沒有大量的聯合?

+0

包含示例代碼+1的問題 – 2010-06-04 15:17:59

回答

1

我裝100,000行(這仍然不是足以減慢速度),嘗試了老式的方式:

select t.* 
from #t t 
    inner join (select placeId, max(ts) ts 
       from #t 
       where placeId in (1,2,3) 
       group by placeId) xx 
    on xx.placeId = t.placeId 
    and xx.ts = t.ts 

並得到了很多相同的結果。

然後我扭轉了索引中的列的順序,以

CREATE TABLE #t ( 
    placeId int, 
    ts datetime, 
    temp int, 
    PRIMARY KEY (placeId, ts) 
) 

,並在所有的查詢,減少了頁面讀取和指數尋求而不是掃描。

如果優化是你的目標,你可以修改索引,我修改了主鍵,或者添加了一個覆蓋索引。

+0

謝謝,我不知何故錯過了「老式的方式」。它對我的實際數據結構也起到更好的作用。 – VladV 2010-06-07 06:47:10

2

不只是看執行計劃還看statistics iostatistics time

set statistics io on 
go 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 1 
    ORDER BY ts DESC 
) t1 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 2 
    ORDER BY ts DESC 
) t2 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 3 
    ORDER BY ts DESC 
) t3 

SELECT placeId, temp FROM (
    SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum 
    FROM #t 
    WHERE placeId IN (1, 2, 3) 
) t 
WHERE rownum = 1 

set statistics io off 
go 

表 '#t000000000B99'。掃描計數3,邏輯讀取6,物理讀取0,預讀讀取0,lob邏輯讀取0,lob物理讀取0,lob預讀讀取0. 表'#t000000000B99'。掃描計數1,邏輯讀取6次,物理讀取0,預讀0,lob邏輯讀取0,lob物理讀取0次,lob預讀0

set statistics time on 
go 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 1 
    ORDER BY ts DESC 
) t1 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 2 
    ORDER BY ts DESC 
) t2 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 3 
    ORDER BY ts DESC 
) t3 

SELECT placeId, temp FROM (
    SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum 
    FROM #t 
    WHERE placeId IN (1, 2, 3) 
) t 
WHERE rownum = 1 

set statistics time on 
go 

對我來說,有沒有真正的區別2種方法,加載了更多的數據,當您將它下降到40%和60%這兩個查詢添加訂單也再次比較

SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 1 
    ORDER BY ts DESC 
) t1 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 2 
    ORDER BY ts DESC 
) t2 
UNION ALL 
SELECT * FROM (
    SELECT TOP 1 placeId, temp 
    FROM #t 
    WHERE placeId = 3 
    ORDER BY ts DESC 
) t3 
ORDER BY placeId 

SELECT placeId, temp FROM (
    SELECT placeId, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum 
    FROM #t 
    WHERE placeId IN (1, 2, 3) 
) t 
WHERE rownum = 1 
ORDER BY placeId 
0

只是爲了記錄,另一個選項使用CROSS APPLY。
在我的配置上,它的性能比以前提到的要好。

SELECT * 
FROM (VALUES (1),(2),(3)) t (placeId) 
CROSS APPLY (
    SELECT TOP 1 ts, temp 
    FROM #t 
    WHERE placeId = t.placeId 
    ORDER BY ts DESC 
) tt 

我猜,VALUES可能被chaged到臨時表或表變量沒有太大的區別。