2017-09-13 50 views
1

在SQL Server中,我試圖拼湊其抓住一排單查詢,包括來自該行前兩個小時的窗口中的彙總數據,以及從一個聚合數據小時後窗口。我怎樣才能讓這個運行更快?加快SQL服務器跨應用獲得彙總數據

的行具有時間戳的毫秒精度,而不是均勻地隔開。我在此表中有50萬行,並且查詢似乎沒有完成。許多地方都有索引,但它們似乎沒有幫助。我也在考慮使用窗口函數,但我不確定它是否可能具有不均勻分佈的行的滑動窗口。另外,對於未來的一個小時窗口,我不確定如何用SQL窗口完成這個工作。

Box是一個字符串,有10個獨特的價值觀。 進程是一個字符串,有30個唯一值。 平均duration_ms是200毫秒。 錯誤數據少於0.1%。 5000萬行描述了數年的數據。

select 
c1.start_time, 
c1.end_time, 
c1.box, 
c1.process, 
datediff(ms,c1.start_time,c1.end_time) as duration_ms, 
datepart(dw,c1.start_time) as day_of_week, 
datepart(hour,c1.start_time) as hour_of_day, 
c3.*, 
c5.* 
from metrics_table c1 
cross apply 
(select 
    avg(cast(datediff(ms,c2.start_time,c2.end_time) as numeric)) as avg_ms, 
    count(1) as num_process_total, 
    count(distinct process) as num_process_unique, 
    count(distinct box) as num_box_unique 
    from metrics_table c2 
    where datediff(minute,c2.start_time,c1.start_time) <= 120 
    and c1.start_time> c2.start_time 
    and c2.error_code = 0 
) c3 
cross apply 
(select 
    avg(case when datediff(ms,c4.start_time,c4.end_time)>1000 then 1.0 else 0.0 end) as percent_over_thresh 
    from metrics_table c4 
    where datediff(hour,c1.start_time,c4.start_time) <= 1 
    and c4.start_time> c1.start_time 
    and c4.error_code= 0 
) c5 
where 
c1.error_code= 0 

編輯

版:SQL Azure的12.0

添加執行計劃: enter image description here

+5

如果性能問題不是因爲您的where謂詞,我會感到驚訝。你的where子句中有函數,這意味着你必須爲每一行計算datediff。在這種情況下,你正在做兩次。這意味着你正在執行大約1億次的計算。 –

+1

@Hogan我試圖去開窗,但是我沒有看到一種方法,如果數據點不是以均勻間隔收集的話,我會從某個時間點開始-2小時。含義從一排的差到下一個可能是幾毫秒,可能是幾秒鐘,可能是幾分鐘 – user4446237

+0

是的,這是不可能在SQL Server實現(沒有'範圍介於INTERVAL'),你就必須做一些預聚合以保證每分鐘一行等。但是'COUNT(DISTINCT ...)'不容易兼容。 –

回答

3

下應該是在正確的方向邁出的一步... 注:c2.start_time & c4.start_time不再在DATEDIFF函數wrappen使他們優化搜索...

SELECT 
    c1.start_time, 
    c1.end_time, 
    c1.box, 
    c1.process, 
    DATEDIFF(ms, c1.start_time, c1.end_time) AS duration_ms, 
    DATEPART(dw, c1.start_time) AS day_of_week, 
    DATEPART(HOUR, c1.start_time) AS hour_of_day, 
    --c3.*, 
    avg_ms = CASE WHEN 
    c5.* 
FROM 
    dbo.metrics_table c1 
    CROSS APPLY (
       SELECT 
        AVG(CAST(DATEDIFF(ms, c2.start_time, c2.end_time) AS NUMERIC)) AS avg_ms, 
        COUNT(1) AS num_process_total, 
        COUNT(DISTINCT process) AS num_process_unique, 
        COUNT(DISTINCT box) AS num_box_unique 
       FROM 
        dbo.metrics_table c2 
       WHERE 
        --DATEDIFF(minute,c2.start_time,c1.start_time) <= 120 
        c2.start_time <= DATEADD(MINUTE, -120, c1.start_time) 
        --and c1.start_time> c2.start_time 
        AND c2.error_code = 0 
       ) c3 
    CROSS APPLY (
       SELECT 
        AVG(CASE WHEN DATEDIFF(ms, c4.start_time, c4.end_time) > 1000 THEN 1.0 ELSE 0.0 END 
        ) AS percent_over_thresh 
       FROM 
        dbo.metrics_table c4 
       WHERE 
        --DATEDIFF(HOUR, c1.start_time, c4.start_time) <= 1 
        c4.start_time >= DATEADD(HOUR, 1, c1.start_time) 
        --and c4.start_time> c1.start_time 
        AND c4.error_code = 0 
       ) c5 
WHERE 
    c1.error_code = 0; 

當然,使查詢優化搜索沒有任何好處,除非有可用的合適指標。下面列出的是適合所有3個metrics_table引用...(看什麼指標目前已經上市,有可能是你需要創建一個新的指數機會)

CREATE NONCLUSTERED INDEX ixf_metricstable_errorcode_starttime ON dbo.metrics_table (
    error_code, 
    start_time 
    ) 
INCLUDE (
    end_time, 
    box, 
    process 
    ) 
WHERE 
    error_code = 0; 
0

我用Between並得到了良好的性能我簡單的測試裝備。我也使用了列存儲,因爲5000萬條記錄是DW卷:

CREATE TABLE dbo.metrics_table (
    rowId  INT IDENTITY, 
    start_time DATETIME NOT NULL, 
    end_time DATETIME NOT NULL, 
    box   VARCHAR(10) NOT NULL, 
    process  VARCHAR(10) NOT NULL, 
    error_code INT NOT NULL 
); 


-- Add records 
;WITH cte AS (
SELECT TOP 3334 ROW_NUMBER() OVER (ORDER BY (SELECT 1)) rn 
FROM sys.columns c1 
    CROSS JOIN sys.columns c2 
    CROSS JOIN sys.columns c3 
) 
INSERT INTO dbo.metrics_table (start_time, end_time, box, process, error_code) 
SELECT 
    DATEADD(ms, rn, DATEADD(day, rn % 365, '1 Jan 2017')) AS start_time, 
    DATEADD(ms, rn % 409, DATEADD(ms, rn, DATEADD(day, rn % 365, '1 Jan 2017'))) AS end_time, 
    'box' + CAST(boxes.box AS VARCHAR(10)) box, 
    'process' + CAST(boxes.box AS VARCHAR(10)) process, 
    ABS(CAST(rn % 3000 AS BIT) -1) error_code 
FROM cte c 
    CROSS JOIN (SELECT TOP 10 rn FROM cte) AS boxes(box) 
    CROSS JOIN (SELECT TOP 30 rn FROM cte) AS processes(process); 


-- Create normal clustered index to order the data 
CREATE CLUSTERED INDEX cci_metrics_table ON dbo.metrics_table (start_time, end_time, box, process); 
--CREATE CLUSTERED INDEX cci_metrics_table ON dbo.metrics_table (box, process, start_time, end_time); 

-- Convert to columnstore 
CREATE CLUSTERED COLUMNSTORE INDEX cci_metrics_table ON dbo.metrics_table WITH (MAXDOP = 1, DROP_EXISTING = ON); 



IF OBJECT_ID('tempdb..#tmp1') IS NOT NULL DROP TABLE #tmp1 

-- two hour window before, 1 hour window after 
SELECT 
    c1.start_time, 
    c1.end_time, 
    c1.box, 
    c1.process, 
    DATEDIFF(ms, c1.start_time, c1.end_time) AS duration_ms, 
    DATEPART(dw, c1.start_time) AS day_of_week, 
    DATEPART(hour, c1.start_time) AS hour_of_day, 
    c2.xavg, 
    c2.num_process_total, 
    c2.num_process_unique, 
    c2.num_box_unique, 
    c3.percent_over_thresh 

INTO #tmp1 

FROM dbo.metrics_table c1 
    CROSS APPLY 
     (
     SELECT 
      COUNT(1) AS num_process_total, 
      AVG(CAST(DATEDIFF(ms, start_time, end_time) AS NUMERIC)) xavg, 
      COUNT(DISTINCT process) num_process_unique, 
      COUNT(DISTINCT box) num_box_unique 
     FROM dbo.metrics_table c2 
     WHERE c2.error_code = 0 
      AND c2.start_time Between DATEADD(minute, -120, c1.start_time) And c1.start_time 
      AND c1.start_time > c2.start_time 
     ) c2 

    CROSS APPLY 
     (
     SELECT 
      AVG(CASE WHEN DATEDIFF(ms, c4.start_time, c4.end_time) > 1000 THEN 1.0 ELSE 0.0 END) percent_over_thresh 
     FROM dbo.metrics_table c4 
     WHERE c4.error_code = 0 
      AND c4.start_time Between c1.start_time And DATEADD(minute, 60, c1.start_time) 
      AND c4.start_time > c1.start_time 
     ) c3 

WHERE error_code = 0