2017-02-11 45 views
2

我可以使用傳統的子查詢方法來統計最近10分鐘內的事件。例如:窗口函數用於統計最近10分鐘內發生的事件

drop table if exists [dbo].[readings] 
go 

create table [dbo].[readings](
    [server] [int] NOT NULL, 
    [sampled] [datetime] NOT NULL 
) 
go 

insert into readings 
values 
(1,'20170101 08:00'), 
(1,'20170101 08:02'), 
(1,'20170101 08:05'), 
(1,'20170101 08:30'), 
(1,'20170101 08:31'), 
(1,'20170101 08:37'), 
(1,'20170101 08:40'), 
(1,'20170101 08:41'), 
(1,'20170101 09:07'), 
(1,'20170101 09:08'), 
(1,'20170101 09:09'), 
(1,'20170101 09:11') 
go 

-- Count in the last 10 minutes - example periods 08:31 to 08:40, 09:12 to 09:21 
select server,sampled,(select count(*) from readings r2 where r2.server=r1.server and r2.sampled <= r1.sampled and r2.sampled > dateadd(minute,-10,r1.sampled)) as countinlast10minutes 
from readings r1 
order by server,sampled 
go 

如何使用窗口函數獲得相同的結果?我試過這個:

select server,sampled, 
count(case when sampled <= r1.sampled and sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes 
-- count(case when currentrow.sampled <= r1.sampled and currentrow.sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes 
from readings r1 
order by server,sampled 

但結果只是運行計數。任何引用當前行指針的系統變量? currentrow.sampled?

+0

試試這個 SELECT COUNT(1)從讀數R1 其中DATEDIFF(分鐘,GETDATE(),採樣)<= 10 –

回答

2

這不是一個很討好的答案,但一種可能性是先創建一個輔助表的所有分

CREATE TABLE #DateTimes(datetime datetime primary key); 

WITH E1(N) AS 
(
    SELECT 1 FROM (VALUES(1),(1),(1),(1),(1), 
          (1),(1),(1),(1),(1)) V(N) 
)          -- 1*10^1 or 10 rows 
, E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows 
, E4(N) AS (SELECT 1 FROM E2 a, E2 b) -- 1*10^4 or 10,000 rows 
, E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^8 or 100,000,000 rows 
,R(StartRange, EndRange) 
AS (SELECT MIN(sampled), 
      MAX(sampled) 
    FROM readings) 
,N(N) 
AS (SELECT ROW_NUMBER() 
       OVER (
       ORDER BY (SELECT NULL)) AS N 
    FROM E8) 
INSERT INTO #DateTimes 
SELECT TOP (SELECT 1 + DATEDIFF(MINUTE, StartRange, EndRange) FROM R) DATEADD(MINUTE, N.N - 1, StartRange) 
FROM N, 
     R; 

,然後與您可以使用ROWS BETWEEN 9 PRECEDING AND CURRENT ROW

WITH T1 AS 
(SELECT Server, 
        MIN(sampled) AS StartRange, 
        MAX(sampled) AS EndRange 
     FROM  readings 
     GROUP BY Server) 
SELECT  Server, 
      sampled, 
      Cnt 
FROM  T1 
CROSS APPLY 
      (SELECT r.sampled, 
           COUNT(r.sampled) OVER (ORDER BY N.datetime ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS Cnt 
         FROM  #DateTimes N 
         LEFT JOIN readings r 
         ON  r.sampled = N.datetime 
           AND r.server = T1.server 
         WHERE  N.datetime BETWEEN StartRange AND EndRange) CA 
WHERE  CA.sampled IS NOT NULL 
ORDER BY sampled 

上面假定每分鐘最多有一個樣本,並且所有時間都是精確的分鐘。如果這不是真的,它需要另一個表格表達式按日期時間預先聚合到一分鐘。

1

據我所知,沒有一個簡單的確切替代你的子查詢使用窗口函數。

窗口函數對一組行進行操作,並允許您根據分區和順序使用它們。 你所要做的不是我們可以在窗口函數中使用的分區類型。 要生成分區,我們需要能夠在這個實例中使用窗口函數只會導致代碼過於複雜。

我建議cross apply()作爲你的子查詢的替代。

我不知道你是否打算在9分鐘內限制你的結果,但是sampled > dateadd(...)這就是你原來的子查詢中發生的情況。

下面是一個窗口函數的樣子,它基於將樣本分成10分鐘窗口和cross apply()版本。

select 
    r.server 
    , r.sampled 
    , CrossApply  = x.CountRecent 
    , OriginalSubquery = (
     select count(*) 
     from readings s 
     where s.server=r.server 
     and s.sampled <= r.sampled 
     /* doesn't include 10 minutes ago */ 
     and s.sampled > dateadd(minute,-10,r.sampled) 
     ) 
    , Slices   = count(*) over(
     /* partition by server, 10 minute slices, not the same thing*/ 
     partition by server, dateadd(minute,datediff(minute,0,sampled)/10*10,0) 
     order by sampled 
    ) 
from readings r 
    cross apply (
    select CountRecent=count(*) 
    from readings i 
    where i.server=r.server 
     /* changed to >= */ 
     and i.sampled >= dateadd(minute,-10,r.sampled) 
     and i.sampled <= r.sampled 
    ) as x 
order by server,sampled 

結果:http://rextester.com/BMMF46402

+--------+---------------------+------------+------------------+--------+ 
| server |  sampled  | CrossApply | OriginalSubquery | Slices | 
+--------+---------------------+------------+------------------+--------+ 
|  1 | 01.01.2017 08:00:00 |   1 |    1 |  1 | 
|  1 | 01.01.2017 08:02:00 |   2 |    2 |  2 | 
|  1 | 01.01.2017 08:05:00 |   3 |    3 |  3 | 
|  1 | 01.01.2017 08:30:00 |   1 |    1 |  1 | 
|  1 | 01.01.2017 08:31:00 |   2 |    2 |  2 | 
|  1 | 01.01.2017 08:37:00 |   3 |    3 |  3 | 
|  1 | 01.01.2017 08:40:00 |   4 |    3 |  1 | 
|  1 | 01.01.2017 08:41:00 |   4 |    3 |  2 | 
|  1 | 01.01.2017 09:07:00 |   1 |    1 |  1 | 
|  1 | 01.01.2017 09:08:00 |   2 |    2 |  2 | 
|  1 | 01.01.2017 09:09:00 |   3 |    3 |  3 | 
|  1 | 01.01.2017 09:11:00 |   4 |    4 |  1 | 
+--------+---------------------+------------+------------------+--------+ 
0

謝謝,馬丁和SqlZim,爲您解答。我將針對可用於窗口聚合的%% currentrow提出Connect連接增強請求。我想這會導致更簡單和自然的SQL:

select count(case when sample < = %% currentrow.sampled and sampled> dateadd(minute,-10,%% currentrow.sampled)then 1否則返回null完)OVER(...無論窗外是...)

我們已經可以用表達式如下:採樣< = GETDATE(當

SELECT COUNT(情況)和採樣> DATEADD(分,-10,getdate())then 1 else null end)over(...無論窗口是...)

因此,如果我們能夠引用當前行中的列,那麼思考會很棒。

+0

做的標準SQL的方式,你想要的這裏是使用'RANGE'取代'ROWS '但SQL Server不完全支持這一點。 http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sql-reference-window-clause.html –