2012-04-08 70 views
5

我有一個包含一系列(IP varchar(15),DateTime datetime2)值的表。每行對應於用戶所做的HTTP請求。我想分配會話號碼到這些行。不同的IP地址有不同的會話號碼。如果最後一個請求大於30分鐘,則應爲相同的IP分配一個新的會話編號。下面是一個示例輸出:SQL Server:通過超時分區的row_number

IP,  DateTime,   SessionNumber, RequestNumber 
1.1.1.1, 2012-01-01 00:01, 1,    1 
1.1.1.1, 2012-01-01 00:02, 1,    2 
1.1.1.1, 2012-01-01 00:03, 1,    3 
1.1.1.2, 2012-01-01 00:04, 2,    1 --different IP => new session number 
1.1.1.2, 2012-01-01 00:05, 2,    2 
1.1.1.2, 2012-01-01 00:40, 3,    1 --same IP, but last request 35min ago (> 30min) 

列1和2是輸入,3和4所希望的輸出。該表顯示兩個用戶。

由於底層表是真的很大,這怎麼能有效解決呢?我更喜歡在數據上傳遞一個小數量(一個或兩個)。

+0

什麼版本的SQL Server?如果2012年,新的'OVER'子句功能將有所幫助。 – 2012-04-08 18:37:19

+0

是的,它是SQL Server 2012. – usr 2012-04-08 18:40:16

回答

8

這裏有幾個嘗試。

;WITH CTE1 AS 
(
SELECT *, 
IIF(DATEDIFF(MINUTE, 
     LAG(DateTime) OVER (PARTITION BY IP ORDER BY DateTime), 
     DateTime) < 30,0,1) AS SessionFlag 
FROM Sessions 
), CTE2 AS 
(
SELECT *, 
     SUM(SessionFlag) OVER (PARTITION BY IP 
            ORDER BY DateTime) AS IPSessionNumber 
FROM CTE1 
) 
SELECT IP, 
     DateTime, 
     DENSE_RANK() OVER (ORDER BY IP, IPSessionNumber) AS SessionNumber, 
     ROW_NUMBER() OVER (PARTITION BY IP, IPSessionNumber 
           ORDER BY DateTime) AS RequestNumber 
FROM CTE2 

這有兩個排序操作(由IP, DateTime然後通過IP, IPSessionNumber)但並假定SessionNumber可以任意只要分配爲不同的唯一的會話號碼被分配給每個IP地址的每個新會話/30分鐘規則。

按時間順序依次指定SessionNumber s。我使用了以下內容。

;WITH CTE1 AS 
(
SELECT *, 
IIF(DATEDIFF(MINUTE, 
     LAG(DateTime) OVER (PARTITION BY IP ORDER BY DateTime), 
     DateTime) < 30,0,1) AS SessionFlag 
FROM Sessions 
), CTE2 AS(
SELECT *, 
     SUM(SessionFlag) OVER (ORDER BY DateTime) AS GlobalSessionNo 
FROM CTE1 
), CTE3 AS(
SELECT *, 
     MAX(CASE WHEN SessionFlag = 1 THEN GlobalSessionNo END) 
       OVER (PARTITION BY IP ORDER BY DateTime) AS SessionNumber 
FROM CTE2) 
SELECT IP, 
     DateTime, 
     SessionNumber, 
     ROW_NUMBER() OVER (PARTITION BY SessionNumber 
           ORDER BY DateTime) AS RequestNumber 
FROM CTE3 

但是,這會將排序操作的數量增加到4個。

+0

如果來自兩個IP交錯的請求,他們的會話不會混淆? – Andomar 2012-04-08 19:17:46

+0

@Andomar - 好點!固定。 – 2012-04-08 19:37:16

+0

使用窗口計數是巧妙的!我會記住那個技巧。 – usr 2012-04-08 22:30:08

2

這是一個使用表變量和row_number創建可用於遞歸CTE的ID的版本。將性能與遊標和一個查詢(由Martin提供)進行比較可能是值得的。

CREATE TABLE #T 
(
    IP varchar(15), 
    DateTime datetime, 
    ID int, 
    primary key (IP, ID) 
) 

insert into #T(IP, DateTime, ID) 
select IP, DateTime, row_number() over(partition by IP order by DateTime) 
from #sessionRequests 

;with C as 
(
    select IP, 
     ID, 
     DateTime, 
     1 as Session 
    from #T 
    where ID = 1 
    union all 
    select T.IP, 
     T.ID, 
     T.DateTime, 
     C.Session + case when datediff(minute, C.DateTime, T.DateTime) >= 30 then 1 else 0 end 
    from #T as T 
    inner join C 
     on T.IP = C.IP and 
     T.ID = C.ID + 1 
) 
SELECT IP, 
     DateTime, 
     dense_rank() over(order by IP, Session) as SessionNumber, 
     row_number() over(partition by IP, Session order by DateTime) as RequestNumber 
from C 
order by IP, DateTime, SessionNumber, RequestNumber 
option (maxrecursion 0) 
+1

我喜歡這個版本,因爲它很容易擴展,幾乎像基於遊標的方法。我改變它使用固定優化器問題的臨時表(表變量沒有統計數據)。另外,我驗證了這個代碼的工作原理。謝謝! – usr 2012-04-09 16:47:38