2016-09-22 57 views
2

我有有安排的方式爲數據,看起來像這樣的表:識別序列開始和結束在SQL Server

ID | BOUNDARY | TIMESTAMP 
1 | NULL  | 2016-01-01 00:20:00 
2 | A   | 2016-01-01 00:20:10 
3 | A   | 2016-01-01 00:20:14 
4 | A   | 2016-01-01 00:20:22 
5 | NULL  | 2016-01-01 00:20:38 
6 | A   | 2016-01-01 00:20:45 
7 | B   | 2016-01-01 00:21:02 
8 | B   | 2016-01-01 00:21:12 
9 | A   | 2016-01-01 00:21:16 
10 | A   | 2016-01-01 00:21:22 
11 | C   | 2016-01-01 00:21:30 
12 | A   | 2016-01-01 00:21:35 
13 | A   | 2016-01-01 00:21:40 
14 | A   | 2016-01-01 00:21:46 
15 | A   | 2016-01-01 00:21:50 

我想要做的是找到一個有效的方式來標記的ID和SQL Server 2014中序列的開始和結束的時間戳。段將是邊界不爲空並且至少連續兩次重複。例如,第一部分將來自ID 2-4,第二部分將是ID 7-8,第三部分將是9-10。

我首先嚐試的方法是創建兩個列,一個「startflag」列和一個「endflag」列。我創建更新查詢正確標記的開始和結束,但我想創建一個視圖,我可以把它看成一個記錄,如下圖所示:

BOUNDARY | START ID | END ID 
A  | 2  | 4 
B  | 7  | 8 
A  | 9  | 10 
A  | 12  | 15 
+0

http://stackoverflow.com/a/31704558/3585278 – Danieboy

+0

爲什麼Boundary =「C」未包含在最終答案中?因爲沒有兩個記錄? –

+1

因爲需要連續至少有2個實例。 – user3150002

回答

2

好吧,我敢肯定有更好的方法可以做到這一點,但這個工程:

WITH CTE AS 
(
    SELECT *, 
      RN1 = ROW_NUMBER() OVER(ORDER BY [TIMESTAMP]), 
      RN2 = ROW_NUMBER() OVER(PARTITION BY BOUNDARY ORDER BY [TIMESTAMP]) 
    FROM #YourTable 
), CTE2 AS 
(
    SELECT *, 
      RN1-RN2 RN3, 
      COUNT(*) OVER(PARTITION BY RN1-RN2) N 
    FROM CTE 
) 
SELECT BOUNDARY, 
     MIN(ID) [START ID], 
     MAX(ID) [END ID] 
FROM CTE2 
WHERE N > 1 
AND BOUNDARY IS NOT NULL 
GROUP BY BOUNDARY, RN3 
ORDER BY [START ID]; 

如果我們使用此示例表:

CREATE TABLE #YourTable 
    ([ID] int, [BOUNDARY] varchar(4), [TIMESTAMP] datetime) 
; 

INSERT INTO #YourTable 
    ([ID], [BOUNDARY], [TIMESTAMP]) 
VALUES 
    (1, NULL, '2016-01-01 00:20:00'), 
    (2, 'A', '2016-01-01 00:20:10'), 
    (3, 'A', '2016-01-01 00:20:14'), 
    (4, 'A', '2016-01-01 00:20:22'), 
    (5, NULL, '2016-01-01 00:20:38'), 
    (6, 'A', '2016-01-01 00:20:45'), 
    (7, 'B', '2016-01-01 00:21:02'), 
    (8, 'B', '2016-01-01 00:21:12'), 
    (9, 'A', '2016-01-01 00:21:16'), 
    (10, 'A', '2016-01-01 00:21:22'), 
    (11, 'C', '2016-01-01 00:21:30'), 
    (12, 'A', '2016-01-01 00:21:35'), 
    (13, 'A', '2016-01-01 00:21:40'), 
    (14, 'A', '2016-01-01 00:21:46'), 
    (15, 'A', '2016-01-01 00:21:50') 
; 

的結果是:

╔══════════╦══════════╦════════╗ 
║ BOUNDARY ║ START ID ║ END ID ║ 
╠══════════╬══════════╬════════╣ 
║ A  ║  2 ║  4 ║ 
║ B  ║  7 ║  8 ║ 
║ A  ║  9 ║  10 ║ 
║ A  ║  12 ║  15 ║ 
╚══════════╩══════════╩════════╝ 
+1

+1,但我認爲TIMESTAMP的順序應該確實是ID,特別是如果TIMESTAMP可能不合適例如將記錄3更改爲1天后,您會得到一個時髦的結果。 – Matt

+0

@Lamak:我對數據的一小部分進行了測試,它似乎起作用!我的表有成千上萬行數據,所以讓我通過整個集合來驗證。 – user3150002

+0

@ user3150002沒問題。請花一點時間並根據需要進行測試 – Lamak

2

的關鍵是通過

  1. 計算行數基於時間來創建島分組(這是您的ID
  2. 對於每個不同的值
  3. 分組計算行數=(1) - ( 2)

看看下面的例子:

declare @T table (ID int, BOUNDARY char(1), [TIMESTAMP] datetime2) 
insert into @T values (1, null, '2016-01-01 00:20:00'), (2, 'A', '2016-01-01 00:20:10'), (3, 'A', '2016-01-01 00:20:14'), (4, 'A', '2016-01-01 00:20:22'), (5, null, '2016-01-01 00:20:38'), (6, 'A', '2016-01-01 00:20:45'), (7, 'B', '2016-01-01 00:21:02'), (8, 'B', '2016-01-01 00:21:12'), (9, 'A', '2016-01-01 00:21:16'), (10, 'A', '2016-01-01 00:21:22'), (11, 'C', '2016-01-01 00:21:30'), (12, 'A', '2016-01-01 00:21:35'), (13, 'A', '2016-01-01 00:21:40'), (14, 'A', '2016-01-01 00:21:46'), (15, 'A', '2016-01-01 00:21:50') 

select 
    BOUNDARY, 
    min(ID) as [START ID], 
    max(id) as [END ID] 
from 
(
    select 
     ID, 
     BOUNDARY, 
     ID - 
     row_number() over (partition by BOUNDARY order by TIMESTAMP) as grp 
    from @T as t 
) as T 
where BOUNDARY is not null 
group by grp, BOUNDARY 
having count(*) >= 2 
order by min(ID) 
+1

+1,但我認爲TIMESTAMP的順序應該確實是ID,特別是如果TIMESTAMP可能出現故障,例如將記錄3更改爲1天后,您會得到一個時髦的結果。 – Matt

+0

@Aducci,謝謝,但不完全。我嘗試了上面的內容,並將所有'A'邊界作爲一條記錄進行分組,開始時爲2,結束時間爲15分鐘。實際上,上面需要3個單獨的'A'記錄。然而,這是一個開始,我可能會與之合作。 – user3150002

+0

@ user3150002 - 腳本爲我輸出正確的結果。不確定你正在運行什麼? – Aducci

相關問題