2017-06-03 57 views
7

問:如何根據1列的更改值對記錄進行排名?根據1列的更改值排列記錄

我有如下數據(https://pastebin.com/vdTb1JRT):

EmployeeID Date  Onleave 
ABH12345 2016-01-01 0 
ABH12345 2016-01-02 0 
ABH12345 2016-01-03 0 
ABH12345 2016-01-04 0 
ABH12345 2016-01-05 0 
ABH12345 2016-01-06 0 
ABH12345 2016-01-07 0 
ABH12345 2016-01-08 0 
ABH12345 2016-01-09 0 
ABH12345 2016-01-10 1 
ABH12345 2016-01-11 1 
ABH12345 2016-01-12 1 
ABH12345 2016-01-13 1 
ABH12345 2016-01-14 0 
ABH12345 2016-01-15 0 
ABH12345 2016-01-16 0 
ABH12345 2016-01-17 0 

我想產生以下結果:

EmployeeID DateValidFrom DateValidTo  OnLeave 
ABH12345 2016-01-01  2016-01-09  0 
ABH12345 2016-01-10  2016-01-13  1 
ABH12345 2016-01-14  2016-01-17  0 

所以我想,如果我能以某種方式創建一個排名列(如下所示),該值根據Onleave列中的值增加 - 由EmployeeID列分區。

EmployeeID Date  Onleave RankedCol 
ABH12345 2016-01-01 0   1 
ABH12345 2016-01-02 0   1 
ABH12345 2016-01-03 0   1 
ABH12345 2016-01-04 0   1 
ABH12345 2016-01-05 0   1 
ABH12345 2016-01-06 0   1 
ABH12345 2016-01-07 0   1 
ABH12345 2016-01-08 0   1 
ABH12345 2016-01-09 0   1 
ABH12345 2016-01-10 1   2 
ABH12345 2016-01-11 1   2 
ABH12345 2016-01-12 1   2 
ABH12345 2016-01-13 1   2 
ABH12345 2016-01-14 0   3 
ABH12345 2016-01-15 0   3 
ABH12345 2016-01-16 0   3 
ABH12345 2016-01-17 0   3 

然後,我將能夠做到以下幾點:

SELECT 
[EmployeeID] = [EmployeeID] 
,[DateValidFrom] = MIN([Date]) 
,[DateValidTo] = MAX([Date]) 
,[OnLeave]  = [OnLeave] 
FROM table/view/cte/sub-query 
GROUP BY 
[EmployeeID] 
,[OnLeave] 
,[RankedCol] 

其他解決方案都非常歡迎..

下面是測試數據:

WITH CTE AS (SELECT EmployeeID = 'ABH12345', [Date] = CAST(N'2016-01-01' AS Date), [Onleave] = 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-02' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-03' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-04' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-05' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-06' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-07' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-08' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-09' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-10' AS Date), 1 
UNION SELECT 'ABH12345', CAST(N'2016-01-11' AS Date), 1 
UNION SELECT 'ABH12345', CAST(N'2016-01-12' AS Date), 1 
UNION SELECT 'ABH12345', CAST(N'2016-01-13' AS Date), 1 
UNION SELECT 'ABH12345', CAST(N'2016-01-14' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-15' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-16' AS Date), 0 
UNION SELECT 'ABH12345', CAST(N'2016-01-17' AS Date), 0 
) 

SELECT * FROM CTE 
+4

加1的樣本數據 – TheGameiswar

+1

提示:這是有幫助的標記同時與相應的軟件數據庫的問題(MySQL和甲骨文,DB2,...)和版本,例如'的SQL服務器2014'。語法和功能的差異往往會影響答案。在這種情況下,滯後是一個相對較新的特徵。 – HABO

+0

增加了sql-server-2014,謝謝@HABO –

回答

2

這是另一種更簡單的方法來獲得所需的輸出 - 只訪問一次表。

-- sample of data from your question 
with t1(EmployeeID, Date1, Onleave) as(
    select 'ABH12345', cast('2016-01-01' as date), 0 union all 
    select 'ABH12345', cast('2016-01-02' as date), 0 union all 
    select 'ABH12345', cast('2016-01-03' as date), 0 union all 
    select 'ABH12345', cast('2016-01-04' as date), 0 union all 
    select 'ABH12345', cast('2016-01-05' as date), 0 union all 
    select 'ABH12345', cast('2016-01-06' as date), 0 union all 
    select 'ABH12345', cast('2016-01-07' as date), 0 union all 
    select 'ABH12345', cast('2016-01-08' as date), 0 union all 
    select 'ABH12345', cast('2016-01-09' as date), 0 union all 
    select 'ABH12345', cast('2016-01-10' as date), 1 union all 
    select 'ABH12345', cast('2016-01-11' as date), 1 union all 
    select 'ABH12345', cast('2016-01-12' as date), 1 union all 
    select 'ABH12345', cast('2016-01-13' as date), 1 union all 
    select 'ABH12345', cast('2016-01-14' as date), 0 union all 
    select 'ABH12345', cast('2016-01-15' as date), 0 union all 
    select 'ABH12345', cast('2016-01-16' as date), 0 union all 
    select 'ABH12345', cast('2016-01-17' as date), 0 
) 
-- actual query 
select max(w.employeeid) as employeeid 
    , min(w.date1)  as datevalidfrom 
    , max(w.date1)  as datevalidto 
    , max(w.onleave) as onleave 
    from (
     select row_number() over(partition by employeeid order by date1) - 
       row_number() over(partition by employeeid, onleave order by date1) as grp 
      , employeeid 
      , date1 
      , onleave 
      from t1 s 
     ) w 
group by w.grp 
order by employeeid, datevalidfrom 

結果:

employeeid datevalidfrom datevalidto onleave 
---------- ------------- ----------- ----------- 
ABH12345 2016-01-01 2016-01-09 0 
ABH12345 2016-01-10 2016-01-13 1 
ABH12345 2016-01-14 2016-01-17 0 
2

這是羣島問題的一個例子。在這種情況下,您可以使用日期算術。關鍵的觀察結果是,從日期列中減去一個整數序列可以確定類似值的島嶼。

作爲一個查詢,這看起來像:

SELECT EmployeeId, MIN([Date]) as DateValidFrom, MAX([Date]) as DateValidTo, 
     OnLeave 
FROM (SELECT t.*, 
      ROW_NUMBER() OVER (PARTITION BY EmployeeId, OnLeave ORDER BY [Date]) as seqnum 
     FROM t 
    ) t 
GROUP BY EmployeeID, DATEADD(day, - seqnum, [Date]), OnLeave; 

您可以運行子查詢,在結果盯着,做算術明白爲什麼這個工程。

這裏是example

+0

有趣..從我開始的地方輸出有點仍然是相同的。我怎麼能夠只用3行來總結結果呢? –

3

還有一種方法可以做到這一點lag。通過獲取每個employeeid的前一個Onleave值並在找到不同值時重置它,從而分配組。

select employeeid,min(date) as date_from,max(date) as date_to,max(onleave) as onleave 
from (select t.*,sum(case when prev_ol=onleave then 0 else 1 end) over(partition by employeeid order by date) as grp 
     from (select c.*,lag(onleave,1,onleave) over(partition by employeeid order by date) as prev_ol 
      from cte c 
      ) t 
    ) t 
group by employeeid,grp 
+0

工程就像一個魅力!使用滯後分配組。這很聰明。非常感謝! –