2017-02-13 42 views
0

我曾經有過下面的代碼,以獲得所有「 - 」行動對所有客戶:DistinctCount + LastNonEmpty內OVER子句

with 
T1 as 
(
select 
    [Contract] = 'Contract1', 
    [Customer] = 'Customer4', 
    [Date] = '2017-01-01', 
    [Action] = '+' 

union all 

select 
    [Contract] = 'Contract1', 
    [Customer] = 'Customer6', 
    [Date] = '2017-01-02', 
    [Action] = '+' 

union all 

select 
    [Contract] = 'Contract1', 
    [Customer] = 'Customer4', 
    [Date] = '2017-01-03', 
    [Action] = '-' 

union all 

select 
    [Contract] = 'Contract1', 
    [Customer] = 'Customer4', 
    [Date] = '2017-01-04', 
    [Action] = '+' 

union all 

select 
    [Contract] = 'Contract1', 
    [Customer] = 'Customer4', 
    [Date] = '2017-01-05', 
    [Action] = '-' 

union all 

select 
    [Contract] = 'Contract1', 
    [Customer] = 'Customer6', 
    [Date] = '2017-01-06', 
    [Action] = '-' 

union all 

select 
    [Contract] = 'Contract1', 
    [Customer] = 'Customer8', 
    [Date] = '2017-01-07', 
    [Action] = '+' 

union all 

select 
    [Contract] = 'Contract1', 
    [Customer] = 'Customer8', 
    [Date] = '2017-01-08', 
    [Action] = '-' 

union all 

select 
    [Contract] = 'Contract1', 
    [Customer] = 'Customer4', 
    [Date] = '2017-01-09', 
    [Action] = '+' 
) 

select 
    [Customer], 
    [Date] 
from T1 
where [Action] = '-' 

現在我需要做的是在合同領域。這意味着當上次操作爲' - '時,必須在該日期之前對所有使用'+'操作的客戶返回合同和日期值。非常期望的輸出應該是:

Date   | Contract 
------------ | ------ 
2017-01-06 | Contract1 
2017-01-08 | Contract1 

預期的算法應該是這樣的:

[PlusDC] = count(distinct iif([Action] = '+',Customer,NULL)) over (partition by [Contract] order by [Date]) 
[MinusDC] = count(distinct iif([Action] = '-',Customer,NULL)) over (partition by [Contract] order by [Date]) 

但是:

  • 它不反正工作。
  • 即使它工作,即使[PlusDC] = [MinusDC],它也會返回值2017-01-09,這是不正確的。

粗略地說,我要檢查以下代碼對所有客戶:

  1. [動作] = ' - ' 當前行。

  2. lag([Action],1)=' - '(或Null,如果客戶記錄在當天晚些時候出現)爲每個客戶。

更新:爲了讓事情更清楚了,我做了我的一列數據爲導向的觀點:

----------------------------------------------------------------------- 
| Date   | Contract | Customer4 | Customer6 | Customer8 | All | 
| ------------ | --------- | --------- | --------- | --------- | --- | 
| 2017-01-01 | Contract1 |  +  |   |   |  | 
| 2017-01-02 | Contract1 |   |  +  |   |  | 
| 2017-01-03 | Contract1 |   |   |   |  | 
| 2017-01-04 | Contract1 |  -  |   |   |  | <-- Customer6 still has a '+' 
| 2017-01-05 | Contract1 |  +  |   |   |  | 
| 2017-01-06 | Contract1 |  -  |   |   |  | <-- Customer6 still has a '+' 
| 2017-01-07 | Contract1 |   |  -  |   | - | <-- All customers has '-' or null as a last action 
| 2017-01-08 | Contract1 |   |   |  +  |  | 
| 2017-01-09 | Contract1 |   |   |  -  | - | <-- All customers has '-' or null as a last action 
----------------------------------------------------------------------- 

的所有列代表所有客戶的實際狀態(我需要的行)。正如您可能注意到的,2017-01-04和2017-01-06在合同領域內並不是真正的' - '。 Contract1沒有關閉,它仍然有一個Customer6打開。每個合同有一定數量的客戶很容易。無數的事情呢?

有沒有實用的建議?

回答

0

感謝我的答案,我想出了我自己的解決方案,它和Søren's一樣正確,並且和shA.t一樣快。多謝你們!

,[SelfJoinSet] as (
select 
    T1.[Contract], -- Contract from the base 
    T1.[Date], -- Date from the base 
    T2.[Customer], -- Customers from the current contract 
    T2.[Action], -- ActionType of the customer 
    -- RowNumber in order to get the last record for each customer within the current contract: 
    [RN] = row_number() over (partition by T2.[Customer],T1.[Date] order by T2.[Date] desc) 
from T1 
    left join T1 as T2 -- SelfJoin 
    on T1.[Contract] = T2.[Contract] -- Within the current contract 
    and T1.[Date] >= T2.[Date] -- Get all records till the current date  
where T1.[Action] = '-' --Filter out all '+' actions, we'll get only '-' records anyway (for performance optimization reasons). 
) 

select 
    [Contract], 
    [Date] 
from [SelfJoinSet] 
where [RN] = 1 --Show only last record per each customer 
group by [Contract],[Date] -- Collapse all to the base table records 
having count(distinct iif([Action] = '-',[Customer],NULL)) = count(distinct [Customer]) -- Leave only records where the last action is '-' for all customers 
1

我認爲你可以使用ROW_NUMBER()這樣的:

;with tt as (
    select T1.[Contract], T1.[Date], T1.[Action], t.[Customer], t.[Action] lAction, t.[Date] lDate 
     -- this `rn` will give me the last action for each other customer older that each Date 
     , row_number() over (partition by T1.[Contract], T1.[Date], t.[Customer] order by t.[Date] desc) rn 
    from T1 
    -- I use this self left join to gather data with: 
    left join T1 t 
     on T1.[Contract] = t.[Contract] -- same Contract 
     and T1.[Date] > t.[Date]   -- older than current date 
     and T1.[Customer] != t.[Customer] -- for other customers 
    -- So I will have actions of other customer older than each date 
) 
select [Contract], [Date] 
from T1 
-- I just check if there is not any data in `tt` with: 
where not exists(
    select 1 
    from tt 
    where tt.[Contract] = T1.[Contract] -- same contract 
     and tt.[Date] = T1.[Date]   -- same date 
     and rn = 1       -- only last action 
     and (T1.[Action] = '+'    -- current customer's action is '+' 
     or isnull(lAction, '+') = '+')  -- or others last actions is '+' 
    ) 
group by [Contract], [Date]; 
+0

它應該檢查所有客戶的上次操作(它必須是' - '),按合同分組。請參閱http://pastebin.com/xZvNJ1D1。我已將2017-01-05操作更改爲'+',並且您的代碼仍然返回相同的結果,但不應該,Customer4的最後一個操作是'+',因此它應該不會返回任何結果。 –

+0

你讓我錯了,我已經用視覺重新描述了任務,看到更新的數據。 –

+0

終於,我得到你想要的,檢查我編輯的答案-HTH;)。 –

1

好了我要去的是你一個人先填寫一個表來解決這個問題。我會做的是重複每個日期的客戶和合同的每個組合。

我此CTE追加到您的示例代碼:

, 
FullTable as 
(
select 
a.[Contract] 
,a.[Customer] 
,b.[Date] 
,c.[Action] 
,count(c.[Action]) over (partition by a.[Contract],a.[Customer] order by b.[Date]) c 
from 
(select distinct 
    [Contract], 
    [Customer] 
from T1) a 
inner join 
(select distinct 
    [Contract], 
    [Date] 
from T1) b 
on a.[Contract]=b.[Contract] 
left join t1 c 
on c.[Contract]=a.[Contract] and a.[Customer]=c.[Customer] and b.[Date]=c.[Date] 
) 

現在Fulltable做兩件事情,它可以確保有對每一個客戶的每一天的行。如果源數據中沒有針對該客戶的操作,則Action爲NULL。 我做的第二件事是使用窗計數

count(c.[Action]) over (partition by a.[Contract],a.[Customer] order by b.[Date]) c 

計數不計算NULL值算以前的操作的數量,所以這實際上組數據,其中一組爲每個客戶對於每一日期,有是一種價值,以及與空動作之後直接來到任何行得到相同的組

這是該數據爲客戶4

Contract Customer Date  c Action 
Contract1 Customer4 2017-01-01 1 + 
Contract1 Customer4 2017-01-02 1 NULL 
Contract1 Customer4 2017-01-03 2 - 
Contract1 Customer4 2017-01-04 3 + 
Contract1 Customer4 2017-01-05 4 - 
Contract1 Customer4 2017-01-06 4 NULL 
Contract1 Customer4 2017-01-07 4 NULL 
Contract1 Customer4 2017-01-08 4 NULL 
Contract1 Customer4 2017-01-09 5 + 

現在我做一個新的CTE叫DaillyStatus。此CTE填寫NULLS,以便現在每天保持該合同和客戶的最新狀態而不是NULL。這意味着對於表中的每一天,可以找到每個客戶合同組合的狀態。要做到這一點,我只是得到了MAX爲每個組我剛剛發現

,DailyStatus as 
(
select 
[Contract] 
,[Customer] 
,[Date] 
,[Action] 
,c 
,max([Action]) over (partition by [Contract],[Customer],c) FilledAction 
from 
FullTable 
) 

Contract Customer Date  c FilledAction Action 
Contract1 Customer6 2017-01-01 0 NULL NULL 
Contract1 Customer6 2017-01-02 1 +  + 
Contract1 Customer6 2017-01-03 1 +  NULL 
Contract1 Customer6 2017-01-04 1 +  NULL 
Contract1 Customer6 2017-01-05 1 +  NULL 
Contract1 Customer6 2017-01-06 2 -  - 
Contract1 Customer6 2017-01-07 2 -  NULL 
Contract1 Customer6 2017-01-08 2 -  NULL 
Contract1 Customer6 2017-01-09 2 -  NULL 

使用這個表中,我們可以得到每個客戶表中的每個日期的狀態。由於「+」>「 - 」> NULL我們可以發現,所有客戶都有日期「 - 」作爲最新actionor已對或該日期之前不採取行動(NULL)

select 
[Contract] 
,[Date] 
,max(FilledAction) 
from DailyStatus 
group by [Contract],[Date] 
having max(FilledAction) ='-' 

完整的解決方案是在這裏:

,FullTable as 
(
select 
a.[Contract] 
,a.[Customer] 
,b.[Date] 
,c.[Action] 
,count(c.[Action]) over (partition by a.[Contract],a.[Customer] order by b.[Date]) c 
from 
(select distinct 
    [Contract], 
    [Customer] 
from T1) a 
inner join 
(select distinct 
    [Contract], 
    [Date] 
from T1) b 
on a.[Contract]=b.[Contract] 
left join t1 c 
on c.[Contract]=a.[Contract] and a.[Customer]=c.[Customer] and b.[Date]=c.[Date] 
) 
,DailyStatus as 
(
select 
[Contract] 
,[Customer] 
,[Date] 
,[Action] 
,c 
,max([Action]) over (partition by [Contract],[Customer],c) FilledAction 
from 
FullTable 
) 
select 
[Contract] 
,[Date] 
,max(FilledAction) 
from DailyStatus 
group by [Contract],[Date] 
having max(FilledAction) ='-' 
+0

謝謝!看起來很有希望!我稍後會測試它。 –

+0

我測試過我的數據,它似乎很沉重。它也沒有通過這種情況:http://pastebin.com/Jauig67u然而這是一個邊緣案例。謝謝。我明白了!我將通過邊緣案例和更快的方式發佈我的解決方案來處理大數據集。 –