2017-06-07 98 views
0

我試圖通過多個類別跟蹤不同的路徑。我的表的簡化圖如下所示:SQL Server row_number()通過分區,但忽略重複的分類值

Table: customer_category 

CustomerID | Category | Date 
11111  | A   | 2016-01-01 
11111  | B   | 2016-02-01 
11111  | C   | 2016-03-01 
22222  | A   | 2016-01-01 
22222  | A   | 2016-02-01 
22222  | A   | 2016-03-01 
22222  | C   | 2016-04-01 
33333  | A   | 2016-01-01 
33333  | B   | 2016-02-01 
33333  | C   | 2016-03-01 
33333  | C   | 2016-04-01 

我可以通過這個查詢找到絕對路徑:

with cat_order as (
    select CustomerID 
      ,Category 
      ,row_number() over (partition by CustomerID order by Date) as rnk 
    from customer_category 
),pivot as (
    select CustomerID 
     ,max(case when rnk = 1 then Category else null end) as category_1 
     ,max(case when rnk = 2 then Category else null end) as category_2 
     ,max(case when rnk = 3 then Category else null end) as category_3 
     ,max(case when rnk = 4 then Category else null end) as category_4 
    from cat_order 
    group by CustomerID 
) 
select category_1, category_2, category_3, category_4, count(*) as count 
from pivot 
group by category_1, category_2, category_3, category_4 

;

這使我有以下幾點:

category_1 | category_2 | category_3 | category_4 | count 
A   | B   | C   |    | 1 
A   | A   | A   | C   | 1 
A   | B   | C   | C   | 1 

我想要什麼,雖然是忽略重複的類別,這樣我就看到

category_1 | category_2 | category_3 | category_4 | count 
A   | B   | C   |    | 2 
A   | C   |    |    | 1 

在我的頭上,我想我會需要到

  1. 省略任何記錄,其中類別=滯後(類別)
  2. 排名在分區...
  3. 支點與case語句
  4. 彙總結果

感覺方式過於複雜。有沒有更簡單的方法來做到這一點?

+0

你是什麼意思忽略重複類別..所有1,2,3,4?在你的結果中,你從category2中得到了一個c,但是基礎沒有。 –

+0

當我說'重複類別'時,我正在研究消費者22222是如何經歷AAA C序列的。我不關心他們是否屬於A類中的三種不同測量,只是它們是A,然後是C (沒有通過B類),而另外兩個從A→B→C進展 –

回答

0

就我所知(根據您的數據和您想要的輸出),沒有一種簡單的方法可以做到這一點。爲了得到你想要的結果,你基本上需要完成你列出的四個步驟(或者它的一些變化)。儘管如此,你可以通過一種不需要CTE的方式來「簡化」它。例如:

SELECT category_1 = P.[1], 
     category_2 = P.[2], 
     category_3 = P.[3], 
     category_4 = P.[4], 
     [Count] = COUNT(*) 
FROM 
(
    SELECT CustomerID, 
      Category, 
      rnk = SUM(checkprev) OVER (PARTITION BY CustomerID ORDER BY [Date]) 
    FROM 
    (
     SELECT *, checkprev = CASE WHEN LAG(Category) OVER (PARTITION BY CustomerID ORDER BY [Date]) = Category THEN 0 ELSE 1 END 
     FROM customer_category 
    ) T 
) AS T 
PIVOT 
(
    MAX(Category) FOR rnk IN ([1], [2], [3], [4]) 
) AS P 
GROUP BY P.[1], P.[2], P.[3], P.[4];