2010-09-18 75 views
8

請幫我生成以下查詢。說我有客戶桌子和訂單表。TSQL查找連續3個月發生的訂單

客戶表

CustID CustName 

1  AA  
2  BB 
3  CC 
4  DD 

順序表

OrderID OrderDate   CustID 
100  01-JAN-2000  1 
101  05-FEB-2000  1  
102  10-MAR-2000  1 
103  01-NOV-2000  2  
104  05-APR-2001  2 
105  07-MAR-2002  2 
106  01-JUL-2003  1 
107  01-SEP-2004  4 
108  01-APR-2005  4 
109  01-MAY-2006  3 
110  05-MAY-2007  1 
111  07-JUN-2007  1 
112  06-JUL-2007  1 

我想找出誰對連續3個月取得訂單的客戶。 (允許使用SQL Server 2005和2008進行查詢)。

所需的輸出是:

CustName  Year OrderDate 

    AA  2000 01-JAN-2000  
    AA  2000 05-FEB-2000 
    AA  2000 10-MAR-2000 

    AA  2007 05-MAY-2007   
    AA  2007 07-JUN-2007   
    AA  2007 06-JUL-2007   
+0

如果將'113,13-AUG-2007,1'行添加到訂單表中,您希望輸出什麼? AA的輸出塊有4行或兩個輸出塊,每行包含3行?如果您願意,是否「一次嚴格三個月」或「一次三個月以上」。 – 2010-09-19 00:40:00

+0

對不起,我比較喜歡三個月 – Gopi 2010-09-20 15:22:57

+0

你的意思是說一個4個月的字符串會返回6行,一個是第1,2,3個月,另一個是第2,3,4個月,或者只是排除所有不完全是3個月的訂單? – ErikE 2010-09-20 17:04:06

回答

7

編輯:擺脫或MAX() OVER (PARTITION BY ...)作爲,似乎殺死性能。

;WITH cte AS ( 
SELECT CustID , 
      OrderDate, 
      DATEPART(YEAR, OrderDate)*12 + DATEPART(MONTH, OrderDate) AS YM 
FROM  Orders 
), 
cte1 AS ( 
SELECT CustID , 
      OrderDate, 
      YM, 
      YM - DENSE_RANK() OVER (PARTITION BY CustID ORDER BY YM) AS G 
FROM  cte 
), 
cte2 As 
(
SELECT CustID , 
      MIN(OrderDate) AS Mn, 
      MAX(OrderDate) AS Mx 
FROM cte1 
GROUP BY CustID, G 
HAVING MAX(YM)-MIN(YM) >=2 
) 
SELECT  c.CustName, o.OrderDate, YEAR(o.OrderDate) AS YEAR 
FROM   Customers AS c INNER JOIN 
         Orders AS o ON c.CustID = o.CustID 
INNER JOIN cte2 c2 ON c2.CustID = o.CustID and o.OrderDate between Mn and Mx 
order by c.CustName, o.OrderDate 
+1

需要在三個月內使用DENSE_RANK或四個+銷售量將被忽略。 – 2010-09-18 22:09:55

+1

完美的羣島解決方案... – ErikE 2010-09-20 16:20:01

+0

馬丁,我測試了您的查詢,並沒有給出正確的結果... – ErikE 2010-09-20 20:05:11

1

在這裏你去:

select distinct 
CustName 
,year(OrderDate) [Year] 
,OrderDate 
from 
(
select 
o2.OrderDate [prev] 
,o1.OrderDate [curr] 
,o3.OrderDate [next] 
,c.CustName 
from [order] o1 
join [order] o2 on o1.CustId = o2.CustId and datediff(mm, o2.OrderDate, o1.OrderDate) = 1 
join [order] o3 on o1.CustId = o3.CustId and o2.OrderId <> o3.OrderId and datediff(mm, o3.OrderDate, o1.OrderDate) = -1 
join Customer c on c.CustId = o1.CustId 
) t 
unpivot 
(
    OrderDate for [DateName] in ([prev], [curr], [next]) 
) 
unpvt 
order by CustName, OrderDate 
+0

警告:此查詢效率極低。 :) – 2010-09-18 22:58:40

+0

丹尼斯,我很抱歉地報告,當同一客戶在同一天有兩個訂單時,此查詢不會返回正確的結果。 – ErikE 2010-09-20 22:09:16

+0

@Emtucifor,我知道!但我們不知道@CSharpy需要什麼! :) – 2010-09-21 06:37:28

4

這裏是我的版本。我真的只是把它作爲一種好奇心來表達,以展示另一種思考問題的方式。事實證明它比這更有用,因爲它甚至比馬丁史密斯酷炫的「羣島」解決方案的表現還要好。但是,一旦他擺脫了一些過於昂貴的聚合窗口功能,並且做了真正的聚合,他的查詢開始踢屁股。

解決方案1:運行3個月或更長時間,通過檢查前後1個月並使用半連接來完成。

WITH Months AS (
    SELECT DISTINCT 
     O.CustID, 
     Grp = DateDiff(Month, '20000101', O.OrderDate) 
    FROM 
     CustOrder O 
), Anchors AS (
    SELECT 
     M.CustID, 
     Ind = M.Grp + X.Offset 
    FROM 
     Months M 
     CROSS JOIN (
     SELECT -1 UNION ALL SELECT 0 UNION ALL SELECT 1 
    ) X (Offset) 
    GROUP BY 
     M.CustID, 
     M.Grp + X.Offset 
    HAVING 
     Count(*) = 3 
) 
SELECT 
    C.CustName, 
    [Year] = Year(OrderDate), 
    O.OrderDate 
FROM 
    Cust C 
    INNER JOIN CustOrder O ON C.CustID = O.CustID 
WHERE 
    EXISTS (
     SELECT 1 
     FROM 
     Anchors A 
     WHERE 
     O.CustID = A.CustID 
     AND O.OrderDate >= DateAdd(Month, A.Ind, '19991201') 
     AND O.OrderDate < DateAdd(Month, A.Ind, '20000301') 
    ) 
ORDER BY 
    C.CustName, 
    OrderDate; 

解決方案2:精確3個月的圖案。如果是4個月或更長時間的運行,則排除這些值。這是通過檢查前2個月和後兩個月(基本上尋找模式N,Y,Y,Y,N)完成的。

WITH Months AS (
    SELECT DISTINCT 
     O.CustID, 
     Grp = DateDiff(Month, '20000101', O.OrderDate) 
    FROM 
     CustOrder O 
), Anchors AS (
    SELECT 
     M.CustID, 
     Ind = M.Grp + X.Offset 
    FROM 
     Months M 
     CROSS JOIN (
     SELECT -2 UNION ALL SELECT -1 UNION ALL SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 
    ) X (Offset) 
    GROUP BY 
     M.CustID, 
     M.Grp + X.Offset 
    HAVING 
     Count(*) = 3 
     AND Min(X.Offset) = -1 
     AND Max(X.Offset) = 1 
) 
SELECT 
    C.CustName, 
    [Year] = Year(OrderDate), 
    O.OrderDate 
FROM 
    Cust C 
    INNER JOIN CustOrder O ON C.CustID = O.CustID 
    INNER JOIN Anchors A 
     ON O.CustID = A.CustID 
     AND O.OrderDate >= DateAdd(Month, A.Ind, '19991201') 
     AND O.OrderDate < DateAdd(Month, A.Ind, '20000301') 
ORDER BY 
    C.CustName, 
    OrderDate; 

這裏是我的表加載腳本,如果別人想打:

IF Object_ID('CustOrder', 'U') IS NOT NULL DROP TABLE CustOrder 
IF Object_ID('Cust', 'U') IS NOT NULL DROP TABLE Cust 
GO 
SET NOCOUNT ON 
CREATE TABLE Cust (
    CustID int identity(1,1) NOT NULL PRIMARY KEY CLUSTERED, 
    CustName varchar(100) UNIQUE 
) 

CREATE TABLE CustOrder (
    OrderID int identity(100, 1) NOT NULL PRIMARY KEY CLUSTERED, 
    CustID int NOT NULL FOREIGN KEY REFERENCES Cust (CustID), 
    OrderDate smalldatetime NOT NULL 
) 

DECLARE @i int 
SET @i = 1000 
WHILE @i > 0 BEGIN 
    WITH N AS (
     SELECT 
     Nm = 
      Char(Abs(Checksum(NewID())) % 26 + 65) 
      + Char(Abs(Checksum(NewID())) % 26 + 97) 
      + Char(Abs(Checksum(NewID())) % 26 + 97) 
      + Char(Abs(Checksum(NewID())) % 26 + 97) 
      + Char(Abs(Checksum(NewID())) % 26 + 97) 
      + Char(Abs(Checksum(NewID())) % 26 + 97) 
    ) 
    INSERT Cust 
    SELECT N.Nm 
    FROM N 
    WHERE NOT EXISTS (
     SELECT 1 
     FROM Cust C 
     WHERE 
     N.Nm = C.CustName 
    ) 

    SET @i = @i - @@RowCount 
END 
WHILE @i < 50000 BEGIN 
    INSERT CustOrder 
    SELECT TOP (50000 - @i) 
     Abs(Checksum(NewID())) % 1000 + 1, 
     DateAdd(Day, Abs(Checksum(NewID())) % 10000, '19900101') 
    FROM master.dbo.spt_values 
    SET @i = @i + @@RowCount 
END 

性能

這裏有一些性能測試結果爲3個月或更多的查詢:

Query  CPU Reads Duration 
Martin 1 2297 299412 2348 
Martin 2 625 285 809 
Denis  3641 401 3855 
Erik  1855 94727 2077 

這只是一次運行每個,但數字是相當具有代表性的。事實證明,你的查詢並不是那麼糟糕,畢竟,丹尼斯。馬丁的查詢擊敗了其他人,但起初他使用了一些他固定的過於昂貴的窗口功能策略。

當然,正如我所指出的,當客戶在同一天有兩個訂單時,丹尼斯的查詢不會拉動正確的行,所以他的查詢不存在爭用,除非他是固定的。

此外,不同的指數可能會改變事情。我不知道。

+0

不要讓我再添加兩個連接到我的解決方案,它已經是三維的。 :P – 2010-09-20 21:22:57

+0

你需要更新你的表現圖! – 2010-09-20 23:57:25

+1

完成。爲了表明並非所有的窗口函數操作都非常棒,我將這些統計信息留在舊版本中。不加區別地使用它們會傷害性能。 – ErikE 2010-09-21 00:19:40

0

這是我的要求。

select 100 as OrderID,convert(datetime,'01-JAN-2000') OrderDate, 1 as CustID into #tmp union 
    select 101,convert(datetime,'05-FEB-2000'),  1 union 
    select 102,convert(datetime,'10-MAR-2000'),  1 union 
    select 103,convert(datetime,'01-NOV-2000'),  2 union 
    select 104,convert(datetime,'05-APR-2001'),  2 union 
    select 105,convert(datetime,'07-MAR-2002'),  2 union 
    select 106,convert(datetime,'01-JUL-2003'),  1 union 
    select 107,convert(datetime,'01-SEP-2004'),  4 union 
    select 108,convert(datetime,'01-APR-2005'),  4 union 
    select 109,convert(datetime,'01-MAY-2006'),  3 union 
    select 110,convert(datetime,'05-MAY-2007'),  1 union 
    select 111,convert(datetime,'07-JUN-2007'),  1 union 
    select 112,convert(datetime,'06-JUL-2007'),  1 


    ;with cte as 
    (
     select 
      * 
      ,convert(int,convert(char(6),orderdate,112)) - dense_rank() over(partition by custid order by orderdate) as g 
     from #tmp 
    ), 
    cte2 as 
    (
    select 
     CustID 
     ,g 
    from cte a 
    group by CustID, g 
    having count(g)>=3 
    ) 
    select 
     a.CustID 
     ,Yr=Year(OrderDate) 
     ,OrderDate 
    from cte2 a join cte b 
     on a.CustID=b.CustID and a.g=b.g