2012-04-27 58 views
4

我有一個存儲股票每日價格的sql表。市場關閉後每天都會插入新的記錄。我想找到價格連續上漲的股票。查找連續增加列值的行

表有大量的列,但這是相關子集:

quoteid  stockid  closeprice  createdate 
-------------------------------------------------- 
    1   1    1  01/01/2012 
    2   2    10  01/01/2012 
    3   3    15  01/01/2012 

    4   1    2  01/02/2012 
    5   2    11  01/02/2012 
    6   3    13  01/02/2012 

    7   1    5  01/03/2012 
    8   2    13  01/03/2012 
    9   3    17  01/03/2012 

    10   1    7  01/04/2012 
    11   2    14  01/04/2012 
    12   3    18  01/04/2012 

    13   1    9  01/05/2012 
    14   2    11  01/05/2012 
    15   3    10  01/05/2012 

quoteid列是主鍵。

在表中,庫存ID 1的收盤價每天都在上漲。股票ID 3波動很大,股票ID 2的價格在最後一天下跌。

我在尋找這樣一個結果:

stockid  Consecutive Count (CC) 
---------------------------------- 
    1    5 
    2    4 

如果你能得到的日期爲連續條紋輸出,那會更好:

stockid  Consecutive Count (CC)  StartDate  EndDate 
--------------------------------------------------------------- 
    1    5     01/01/2012 01/05/2012 
    2    4     01/01/2012 01/04/2012 

StartDate是當價格開始增加和EndDate是當牛市實際完成時。

我認爲這不是一個簡單的問題。我看過其他帖子,這些帖子也涉及這種連續的情況,但他們不符合我的需求。如果你知道任何類似於我的帖子,請讓我知道。

+1

你想要你的最小長度連續增加是多少 - 只比一天多一點?或者以某種方式抵消其減少?我假設你想看到多次運行,如果有數據的話。 – 2012-04-27 16:49:27

+1

數據中是否存在任何差距 - 比如週末 - 以及需要做些什麼? – 2012-04-27 16:56:01

+0

我沒有連續增加的規則,它只需要比前一天更大。是的,我正在尋找多次運行。我將針對過去3個月,6個月的數據運行此查詢,或者可能不止於此。數據中會有空白,我們可以使用主鍵列取得前一天的記錄 – 2012-04-27 17:33:55

回答

6

無論如何,它有助於將它放在逐行增加的行中(實際的quoteid值在這裏並不真正有用)。計數天數截獲(在這張表中)是最簡單的 - 如果你想要別的東西(比如只有工作日,忽略週末/節假日或其他)你可能需要一個日曆文件。如果您還沒有索引,您需要索引超過[stockidcreatedate]。

WITH StockRow AS (SELECT stockId, closePrice, createdDate, 
         ROW_NUMBER() OVER(PARTITION BY stockId 
              ORDER BY createdDate) rn 
        FROM Quote), 

    RunGroup AS (SELECT Base.stockId, Base.createdDate, 
         MAX(Restart.rn) OVER(PARTITION BY Base.stockId 
               ORDER BY Base.createdDate) groupingId 
        FROM StockRow Base 
        LEFT JOIN StockRow Restart 
         ON Restart.stockId = Base.stockId 
          AND Restart.rn = Base.rn - 1 
          AND Restart.closePrice > Base.closePrice) 

SELECT stockId, 
     COUNT(*) AS consecutiveCount, 
     MIN(createdDate) AS startDate, MAX(createdDate) AS endDate 
FROM RunGroup 
GROUP BY stockId, groupingId 
HAVING COUNT(*) >= 3 
ORDER BY stockId, startDate 

其中產量從所提供的數據的結果如下:

Increasing_Run 
stockId consecutiveCount startDate endDate 
=================================================== 
1   5     2012-01-01 2012-01-05 
2   4     2012-01-01 2012-01-04 
3   3     2012-01-02 2012-01-04 

SQL Fiddle Example
(小提琴還具有用於多次運行的實例)

此分析將忽略所有間隙,正確匹配所有運行(下一次正面運行開始)。


那麼這裏發生了什麼?

StockRow AS (SELECT stockId, closePrice, createdDate, 
        ROW_NUMBER() OVER(PARTITION BY stockId 
             ORDER BY createdDate) rn 
      FROM Quote) 

這CTE被用於一個目的:我們需要一種方法來尋找下一個/前行,所以首先我們爲了(日期)編號每行...

RunGroup AS (SELECT Base.stockId, Base.createdDate, 
        MAX(Restart.rn) OVER(PARTITION BY Base.stockId 
             ORDER BY Base.createdDate) groupingId 
      FROM StockRow Base 
      LEFT JOIN StockRow Restart 
        ON Restart.stockId = Base.stockId 
         AND Restart.rn = Base.rn - 1 
          AND Restart.closePrice > Base.closePrice) 

...然後根據索引加入它們。如果你最終得到的東西有LAG()/LEAD(),那麼使用它們幾乎肯定會是更好的選擇。這裏有一個重要的事情 - 匹配只有當行是亂序(小於前一行)。否則,該值最終爲null(隨後LAG(), you'd need to use something like CASE`將其取消)。你得到一個臨時組看起來是這樣的:

B.rn B.closePrice B.createdDate R.rn R.closePrice R.createdDate groupingId 
1  15    2012-01-01  -  -    -    - 
2  13    2012-01-02  1  15    2012-01-01  1 
3  17    2012-01-03  -  -    -    1 
4  18    2012-01-04  -  -    -    1 
5  10    2012-01-05  4  18    2012-01-04  4 

...所以這是對Restart只有當以前是比「當前」行更大的價值。在窗口函數中使用MAX()正被用於迄今爲止看到的最大值......這是因爲null最低,導致所有其他行保留行索引,直到發生另一個不匹配(這會給出一個新值) 。在這一點上,我們基本上有一個查詢的中間結果,爲最終聚合做好準備。

SELECT stockId, 
     COUNT(*) AS consecutiveCount, 
     MIN(createdDate) AS startDate, MAX(createdDate) AS endDate 
FROM RunGroup 
GROUP BY stockId, groupingId 
HAVING COUNT(*) >= 3 
ORDER BY stockId, startDate 

查詢的最後部分是讓運行的開始和結束日期,並計算這些日期之間的條目數。如果計算日期有些更復雜,那麼可能需要在此時進行。 GROUP BY顯示了而不是的少數幾個合法實例之一,其中包括SELECT子句中的一列。 HAVING子句用於消除「太短」的運行。

+0

謝謝非常有用! – 2015-10-19 16:48:41

1

我會嘗試的CTE,大致是:

with increase (stockid, startdate, enddate, cc) as 
(
    select d2.stockid, d1.createdate as startdate, d2.createdate as enddate, 1 
    from quote d1, quote d2 
    where d1.stockid = d2.stockid 
    and d2.closedprice > d1.closedprice 
    and dateadd(day, 1, d1.createdate) = d2.createdate 

    union all 

    select d2.stockid, d1.createdate as startdate, cend.enddate as enddate, cend.cc + 1 
    from quote d1, quote d2, increase cend 
    where d1.stockid = d2.stockid and d2.stockid = cend.stockid 
    and d2.closedprice > d1.closedprice 
    and d2.createdate = cend.startdate 
    and dateadd(day, 1, d1.createdate) = d2.createdate 
) 
select o.stockid, o.cc, o.startdate, o.enddate 
from increase o where cc = (select max(cc) from increase i where i.stockid = o.stockid and i.enddate = o.enddate) 

這是假設沒有間隙。標準dateadd(day, 1, d1.createdate) = d2.createdate必須由其他指示d2是否爲d1之後的「下一個」日期的其他內容替換。

0

這是根據我的需要的最終工作SQL。測試顯示它工作正常。我從@Oran使用CC的方法

WITH StockRow (stockId, [close], createdDate, rowNum) 
as 
(
    SELECT stockId,   [close],     createdDate, 
      ROW_NUMBER() OVER(PARTITION BY stockId ORDER BY createdDate) 
    FROM dbo.Quote 
    where createddate >= '01/01/2012' --Beginning of this year 
    ), 

    RunStart (stockId, [close], createdDate, runId) as (
    SELECT  a.stockId,  a.[close], a.createdDate, 
      ROW_NUMBER() OVER(PARTITION BY a.stockId ORDER BY a.createdDate) 
    FROM StockRow as a 
    LEFT JOIN StockRow as b 
    ON b.stockId = a.stockId 
    AND b.rowNum = a.rowNum - 1 
    AND b.[close] < a.[close] 
    WHERE b.stockId IS NULL) 
    , 
RunEnd (stockId, [close], createdDate, runId) as (
    SELECT a.stockId, a.[close], a.createdDate, 
      ROW_NUMBER() OVER(PARTITION BY a.stockId ORDER BY a.createdDate) 
    FROM StockRow as a 
    LEFT JOIN StockRow as b 
    ON b.stockId = a.stockId 
    AND b.rowNum = a.rowNum + 1 
    AND b.[close] > a.[close] 
    WHERE b.stockId IS NULL) 

SELECT a.stockId,  s.companyname,   s.Symbol, 
a.createdDate as startdate,  b.createdDate as enddate, 
(select count(r.createdDate)  from  dbo.quote r  where r.stockid = b.stockid and  r.createdDate   between a.createdDate  and  b.createdDate) as BullRunDuration 
FROM RunStart as a JOIN RunEnd as b 
ON b.stockId = a.stockId 
join dbo.stock as s 
on a.stockid = s.stockid 
AND b.runId = a.runId 
AND b.[close] > a.[close] 
and (select count(r.createdDate) from dbo.quote r where r.stockid = b.stockid and 
r.createdDate between a.createdDate and b.createdDate) > 2 -- trying to avoid cluter 
order by 6 desc, a.stockid