2012-04-06 314 views
6

下面的代碼返回一段時期內已解析的故障單數量和已打開的故障單數量(期間爲YYYY,WW),並返回一定天數。例如,如果@NoOfDays是7:從數據集創建趨勢線SQL

已解決|打開|周|年| |期間

56 | 30 | 13 | 2012 | 2012,13

237 | 222 | 14 | 2012 | 2012,14

「已解決」和「已打開」在時間段(x)的圖形(y)上繪製。我想添加另一列「趨勢」,它會返回一個數字,當在一段時間內繪製時,將成爲趨勢線(簡單線性迴歸)。 I do希望將這兩組值作爲趨勢的一個數據源。

這是我的代碼:

SELECT a.resolved, b.opened, a.weekClosed AS week, a.yearClosed AS year, 
    CAST(a.yearClosed as varchar(5)) + ', ' + CAST(a.weekClosed as varchar(5)) AS period 
FROM 
    (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed 
    FROM v_rpt_Service 
    WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) 
    GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS a 
LEFT OUTER JOIN 
    (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered) 
    } AS yearEntered 
    FROM v_rpt_Service AS v_rpt_Service_1 
    WHERE  (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) 
    GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS b ON a.weekClosed = b.weekEntered AND a.yearClosed = b.yearEntered 
ORDER BY year, week 

編輯:

據serc.carleton.edu/files/mathyouneed/best_fit_line_dividing.pdf,似乎我想突破數據減半,然後計算平均值。然後我需要找到最合適的線,並使用y = mx + b來使用斜率和y軸截距來計算返回「趨勢」所需的值。

我知道這在SQL中是非常可能的,但是,插入SQL的程序對我所能做的事情有限制。

紅色和藍色的點是我現在正在返回的數字(打開並解決)。爲了創建紫色線,我需要爲每個「趨勢」期間返回一個值。 (這個形象是假設)

Hypothetical Chart

+0

這是用於MS SQLServer還是用於不同的RDBMS? – 2012-04-11 13:46:43

+0

MS SQLServer是正確的。 – 2012-04-11 13:59:11

回答

1

我想通了。我將數據分成多個派生表和子查詢,基本上將數據分成一半。這些都是我的公式來獲取每個值:

*(each row is a week)* 
y1 = average of data first half 
y2 = average of data second half 
x1 = 1/4 of number of weeks 
x2 = 3/4 of number of weeks 
m = (y1-y2)/(x1-x2) 
b = y2 - (m * x2) 
trend = (m * row_number) + b 

這裏是我的(很髒)的SQL代碼:

SELECT resolved_half1,resolved_half2,opened_half1,opened_half2, c.period, 
((SUM (resolved_half1) OVER() + SUM(opened_half1) OVER()) - (SUM(resolved_half2) OVER() + SUM(opened_half2) OVER()))/((COUNT(resolved_half1) OVER() + COUNT(opened_half1) OVER())/2) as y1, 
((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as y2, 
((COUNT(c.period) OVER())/4) as x1, 
(((COUNT(c.period) OVER())/4) * 3) as x2, 
((CAST(((SUM (resolved_half1) OVER() + SUM(opened_half1) OVER()) - (SUM(resolved_half2) OVER() + SUM(opened_half2) OVER()))/((COUNT(resolved_half1) OVER() + COUNT(opened_half1) OVER())/2) as float) - CAST(((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as float))/(CAST(((COUNT(c.period) OVER())/4) as float) - CAST((((COUNT(c.period) OVER())/4) * 3) as float))) as m, 
(CAST(((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as float) - (((CAST(((SUM (resolved_half1) OVER() + SUM(opened_half1) OVER()) - (SUM(resolved_half2) OVER() + SUM(opened_half2) OVER()))/((COUNT(resolved_half1) OVER() + COUNT(opened_half1) OVER())/2) as float) - CAST(((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as float))/(CAST(((COUNT(c.period) OVER())/4) as float) - CAST((((COUNT(c.period) OVER())/4) * 3) as float))) * (((COUNT(c.period) OVER())/4) * 3))) as b, 
((((CAST(((SUM (resolved_half1) OVER() + SUM(opened_half1) OVER()) - (SUM(resolved_half2) OVER() + SUM(opened_half2) OVER()))/((COUNT(resolved_half1) OVER() + COUNT(opened_half1) OVER())/2) as float) - CAST(((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as float))/(CAST(((COUNT(c.period) OVER())/4) as float) - CAST((((COUNT(c.period) OVER())/4) * 3) as float))) * (ROW_NUMBER() OVER(ORDER BY c.yearClosed,c.weekClosed))) + (CAST(((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as float) - (((CAST(((SUM (resolved_half1) OVER() + SUM(opened_half1) OVER()) - (SUM(resolved_half2) OVER() + SUM(opened_half2) OVER()))/((COUNT(resolved_half1) OVER() + COUNT(opened_half1) OVER())/2) as float) - CAST(((SUM(resolved_half2) OVER() + SUM(opened_half2) OVER())/(COUNT(resolved_half2) OVER() + COUNT (opened_half2) OVER())) as float))/(CAST(((COUNT(c.period) OVER())/4) as float) - CAST((((COUNT(c.period) OVER())/4) * 3) as float))) * (((COUNT(c.period) OVER())/4) * 3)))) as trend, 
ROW_NUMBER() OVER(ORDER BY c.yearClosed,c.weekClosed) as row 

FROM 
    (SELECT *, CAST(yearClosed as varchar(5)) + ', ' + CAST(weekClosed as varchar(5)) AS period 
    FROM (SELECT  TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved_half1, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed 
          FROM   v_rpt_Service 
     WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180), 0)) 

     GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS a 
     LEFT OUTER JOIN 
     (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened_half1, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered) 
     FROM v_rpt_Service AS v_rpt_Service_1 
     WHERE (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180), 0)) 
     GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS b ON a.weekClosed = b.weekEntered AND a.yearClosed = b.yearEntered) as c 
     LEFT OUTER JOIN 
     (SELECT *, CAST(yearClosed as varchar(5)) + ', ' + CAST(weekClosed as varchar(5)) AS period 
     FROM (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved_half2, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed 
     FROM v_rpt_Service 
     WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180/2), 0)) 
     GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS d 
     LEFT OUTER JOIN 
     (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened_half2, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered)} AS yearEntered 
     FROM v_rpt_Service AS v_rpt_Service_1 
     WHERE (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180/2), 0)) 
     GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS e ON d.weekClosed = e.weekEntered AND d.yearClosed = e.yearEntered 
) as f ON c.yearClosed = f.yearClosed AND c.weekClosed = f.weekClosed AND c.weekEntered = f.weekEntered AND c.yearEntered = f.yearEntered AND c.period = f.period 
GROUP BY c.period, resolved_half1,resolved_half2,opened_half1,opened_half2,c.yearClosed,c.weekClosed 
ORDER BY row 

該代碼使用180天的硬編碼值。我仍然需要能夠使用varibale來選擇天數(沒有得到0除錯),並且代碼真的需要清理。 如果有人可以爲我做這兩件事(我不是最好的SQL),賞金是他們的。

圖片:

Chart

0

我認爲,這將這樣的伎倆 - 如果不發佈一些實際的樣本數據,我會看看我是否能調整它來解決它:

DECLARE @noOfDays INT 
SET @noofdays = 180 

;WITH tickets AS 
(
SELECT DISTINCT 
DATENAME(YEAR,date_closed) + RIGHT('000' + CAST(DATEPART(WEEK,date_closed) AS VARCHAR(5)),3) as Period 
,ticket_nbr 
,1 as ticket_type --resolved 
FROM v_rpt_Service 
WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) 
UNION ALL 
SELECT DISTINCT 
DATENAME(YEAR,date_closed) + RIGHT('000' + CAST(DATEPART(WEEK,date_closed) AS VARCHAR(5)),3) as Period 
,ticket_nbr 
,0 as ticket_type --opened 
FROM v_rpt_Service 
WHERE (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) 
) 
,tickets2 AS 
(
SELECT 
Period 
,SUM(CASE WHEN ticket_type = 0 THEN 1 ELSE 0 END) as opened 
,SUM(CASE WHEN ticket_type = 1 THEN 1 ELSE 0 END) as closed 
FROM tickets 
GROUP BY 
Period 
) 
,tickets3 AS 
(
SELECT 
Period 
,row_number() OVER (ORDER BY period ASC) as row 
,opened 
,closed 
,COUNT(period) OVER() as base 
,SUM(opened) OVER() as [Sumopened] 
,SUM(opened * opened) OVER() as [Sumopened^2] 
,SUM(opened * closed) OVER() as [Sumopenedclosed] 
,SUM(closed) OVER() as [Sumclosed] 
,SUM(closed * closed) OVER() as [Sumclosed^2] 
,SUM(opened * closed) OVER() * COUNT(period) OVER() AS [nSumopenedclosed] 
,SUM(opened) OVER() * SUM(closed) OVER() AS [Sumopened*Sumclosed] 
,SUM(opened * opened) OVER() * COUNT(period) OVER() AS [nSumopened^2] 
,SUM(opened) OVER() * SUM(opened) OVER() as [Sumopened*Sumopened] 
FROM tickets2 
) 
--Formula for linear regression is Y = A + BX 
SELECT 
period 
,opened 
,closed 
,((1.0/base) * [Sumclosed]) - 
([Sumopenedclosed] - ([Sumopened*Sumclosed]/base))/([Sumopened^2] - ([Sumopened*Sumopened]/base)) *((1.0/base) * [Sumopened]) 
+ row * ([Sumopenedclosed] - ([Sumopened*Sumclosed]/base))/([Sumopened^2] - ([Sumopened*Sumopened]/base)) 
AS trend_point 
,((1.0/base) * [Sumclosed]) - 
([Sumopenedclosed] - ([Sumopened*Sumclosed]/base))/([Sumopened^2] - ([Sumopened*Sumopened]/base)) *((1.0/base) * [Sumopened]) AS A 
,([Sumopenedclosed] - ([Sumopened*Sumclosed]/base))/([Sumopened^2] - ([Sumopened*Sumopened]/base)) as B 
from tickets3 
3

我對這個問題很感興趣,而且我發現討論複雜查詢的最好方法是用我自己的樣式和約定重新格式化它。我將它們應用於您的解決方案,結果如下。我不知道這是否會要你的任何值...

  • 有代碼,我不相信是MS的T-SQL語法的一部分幾個位,如({fn xxx }WEEK(xxx)功能。
  • 此代碼編譯,但我不能運行它,因爲我沒有正確配置數據表。
  • 我做了大量的編碼修改,需要大量的解釋,我會跳過其中的大部分。如果你想要解釋任何事情,請添加評論。
  • 我扔了很多空白。可讀和不可讀代碼之間的區別往往只是旁觀者的看法和敏感性,你可能會討厭我的約定。
  • 不知道最終的結果集應該是什麼(即,得到返回的列)

一些進一步的說明:

  • 這個查詢將不會在一個星期內,如果沒有項目也進入項目在當週關閉
  • 周可能是部分的,例如並非所有七天都可能存在(調整@間隔始終包括整週 - 但奇數間隔又如何?)
  • 將count(*)值乘以1.0以將它們轉換爲提前浮動(避免使用轉換和整數數學截斷)
  • 使它成爲CTE允許上述公式中,以在後來的公式符號來代替(在這點事情成了很多更清晰)

所以這裏就是我想出了:

;WITH cte as (
select 
    c.period 
    ,resolved_half1 
    ,resolved_half2 
    ,opened_half1 
    ,opened_half2 
    ,row = row_number() over(order by c.yearClosed, c.weekClosed) 
    ,y1 = ((SUM(resolved_half1) + SUM(opened_half1)) - (SUM(resolved_half2) + SUM(opened_half2)))/((count(resolved_half1) + count(opened_half1))/2) 
    ,y2 = ((SUM(resolved_half2) + SUM(opened_half2))/(count(resolved_half2) + COUNT (opened_half2))) 
    ,x1 = ((count(c.period))/4) 
    ,x2 = (((count(c.period))/4) * 3) 
from (select 
      a.yearclosed 
     ,a.weekClosed 
     ,a.resolved_half1 
     ,b.yearEntered 
     ,b.weekEntered 
     ,b.opened_half1 
     ,cast(a.yearClosed as varchar(5)) + ', ' + cast(a.weekClosed as varchar(5)) period 
     from (-- Number of items per week that closed within @Interval 
       select 
       count(distinct TicketNbr) * 1.0 resolved_half1 
       ,datepart(wk, date_closed)  weekClosed 
       ,year(date_closed)    yearClosed 
       from v_rpt_Service 
       where date_closed >= @FullInterval 
       group by 
       datepart(wk, date_closed) 
       ,year(date_closed)) a 
     left outer join (-- Number of items per week that were entered within @Interval 
          select 
          count(distinct TicketNbr) * 1.0 opened_half1 
          ,datepart(wk, date_entered)  weekEntered 
          ,year(date_entered)    yearEntered 
          from v_rpt_Service 
          where date_entered >= @FullInterval 
          group by 
          datepart(wk, date_entered) 
          ,year(date_entered)) b 
      on a.weekClosed = b.weekEntered 
      and a.yearClosed = b.yearEntered) c 
    left outer join (select 
         d.yearclosed 
         ,d.weekClosed 
         ,d.resolved_half2 
         ,e.yearEntered 
         ,e.weekEntered 
         ,e.opened_half2 
         ,cast(yearClosed as varchar(5)) + ', ' + cast(weekClosed as varchar(5)) period 
        from (select 
          count(distinct TicketNbr) * 1.0 resolved_half2 
          ,datepart(wk, date_closed)  weekClosed 
          ,year(date_closed)    yearClosed 
          from v_rpt_Service 
          where date_closed >= @HalfInterval 
          group by 
          datepart(wk, date_closed) 
          ,year(date_closed)) d 
        left outer join (select 
             count(distinct TicketNbr) * 1.0 opened_half2 
             ,datepart(wk, date_entered)  weekEntered 
             ,year(date_entered)    yearEntered 
             from v_rpt_Service 
             where date_entered >= @HalfInterval 
             group by 
              datepart(wk, date_entered) 
              ,year(date_entered)) e 
         on d.weekClosed = e.weekEntered 
         and d.yearClosed = e.yearEntered) f 
    on c.period = f.period 
group by 
    c.period 
    ,resolved_half1 
    ,resolved_half2 
    ,opened_half1 
    ,opened_half2 
    ,c.yearClosed 
    ,c.weekClosed 
) 
SELECT 
    row 
    ,Period 
    ,x1 
    ,y1 
    ,x2 
    ,y2 
    ,m = ((y1 - y2)/(x1 - x2)) 
    ,b = (y2 - (((y1 - y2)/(x1 - x2)) * x2)) 
    ,trend = ((((y1 - y2)/(x1 - x2)) * (row)) + (y2 - (((y1 - y2)/(x1 - x2)) * x2))) 
from cte 
order by row 

作爲附錄,所有子查詢「c」co將其替換爲以下內容,並且稍微修改版本的「f」。更好或更差的性能取決於表格大小,索引和其他imponderables。

select 
    datepart(wk, date_closed) weekClosed 
    ,year(date_closed)   yearClosed 
    ,count (distinct case 
        when date_closed >= @FullInterval then TicketNbr 
        else null 
       end)   resolved_half1 
    ,count (distinct case 
        when date_entered >= @FullInterval then TicketNbr 
        else null 
       end)   opened_half1 
from v_rpt_Service 
where date_closed >= @FullInterval 
    or date_entered >= @FullInterval 
group by 
    datepart(wk, date_closed) 
    ,year(date_closed)