2010-06-24 132 views
4

過濾不良數據使用SQL Server 2005在SQL Server 2005

我有以下的列

ID 名 日期 值的表

我想選擇的所有行表中按日期不連續四個零。我會怎麼做?下面是我的意思的一個例子。

id  name  date   value 
1  a  1/1/2010  5 
2  a  1/2/2010  3 
3  a  1/3/2010  5 
4  a  1/4/2010  0 
5  a  1/7/2010  0 
6  a  1/8/2010  0 
7  a  1/9/2010  2 
8  a  1/10/2010 3 
9  a  1/11/2010 0 
10  a  1/15/2010 0 
11  a  1/16/2010 0 
12  a  1/17/2010 0 
13  a  1/20/2010 4 
14  a  1/21/2010 4 

我想查詢的結果包括除ID 9-12以外的所有行。

+0

有趣的是,只是想知道這需要..這是一個商業規則或只是學習過程? – VoodooChild 2010-06-24 17:10:25

+0

這是業務需求。我們需要從整體計算中消除不良數據點。還有那個爵士樂。 – 2010-06-24 17:12:20

回答

2

這是假設您按ID排序的行,但您可以簡單地將ORDER BY id更改爲別的,它應該仍然有效。

使用在this Kodyaz Development Resources site上找到的T-SQL CTE,我能夠創建下面的代碼。我有它的工作,所以它刪除行有兩個連續的0,而不是4,因爲我在我的代碼上測試它,只是改變了表/行的名稱。

WITH CTE as (
    SELECT 
    RN = ROW_NUMBER() OVER (ORDER BY id), 
    * 
    FROM tablename 
) 
SELECT 
    [Current Row].* 
FROM CTE [Current Row] 
LEFT JOIN CTE [Previous Row] ON 
    [Previous Row].RN = [Current Row].RN - 1 
LEFT JOIN CTE [Next Row] ON 
    [Next Row].RN = [Current Row].RN + 1 
WHERE 
    not([Current Row].value = 0 AND [Next Row].value = 0) AND 
    // this deletes the row where value is zero and the next rows value is zero 
    not([Previous Row].value = 0 AND [Current Row].value = 0) 
    // this deletes the row where value is zero and the previous rows value is zero 

所有你需要做的,使之成爲你的情況是把WHERE語句中每一個可能的組合工作。例如,處理這一行和接下來的三行等於0或者這一行是前一行和後兩行。

+0

使用ROW_NUMBER保證以簡單的方式查找下一行的能力的絕佳主意+1 – 2010-06-24 17:33:37

1

你沒有提及這個名字是如何涉及的,所以我假設你想按名稱完成。我將進一步假設,當你談論「連續」時,你的意思是按照日期順序,而不是以id順序。最後,我還要假定你也將排除在連續5個零,連續6個零,等

有可能是一個更簡單的方法,但這應該工作:

;WITH Transitions_To_CTE AS 
(
    SELECT 
     T1.id, 
     T1.name, 
     T1.date, 
     T1.value 
    FROM 
     My_Table T1 
    LEFT OUTER JOIN My_Table T2 ON 
     T2.name = T1.name AND 
     T2.date < T1.date AND 
     T2.value <> 0 
    LEFT OUTER JOIN My_Table T3 ON 
     T3.name = T1.name AND 
     T3.date > COALESCE(T2.date, '1900-01-01') AND 
     T3.date < T1.date 
    WHERE 
     T1.value = 0 AND 
     T3.id IS NULL 
), 
Transitions_From_CTE AS 
(
    SELECT 
     T1.id, 
     T1.name, 
     T1.date, 
     T1.value 
    FROM 
     My_Table T1 
    LEFT OUTER JOIN My_Table T2 ON 
     T2.name = T1.name AND 
     T2.date > T1.date AND 
     T2.value <> 0 
    LEFT OUTER JOIN My_Table T3 ON 
     T3.name = T1.name AND 
     T3.date < COALESCE(T2.date, '9999-12-31') AND 
     T3.date > T1.date 
    WHERE 
     T1.value = 0 AND 
     T3.id IS NULL 
), 
Range_Exclusions AS 
(
    SELECT 
     S.name, 
     S.date AS start_date, 
     E.date AS end_date 
    FROM 
     Transitions_To_CTE S 
    INNER JOIN Transitions_From_CTE E ON 
     E.name = S.name AND 
     E.date > S.date 
    LEFT OUTER JOIN Transitions_From_CTE E2 ON 
     E2.name = S.name AND 
     E2.date > S.date AND 
     E2.date < E.date 
    WHERE 
     E2.id IS NULL AND 
     (SELECT COUNT(*) FROM dbo.My_Table T WHERE T.name = S.name AND T.date BETWEEN S.date AND E.date) >= 4 
) 
SELECT 
    T.id, 
    T.name, 
    T.date, 
    T.value 
FROM 
    dbo.My_Table T 
WHERE 
    NOT EXISTS (SELECT * FROM Range_Exclusions RE WHERE RE.name = T.name AND T.date BETWEEN RE.start_date AND RE.end_date) 
+0

謝謝。 +1爲您的答案和幾乎擊敗我。 – Kyra 2010-06-24 17:36:54

0

這裏是我的嘗試,使用遞歸cte計算出連續的零的數量,然後使用級別> 4創建一個ID序列,然後簡單地在id上做一個not in子句。

with trend --work out number of consecutive zeros using level 
as 
(Select 1 as level, id, value, id as startid 
    from IdsAndValues 
    Union All 
    Select [Level]+1, P.ID, p.value, t.startid 
    From IdsAndValues as p 
     Inner Join trend as t on p.id = t.id+1 
    Where t.value =0 and p.value=0 
) 
,IDs --create sequence of ids using startid and id, this allows us to do the not in 
as 
( 
    Select startid as ExcludeID ,id 
    from trend as t-- 
    Where level>=4 
    Union All 
    Select ExcludeID +1, id 
    From ids 
    where ExcludeID <id 
) 

Select * 
from IdsAndValues 
Where id Not in 
    (Select ExcludeID from IDs)