2017-07-07 48 views
1

對於一個scd類型2維度,我有一個問題確定並修復了一些具有重疊時間間隔的記錄。 我所擁有的是:如何檢查一個類型的重疊時間間隔2 SCD維度

Bkey Uid startDate      endDate 
'John' 1 1990-01-01 (some time stamp) 2017-01-10 (some time stamp) 
'John' 2 2016-11=03 (some time stamp) 2016-11-14 (some time stamp) 
'John' 3 2016-11-14 (some time stamp) 2016-12-29 (some time stamp) 
'John' 4 2016-12-29 (some time stamp) 2017-01-10 (some time stamp) 
'John' 5 2017-01-10 (some time stamp) 2017-04-22 (some time stamp) 
...... 

我想找到(第一),這是所有的約翰有重疊的時間段,對於具有很多很多約翰斯的表,然後想出一個辦法來糾正這些重疊時間段。對於最新的我知道有一些函數LAGG,LEAD,它可以處理這個函數,但是它避開了我如何找到那些重疊的部分。 任何提示? Regards,

+0

糾正它:將每個間隔拆分爲原子(單日),DISTINCT,查找島。 – Serg

+0

我添加了@Groups表變量的聲明部分[編輯1]。 –

回答

1

[1]以下的查詢將返回重疊的時間範圍:

SELECT *, 
     (
      SELECT * 
      FROM @Dimension1 y 
      WHERE x.Bkey = y.Bkey 
      AND  x.Uid <> y.Uid 
      AND  NOT(x.startDate > y.endDate OR x.endDate < y.startDate) 
      FOR XML RAW, ROOT, TYPE 
     ) OverlappingTimeRanges 
FROM @Dimension1 x 

完整腳本:

DECLARE @Dimension1 TABLE (
    Bkey  VARCHAR(50) NOT NULL, 
    Uid   INT NOT NULL, 
    startDate DATE NOT NULL, 
    endDate  DATE NOT NULL, 
     CHECK(startDate < endDate) 
); 
INSERT @Dimension1 
SELECT 'John', 1, '1990-01-01', '2017-01-10' UNION ALL 
SELECT 'John', 2, '2016-11-03', '2016-11-14' UNION ALL 
SELECT 'John', 3, '2016-11-14', '2016-12-29' UNION ALL 
SELECT 'John', 4, '2016-12-29', '2017-01-10' UNION ALL 
SELECT 'John', 5, '2017-01-11', '2017-04-22'; 

SELECT *, 
     (
      SELECT * 
      FROM @Dimension1 y 
      WHERE x.Bkey = y.Bkey 
      AND  x.Uid <> y.Uid 
      AND  NOT(x.startDate > y.endDate OR x.endDate < y.startDate) 
      FOR XML RAW, ROOT, TYPE 
     ) OverlappingTimeRanges 
FROM @Dimension1 x 

Demo here

[2]爲了找到的時間不同的組與原始行重疊的範圍我將使用以下方法:

-- Edit 1 
DECLARE @Groups TABLE (
    Bkey   VARCHAR(50) NOT NULL, 
    Uid    INT NOT NULL, 
    startDateNew DATE NOT NULL, 
    endDateNew  DATE NOT NULL, 
     CHECK(startDateNew < endDateNew) 
); 
INSERT @Groups 
SELECT x.Bkey, x.Uid, z.startDateNew, z.endDateNew 
FROM @Dimension1 x 
OUTER APPLY (
    SELECT MIN(y.startDate) AS startDateNew, MAX(y.endDate) AS endDateNew 
    FROM @Dimension1 y 
    WHERE x.Bkey = y.Bkey 
    AND  NOT(x.startDate > y.endDate OR x.endDate < y.startDate) 
) z 
-- End of Edit 1 

-- This returns distinct groups identified by DistinctGroupId together with all overlapping Uid(s) from current group 
SELECT * 
FROM (
    SELECT ROW_NUMBER() OVER(ORDER BY b.Bkey, b.startDateNew, b.endDateNew) AS DistinctGroupId, b.* 
    FROM (
     SELECT DISTINCT a.Bkey, a.startDateNew, a.endDateNew 
     FROM @Groups a 
    ) b 
) c 
OUTER APPLY (
    SELECT d.Uid AS Overlapping_Uid 
    FROM @Groups d 
    WHERE c.Bkey = d.Bkey 
    AND  c.startDateNew = d.startDateNew 
    AND  c.endDateNew = d.endDateNew 
) e 

-- This returns distinct groups identified by DistinctGroupId together with an XML (XmlCol) which includes overlapping Uid(s) 
SELECT * 
FROM (
    SELECT ROW_NUMBER() OVER(ORDER BY b.Bkey, b.startDateNew, b.endDateNew) AS DistinctGroupId, b.* 
    FROM (
     SELECT DISTINCT a.Bkey, a.startDateNew, a.endDateNew 
     FROM @Groups a 
    ) b 
) c 
OUTER APPLY (
    SELECT (
    SELECT d.Uid AS Overlapping_Uid 
    FROM @Groups d 
    WHERE c.Bkey = d.Bkey 
    AND  c.startDateNew = d.startDateNew 
    AND  c.endDateNew = d.endDateNew 
    FOR XML RAW, TYPE 
    ) AS XmlCol 
) e 

enter image description here

注:在我的例子中使用最新的範圍是'John', 5, '2017-01-11', '2017-04-22';,而不是'John', 5, '2017-01-10', '2017-04-22';。此外,使用的數據類型是DATE而不是DATETIME[2][OFFSET]

0

我覺得你的查詢的棘手部分是能夠闡明重疊範圍的邏輯。我們可以自行加入,條件是左側的一行與右側的任何一行重疊。所有匹配的行都是重疊的行。

我們能想到的四種可能的重疊情況:

|---------| |---------| no overlap 

|---------| 
     |---------|   1st end and 2nd start overlap 

     |---------| 
|---------|     1st start and 2nd end overlap 

|---------| 
    |---|     2nd completely contained inside 1st 
          (could be 1st inside 2nd also) 

SELECT DISTINCT 
    t.Uid 
FROM yourTable t1 
INNER JOIN yourTable t2 
    ON t1.startDate <= t2.endDate AND 
     t2.startDate <= t1.endDate 
WHERE 
    t1.Bkey = 'John' AND t2.Bkey = 'John' 

這將至少讓你找出重疊的記錄。以有意義的方式更新和分離它們可能最終會成爲一個醜陋的空白和孤島問題,也許值得再次提出一個問題。

+2

看一看[overlap tag wiki](https://stackoverflow.com/tags/overlap/info),它展示了重疊元素的最簡單測試。 –

相關問題