2017-02-21 85 views
0

我有一個要求刪除行中存在的重複值。 喜歡:刪除一行中的重複值(組合2列)

C1 | C2 | C3 | C4 | C5 | C6 
---------------------------- 
1 | 2 | 1 | 2 | 1 | 3 
1 | 2 | 1 | 3 | 1 | 4 
1 |NULL| 1 |NULL| 1 |NULL 

輸出的查詢應該是:

C1 | C2 | C3 | C4 | C5 | C6 
---------------------------- 
1 | 2 | 1 | 3 |NULL|NULL 
1 | 2 | 1 | 3 | 1 | 4 
1 |NULL|NULL|NULL|NULL|NULL 

正如你可以看到2列的組合,應該是在一排獨特。

在第1行


的1/2組合是重複所以其除去和1/3是在C5/C6移動到C3/C4

第2行


沒有在重複的1/2組合,三分之一,四分之一,從而在結果沒有變化

在第3列:
所有3名的組合是相同的樣1/NULL是存在於所有的組合所以c3到c6被設置爲空。

在此先感謝

+2

你的問題不是很精確。你究竟需要什麼?你的例子可以用很多方式來解釋,你的描述類型自相矛盾。 「刪除一行中存在的重複值」與「兩列的組合應該在一行中唯一」不同。另外,哪些組合應該是唯一的? – m00hk00h

+0

更新了問題! – Biswabid

+0

是否所有的列都來自同一個表?如何查詢你的查詢 –

回答

0

也許還有一個更聰明的方式...但你可以將它們轉換爲對,不同的(在這種情況下,工會確實是),然後轉動回來。

with pairs as (
    select id, c1 as x, c2 as y from mytable 
    union 
    select id, c3, c4 from mytable 
    union 
    select id, c5, c6 from mytable 
) 
select id, 
     max(decode(rn,1,x)) c1, 
     max(decode(rn,1,y)) c2, 
     max(decode(rn,2,x)) c3, 
     max(decode(rn,2,y)) c4, 
     max(decode(rn,3,x)) c5, 
     max(decode(rn,3,y)) c6 
from (
    select id, x, y, row_number() over (partition by id) rn 
    from pairs 
) as foo 
group by id 
0

這一件作品 - 包括用於測試數據,但可能需要一些時間來了解

一個技巧:取消註釋代碼片段下 - 調試線路,複製腳本,直到只是這些代碼片段並將此部分粘貼到SQL提示符中以測試中間結果。

原理是得到一個行標識符來「記住」行;然後垂直旋轉 - 不是3列到1,而是6列到3對列;然後,使用DISTINCT去重複;然後在重新發送的中間行的行標識符內獲得一個索引;然後使用該索引再次水平轉動。

像這樣:

WITH 
input(c1,c2,c3,c4,c5,c6) AS (
      SELECT 1,  2,1,  2,1,  3 
UNION ALL SELECT 1,  2,1,  3,1,  4 
UNION ALL SELECT 1,NULL::INT,1,NULL::INT,1,NULL::INT 
) 
, 
-- need rowid 
input_with_rowid AS (
SELECT ROW_NUMBER() OVER() AS rowid, * FROM input 
) 
, 
-- three groupy of 2 columns, so pivot using 3 indexes 
idx3(idx) AS (SELECT 1 UNION SELECT 2 UNION SELECT 3) 
, 
-- pivot vertically, two columns at a time and de-dupe 
pivot_pair AS (
SELECT DISTINCT 
    rowid 
, CASE idx 
    WHEN 1 THEN c1 
    WHEN 2 THEN c3 
    WHEN 3 THEN c5 
    END AS c1 
, 
    CASE idx 
    WHEN 1 THEN c2 
    WHEN 2 THEN c4 
    WHEN 3 THEN c6 
    END AS c2 
FROM input_with_rowid CROSS JOIN idx3 
) 
-- debug 
-- SELECT * FROM pivot_pair ORDER BY rowid; 
, 
-- add sequence per rowid 
pivot_pair_with_seq AS (
SELECT 
    rowid 
, ROW_NUMBER() OVER(PARTITION BY rowid) AS seq 
, c1 
, c2 
FROM pivot_pair 
) 
-- debug 
-- SELECT * FROM pivot_pair_with_seq; 

SELECT 
    rowid 
, MAX(CASE seq WHEN 1 THEN c1 END) AS c1 
, MAX(CASE seq WHEN 1 THEN c2 END) AS c2 
, MAX(CASE seq WHEN 2 THEN c1 END) AS c3 
, MAX(CASE seq WHEN 2 THEN c2 END) AS c4 
, MAX(CASE seq WHEN 3 THEN c1 END) AS c5 
, MAX(CASE seq WHEN 3 THEN c2 END) AS c6 
FROM pivot_pair_with_seq 
GROUP BY rowid 
ORDER BY rowid 
; 

rowid|c1|c2|c3|c4|c5|c6 
    1| 1| 2| 1| 3|- |- 
    2| 1| 2| 1| 3| 1| 4 
    3| 1|- |- |- |- |- 
0

使用marcothesane與樞軸/ unpivot的運營理念。如果更多輸入列應該重複數據刪除,則更容易維護。這維護了源數據(列對)的順序 - 而marcothesane的解決方案可能會根據輸入數據對列對進行重新排序。它也比marcothesane慢一點。它只能在11R1以上運行。

WITH 
input(c1,c2,c3,c4,c5,c6) AS (
      SELECT 1,  2,1,  2,1,  3 from dual 
UNION ALL SELECT 1,  2,1,  3,1,  4 from dual 
UNION ALL SELECT 1,NULL ,1,NULL ,1,NULL from dual 
) 
, 
-- need rowid 
input_with_rowid AS (
SELECT ROW_NUMBER() OVER (order by 1) AS row_id, input.* FROM input 
), 
unpivoted_pairs as 
(
    select row_id, tuple_idx, val1, val2, row_number() over (partition by row_id, val1, val2 order by tuple_idx) as keep_first 
    from input_with_rowid 
    UnPivot include nulls(
      (val1, val2) --measure 
       for tuple_idx in ((c1,c2) as 1, 
            (c3,c4) as 2, 
            (c5,c6) as 3) 
     ) 
) 
select row_id, 
     t1_val1 as c1, 
     t1_val2 as c2, 
     t2_val1 as c3, 
     t2_val2 as c4, 
     t3_val1 as c5, 
     t3_val2 as c6 
from (
     select row_id, 
      val1, val2, row_number() over (partition by row_id order by tuple_idx) as tuple_order 
     from unpivoted_pairs 
     where keep_first = 1 
    ) 
pivot (sum(val1) as val1, sum(val2) as val2 
     for tuple_order in ('1' as t1, '2' as t2, '3' as t3) 
     )