2016-11-18 35 views
0

我有一個名爲Customer的表,它包含基於某些字段的重複記錄。
顧客表:
的cust_id
EMAIL_ID
ROLE_ID
DOB
CREATION_DATE根據某些字段(email_id,role_id和dob)和基於條件(creation_date)刪除重複記錄

數據在客戶表是如下:

cust_id email_id   role_id  dob  creation_date 
1  [email protected]   5  4/2/1966  17/09/2016 
2  [email protected]   5  4/2/1966  20/09/2016 
3  [email protected]   5  15/2/1991  18/09/2016 
4  [email protected]   5  15/2/1991  21/09/2016 
5  [email protected]   5  16/2/1985  30/09/2016 
6  [email protected]   5  16/2/1985  05/11/2016 
7  [email protected]   5  16/2/1985  04/11/2016 

的EMAIL_ID,ROLE_ID和DOB是相同(重複)2個或更多記錄,如上表所示。

我想不同的查詢,這將導致如下:

cust_id email_id   role_id  dob  creation_date 
1  [email protected]   5  4/2/1966  17/09/2016  
3  [email protected]   5  15/2/1991  18/09/2016 
5  [email protected]   5  16/2/1985  30/09/2016  

也就是說,除去基於EMAIL_ID,ROLE_ID和出生日期,其CREATION_DATE小於其他重複記錄重複的記錄。

cust_id email_id   role_id  dob  creation_date 
2  [email protected]   5  4/2/1966  20/09/2016  
4  [email protected]   5  15/2/1991  21/09/2016 
6  [email protected]   5  16/2/1985  05/11/2016 

也就是說,除去基於EMAIL_ID,ROLE_ID和出生日期,其CREATION_DATE比其他重複的記錄更多的重複記錄。

編輯:對上述問題的反問題。

現在,當我在兩個名爲Customer和Individual的表上加入連接時,如何獲得與上面相同的所需結果。
客戶表:
CUST_ID
EMAIL_ID
ROLE_ID
individaul_id(外鍵)
CREATION_DATE

Individaul表:
使用下面的查詢individaul_id
DOB

SELECT c.email_id,c.role_id,i.dob FROM CUSTOMER c 
JOIN INDIVIDUAL i on c.individaul_id=i.individaul_id  
GROUP BY c.email_id,c.role_id,i.dob  
Having count(*) >=2 

我使用MSSQL Server 2012的數據庫
非常感謝提前。

+1

你有太多的聲譽不知道的事實,因此不代碼寫作服務,你需要展示你已經嘗試過。 –

+0

@ZoharPeled,是的。但現在我有點匆忙這是一個生產問題。任何幫助都會很棒。 –

回答

0

我使用了@navintb回答的查詢,並將其修改如下,以刪除重複的結果以獲取所需的輸出。

SELECT max(cust_id),c.email_id,c.role_id,i.dob,max(creation_date) FROM 
CUSTOMER c 
JOIN INDIVIDUAL i on c.individual_id=i.individual_id 
GROUP BY c.email_id,c.role_id,i.dob 
Having count(*) >=2 

AND,

SELECT min(cust_id),c.email_id,c.role_id,i.dob,min(creation_date) FROM 
CUSTOMER c 
JOIN INDIVIDUAL i on c.individual_id=i.individual_id 
GROUP BY c.email_id,c.role_id,i.dob 
Having count(*) >=2 
1

使用MIN和MAX函數

select min(cust_id),email_id,role_id,dob,min(creation_date) from customer group by email_id,role_id,dob; 

select max(cust_id),email_id,role_id,dob,max(creation_date) from customer group by email_id,role_id,dob; 

希望工程

+0

謝謝。排序僅基於'creation_date'而不是'cust_id' –

+0

您可以請看看我的新問題stackoverflow.com/questions/40671048/...它是相同的,但數據提取是從聯接查詢嗎? –

2

您可以使用ROW_NUMBER()的創建日期訂購併過濾掉重複的記錄

首先查詢給人紀錄minimun創作日期

;WITH cte AS (
SELECT cust_id, email_id, role_id, dob, creation_date , 
     ROW_NUMBER() OVER(PARTITION BY email_id, role_id, dob ORDER BY creation_date) seq FROM customer 
) 
SELECT cust_id, email_id, role_id, dob, creation_date 
FROM cte 
WHERE seq = 1 

對於最大創建日期相同的查詢工作與ORDER BY遞減順序進行

;WITH cte AS (
    SELECT cust_id, email_id, role_id, dob, creation_date , 
      ROW_NUMBER() OVER(PARTITION BY email_id, role_id, dob ORDER BY creation_date DESC) seq FROM customer 
    ) 
    SELECT cust_id, email_id, role_id, dob, creation_date 
    FROM cte 
    WHERE seq = 1 

編輯 對於連接查詢只需要添加加盟CTE表達SELECT語句

;WITH cte AS (
    SELECT c.cust_id, c.email_id, c.role_id, i.dob, c.creation_date , 
      ROW_NUMBER() OVER(PARTITION BY c.email_id, c.role_id, c.dob ORDER BY c.creation_date) seq 
FROM customer c 
JOIN INDIVIDUAL i on c.individaul_id=i.individaul_id 
) 
SELECT cust_id, email_id, role_id, dob, creation_date 
FROM cte 
WHERE seq = 1 

對於最大值創建日期相同的查詢條件可與ORDER BY的下降做order

;WITH cte AS (
    SELECT c.cust_id, c.email_id, c.role_id, i.dob, c.creation_date , 
      ROW_NUMBER() OVER(PARTITION BY c.email_id, c.role_id, c.dob ORDER BY c.creation_date DESC) seq 
FROM customer c 
JOIN INDIVIDUAL i on c.individaul_id=i.individaul_id 
    ) 
    SELECT cust_id, email_id, role_id, dob, creation_date 
    FROM cte 
    WHERE seq = 1 
+0

你可以請看看我的新問題http://stackoverflow.com/questions/40671048/remove-duplicate-records-from-join-query-based-on-certain-fieldsemail-id-role它是相同的,但數據獲取來自連接查詢。 –

+0

刪除該問題..我在這裏發帖..沒有太多需要改變 –

+0

好吧,我已經刪除了這個問題。 –

0

您可以使用ROW_NUMBERPARTITION。只是谷歌它相同。

入住此查詢:

Declare @customer table(cust_id int, email_id varchar(200), role_id int, dob datetime, creation_date datetime) 

    Insert into @customer 
    values(1,'[email protected]',5,'04-feb-1966','17-sep-2016'), 
    (2,'[email protected]',5,'04-feb-1966','20-sep-2016'), 
    (3,'[email protected]',5,'15-feb-1991','18-sep-2016'), 
    (4,'[email protected]',5,'15-feb-1991','21-sep-2016'), 
    (5,'[email protected]',5,'16-feb-1985','30-sep-2016'), 
    (6,'[email protected]',5,'16-feb-1985','05-nov-2016'), 
    (7,'[email protected]',5,'16-feb-1985','04-nov-2016') 

--using row number and partition to group data and remove duplicate 
    ;with custCTE as(
    select cust_id, email_id, role_id,dob,creation_date,row_number() over(partition by email_id, role_id, dob order by creation_date) as rnk 
    from @customer 
    ) 

    delete from @customer where cust_id in (select cust_id from custCTE where rnk <> 1) 
    select * from @customer 
+0

謝謝你。你可以請檢查更新的問題(編輯),它使用連接完成。數據已經存在於db我只需要這兩個查詢。 –

0

這裏是解決方案。

DECLARE @MainTable TABLE 
(
    Cust_Id INT, 
    Email_Id NVARCHAR(250), 
    Role_Id INT, 
    DOB DATE, 
    Creation_Date DATE 
) 

DECLARE @Table1 TABLE 
(
    Cust_Id INT, 
    Email_Id NVARCHAR(250), 
    Role_Id INT, 
    DOB DATE, 
    Creation_Date DATE 
) 

DECLARE @Table2 TABLE 
(
    Cust_Id INT, 
    Email_Id NVARCHAR(250), 
    Role_Id INT, 
    DOB DATE, 
    Creation_Date DATE 
) 

INSERT INTO @MainTable 
     (Cust_Id , 
      Email_Id , 
      Role_Id , 
      DOB , 
      Creation_Date 
     ) 
VALUES (1 , N'[email protected]' , 5 , '2/4/1966' , '09/17/2016'), 
     (2 , N'[email protected]' , 5 , '2/4/1966' , '09/20/2016'), 
     (3 , N'[email protected]' , 5 , '2/15/1991' , '09/18/2016'), 
     (4 , N'[email protected]' , 5 , '2/15/1991' , '09/21/2016'), 
     (5 , N'[email protected]' , 5 , '2/16/1985' , '09/30/2016'), 
     (6 ,N'[email protected]' , 5 , '2/16/1985' , '11/05/2016'), 
     (7 , N'[email protected]' , 5 , '2/16/1985' , '11/04/2016') 

;WITH MainTable AS (
SELECT 
    Cust_Id , 
    Email_Id , 
    Role_Id , 
    DOB , 
    Creation_Date , 
    RANK() OVER (PARTITION BY Email_Id, Role_Id, DOB ORDER BY Creation_Date) AS [Rank] 
FROM @MainTable 
) 
INSERT INTO @Table1 
SELECT 
    MainTable.Cust_Id , 
    MainTable.Email_Id , 
    MainTable.Role_Id , 
    MainTable.DOB , 
    MainTable.Creation_Date 
FROM MainTable 
WHERE MainTable.[Rank] = 1 

;WITH MainTable AS (
SELECT 
    Cust_Id , 
    Email_Id , 
    Role_Id , 
    DOB , 
    Creation_Date , 
    RANK() OVER (PARTITION BY Email_Id, Role_Id, DOB ORDER BY Creation_Date) AS [Rank] 
FROM @MainTable 
) 
INSERT INTO @Table2 
SELECT 
    MainTable.Cust_Id , 
    MainTable.Email_Id , 
    MainTable.Role_Id , 
    MainTable.DOB , 
    MainTable.Creation_Date 
FROM MainTable 
WHERE MainTable.[Rank] <> 1 


SELECT * FROM @MainTable ORDER BY Cust_Id 
SELECT * FROM @Table1 ORDER BY Cust_Id 
SELECT * FROM @Table2 ORDER BY Cust_Id 

基本上這樣的問題可以通過使用sql server窗口函數更好地處理。窗口函數在具有添加功能的sql server 2012中更加有用。因此,上面的代碼將在MSSQL 2012中正常工作。