2011-09-28 100 views
1

我們得到了擁有將近1億多行的大表。誰能幫如何在表中找到重複的數據,並且可以將其移動到壓縮文件SQL Server - 查找表中的重複項

表名:CustomerData
NumberofFields:10

最新一個應該留(這是由END_DATE標識提到NULL在記錄)

關注

+4

定義重複。所有列相同的值? – Thilo

回答

3

你只需要動到哪END_DATE不爲空行?

在一個單獨的事務:

INSERT INTO archive (column1, column2, ... column10) 
SELECT column1, column2, ..., column10 
FROM CustomerData 
WHERE END_DATE IS NOT NULL 

DELETE CustomerData 
WHERE END_DATE IS NOT NULL 
1

您是否試過這種解決方案?

--INSERT Archive (columns) 
SELECT ... 10 columns ... 
FROM CustomerData 
WHERE END_DATE IS NULL 
0

假設CustomerData表結構爲: CustomerDAta(的cust_id,CUST_NAME,ADDRESS_ID,START_TIME,結束日期,.....,其他7列);

並假設2個客戶有SAme地址ID以獲得重複。

插入到存檔表: -

INSERT INTO archive (column1, column2, ... column10) 
SELECT cust_id, start_Date, ...,End_Date 
FROM CustomerData 
WHERE END_DATE IS NOT NULL 
AND Address_ID IN(
     SELECT Address_ID FROM 
      (
      SELECT Address(ID),count(Address_ID) 
      FROM customerDAta 
      GROUP BY Address_ID 
      HAVING count(Adddress_ID)>1 
      ) 
     )      
         ) 

要刪除: - CustomerDAt表: -

DELETE CustomerData 
WHERE END_DATE IS NOT NULL 
    AND 
    Address_ID IN(
      SELECT Address_ID FROM 
      (
      SELECT Address(ID),count(Address_ID) 
      FROM customerDAta 
      GROUP BY Address_ID 
      HAVING count(Adddress_ID)>1 
      ) 
     ) 

內部子提取重複基於類似於同ADDRESS_ID列Oracle數據庫提供的employees表中的DeptID列。