如何從數據庫中刪除重複項？

我有一個表有四個字段：ID自動增量，一個字符串和兩個整數。我想要做類似的東西：如何從數據庫中刪除重複項？

 select count(*) from table group by string

，然後用結果來鞏固它們是大於1

也就是說，走哪都行數大於1的所有指控，並將數據庫中所有這些行（具有相同的字符串）替換爲單行，ID無關緊要，並且這兩個整數是所有行數大於1的所有行的總和。

這可能使用一些簡單的查詢嗎？

謝謝。

來源

2012-03-11 kloop

如果您可以阻止其他用戶更新表格，那麼這很容易。

-- We're going to add records before deleting old ones, so keep track of which records are old. 
DECLARE @OldMaxID INT 
SELECT @OldMaxID = MAX(ID) FROM table 

-- Combine duplicate records into new records 
INSERT table (string, int1, int2) 
SELECT string, SUM(int1), SUM(int2) 
FROM table 
GROUP BY string 
HAVING COUNT(*) > 1 

-- Delete records that were used to make combined records. 
DELETE FROM table 
WHERE ID <= @OldMaxID 
GROUP BY string 
HAVING COUNT(*) > 1

來源

2012-03-12 00:21:58

有一個簡單的方法來做到這一點。只要將像

id NOT IN (select id from table group by string)

在where語句

，這將通過選擇只是count > 0些，然後選擇所需的資金只選擇複製

來源

2012-03-11 23:13:10

開始：

select * from (
    select count(*), string_col, sum(int_col_1), sum(int_col_2) 
    from my_table 
    group by string_col 
) as foo where count > 1

後我會將這些數據放入臨時表中，刪除不需要的行，並將臨時表中的數據插入原始表中。

來源

2012-03-11 23:13:51

我建議插入臨時表數據按字符串分組，並伴有min（id）其中有重複的地方。然後更新原始表格，其中id = min（id），並刪除字符串匹配但id不匹配。

insert into temp 
select string, min(id) id, sum(int1) int1, sum(int2) int2 
    from table 
    group by string 
having count(*) > 1 

update table, temp 
    set table.int1 = temp.int1, 
     table.int2 = temp.int2 
where table.id = temp.id 
-- Works because there is only one record given a string in temp 
delete table 
    where exists (select null from temp where temp.string = table.string and temp.id <> table.id)

備份是強制性的:-)和一個交易也。

來源

2012-03-11 23:50:22

你可以在兩個查詢中完成所有工作，沒有臨時表。但是您需要重複運行DELETE查詢，因爲它一次只能刪除1個重複項。所以如果一行有三份，你需要運行兩次。但是你可以運行它直到沒有更多的結果。

更新您要保留的重複行以包含計數/總和。

UPDATE tablename JOIN (
    SELECT min(id) id,sum(int1) int1,sum(int2) int2 
    FROM tablename GROUP BY string HAVING c>1 
) AS dups ON tablename.id=dups.id 
SET tablename.int1=dups.int1, tablename.int2

然後，您可以在DELETE查詢中使用同一個SELECT查詢，使用多表語法。

DELETE tablename FROM tablename 
JOIN (SELECT max(id) AS id,count(*) c FROM tablename GROUP BY string HAVING c>1) dups 
ON tablename.id=dups.id

只需運行DELETE，直到沒有行返回（0受影響的行）。

來源

2012-03-11 23:52:43

這不符合要求，其餘行中的整數被更新爲具有該組中所有行的所有整數的總和（在刪除之前） – 2012-03-12 00:06:07

感謝您指出這一點，錯過了這部分問題。編輯添加UPDATE查詢以首先保存總和。 – 2012-03-12 02:25:12

您可以在視圖中獲得這樣的信息：

CREATE VIEW SummarizedData (StringCol, IntCol1, IntCol2, OriginalRowCount) AS 
    SELECT StringCol, SUM(IntCol1), SUM(IntCol2), COUNT(*) 
    FROM TableName 
    GROUP BY StringCol

這將創建一個虛擬表你想要的信息。它將包含僅有一個StringCol值實例的行 - 如果您不希望這些行將短語HAVING COUNT(*) > 1添加到查詢的末尾。

有了這個方法，你可以保持原有的表和彙總數據只是讀，也可以創建具有相應列從SummarizedData一個空表結構和INSERT到新表中得到一個「真正」的表中的數據。

來源

2012-03-12 02:33:44

如何從數據庫中刪除重複項？

回答

相關問題