2016-03-08 61 views
1

我在HeidiSQL中工作,我試圖找出如何刪除除最近的所有重複行。 「重複」之間有一些細微的差別,但是每當有四個以上特定值相同(即UserID,ContactID,SMSID和EventID)時,該行就被認爲是重複的。我需要根據最近的行(由CreatedDate標識)刪除這些。如何識別和刪除重複行,除了最近

下面的查詢識別這些行:

SELECT a.UserID, a.ContactID, a.SMSID, a.EventID, CreatedDate 
FROM WhenToText a 
JOIN (SELECT UserID, ContactID, SMSID, EventID 
     FROM WhenToText 
     GROUP BY UserID, ContactID, SMSID, EventID 
     HAVING COUNT(*) > 1) b 
ON a.UserID = b.UserID 
AND a.ContactID = b.ContactID 
AND a.SMSID = b.SMSID 
AND a.EventID = b.EventID 
ORDER BY UserID, ContactID, SMSID, EventID, CreatedDate DESC 

但是,我不知道我已經確定了他們後如何刪除這些重複。

下面是一些樣本數據:

enter image description here

+0

當你說'「根據最近的一排」'你的意思是,在在重複的情況下,您想要保留最近的記錄嗎?你能向我們展示一些樣本數據嗎? –

+0

是的,這是正確的。我只想保留最近的重複。所以我對最新的CreatedDate感興趣。我在原始文章中添加了一些示例數據的屏幕截圖。再次感謝。 – David

回答

1

這裏有一個辦法:

DELETE FROM WhenToText w1 
INNER JOIN 
(
    SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) AS MaxDate 
    FROM WhenToText 
    GROUP BY UserID, ContactID, SMSID, EventID 
) w2 
    ON w1.UserID = w2.UserID AND w1.ContactID = w2.ContactID AND w1.SMSID = w2.SMSID 
     AND w1.EventID = w2.EventID 
     AND w1.CreatedDate != w2.MaxDate 

這將刪除CreatedDate不是最新的給定(UserID, ContactID, SMSID, EventID)組的任何記錄。請記住,如果共享最新的CreatedDate,這可能會爲每個組留下多個記錄。

如果要首先測試哪個查詢以查看哪些記錄將作爲刪除目標,那麼可以用SELECT w1.* FROM WhenToText w1替換DELETE FROM WhenToText w1

這裏是一個SQL小提琴一個鏈接,演示了查詢如何確定刪除記錄:

SQLFiddle

+1

非常感謝,這正是我一直在尋找的。 – David

1

下面是一個使用DELETE FROM JOIN,W /與您的數據的完整演示解決方案。

SQL:

-- Data preparation 
create table WhenToText(UserID int, ContactID int, SMSID int, EventID int, CreatedDate datetime); 
insert into WhenToText values 
    (4, 25, 7934, 7407, '2016-02-10 00:00:11'), 
    (4, 25, 7934, 7407, '2016-02-09 00:00:12'), 
    (4, 29, 5132, 7407, '2016-02-10 00:00:11'), 
    (4, 29, 5132, 7407, '2016-02-09 00:00:12'), 
    (4, 31, 12944, 7405, '2016-02-10 07:03:02'), 
    (4, 31, 12944, 7405, '2016-02-10 05:03:02'), 
    (4, 146, 12908, 7405, '2016-02-10 06:52:02'), 
    (4, 146, 12908, 7405, '2016-02-10 04:52:02'), 
    (15, 63, 12964, 7401, '2016-02-10 03:42:04'), 
    (15, 63, 12964, 7401, '2016-02-10 03:41:04'), 
    (15, 64, 12326, 7401, '2016-02-07 03:01:03'), 
    (15, 64, 12326, 7401, '2016-02-07 03:00:03'); 
SELECT * FROM WhenToText; 

-- SQL needed 
DELETE a FROM 
    WhenToText a INNER JOIN 
    (
    SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) CreatedDate 
    FROM WhenToText 
    GROUP BY UserID, ContactID, SMSID, EventID 
    ) b 
    USING(UserID, ContactID, SMSID, EventID) 
WHERE 
    a.CreatedDate != b.CreatedDate; 

SELECT * FROM WhenToText; 

輸出:

mysql> SELECT * FROM WhenToText; 
+--------+-----------+-------+---------+---------------------+ 
| UserID | ContactID | SMSID | EventID | CreatedDate   | 
+--------+-----------+-------+---------+---------------------+ 
|  4 |  25 | 7934 | 7407 | 2016-02-10 00:00:11 | 
|  4 |  25 | 7934 | 7407 | 2016-02-09 00:00:12 | 
|  4 |  29 | 5132 | 7407 | 2016-02-10 00:00:11 | 
|  4 |  29 | 5132 | 7407 | 2016-02-09 00:00:12 | 
|  4 |  31 | 12944 | 7405 | 2016-02-10 07:03:02 | 
|  4 |  31 | 12944 | 7405 | 2016-02-10 05:03:02 | 
|  4 |  146 | 12908 | 7405 | 2016-02-10 06:52:02 | 
|  4 |  146 | 12908 | 7405 | 2016-02-10 04:52:02 | 
|  15 |  63 | 12964 | 7401 | 2016-02-10 03:42:04 | 
|  15 |  63 | 12964 | 7401 | 2016-02-10 03:41:04 | 
|  15 |  64 | 12326 | 7401 | 2016-02-07 03:01:03 | 
|  15 |  64 | 12326 | 7401 | 2016-02-07 03:00:03 | 
+--------+-----------+-------+---------+---------------------+ 
12 rows in set (0.00 sec) 

mysql> 
mysql> -- SQL needed 
mysql> DELETE a FROM 
    ->  WhenToText a INNER JOIN 
    ->  (
    ->  SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) CreatedDate 
    ->  FROM WhenToText 
    ->  GROUP BY UserID, ContactID, SMSID, EventID 
    ->  ) b 
    ->  USING(UserID, ContactID, SMSID, EventID) 
    -> WHERE 
    ->  a.CreatedDate != b.CreatedDate; 

SELECT * FQuery OK, 6 rows affected (0.00 sec) 

mysql> 
mysql> SELECT * FROM WhenToText; 
+--------+-----------+-------+---------+---------------------+ 
| UserID | ContactID | SMSID | EventID | CreatedDate   | 
+--------+-----------+-------+---------+---------------------+ 
|  4 |  25 | 7934 | 7407 | 2016-02-10 00:00:11 | 
|  4 |  29 | 5132 | 7407 | 2016-02-10 00:00:11 | 
|  4 |  31 | 12944 | 7405 | 2016-02-10 07:03:02 | 
|  4 |  146 | 12908 | 7405 | 2016-02-10 06:52:02 | 
|  15 |  63 | 12964 | 7401 | 2016-02-10 03:42:04 | 
|  15 |  64 | 12326 | 7401 | 2016-02-07 03:01:03 | 
+--------+-----------+-------+---------+---------------------+ 
6 rows in set (0.00 sec) 
+0

這似乎是一個很好的方法。在執行它之前有沒有一種很好的方法來測試它?我嘗試將它作爲SELECT * FROM運行,試圖獲得所有將被刪除但無法使其正常工作的行的返回。任何想法?再次感謝! – David

+0

@David根據你的新數據更新了帖子。請再試一次。 –

+0

優秀,這個作品!感謝您的幫助,非常感謝。只是好奇 - 是否有辦法自動化CREATE TABLE表的輸入,還是必須手動完成?理想情況下,我希望能夠直接查詢這一步,並避免必須手動輸入。 – David

0

這應該提供您正在尋找的解決辦法,因爲CreatedDate是一個日期數據類型。這也是基於最近的行在技術上是最近的CreatedDate的假設。

SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) AS CreatedDate 
FROM WhenToText 
GROUP BY 1, 2, 3, 4; 

有了這些值,你可以只覆蓋WhenToText表...這將是這個樣子......

CREATE TABLE tmp_table LIKE WhenToText; 

INSERT INTO tmp_table (SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) AS CreatedDate 
          FROM WhenToText 
          GROUP BY 1, 2, 3, 4); 

TRUNCATE WhenToText; 

INSERT INTO WhenToText (SELECT * FROM tmp_table); 

DROP TABLE tmp_table; 
+0

我只是試着用這個,它一直說我不能「在CreatedDate上分組」。 – David

+0

另外,我試着做一些調整,比如用ORDER BY替換GROUP BY,並且查詢只返回1個單行(即MAX(Cre​​atedDate)。) – David

+0

1st問:@David不知道如何得到GROUP BY錯誤,因爲如果它們是select語句中的前4個項目,那麼1,2,3,4只指向項目(UserID,ContactID,SMSID,EventID)。第二個問題:您需要GROUP BY語句正確地運行聚集MAX函數...使用聚合引入ORDER BY子句而不是GROUP BY聚合,只返回一行 – TomDobbs