2015-02-06 88 views
3

樣本數據結合GROUP BY和ROW_NUMBER()

userid   email_address    login_name  name    Title  org   phone_number_com 
============= ========================== =============== ================== ========== ============= =================== 
1192   [email protected]  sjobs   Steve Jobs   CEO   Apple   N/A 
1274   [email protected]  sjobs   Steve Jobs   CFO   Apple   697-4686 
1192   [email protected]  sjobs   Steven jobs   CEO   Apple   604-7126 
1885   [email protected] bgates   Bill Gates   CEO   Microsoft  604-7114 
1920   [email protected] bgates   William Gates  CTR   Microsoft  604-7247 
1951   [email protected]  wbuffet   Warren Buffet  CEO   HP    614-9141 
1954   [email protected]  wbuffet   W. Buffet   COO   HP    614-7589 
1951   [email protected]  wbuffet   Warren S Buffet  CIO   Xerox   614-8874 
1956   [email protected]  mzuck   Mark Zuckerberg CEO   FB    614-8295 

QUERY

SELECT * 
FROM 
    (
     SELECT userid, name, login_name, email_address, phone_number_com, 
     ROW_NUMBER() OVER(PARTITION BY [login_name] ORDER BY login_name) Num_Duplicates 
     FROM web_user 
    ) as Rows 
WHERE Num_Duplicates > 1 

這是我的第一篇文章,希望我下面所有的程序。我得到一個結果集,它顯示了重複的第2和第3行。我試圖GROUP BYlogin_name並只顯示最高的行Num_Duplicates。如果一個login_name有一個Num_Duplicates的2和3,只顯示行3.我希望這是有道理的!預先感謝您提供的任何指導。

這些都是結果,我想輸出查詢:

userid | email_address | login_name | name | Title | org phone_number_com | Num_Duplicates  
1192 | [email protected] | sjobs | Steve Jobs | CEO | Apple | N/A | 3  
1885 | [email protected] | bgates | Bill Gates | CEO | Microsoft | 604-7114 | 2  
1951 | [email protected] | wbuffet | Warren Buffet | CEO | HP | 614-9149 | 3 
+1

你爲什麼需要行號? – serakfalcon 2015-02-06 18:08:37

+1

你會添加你想要的結果嗎? – RezaRahmati 2015-02-06 18:11:55

+0

爲什麼只顯示第三個?您正在按login_name進行分組和排序,這意味着每個組內的順序是任意的,並且每次執行時都會有所不同。所以1,2,3 ..他們都是一樣的。爲什麼只顯示3?爲什麼不只顯示2或只顯示1? – 2015-02-06 18:16:44

回答

0

如果我明白你正確地做什麼,你會被登錄名組率先拿到副本的數目:

SELECT login_name, COUNT(*) AS num_duplicates 
    FROM web_user 
GROUP BY login_name 

在這裏,您既可以使用子查詢與ROW_NUMBER()(雖然我會聯繫的情況下,推薦使用RANK()),或者你可以只使用總的窗函數:

SELECT login_name, COUNT(*) AS num_duplicates 
    , RANK() OVER (ORDER BY COUNT(*) DESC) AS rn 
    FROM web_user 
GROUP BY login_name; 

那麼把它放進一個子查詢只得到了login_name最重複的:每OP的評論

SELECT * FROM (
    SELECT login_name, COUNT(*) AS num_duplicates 
     , RANK() OVER (ORDER BY COUNT(*) DESC) AS rn 
     FROM web_user 
    GROUP BY login_name 
) WHERE rn = 1; 

UPDATE,問題編輯:

SELECT userid, name, login_name, email_address, phone_number_com, num_duplicates 
    FROM (
    SELECT userid, name, login_name, email_address, phone_number_com 
     , COUNT(*) OVER (PARTITION BY login_name) AS num_duplicates 
     , ROW_NUMBER() OVER (PARTITION BY login_name ORDER BY userid) AS rn 
     FROM web_user 
) WHERE num_duplicates > 1 AND rn = 1; 

我在做什麼以上是使用COUNT(*)作爲窗口函數;通過login_name分區將獲得每個登錄名的計數。我還劃分了login_name以獲得ROW_NUMBER()並按userid排序,以便我可以返回最小值(您似乎正在執行所需的輸出)。

+1

我會在那裏添加HAVING COUNT(*)> 2條件,所以你真的知道這些是根據OP的文本重複的 – 2015-02-06 18:24:52

+0

,我敢肯定這不是Ariel想要的。 – 2015-02-06 18:27:18

+0

大衛,我開始用下面的查詢: – Ariel 2015-02-06 19:02:40

0

嗯 - 從您的描述聽起來像你只是想這樣的事情(把我的頭頂部):

SELECT login_name, email_address 
FROM web_user 
GROUP BY login_name, email_address 
HAVING count(*) > 2 
+0

在我的結果中,我需要返回userid,name,login_name,email_address,phone_number_com。 – Ariel 2015-02-06 18:36:59

+0

只需根據需要添加 - 例如'login_name,email_address,MAX(phone_number)',等等。 – 2015-02-06 18:38:28

+0

ISE,如果我GROUP BY的所有列我選擇它會給我一個不準確的結果。我只需要GROUP BY only login_name並顯示我選擇的其他字段(例如,用戶名,名稱,登錄名,電子郵件地址,電話號碼) – Ariel 2015-02-06 19:12:42

0

下應該給你你需要什麼。

ROW_NUMBER窗口函數用於標識login_name的第一行。使用窗口函數COUNT來計算每個login_name的行數。

然後,外部查詢將結果限制爲具有多於1行的那些login_name,並且僅返回每個login_name的第一行。

DECLARE @users TABLE 
(
    userid    int 
    , email_address  varchar(100) 
    , login_name  varchar(100) 
    , name    varchar(100) 
    , title    varchar(100) 
    , org    varchar(100) 
    , phone_number_com varchar(100) 
) 

INSERT INTO @users 
VALUES 
(1192, '[email protected]', 'sjobs', 'Steve Jobs', 'CEO', 'Apple', 'N/A') 
, (1274, '[email protected]', 'sjobs', 'Steve Jobs', 'CFO', 'Apple', '697-4686') 
, (1192, '[email protected]', 'sjobs', 'Steven jobs', 'CEO', 'Apple', '604-7126') 
, (1885, '[email protected]', 'bgates', 'Bill Gates', 'CEO', 'Microsoft', '604-7114') 
, (1920, '[email protected]', 'bgates', 'William Gates', 'CTR', 'Microsoft', '604-7247') 
, (1951, '[email protected]', 'wbuffet', 'Warren Buffet', 'CEO', 'HP', '614-9141') 
, (1954, '[email protected]', 'wbuffet', 'W. Buffet', 'COO', 'HP', '614-7589') 
, (1951, '[email protected]', 'wbuffet', 'Warren S Buffet', 'CIO', 'Xerox', '614-8874') 
, (1956, '[email protected]', 'mzuck', 'Mark Zuckerberg', 'CEO', 'FB', '614-8295') 
; 

WITH LoginWithWindowFunction AS 
(
    SELECT 
     * 
     , ROW_NUMBER() OVER(PARTITION BY login_name ORDER BY userid) AS LoginOrder 
     , COUNT(*) OVER(PARTITION BY login_name) AS Num_Duplicates 

    FROM 
     @users 
) 

SELECT 
    userid 
    , email_address 
    , login_name 
    , name 
    , title 
    , org 
    , phone_number_com 
    , Num_Duplicates 

FROM 
    LoginWithWindowFunction 

WHERE 
    LoginOrder = 1 
    AND Num_Duplicates > 1 

ORDER BY 
    userid