2012-05-14 35 views
1

我有以下數據。在數據框中選擇相似和唯一的值

df = data.frame(email_one=c("[email protected]","[email protected]","[email protected]", 
     "[email protected]","[email protected]"), email_two=c("[email protected]", 
     "[email protected]","[email protected]","[email protected]","[email protected]")) 

我想知道如果我可以使用R鍵選擇出現在剛剛第一列中出現的只是列中有兩個出現在兩列值,獨特的價值觀和獨特的價值。

我最初試圖在excel中弄清楚這一點,但我假設R中有更優雅的解決方案,甚至可能使用sqldf軟件包。最好是內置函數,而不是用戶定義的函數,其中包含各種條件語句 (df $ email_one == df $ email_two)

任何人都可以幫助我指出正確的方向。

回答

4

您有理由懷疑這些操作會有內置功能。在這種情況下,您需要功能intersect()setdiff(),並在?intersect幫助頁面上記錄相關功能。

# Elements present in both columns 
intersect(df[[1]], df[[2]]) 
[1] "[email protected]" "[email protected]" "[email protected]" 

# Elements of column 1 that are not in column 2 
setdiff(df[[1]], df[[2]]) 
[1] "[email protected]" "[email protected]" 

# Elements of column _2_ that are not in column _1_ 
setdiff(df[[2]], df[[1]]) 
[1] "[email protected]" "[email protected]"