2017-08-04 53 views
2

我想使用dplyr的left_join將值(「新」)從一個DF轉移到另一個DF。dplyr:如何按名稱選擇連接列?

我該怎麼做,如果我不知道密鑰的名稱,但只知道它是數據集中的第一個變量?

require("dplyr") 

testData1 <- data.frame(idvar=c(1,2,3), 
        b=c("a","b","c"), 
        c=c("i","ii","iii")) 

testData2 <- data.frame(identification=c(1,2), 
        b=c("a","b"), 
        c=c("i","NA"), 
        new=c("var1","var2")) 

# now do a left join to obtain values of the new variable in the old dataset 


(testResult1 <- left_join(testData1,testData2)) 
# var2 is not in the results because of the "NA" in testData2! 


(testResult2 <- left_join(testData1,testData2, 
         by=c("idvar"="identification"))) 
# works as expected! ... but we do not know the name of the idvar! 


(testResult3 <- left_join(testData1,testData2, 
         by=c(names(testData1)[1]=names(testData2)[1]))) 
# Error: unexpected '=' in: 
# "testResult3 <- left_join(testData1,testData2, 
#        by=c(names(testData1)[1]=" 
+0

這是一個相關的Q&A:https://stackoverflow.com/questions/28125816/r-standard-evalation-for- join-dplyr –

回答

2

您可以提前創建命名載體,然後加入如下:

join_by = colnames(testData2)[1] 
names(join_by)=colnames(testData1)[1] 
left_join(testData1,testData2, by=join_by) 

或一條線:

left_join(testData1,testData2, 
     by=structure(colnames(testData2)[1], names=colnames(testData1)[1])) 

或者作爲由阿爾喬姆建議:

left_join(testData1,testData2, 
       by=setNames(colnames(testData2)[1], colnames(testData1)[1])) 

希望這個他LPS!

+0

考慮使用'setNames(a,b)'作爲'structure(a,names = b)'的縮寫。 –

+0

謝謝,補充說,作爲一個選項。 setNames比這裏的結構有什麼優勢? – Florian

+0

除了需要較少的輸入外,'setNames'對長向量也更有效。 –

3

另一種方法是使這兩個鍵列具有相同的名稱:

left_join(
    testData1, 
    rename_at(testData2, 1, ~ names(testData1)[1]), 
    by = names(testData1)[1] 
) 

# idvar b.x c.x b.y c.y new 
# 1  1 a i a i var1 
# 2  2 b ii b NA var2 
# 3  3 c iii <NA> <NA> <NA> 

# > (testResult2 <- left_join(testData1,testData2, by=c("idvar"="identification"))) 
# idvar b.x c.x b.y c.y new 
# 1  1 a i a i var1 
# 2  2 b ii b NA var2 
# 3  3 c iii <NA> <NA> <NA>