R：合併2個數據幀並將參考數據應用於匹配一個級別的所有行

我有兩個數據幀：一個（「grny」），主要是一個引用，但在「yield」列中有一些數據I' m之後，另一個（「txie」）會因爲丟失數據而產生少量數據。我想合併它們，以便在「網站」中具有共同值的行中的所有單元格都是完整的。R：合併2個數據幀並將參考數據應用於匹配一個級別的所有行

其中最多的一年，通過一年的數據是：

txie<-data.frame (site=c(rep("smithfield",2),rep("belleville",3)), 
yield=c((rnorm(4, mean=8)),NA), 
year=c(1999:2000,1992:1994), 
prim=c(rep("nt",2),rep(NA,3)))

一些年的年收益率數據大多參考：

grny<-data.frame (site=c("smithfield","belleville",rep("nashua",3)), 
yield=c(rep(NA,2),rnorm(3,mean=9)), 
year=c(rep(NA,2),1990:1992), 
prim=c(NA,"nt",sample(c("nt","ct"),3,rep=TRUE)), 
lat=(c(rnorm(2,mean=45,sd=10),rep(49.1,3))))

我想要什麼：

  site yield year prim lib  lat 
1 smithfield 7.009178 1999 nt 1109  43.61828 
2 smithfield 8.472677 2000 nt 1109  43.61828 
3 belleville 8.857462 1992 nt 122  74.08792 
4 belleville 7.368488 1993 nt 122  74.08792 
5 belleville  NA 1994 nt 122  74.08792 
6 nashua  7.494519 1990 nt 554  49.10000 
8 nashua  8.696066 1991 ct 554  49.10000 
9 nashua  8.051670 1992 nt 554  49.10000

我試過的東西：

rbind.fill(txie,grny) #this appends rows to the correct columns but leaves NA's everywhere because it doesn't know I want data missing in grny filled in when it is available in txie 
Reduce(function(x,y) merge(txie,grny, by="site", all.y=TRUE), list(txie,grny)) #this merges by rows but creates new variables from x and y. 
merge(x = txie, y = grny, by = "site", all = TRUE) #this does the same as the above (new variables from each x and y ending in .x or .y) 
merge(x = txie, y = grny, by = "site", all.x = TRUE)#this does similar to above but merges based on the x df (new variables from each x and y ending in .x or .y) 
setkey(setDT(grny),site)[txie]# this gives a similar result to the all.x line

例如，與外部連接合並我結束了：

 site yield.x year.x prim.x yield.y year.y prim.y  lat 
1 belleville 6.766628 1992 <NA>  NA  NA  nt 34.92136 
2 belleville 6.845789 1993 <NA>  NA  NA  nt 34.92136 
3 belleville  NA 1994 <NA>  NA  NA  nt 34.92136 
4 smithfield 8.841339 1999  nt  NA  NA <NA> 49.81872 
5 smithfield 7.313310 2000  nt  NA  NA <NA> 49.81872 
6  nashua  NA  NA <NA> 9.173229 1990  ct 49.10000 
7  nashua  NA  NA <NA> 9.196018 1991  nt 49.10000 
8  nashua  NA  NA <NA> 7.336645 1992  ct 49.10000

規定：我想保持NA的那些已經在「收益率」列（如。 1994年納舒厄）。任何答案或有人可以告訴我，這種合併的例子（數據已經在一個或多個共享列，你沒有合併，每個df bringing in new columns除「by」變量）？

謝謝！

來源

2017-09-01 Anomie

我錯了說你不應該只在現場，而是在組合現場x年？ –

這個例子可能會令人困惑，但不，可以保持簡單，只需要網站就可以了，因爲我不會爲同一個網站添加多年 – Anomie

使用dplyr包，你可以做一個full_join，然後使用3210功能的雙列yield.x VS yield.y來獲得非NA值，prim.x VS prim.y等。

library(dplyr) 
full_join(txie,grny,by="site") %>% 
mutate(year = coalesce(year.x,.$year.y), 
yield = coalesce(yield.x,yield.y), 
prim = coalesce(prim.x,prim.y)) %>% 
select(-c(year.x,year.y,yield.x,yield.y,prim.x,prim.y)) 

     site  lat year  yield prim 
1 smithfield 59.71994 1999 7.920844 nt 
2 smithfield 59.71994 2000 10.122713 nt 
3 belleville 34.93358 1992 8.622351 nt 
4 belleville 34.93358 1993 7.360470 nt 
5 belleville 34.93358 1994  NA nt 
6  nashua 49.10000 1990 9.083390 ct 
7  nashua 49.10000 1991 8.073866 nt 
8  nashua 49.10000 1992 8.725625 nt

來源

2017-09-01 16:09:49 Lamia

謝謝！這工作。對於像我這樣的其他新手來說，這只是一個參考（現在看起來很明顯並且很簡單），但我必須確保所有具有相同名稱的向量在兩個dfs中都是同一類型。 – Anomie

R：合併2個數據幀並將參考數據應用於匹配一個級別的所有行

回答

相關問題