2016-08-03 97 views
0

我必須將數據框與名爲col_id的公共列分開。按列合併多行

我的問題是簡單的合併不是我的理想情況。

這裏是DF1的示例結構col_id

 col_id stock ch2 
1 id_100 stock 2 yes 
2 id_100002 stock 2 no 
3 id_100003 stock 2 no 

而對於第二DF

 col_id num cat1 
1 id_100 num 2 0 
2 id_100 num 2 1 
3 id_100 num 2 0 
4 id_100002 num 2 1 
5 id_100002 num 2 1 
6 id_100002 num 2 1 
7 id_100003 num 2 1 
8 id_100003 num 2 1 

我想有輸出是填充第二DF的所有單元具有相同的df一的值。輸出的例子

 col_id num cat1 stock ch2 
1 id_100 num 2 0 stock 2 yes 
2 id_100 num 2 1 stock 2 yes 
3 id_100 num 2 0 stock 2 yes 
4 id_100002 num 2 1 stock 2 no 
5 id_100002 num 2 1 stock 2 no 
6 id_100002 num 2 1 stock 2 no 
7 id_100003 num 2 1 stock 2 no 
8 id_100003 num 2 1 stock 2 no 
+0

你可以提供你的表作爲數據框,而不是粘貼表? – user5249203

+0

@ user5249203是的源代碼,但我可以理解你需要df結構直接加載它? – Jake

回答

1

嘗試:

install.packages('dplyr') 
library(dplyr) 

mytext1 = "col_id,stock, ch2 
id_100,stock 2, yes 
id_100002,stock 2, no 
id_100003,stock 2, no" 
mydf1 <- read.table(text=mytext1, header=T, sep=",") 

mytext2 = "col_id,num, cat1 
id_100,num 2, 0 
id_100,num 2, 1 
id_100,num 2, 0 
id_100002,num 2, 1 
id_100002,num 2, 1 
id_100002,num 2, 1 
id_100003,num 2, 1 
id_100003,num 2, 1" 

mydf2 <- read.table(text=mytext2, header=T, sep=",") 
output_df <- left_join(mydf2,mydf1, by="col_id") 

    col_id num cat1 stock ch2 
id_100 num 2 0 stock 2 yes 
id_100 num 2 1 stock 2 yes 
id_100 num 2 0 stock 2 yes 
id_100002 num 2 1 stock 2 no 
id_100002 num 2 1 stock 2 no 
id_100002 num 2 1 stock 2 no 
id_100003 num 2 1 stock 2 no 
id_100003 num 2 1 stock 2 no 
1

好像你要使用的功能mergeall.x/all.y參數。例如,

df1 <- data.frame(
    col_id = c("id_100", "id_10002", "id_10003"), 
    stock = c("stock 2"), 
    ch2 = c("yes", "no", "no") 
) 

df2 <- data.frame(
    col_id = c(rep("id_100", 3), 
      rep("id_10002", 3), 
      rep("id_10003", 2)), 
    num = c("num 2"), 
    cat1 = c(0, 1, 0, 1, 1, 1, 1, 1) 
) 

mergedData <- merge(df1, df2, all.y = TRUE) 

根據您粘貼的片段產生所需的輸出。您可以使用all.(x|y) = (TRUE|FALSE)的任意組合來實現適當的連接(內部,外部,左側,右側,任何)。 W3 Schools對不同類型的連接有很好的描述(他們正在談論SQL的上下文,但R的merge函數是類似的)。

1

你只需要兩行代碼添加如下

df$stock=rep('stock2',8) 
    df$ch2[df$col_id %in% c('id_100,num','id_100002','id_100003']=c('yes','no','no') 

這可以解決您的問題。