在R中合併後合併列的函數

-1

我想合併兩個數據框後合併列。現在，我正在編寫ifelse語句來爲每個變量獲取一個統一的列。我想要一個函數來選擇什麼數據幀（即x）應該覆蓋另一列。在R中合併後合併列的函數

df$source<-ifelse(df$source.x=='',df$source.y,df$source.x) 
df$id<-ifelse(df$id.x=='',df$id.y,df$id.x) 
df$profile_url<-ifelse(df$profile_url.x=='',df$profile_url.y,df$profile_url.x)

任何幫助，將不勝感激

來源

2013-02-24 user1495088

這只是有點不清楚......對於上面的列，你想讓df帶上y的值iff x是一個空字符串，否則從x取值？ – 2013-02-24 06:26:07

你如何做你的合併？你能給出上下文嗎？並請提供一個可重複的例子。 – agstudy 2013-02-24 06:26:10

這樣應該可以做到這一點。（注意，沒有經過測試，因爲沒有樣本數據）

fixedColumn <- function(colm, myDF, keepx=TRUE) { 
    x <- myDF[[paste0(colm, ".x")]] 
    y <- myDF[[paste0(colm, ".y")]] 

    if(keepx) 
    return(ifelse(x=='', y, x)) 
    # else 
    ifelse(y=='', x, y) 
} 

# columns that need fixing. Don't include the suffixes 
cols <- c("source", "id", "url") 

# fix the .x columns 
df[, paste0(cols, ".x")] <- sapply(cols, fixedColumn, df) 

# delete the .y columns 
for (cc in paste0(cols, ".y")) 
    df[[cc]] <- NULL

使用@ agstudy的樣本數據：

> df 
    Row.names id.x source.x url.x 
1   1 2  2  3 
2   2 3  1  3 
3   3 3  1  2 
4   4 3  2  2 
5   5 3  2  2

來源

2013-02-24 06:32:20

+1;可能稍微快一點就可以做到這一點：'x < - myDF [[paste0（colm，「.x」）]]; x [x ==''] < - myDF [[paste0（colm，「.y」）]] [x =='']' – 2013-02-24 06:35:45

但這裏這段代碼不起作用我想！你的寶瓶中缺少「colm」，不是嗎？ – agstudy 2013-02-24 06:42:23

@agstudy，趕上，謝謝！我修復了'sapply'語句和函數定義匹配。 – 2013-02-24 06:55:21

爲了避免交換柱的這一步，你可以通過sqldf包中使用SQL交換列（如果你真正的問題涉及到可以同時完成的合併）。使用CASE ... WHEN語法你寫相同的if/else邏輯，我們有：

library(sqldf) 
colnames(df) <- gsub('[.]','_',colnames(df)) 
sqldf(" SELECT 
      CASE url_x WHEN '' THEN url_y ELSE url_x END as url , 
      CASE source_x WHEN '' THEN source_y ELSE source_x END as source, 
      CASE id_x WHEN '' THEN id_y ELSE id_x END as id 
     FROM df")

重複的例子，

我們有一個重複的例子測試：

# create some data 
set.seed(1234) 
df1 <- matrix(sample(c('a','b','d',''),3*5,rep=T),ncol=3) 
df2 <- matrix(sample(c('c','b','','a'),3*5,rep=T),ncol=3) 
colnames(df1) <- c('id','source','url') 
colnames(df2) <- c('id','source','url') 
df <- merge(df1,df2,by=0) 

# run 
library(sqldf) 
colnames(df) <- gsub('[.]','_',colnames(df)) 
sqldf(" SELECT 
      CASE url_x WHEN '' THEN url_y ELSE url_x END as url , 
      CASE source_x WHEN '' THEN source_y ELSE source_x END as source, 
      CASE id_x WHEN '' THEN id_y ELSE id_x END as id 
     FROM df") 

url source id 
1 d  d a 
2 d  a d 
3 b  a d 
4 a  d d 
5 b  d c

其中df是：

Row_names id_x source_x url_x id_y source_y url_y 
1   1 a  d  d a  b  a 
2   2 d  a  d b  b  
3   3 d  a  b b  c  a 
4   4 d  d   c  c  a 
5   5    d  b c  c  c

使用輔助函數

（1）如果我們有很多的這些那麼我們可能需要使用一個輔助功能，這使得使用fn$從實現準的Perl風格的字符串替換的gsubfn包：

xy <- function(s) { 
    fn$identity("case $s_x when '' then $s_y else $s_x end as $s") 
} 

fn$sqldf("select `xy('url')`, `xy('source')`, `xy('id')` from df")

（2）或做這種方式 - 存儲SQL語句爲s：

s <- fn$identity("select `xy('url')`, `xy('source')`, `xy('id')` from df") 
sqldf(s)

更多信息

見sqldf home page和fn$看到gsubfn home page。

來源

2013-02-24 07:36:46 agstudy

在R中合併後合併列的函數

回答

相關問題