根據其他列值在列中複製值

我試圖用同一個行中同一列中出現的另一個值填充所有NA，是否有一種簡單的方法可以做到這一點？我發現了幾乎所有的功能，但並不完全如此。根據其他列值在列中複製值

data.frame看起來像這樣

id month price1 price2 
1 1   NA  2 
2 1   4  NA 
3 1   NA  NA 
1 2   6  NA 
2 2   NA  NA 
3 2   NA  4

輸出應該是這樣的：

id month price1 price2 
1 1   4  2 
2 1   4  2 
3 1   4  2 
1 2   6  4 
2 2   6  4 
3 2   6  4

來源

2016-05-23 larry fisherman

，因爲它專注於編程R此問題可能在StackExchange會更好，但這裏是一個回答：

我想有更好的方法來做到這一點，但立即想到的一個。

replace_nas <- function(df,var,id_var,func = function(x) x[!is.na(x)]) 
    return(merge(df[,-which(names(df)==var)],aggregate(as.formula(paste0(var,"~",id_var)),df,func))[,var]) 
replace_all_nas <- function(df,id_vars,select_var,agg_vars,func = function(x) x[!is.na(x)]) 
    return(cbind(df[,id_vars],sapply(agg_vars,function(x) replace_nas(df,x,select_var,func))))

用法：調用replace_all_nas用df爲data.frame要執行上，id_vars是你想要固定的列名的向量這個動作，select_var是要組織由變量， agg_vars是您想要替換NA的變量，func是您希望用來收集非na值來替換NA的函數。我將其設置爲選擇不是NA值（假設只有一個值），但如果列中存在多個非NA值，則需要其他方法來處理此問題。

運行在你的例子：

replace_all_nas(blah,id_vars = c("id","month"),select_var = c("month"),agg_vars = c("price1","price2"),func = function(x) x[!is.na(x)]) 
# id month price1 price2 
# 1 1  1  4  2 
# 2 2  1  4  2 
# 3 3  1  4  2 
# 4 1  2  6  4 
# 5 2  2  6  4 
# 6 3  2  6  4

來源

2016-05-23 02:24:59

謝謝，我要試試這個！ –

一種可能的方法是使用match功能。

d <- data.frame(id = rep(1:3, 2), 
       month = rep(1:2, each=3), 
       price1 = c(NA, 4, NA, 6, NA, NA), 
       price2 = c(2, NA, NA, NA, NA, 4)) 

d[is.na(d$price1), "price1"] <- 
    d[!is.na(d$price1), ][match(d[is.na(d$price1), "month"], 
           d[!is.na(d$price1), "month"]), "price1"] 

d[is.na(d$price2), "price2"] <- 
    d[!is.na(d$price2), ][match(d[is.na(d$price2), "month"], 
           d[!is.na(d$price2), "month"]), "price2"] 

> d 
    id month price1 price2 
1 1  1  4  2 
2 2  1  4  2 
3 3  1  4  2 
4 1  2  6  4 
5 2  2  6  4 
6 3  2  6  4

注意，如果有一個以上的非缺失值可供選擇，此方法將使用第一個非缺失值。

至於建議的Laterow，您可以遍歷變量：

for (j in 3:ncol(d)) { 
    varname <- names(d)[j] 
    d[is.na(d[, varname]), varname] <- 
    d[!is.na(d[, varname]), ][match(d[is.na(d[, varname]), "month"], 
            d[!is.na(d[, varname]), "month"]), 
           varname] 
}

來源

2016-05-23 02:50:33 mark999

謝謝，你有沒有建議迭代這個方法？我忘了提及，我有大約400列。我可以用「for i in ....」開始命令，然後用我替換price1/price2嗎？此外，多個值無關緊要，它每個產品每月的價格始終相同:) –

@larryfisherman只需用'm < - 名稱替換'd [is.na（d $ price1），「price1」]' d）[I]; d [is.na（d [，m]），m]'，並循環類似'for（i in 3：ncol（d））'。 – Laterow

一個dplyr解決方案。它假設每個「月」與NA之間都有一個單一的值。

爲每個月創建一個數據框，併爲每個月創建具有單個值的新變量。

d1 <- d %>% group_by(month) 
%>% summarise(price1a = mean(price1,na.rm=TRUE),price2a=mean(price2,na.rm=TRUE))

將新列追加到原始數據框。

dplyr::left_join(d,d1,by="month") 
id month price1 price2 price1a price2a 
1 1  1  NA  2  4  2 
2 2  1  4  NA  4  2 
3 3  1  NA  NA  4  2 
4 1  2  6  NA  6  4 
5 2  2  NA  NA  6  4 
6 3  2  NA  4  6  4

來源

2016-05-23 10:40:50

一種方法是使用ave。功能可應用於ave的相同因子水平的組。

ave(df$price1, df$month, FUN=function(x)unique(x[!is.na(x)])) 

#[1] 4 4 4 6 6 6 

ave(df$price2, df$month, FUN=function(x)unique(x[!is.na(x)])) 
#[1] 2 2 2 4 4 4

來源

2016-05-23 10:43:32

根據其他列值在列中複製值

回答

相關問題