2015-02-07 53 views
3

我有19個從lapply和split操作生成的嵌套列表。 這些列表的形式爲:通過類似的變量名減少(R)導致錯誤

#list1 
Var col1 col2 col3 
A 2  3 4 
B 3  4 5 

#list2 
Var col1 col2 col3 

A 5 6  7 
B 5 4  4 

...... 

#list19 

Var col1 col2 col3 

A 3 6 7 
B 7 4 4 

我已經能夠與

merge.all <- function(x, y) merge(x, y, all=TRUE, by="Var") 
out <- Reduce(merge.all, DataList) 

但是我得到一個錯誤合併列表由於其他列的名稱相似。

我如何可以連接列表的名稱變量名稱,這樣我得到的是這樣的:

Var list1.col1 list1.col2 list1.col3 .......... list19.col3 
A 2   3   4       7 
B 3   4   5   ..........  4 

回答

3

我真的相信有人會想出一個好得多的解決方案。但是,如果你經歷了一個快速和骯髒的解決方案,這似乎工作。

我的計劃是在合併之前簡單地更改列名稱。

#Sample Data 
df1 <- data.frame(Var = c("A","B"), col1 = c(2,3), col2 = c(3,4), col3 = c(4,5)) 
df2 <- data.frame(Var = c("A","B"), col1 = c(5,5), col2 = c(6,4), col3 = c(7,5)) 
df19 <- data.frame(Var = c("A","B"), col1 = c(3,7), col2 = c(6,4), col3 = c(7,4)) 

mylist <- list(df1, df2, df19) 
names(mylist) <- c("df1", "df2", "df19") #just manually naming, presumably your list has names 


## Change column names by pasting name of dataframe in list with standard column names. - using ugly mix of `lapply` and a `for` loop: 

mycolnames <- colnames(df1) 
mycolnames1 <- lapply(names(mylist), function(x) paste0(x, mycolnames)) 


for(i in 1:length(mylist)){ 
    colnames(mylist[[i]]) <- mycolnames1[[i]] 
    colnames(mylist[[i]])[1] <- "Var" #put Var back in so you can merge 
} 



## Merge 
merge.all <- function(x, y) 
    merge(x, y, all=TRUE, by="Var") 

out <- Reduce(merge.all, mylist) 
out 


# Var df1col1 df1col2 df1col3 df2col1 df2col2 df2col3 df19col1 df19col2 df19col3 
#1 A  2  3  4  5  6  7  3  6  7 
#2 B  3  4  5  5  4  5  7  4  4 

你去 - 它的工作原理,但非常醜陋。

2

要設置唯一的數據框名稱,可以使用函數將所有不是合併變量的列表名稱設置爲唯一名稱。

resetNames <- function(x, byvar = "Var") { 
    asrl <- as.relistable(lapply(x, names)) 
    allnm <- names(unlist(x, recursive = FALSE)) 
    rpl <- replace(allnm, unlist(asrl) %in% byvar, byvar) 
    Map(setNames, x, relist(rpl, asrl)) 
} 

Reduce(merge.all, resetNames(dlist)) 
# Var list1.col1 list1.col2 list1.col3 list2.col1 list2.col2 list2.col4 list3.col1 
#1 A   2   3   4   5   6   7   3 
#2 B   3   4   5   5   4   4   7 
# list3.col2 list3.col3 list4.col1 list4.col2 list4.col3 
#1   6   7   3   6   7 
#2   4   4   4   5   6 

當運行帶有添加數據幀的列表時,沒有警告。總是有數據表。它的合併方法不會爲重複的列名返回警告。

library(data.table) 
Reduce(merge.all, lapply(dlist, as.data.table)) 

另一種方法是在數據進入函數時檢查名稱,在那裏更改它們,然後可以避免警告。這不是完美的,但它在這裏運作良好。

merge.all <- function(x, y) { 
    m <- match(names(y)[-1], gsub("[.](x|y)$", "", names(x)[-1]), 0L) 
    names(y)[-1][m] <- paste0(names(y)[-1][m], "DUPE") 
    merge(x, y, all=TRUE, by="Var") 
} 

rm <- Reduce(merge.all, dlist) 
names(rm) 
# [1] "Var"  "col1"  "col2"  "col3"  "col1DUPE.x" 
# [6] "col2DUPE.x" "col4"  "col1DUPE.y" "col2DUPE.y" "col3DUPE.x" 
# [11] "col1DUPE" "col2DUPE" "col3DUPE.y" 

其中dlist

structure(list(list1 = structure(list(Var = structure(1:2, .Label = c("A", 
"B"), class = "factor"), col1 = 2:3, col2 = 3:4, col3 = 4:5), .Names = c("Var", 
"col1", "col2", "col3"), class = "data.frame", row.names = c(NA, 
-2L)), list2 = structure(list(Var = structure(1:2, .Label = c("A", 
"B"), class = "factor"), col1 = c(5L, 5L), col2 = c(6L, 4L), 
    col4 = c(7L, 4L)), .Names = c("Var", "col1", "col2", "col4" 
), class = "data.frame", row.names = c(NA, -2L)), list3 = structure(list(
    Var = structure(1:2, .Label = c("A", "B"), class = "factor"), 
    col1 = c(3L, 7L), col2 = c(6L, 4L), col3 = c(7L, 4L)), .Names = c("Var", 
"col1", "col2", "col3"), class = "data.frame", row.names = c(NA, 
-2L)), list4 = structure(list(Var = structure(1:2, .Label = c("A", 
"B"), class = "factor"), col1 = 3:4, col2 = c(6L, 5L), col3 = c(7L, 
6L)), .Names = c("Var", "col1", "col2", "col3"), row.names = c(NA, 
-2L), class = "data.frame")), .Names = c("list1", "list2", "list3", 
"list4"))