2015-01-26 68 views
5

我有下面的示例結構的列表:拼合列表具有複雜嵌套結構

> dput(test) 
structure(list(id = 1, var1 = 2, var3 = 4, section1 = structure(list(
    var1 = 1, var2 = 2, var3 = 3), .Names = c("var1", "var2", 
"var3")), section2 = structure(list(row = structure(list(var1 = 1, 
    var2 = 2, var3 = 3), .Names = c("var1", "var2", "var3")), 
    row = structure(list(var1 = 4, var2 = 5, var3 = 6), .Names = c("var1", 
    "var2", "var3")), row = structure(list(var1 = 7, var2 = 8, 
     var3 = 9), .Names = c("var1", "var2", "var3"))), .Names = c("row", 
"row", "row"))), .Names = c("id", "var1", "var3", "section1", 
"section2")) 


> str(test) 
List of 5 
$ id  : num 1 
$ var1 : num 2 
$ var3 : num 4 
$ section1:List of 3 
    ..$ var1: num 1 
    ..$ var2: num 2 
    ..$ var3: num 3 
$ section2:List of 3 
    ..$ row:List of 3 
    .. ..$ var1: num 1 
    .. ..$ var2: num 2 
    .. ..$ var3: num 3 
    ..$ row:List of 3 
    .. ..$ var1: num 4 
    .. ..$ var2: num 5 
    .. ..$ var3: num 6 
    ..$ row:List of 3 
    .. ..$ var1: num 7 
    .. ..$ var2: num 8 
    .. ..$ var3: num 9 

注意,section2列表包含名爲rows元件。這些代表多個記錄。我所擁有的是嵌套列表,其中一些元素位於根級,而其他元素是同一觀察值的多個嵌套記錄。我想在一個data.frame格式輸出如下:

> desired 
    id var1 var3 section1.var1 section1.var2 section1.var3 section2.var1 section2.var2 section2.var3 
1 1 2 4    1    2    3    1    4    7 
2 NA NA NA   NA   NA    NA    2    5    8 
3 NA NA NA   NA   NA    NA    3    6    9 

根級元素應該填充在第一行,而row元素應該有自己的行。作爲一個附加的複雜因素,row條目中的變量數量可能會有所不同。

+0

爲什麼你想要這個所需的輸出?這似乎是一個不方便的數據格式。 – A5C1D2H2I1M1N2O1R2T1 2015-01-28 17:52:51

+0

我正在執行一個soap請求,它返回一個嵌套列表中嵌套結構的html表。我不知道爲什麼你認爲所需的輸出不方便。它以data.frame格式重新創建html表格,並在條目跨越多行時填充NA值。 – Zelazny7 2015-01-28 18:46:04

+0

您是否可以提供一個或兩個以上的測試用例,因爲您已經爲此添加了一個賞金。你提到你正在尋找一個「通用」解決方案,所以很有可能知道應該考慮哪些場景。 – A5C1D2H2I1M1N2O1R2T1 2015-02-01 04:22:33

回答

3

下面是一個通用方法。它並不假定你只有三排;它將與你擁有的許多行一起工作。如果嵌套結構中缺少一個值(例如,第2節中的某些子列表中不存在var1),則該代碼將正確地返回該單元的NA。

E.g.如果我們使用以下數據:

test <- structure(list(id = 1, var1 = 2, var3 = 4, section1 = structure(list(var1 = 1, var2 = 2, var3 = 3), .Names = c("var1", "var2", "var3")), section2 = structure(list(row = structure(list(var1 = 1, var2 = 2), .Names = c("var1", "var2")), row = structure(list(var1 = 4, var2 = 5), .Names = c("var1", "var2")), row = structure(list(var2 = 8, var3 = 9), .Names = c("var2", "var3"))), .Names = c("row", "row", "row"))), .Names = c("id", "var1", "var3", "section1", "section2")) 

的一般方法是用熔融創建一個數據幀,其中包括有關嵌套結構信息,然後dcast把它塑造成你想要的格式。

library("reshape2") 

flat <- unlist(test, recursive=FALSE) 
names(flat)[grep("row", names(flat))] <- gsub("row", "var", paste0(names(flat)[grep("row", names(flat))], seq_len(length(names(flat)[grep("row", names(flat))])))) ## keeps track of rows by adding an ID 
ul <- melt(unlist(flat)) 
split <- strsplit(rownames(ul), split=".", fixed=TRUE) ## splits the names into component parts 
max <- max(unlist(lapply(split, FUN=length))) 
pad <- function(a) { 
    c(a, rep(NA, max-length(a))) 
} 
levels <- matrix(unlist(lapply(split, FUN=pad)), ncol=max, byrow=TRUE) 

## Get the nesting structure 
nested <- data.frame(levels, ul) 
nested$X3[is.na(nested$X3)] <- levels(as.factor(nested$X3))[[1]] 
desired <- dcast(nested, X3~X1 + X2) 
names(desired) <- gsub("_", "\\.", gsub("_NA", "", names(desired))) 
desired <- desired[,names(flat)] 

> desired 
    ## id var1 var3 section1.var1 section1.var2 section1.var3 section2.var1 section2.var2 section2.var3 
## 1 1 2 4    1    2    3    1    4    7 
## 2 NA NA NA   NA   NA   NA    2    5    8 
## 3 NA NA NA   NA   NA   NA    3    6    9 
1

該解決方案的核心思想是將除名爲'row'的子列表之外的所有子列表扁平化。這可以通過爲每個列表元素創建一個唯一的ID(存儲在z中),然後請求單個'行'中的所有元素應該具有相同的ID(存儲在z2中;必須編寫遞歸函數來遍歷嵌套列表)。然後,z2可用於對屬於同一行的元素進行分組。可以使用stringi包中的stri_list2matrix將結果列表轉換爲矩陣形式,然後轉換爲數據幀。

utest <- unlist(test) 
z <- relist(seq_along(utest),test) 

recurse <- function(L) { 
    if (class(L)!='list') return(L) 
    b <- names(L)=='row' 
    L.b <- lapply(L[b],function(k) relist(rep(k[[1]],length(k)),k)) 
    L.nb <- lapply(L[!b],recurse) 
    c(L.b,L.nb) 
} 

z2 <- unlist(recurse(z)) 

library(stringi) 
desired <- as.data.frame(stri_list2matrix(split(utest,z2))) 
names(desired) <- names(z2)[unique(z2)] 

desired 
#  id var1 var3 section1.var1 section1.var2 section1.var3 section2.row.var1 
# 1 1 2 4    1    2    3     1 
# 2 <NA> <NA> <NA>   <NA>   <NA>   <NA>     2 
# 3 <NA> <NA> <NA>   <NA>   <NA>   <NA>     3 
# section2.row.var1 section2.row.var1 
# 1     4     7 
# 2     5     8 
# 3     6     9 
0

因爲當行具有複雜 結構(即,如果在test每一行包含列表test`,應該如何行地結合在一起是你的問題沒有得到很好的界定。同樣如果在同一個錶行具有不同結構?),下面的解決方案依賴於作爲值列表的行。

這就是說,我猜,在一般情況下,你的清單test將 包含任何值,值列表,或行的名單(其中行是值的 列表)。另外,如果行不總是被稱爲「行」,這個解決方案仍然有效。

temp <- lapply(test, 
       function(x){ 
        if(!is.list(x)) 
         # x is a value 
         return(x) 
        # x is a lis of rows or values 
        out <- do.call(cbind,x) 
        if(nrow(out)>1){ 
         # x is a list of rows 
         colnames(out)<-paste0(colnames(out),'.',rownames(out)) 
         rownames(out)<-rep_len(NA,nrow(out)) 
        } 
        return(out) 
       }) 

# a function that extends a matrix to a fixt number of rows (n) 
# by appending rows of NA's 
rowExtend <- function(x,N){ 
       if((!is.matrix(x))){ 
        out<-do.call(rbind,c(list(x),as.list(rep_len(NA,N - 1)))) 
        colnames(out) <- "" 
        out 
       }else if(nrow(x) < N) 
        do.call(rbind,c(list(x),as.list(rep_len(NA,N - nrow(x))))) 
       else 
        x 
      } 

# calculate the maximum number of rows 
.nrows <- sapply(temp,nrow) 
.nrows <- max(unlist(.nrows[!sapply(.nrows,is.null)])) 

# extend the shorter rows 
(temp2<-lapply(temp, rowExtend,.nrows)) 

# calculate new column namames 
newColNames <- mapply(function(x,y) { 
         if(nzchar(y)[1L]) 
          paste0(x,'.',y) 
         else x 
         }, 
         names(temp2), 
         lapply(temp2,colnames)) 


do.call(cbind,mapply(`colnames<-`,temp2,newColNames)) 

#> id var1 var3 section1.var1 section1.var2 section1.var3 section2.row.var1 section2.row.var2 section2.row.var3 
#> 1 2 4 1    2    3    1     4     7     
#> NA NA NA NA   NA   NA   2     5     8     
#> NA NA NA NA   NA   NA   3     6     9     
0

這開始類似蒂法尼的答案,但後來有點分歧。

library(data.table) 

# flatten the first level 
flat = unlist(test, recursive = FALSE) 

# compute max length 
N = max(sapply(flat, length)) 

# pad NA's and convert to data.table (at this point it will *look* like the right answer) 
dt = as.data.table(lapply(flat, function(l) c(l, rep(NA, N - length(l))))) 

# but in reality some of the columns are lists - check by running sapply(dt, class) 
# so unlist them 
dt = dt[, lapply(.SD, unlist)] 
# id var1 var3 section1.var1 section1.var2 section1.var3 section2.row section2.row section2.row 
#1: 1 2 4    1    2    3   1   4   7 
#2: NA NA NA   NA   NA   NA   2   5   8 
#3: NA NA NA   NA   NA   NA   3   6   9