2017-07-17 77 views
10

我想通過組變量拆分嵌套列表。請考慮以下結構:拆分和操作嵌套列表

> str(L1) 
List of 2 
$ names:List of 2 
    ..$ first: chr [1:5] "john" "lisa" "anna" "mike" ... 
    ..$ last : chr [1:5] "johnsson" "larsson" "johnsson" "catell" ... 
$ stats:List of 2 
    ..$ physical:List of 2 
    .. ..$ age : num [1:5] 14 22 53 23 31 
    .. ..$ height: num [1:5] 165 176 179 182 191 
    ..$ mental :List of 1 
    .. ..$ iq: num [1:5] 102 104 99 87 121 

現在我需要產生兩個列表,其中同時使用L1$names$last拼接,導致L2L3,如下所示:

L2:結果由L1$names$last

分組
> str(L2) 
List of 3 
$ johnsson:List of 2 
    ..$ names:List of 1 
    .. ..$ first: chr [1:2] "john" "anna" 
    ..$ stats:List of 2 
    .. ..$ physical:List of 2 
    .. .. ..$ age : num [1:2] 14 53 
    .. .. ..$ height: num [1:2] 165 179 
    .. ..$ mental :List of 1 
    .. .. ..$ iq: num [1:2] 102 99 
$ larsson :List of 2 
    ..$ names:List of 1 
    .. ..$ first: chr [1:2] "lisa" "steven" 
    ..$ stats:List of 2 
    .. ..$ physical:List of 2 
    .. .. ..$ age : num [1:2] 22 31 
    .. .. ..$ height: num [1:2] 176 191 
    .. ..$ mental :List of 1 
    .. .. ..$ iq: num [1:2] 104 121 
$ catell :List of 2 
    ..$ names:List of 1 
    .. ..$ first: chr "mike" 
    ..$ stats:List of 2 
    .. ..$ physical:List of 2 
    .. .. ..$ age : num 23 
    .. .. ..$ height: num 182 
    .. ..$ mental :List of 1 
    .. .. ..$ iq: num 87 

L3:每組只允許發生一次L1$names$last

List of 2 
$ 1:List of 2 
    ..$ names:List of 2 
    .. ..$ first: chr [1:3] "john" "lisa" "mike" 
    .. ..$ last : chr [1:3] "johnsson" "larsson" "catell" 
    ..$ stats:List of 2 
    .. ..$ physical:List of 2 
    .. .. ..$ age : num [1:3] 14 22 23 
    .. .. ..$ height: num [1:3] 165 176 182 
    .. ..$ mental :List of 1 
    .. .. ..$ iq: num [1:3] 102 104 87 
$ 2:List of 2 
    ..$ names:List of 2 
    .. ..$ first: chr [1:2] "anna" "steven" 
    .. ..$ last : chr [1:2] "johnsson" "larsson" 
    ..$ stats:List of 2 
    .. ..$ physical:List of 2 
    .. .. ..$ age : num [1:2] 53 31 
    .. .. ..$ height: num [1:2] 179 191 
    .. ..$ mental :List of 1 
    .. .. ..$ iq: num [1:2] 99 121 

I`ve試圖申請this solution,但現在看來,這不會對嵌套列表的工作。

重複性代碼:

L1 <- list("names" = list("first" = c("john","lisa","anna","mike","steven"),"last" = c("johnsson","larsson","johnsson","catell","larsson")),"stats" = list("physical" = list("age" = c(14,22,53,23,31), "height" = c(165,176,179,182,191)), "mental" = list("iq" = c(102,104,99,87,121)))) 

L2 <- list("johnsson" = list("names" = list("first" = c("john","anna")),"stats" = list("physical" = list("age" = c(14,53), "height" = c(165,179)), "mental" = list("iq" = c(102,99)))), "larsson" = list("names" = list("first" = c("lisa","steven")),"stats" = list("physical" = list("age" = c(22,31), "height" = c(176,191)), "mental" = list("iq" = c(104,121)))), "catell" = list("names" = list("first" = "mike"),"stats" = list("physical" = list("age" = 23, "height" = 182), "mental" = list("iq" = 87)))) 

L3 <- list("1" = list("names" = list("first" = c("john","lisa","mike"),"last" = c("johnsson","larsson","catell")),"stats" = list("physical" = list("age" = c(14,22,23), "height" = c(165,176,182)), "mental" = list("iq" = c(102,104,87)))), "2" = list("names" = list("first" = c("anna","steven"),"last" = c("johnsson","larsson")),"stats" = list("physical" = list("age" = c(53,31), "height" = c(179,191)), "mental" = list("iq" = c(99,121))))) 

編輯:請注意,實際的數據集是相當大的,更深入地嵌套比提供的示例。

+0

您的數據似乎是非常結構化的,即矩形,爲什麼你不使用數據幀 – rawr

+0

我沒有考慮到當我創建樣本數據時。我正在使用的實際數據動態變化,並不一定是矩形。 –

+0

你能提供一個非列表向量不都具有相同長度的例子嗎?隨着理想的最終結果? –

回答

6

通常用於修改列表,您將要使用遞歸。例如,考慮這樣的功能:

foo <- function(x, idx) { 

    if (is.list(x)) { 
     return(lapply(x, foo, idx = idx)) 
    } 
    return(x[idx]) 
} 

它需要一些列表作爲x和一些指數idx的。它將檢查x是否是一個列表,如果是這種情況,它將自動提供給列表的所有子元素。一旦x不再是一個列表,我們採取由idx給出的元素。在整個過程中,原始列表的結構將保持不變。

這裏有一個完整的例子。請注意,此代碼假定列表中的所有矢量都有5個元素。

L1 <- list("names" = list("first" = c("john","lisa","anna","mike","steven"),"last" = c("johnsson","larsson","johnsson","catell","larsson")),"stats" = list("physical" = list("age" = c(14,22,53,23,31), "height" = c(165,176,179,182,191)), "mental" = list("iq" = c(102,104,99,87,121)))) 

L2 <- list("johnsson" = list("names" = list("first" = c("john","anna")),"stats" = list("physical" = list("age" = c(14,53), "height" = c(165,179)), "mental" = list("iq" = c(102,99)))), "larsson" = list("names" = list("first" = c("lisa","steven")),"stats" = list("physical" = list("age" = c(22,31), "height" = c(176,191)), "mental" = list("iq" = c(104,121)))), "catell" = list("names" = list("first" = "mike"),"stats" = list("physical" = list("age" = 23, "height" = 182), "mental" = list("iq" = 87)))) 

L3 <- list("1" = list("names" = list("first" = c("john","lisa","mike"),"last" = c("johnsson","larsson","catell")),"stats" = list("physical" = list("age" = c(14,22,23), "height" = c(165,176,182)), "mental" = list("iq" = c(102,104,87)))), "2" = list("names" = list("first" = c("anna","steven"),"last" = c("johnsson","larsson")),"stats" = list("physical" = list("age" = c(53,31), "height" = c(179,191)), "mental" = list("iq" = c(99,121))))) 

# make L2 
foo <- function(x, idx) { 

    if (is.list(x)) { 
     return(lapply(x, foo, idx = idx)) 
    } 
    return(x[idx]) 
} 

levels <- unique(L1$names$last) 
L2_2 <- vector("list", length(levels)) 
names(L2_2) <- levels 
for (i in seq_along(L2_2)) { 

    idx <- L1$names$last == names(L2_2[i]) 
    L2_2[[i]] <- list(names = foo(L1$names[-2], idx), 
         stats = foo(L1$stats, idx)) 

} 
identical(L2, L2_2) 

str(L2) 
str(L2_2) 

# make L3 

dups <- duplicated(L1$names$last) 
L3_2 <- vector("list", 2) 
names(L3_2) <- 1:2 
for (i in 1:2) { 

    if (i == 1) 
     idx <- !dups 
    else 
     idx <- dups 

    L3_2[[i]] <- foo(L1, idx) 

} 
identical(L3, L3_2) 
str(L3) 
str(L3_2) 
+0

非常感謝你,你的解決方案可以在小列表中正常工作,但對於我的數據集(約~50個變量約有920個觀測值),這是不可行的。 –

+0

爲什麼不可行?時間?記憶?錯誤? – CPak

1

這不是一個完整的答案,但我希望它有幫助。

看看這個工程的L3:

x = data.frame(L1, stringsAsFactors = F) 
y = x[order(x$names.last),] 
y$seq = 1 
y$seq = ifelse(y$names.last == shift(y$names.last),shift(y$seq)+1,1) 
y$seq[1] = 1 

z = list(list(names=list(first=z[[1]]$names.first, last=z[[1]]$names.last), stats=list(physical = list(age =z[[1]]$stats.physical.age, height= z[[1]]$stats.physical.height), mental=list(iq= z[[1]]$stats.iq))), list(names=list(first=z[[2]]$names.first, last=z[[2]]$names.last), stats=list(physical = list(age =z[[2]]$stats.physical.age, height= z[[2]]$stats.physical.height), mental=list(iq= z[[2]]$stats.iq)))) 

最後一部分(z)該轉換回列表可以用循環來完成。假設同名不會太多,循環不會太慢。

你說它更嵌套,在這種情況下,您需要添加is.null和或tryCatch函數來處理錯誤。