2016-04-23 60 views
2

我對一些tidyr行爲感到困惑。我可以UNNEST這樣的單一響應:tidyr:多個unnesting與不同NA計數

library(tidyr) 

resp1 <- c("A", "B; A", "B", NA, "B") 
resp2 <- c("C; D; F", NA, "C; F", "D", "E") 
resp3 <- c(NA, NA, "G; H; I", "H; I", "I") 
data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F) 

tidy <- data %>% 
    transform(resp1 = strsplit(resp1, "; ")) %>% 
    unnest() 

# Source: local data frame [6 x 3] 
# 
#  resp2 resp3 resp1 
#  (chr) (chr) (chr) 
# 1 C; D; F  NA  A 
# 2  NA  NA  B 
# 3  NA  NA  A 
# 4 C; F G; H; I  B 
# 5  D H; I NA 
# 6  E  I  B 

但我需要在我的數據集,以UNNEST多列,並且列有來港數目不等。我想這和它扔了一個錯誤:

data %>% 
    transform(resp1 = strsplit(resp1, "; "), 
      resp2 = strsplit(resp2, "; "), 
      resp3 = strsplit(resp3, "; ")) %>% 
    unnest() 
# Error: All nested columns must have the same number of elements. 

我希望上面的代碼會給我輸出以下相同:

# unnesting multiple response (desired output/is there a better way?) 
data %>% 
    transform(resp1 = strsplit(resp1, "; ")) %>% 
    unnest() %>% 
    transform(resp2 = strsplit(resp2, "; ")) %>% 
    unnest() %>% 
    transform(resp3 = strsplit(resp3, "; ")) %>% 
    unnest() 

#  resp1 resp2 resp3 
#  (chr) (chr) (chr) 
# 1  A  C NA 
# 2  A  D NA 
# 3  A  F NA 
# 4  B NA NA 
# 5  A NA NA 
# 6  B  C  G 
# 7  B  C  H 
# 8  B  C  I 
# 9  B  F  G 
# 10  B  F  H 
# 11  B  F  I 
# 12 NA  D  H 
# 13 NA  D  I 
# 14  B  E  I 

我是新來的R,但這種感覺笨重,讓我懷疑我是否在濫用我不應該濫用的東西。發生多次不切實際的嘗試是怎麼回事?

回答

1

檢查this link,它顯示了從您的不同情況下取消多列的情況。根據所給出的文檔和鏈接,除非有一些聰明的方法來執行此操作,否則可能只是爲單個列定義該函數以避免含糊不清。

因此,您可能必須逐一打開您的列,下面給出的代碼可能仍然很麻煩,但會稍微簡化一點。

> resp1 <- c("A", "B; A", "B", NA, "B") 
> resp2 <- c("C; D; F", NA, "C; F", "D", "E") 
> resp3 <- c(NA, NA, "G; H; I", "H; I", "I") 
> data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F) 
> data 
    resp1 resp2 resp3 
1  A C; D; F <NA> 
2 B; A <NA> <NA> 
3  B C; F G; H; I 
4 <NA>  D H; I 
5  B  E  I 
library(tidyr) 
library(dplyr) 
data %>% 
transform(resp1 = strsplit(resp1, "; "), 
      resp2 = strsplit(resp2, "; "), 
      resp3 = strsplit(resp3, "; ")) %>% 
unnest(resp1) %>% unnest(resp2) %>% unnest(resp3) 
    resp1 resp2 resp3 
1  A  C <NA> 
2  A  D <NA> 
3  A  F <NA> 
4  B <NA> <NA> 
5  A <NA> <NA> 
6  B  C  G 
7  B  C  H 
8  B  C  I 
9  B  F  G 
10  B  F  H 
11  B  F  I 
12 <NA>  D  H 
13 <NA>  D  I 
14  B  E  I 
+0

最後一行給出_Error:錯誤結果的大小(5),預計6或1_。同樣,當我用'unnest(resp1,resp2,resp3)'替換它時。 – alexpghayes

+0

嗯,有趣。代碼似乎適用於我。我粘貼了重現你結果的整個代碼塊。 – Psidom

+0

我有一個類似的問題,順序運行不順暢不起作用,因爲它似乎刪除第一次調用其他嵌套列 –

0

除了Psidom答案:默認情況下,unnest下降附加列表的列(如需要行重複)。使用.drop = FALSE參數保留其他列。

unnest(resp1) %>% unnest(resp2) %>% unnest(resp3)變爲:

unnest(resp1, .drop = FALSE) %>% unnest(resp2, .drop = FALSE) %>% unnest(resp3)