2017-08-13 61 views
1

我有一長串對象,我需要劃分成較小的列表,每列有20個條目。問題在於每個對象只能在一個列表中出現一次。將長列表劃分爲R中的較短列表

# Create some example data... 
# Make a list of objects. 
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 'Chocolate', 'Car', 'Distance') 

# Generate a longer list, with a random sequence and number of repetitions for each entry 
LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE))) 

print(LONG.LIST) 

Name 
1   Cup 
2 Distance 
3  Roof 
4  Pencil 
5  Lunch 
6  Toast 
7  Watch 
8  Bottle 
9   Car 
10  Roof 
11  Lunch 
12 Forever 
13  Cheese 
14 Oranges 
15  Ocean 
16 Chocolate 
17  Socks 
18  Leaves 
19 Oranges 
20 Distance 
21  Green 
22  Paper 
23  Red 
24  Paper 
25  Trees 
26 Chocolate 
27  Bottle 
28  Dog 
29  Wind 
30  Parrot 
etc.... 

使用上述生成的例子,'Distance'出現在兩個位置「2」和位置「20」,'Lunch'在兩個「5」和'11,以及在'Oranges'「14」和19' ,所以第一沒有重複的列表需要擴展到包括'Green','Paper''Red'。然後第二個列表將與'Paper'開始在24位

最後名單很可能是不完整的,所以這將是很好的與「NA的墊它

如果輸出分別列這將是最簡單的一個數據框。

我不知道從哪裏開始,所以任何建議都非常感謝。謝謝!

+1

你意思是這個'library(tidyverse); LONG.LIST%>%group_by(Name)%>%mutate(grp = row_number())%>%group_by(grp)%> mutate(ind = row_number())%>%傳播(grp,名稱)' – akrun

+0

@akrun - 太好了,謝謝!似乎運作良好。如果你想把它寫成答案,我會接受它。我對tidyverse不熟悉,您能否詳細介紹一下發生了什麼?我想改變的唯一方法是按字母順序排列每個列表。 – EcologyTom

+0

@EcologyTom你的意思是說每個列表應該從LONG.LIST的(24n + 1)索引開始? –

回答

3

我們可以用tidyverse來做到這一點。通過「名稱」組合,創建序列號一欄,我們在group_by使用它來創建一個新的序列列「IND」,然後轉化爲「寬」格式spreadorder列字母

library(tidyverse) 
LONG.LIST %>% 
    group_by(Name) %>% 
    mutate(grp = row_number()) %>% 
    group_by(grp) %>% 
    mutate(ind = row_number()) %>% 
    spread(grp, Name) %>% 
    mutate_at(vars(-one_of("ind")), funs(.[order(as.character(.))])) 
# A tibble: 40 x 12 
#  ind  `1`  `2`  `3`  `4`  `5`  `6`  `7`  `8`  `9`  `10`  `11` 
# <int> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> 
# 1  1  Bag  Bag  Bag  Bag  Bag  Bag  Bag  Bag  Cup Distance Distance 
# 2  2  Blue  Blue  Book  Book  Book Cloud  Cup  Cup Distance Train  NA 
# 3  3  Book  Book Bottle Cloud Cloud  Cup Distance Distance Train  NA  NA 
# 4  4 Bottle Bottle Cheese  Cup  Cup Distance  Dog Hammock  NA  NA  NA 
# 5  5  Car  Car Cloud Distance Distance  Dog Hammock  Moon  NA  NA  NA 
# 6  6 Cheese Cheese  Cup  Dog  Dog Hammock  Moon Parrot  NA  NA  NA 
# 7  7 Chocolate Chocolate Distance  Fork Hammock Horse Paper Train  NA  NA  NA 
# 8  8  Cloud  Cloud  Dog Hammock Horse  Moon Parrot  NA  NA  NA  NA 
# 9  9  Cup  Cup  Fork Hippo  Mint Paper Train  NA  NA  NA  NA 
#10 10 Distance Distance Green Horse  Moon Parrot  NA  NA  NA  NA  NA 
# ... with 30 more rows 
+0

謝謝@akrun。雖然錯誤使用方法(「tbl_vars」): 沒有適用於'tbl_vars'的方法應用於類「c('col_list','lazy_dots')」 – EcologyTom

+0

@ EcologyTom我正在使用'tidyr_0.6.3'和'dplyr_0.7.2'。在'R 3.4.1'您能否請檢查您的版本 – akrun

+0

@EcologyTom目前尚不清楚是否由於版本差異。你可以試試'%>%spread(grp,Name)%>% as.data.frame()%>% mutate_at(vars(-one_of(「ind」)),funs(。[order(as.character (。))]))' – akrun