2017-02-10 54 views
-3

我希望計算data frame列名中的組的唯一list元素的長度。我輸入data frame數據框中按組分組的唯一列表元素的長度

NameList = list(c("Sam", "Gemma", "Alison", "Tom"),c("Oliver", "Alison"),c("Tom", "Alison", "Harry"),c("Vin", "Harry"), c("Jason", "Sam", "Harry"),c("Anton", "Harry"),c("Harry"),c("Vin", "Jack")) 

df <- data.frame(Name = c('Alison','Alison','Alison','Harry','Harry','Harry','Harry','Jack'), NameList = sapply(NameList, paste0, collapse = ',')) 

我想通過如下df$name組來計算列表中元素的獨特長度:

Name unique_Num_Name 
Alison 6 
Harry 5 
Jack  2 

我知道如何得到的元素length(unique(unlist(df$NameList)))的唯一列表的長度。但是,對於我的數據框,我沒有成功獲得一個組。所以,我會很感激任何指導或幫助。

+1

請'dput'和'str'你的數據('df')並在這裏分享輸出。目前還不清楚'NameList'列的類型是什麼。 – Abdou

+0

@Abdou我添加了'NameList'的'dput'。這是一個列表。 – Santosh

+0

'tapply'會做到這一點。 (df,tapply(NameList,Name,FUN = function(x)length(unique(unlist(x)))))' –

回答

1

您可以使用dplyrtidyr包從tidyverse

library(tidyverse) 
separate_rows(df, NameList, sep = ",") %>% 
    group_by(Name) %>% 
    summarise(uniq_names = n_distinct(NameList)) 

結果是:

# A tibble: 3 × 2 
    Name uniq_names 
    <fctr>  <int> 
1 Alison   6 
2 Harry   5 
3 Jack   2 

輸入數據:

NameList = list(c("Sam", "Gemma", "Alison", "Tom"),c("Oliver", "Alison"),c("Tom", "Alison", "Harry"),c("Vin", "Harry"), 
       c("Jason", "Sam", "Harry"),c("Anton", "Harry"),c("Harry"),c("Vin", "Jack")) 

df <- data.frame(Name = c('Alison','Alison','Alison','Harry','Harry','Harry','Harry','Jack'), 
       NameList = sapply(NameList, paste0, collapse = ',')) 
+0

這是爲什麼downvoted? – h3rm4n

+0

我不得不修改我的一些代碼。現在,你的腳本工作得很好。感謝您的幫助。我在努力並接受你的回答。再次感謝。 – Santosh

1

斯普利特組由name定義和使用length-unique-unlist組合對於每個組:

lapply(split(dat, dat$Name), function(x) { 
    length(unique(unlist(x$NameList))) 
}) 

更新: 豐富斯克裏在評論中提出的,tapply是更好的選擇在這裏:

with(dat, 
    tapply(NameList, Name, FUN=function(x) 
    length(unique(unlist(x))) 
) 
) 

樣品數據:

structure(
    list(
    Name = structure(
     c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 
     3L), 
     .Label = c("Alison", "Harry", "Jack"), 
     class = "factor" 
    ), 
    NameList = structure(list(
     c("Sam", "Gemma", "Alison", "Tom"), 
     c("Oliver", "Alison"), 
     c("Tom", "Alison", "Harry"), 
     c("Vin", 
     "Harry"), 
     c("Jason", "Sam", "Harry"), 
     c("Anton", "Harry"), 
     "Harry", 
     c("Vin", "Jack") 
    ), class = "AsIs") 
), 
    .Names = c("Name", 
      "NameList"), 
    row.names = c(NA,-8L), 
    class = "data.frame" 
) 
+0

'tapply()'可能是一個更好的選擇。 '(df,tapply(NameList,Name,FUN = function(x)length(unique(unlist(x)))))' –

+0

@RichScriven是的。謝謝。 – bergant