頻率字符串中的R

我要算在下面的數據集中各因素的地位的立場：頻率字符串中的R

df <-data.frame(fact=c("a,b", "c,b"))

所以，我的理想輸出是類似的東西：

Factor position1  Position2 

    a   1   0 
    b   0   2 
    c   1   0

對於實例b在第二位發生兩次。

我曾嘗試是非常繁瑣的，我的字符串分隔列和比一個衡量他們每個人一個的頻率：

library(splitstackshape) 

df <-cSplit(df,"fact", ",") 

table(df$fact_2)

我想知道如果有一個讓我工作的任何提示更輕鬆？

來源

2016-12-05 MFR

由於cSplit給出了一個data.table結果，你可以融化並投以更簡單的方式來得到一個結果：

dfspl <-cSplit(df,"fact", ",") 

dcast(melt(dfspl, measure.vars=names(dfspl)), value ~ variable, fun.agg=length) 

# value fact_1 fact_2 
#1:  a  1  0 
#2:  b  0  2 
#3:  c  1  0

來源

2016-12-06 00:07:44 thelatemail

這裏是dplyr/tidyr

library(dplyr) 
library(tidyr)  
separate(df, fact, into = c("position1", "position2")) %>% #splits the column into two 
     gather() %>% #converts to long format 
     group_by(key, value) %>% #grouped by both the columns 
     count() %>% #get the count 
     spread(key, n, fill=0) #spread to wide format 
# A tibble: 3 × 3 
# value position1 position2 
#* <chr>  <dbl>  <dbl> 
#1  a   1   0 
#2  b   0   2 
#3  c   1   0

另一種選擇

來源

2016-12-06 02:30:24 akrun

頻率字符串中的R

回答

相關問題