2017-02-13 62 views
1

我有一個數據框,其中包含調查響應,每行代表不同的人。一列 - 「文本」 - 是一個開放式的文本問題。我想用Tidytext :: unnest_tokens讓我通過每行做文本分析,包括景氣指數,字數等使用嵌套列表列方法和Purrr與Tidytext :: Unnest_Tokens一起使用

下面是這個例子中,簡單的數據框:

Satisfaction<-c ("Satisfied","Satisfied","Dissatisfied","Satisfied","Dissatisfied") 
Text<-c("I'm very satisfied with the services", "Your service providers are always late which causes me a lot of frustration", "You should improve your staff training, service providers have bad customer service","Everything is great!","Service is bad") 
Gender<-c("M","M","F","M","F") 
df<-data.frame(Satisfaction,Text,Gender) 

然後我將Text列轉換爲字符...

df$Text<-as.character(df$Text) 

接下來我按ID列進行分組並嵌套數據幀。

df<-df%>%mutate(id=row_number())%>%group_by(id)%>%unnest_tokens(word,Text)%>%nest(-id) 

得到這個地步似乎工作確定,但現在我怎麼使用purrr :: map函數對嵌套表列「字」工作?例如,如果我想使用dplyr :: mutate爲每一行創建一個新字段,並記錄字數?

此外,有沒有更好的方式來嵌套數據框,以便只有「文本」列是嵌套列表?

+0

這是不是很清楚你想要什麼。您可以在不使用'purrr :: nest'的情況下進行文本分析,只需在'unnest_tokens'之後停止。如果您只想嵌套單詞列,您可以執行「嵌套(單詞)」操作,但要使其工作,您必須首先對數據框進行「解組」(或不要先按ID組合) – FlorianGD

回答

0

我喜歡用purrr::map來做modeling for different groups,但是對於你在做的事情,我認爲你可以堅持只用直接dplyr。

你可以設置你的數據幀是這樣的:

library(dplyr) 
library(tidytext) 

Satisfaction <- c("Satisfied", 
        "Satisfied", 
        "Dissatisfied", 
        "Satisfied", 
        "Dissatisfied") 

Text <- c("I'm very satisfied with the services", 
      "Your service providers are always late which causes me a lot of frustration", 
      "You should improve your staff training, service providers have bad customer service", 
      "Everything is great!", 
      "Service is bad") 

Gender <- c("M","M","F","M","F") 

df <- data_frame(Satisfaction, Text, Gender) 

tidy_df <- df %>% 
    mutate(id = row_number()) %>% 
    unnest_tokens(word, Text) 

然後找到,例如,每行字的數量,你可以使用group_bymutate

tidy_df %>% 
    group_by(id) %>% 
    mutate(num_words = n()) %>% 
    ungroup 
#> # A tibble: 37 × 5 
#> Satisfaction Gender id  word num_words 
#>   <chr> <chr> <int>  <chr>  <int> 
#> 1  Satisfied  M  1  i'm   6 
#> 2  Satisfied  M  1  very   6 
#> 3  Satisfied  M  1 satisfied   6 
#> 4  Satisfied  M  1  with   6 
#> 5  Satisfied  M  1  the   6 
#> 6  Satisfied  M  1 services   6 
#> 7  Satisfied  M  2  your  13 
#> 8  Satisfied  M  2 service  13 
#> 9  Satisfied  M  2 providers  13 
#> 10 Satisfied  M  2  are  13 
#> # ... with 27 more rows 

您可以通過實現內做情感分析加盟;退房some examples here

+0

感謝您幫助和例子! – Mike