2017-03-15 124 views
1

我試圖將RSentiment :: calculate_score()應用於存儲在data.frame中的一組語句。下面是我如何得到我的數據:R:RSentiment :: calculate_score()返回「錯誤:參數意味着不同的行數」

install.packages("pacman") 
pacman::p_load(XML, dplyr, tidyr, stringr, rvest, audio, xml2, purrr, tidytext, ggplot2) 

sapiens_code = "1846558239" 
deus_ex_code = "1910701874" 

function_product <- function(prod_code){ 
    url <- paste0("https://www.amazon.co.uk/dp/",prod_code) 
    doc <- xml2::read_html(url) 
    prod <- html_nodes(doc,"#productTitle") %>% html_text() %>% 
    gsub("\n","",.) %>% 
    gsub("^\\s+|\\s+$", "", .) #Remove all white space 
    prod 
} 

sapiens <- function_product(sapiens_code) 
deus_ex <- function_product(deus_ex_code) 

#Source function to Parse Amazon html pages for data 
source("https://raw.githubusercontent.com/rjsaito/Just-R-Things/master/Text%20Mining/amazonscraper.R") 

# extracting reviews 
pages <- 13 

function_page <- function(page_num, prod_code){ 
    url2 <- paste0("http://www.amazon.co.uk/product-reviews/",prod_code,"/?pageNumber=", page_num) 
    doc2 <- read_html(url2) 

    reviews <- amazon_scraper(doc2, reviewer = F, delay = 2) 
    reviews 
} 

sapiens_reviews <- map2(1:pages, sapiens_code, function_page) %>% bind_rows() 

deusex_reviews <- map2(1:pages, deus_ex_code, function_page) %>% bind_rows() 

sapiens_reviews$comments <- gsub("\\.", "\\. ", sapiens_reviews$comments) 
deusex_reviews$comments <- gsub("\\.", "\\. ", deusex_reviews$comments) 

sentence_function <- function(df){ 
    df_sentence <- df %>% 
    select(comments, format, stars, helpful) %>% 
    unnest_tokens(sentence, comments, token = "sentences") 
    df_sentence 
} 

sapiens_sentence <- sentence_function(sapiens_reviews) 
deusex_sentence <- sentence_function(deusex_reviews) 

但是當我嘗試分配分數給他們,我收到一個錯誤:

deusex_sentence <- deusex_sentence %>% 
    mutate(sentence_score <- unname(calculate_score(sentence))) 

Error: arguments imply differing number of rows: 34, 33

我看不到任何東西根本不對的我的輸入格式和隨機挑選的句子的輸出似乎很好,例如

unname(calculate_score(sapiens_sentence[1, 4])) 
[1] -1 

任何想法如何解決這個問題?非常感謝您的幫助!

回答

1

事實證明,問題是由句子中的特殊字符引起的。刪除它們後,我可以成功運行情感分析(我在功能中加入了數據清理步驟):

sentence_function <- function(df){ 
    df_sentence <- df %>% 
    select(comments, format, stars, helpful) %>% 
    unnest_tokens(sentence, comments, token = "sentences") %>% 
    mutate(sentence2 = str_replace_all(sentence, "[^[:alnum:]]", " ")) #removing all special characters 

    df_sentence <- df_sentence %>% 
    mutate(sentence_score = unname(calculate_score(sentence2))) 

    df_sentence 
} 

# go and get a hot drink while this is running 
sapiens_sentence <- sentence_function(sapiens_reviews) 
deusex_sentence <- sentence_function(deusex_reviews) 
+0

感謝您更新您的問題,這對我幫助很大! – janfreyberg

相關問題