2017-03-07 97 views
0

我正在使用R刮亞馬遜的顧客評論,並遇到了一個我希望有人可能有一些洞察力的錯誤。Scraping亞馬遜客戶評論

我注意到R無法從所有評論中刮取指定節點(通過使用SelectorGadget找到)。每次運行腳本時,我都會檢索到不同的數量,但從來不會完整。這是非常令人沮喪的,因爲我們的目標是刮取評論並將它們編譯成csv文件,稍後可以使用R來處理。基本上,如果產品有200條評論,當我運行腳本時,有時我會得到150條評論,有時候會有75條評論評論等 - 但不是整個200.這個問題似乎發生後,我已經完成了重複刮。

我也得到了一些超時錯誤,特別是「open.connection(x,」rb「)中的錯誤:超時已達到」。

我該如何解決這個問題以繼續刮擦?我是初學者,但非常感謝任何幫助或見解!

url <- "http://rads.stackoverflow.com/amzn/click/B009HLOZ9U" 

N_pages <- 204 
A <- NULL 
for (j in 1: N_pages){ 
    pant <- read_html(paste0(url, j)) 
    B <- cbind(pant %>% html_nodes(".review-text") %>%  html_text() ) 
    A <- rbind(A,B) 
} 
tail(A) 


print(j) 

回答

1

這不適合你嗎?

設置URL爲「https://www.amazon.com/Match-Mens-Wild-Cargo-Pants/product-reviews/B009HLOZ9U/ref=cm_cr_arp_d_paging_btm_2?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent&pageNumber=

N_pages <- 204 
A <- NULL 
for (j in 1: N_pages){ 
    pant <- read_html(paste0(url, j)) 
    B <- cbind(pant %>% html_nodes(".review-text") %>%  html_text() ) 
    A <- rbind(A,B) 
} 
tail(A) 
     [,1]                                                                                                                                                      
[1938,] "This is really a good item to get. Trendy, probably you can choose a different color, it fits good but I wouldn't say perfect."                                                                                                                       
[1939,] "I don't write reviews for most products, but I felt the need to do so for these pants for a couple reasons. First, they are great pants! Solid material, well-made, and they fit great. Second, I want to echo those who say you need to go up in size when you order. I wear anywhere from 32-34, depending on the brand. I ordered these in a 36 and they fit like a 33 or 34. I really love the look and feel of these, and will be ordering more!"                                        
[1940,] "I bought the green one before, it is good quality and looks nice, than I purchased the similar one, but the khaki color, but received absolutely different product, different material. really disappointed."                                                                                                   
[1941,] "These pants are great! I have been looking to update my wardrobe with a more edgy style; these cargo pants deliver on that. Paired with some casual sneakers or a decent nubuck leather boot completes the look from the waist down. The lazy-casual look is great when traveling, as are the many pockets. I wore these pants on a recent day trip to NYC and traveled comfortably with essential items contained in the 8 pockets. I placed a second order shortly after my first pair arrived because I like them so much. Shipping and delivery is also fairly fast, considering these pants ship from China!" 
[1942,] "Pants are awesome, just like the picture. The size runs small, so if you order them I would order them bigger than normal. I usually wear a 34inch waist because i dont like my pants snug, these pants fit more like a 32 inch waist.Other than that i love them!"                                                                                      
[1943,] "the good:Pants are made from the durable cotton that has a nice feel; have a lot of useful features and roomy well placed pockets; durable stitching.the bad:Pants will shrink and drier/hot water is not recommended. Would have been better if the cotton was pretreated to prevent shrinking. I would gladly gave up the belt if I wouldn't have to wary about how to wash these pants.the ugly:faux pocket with a zipper. useless feature. on my pair came with a bright gold zipper, unlike a silver in a picture." 
+0

謝謝你這麼多的投入,我認真地欣賞它!然而,這個產品總共有2038條評論,而你的代碼產生了1943條評論(即使它只是從第二頁開始,似乎有100條評論的不足)? – PugFanatic

+0

啊,我只做過驗證評語!如果您希望它成爲所有評論,則需要更改URL中的類型。例如「Type = all_reviews」而不是「Type = avp_only_reviews」。 – ZLevine

+0

哦,好的!令人驚訝的是,錯誤是如此簡單!有時候我一直在從第二頁開始抓,並且仍然得到這個不完整的抓取錯誤,因爲我記得學習嘗試使用第二頁(但不知道爲什麼)。另外,我嘗試了一下抓取的頁面數量,看起來如果我將這個數字增加到包含評論的頁面數量之外,它似乎可行?再次感謝您花時間幫助我!我一直在爲此而苦苦掙扎! – PugFanatic