Scraping亞馬遜客戶評論

我正在使用R刮亞馬遜的顧客評論，並遇到了一個我希望有人可能有一些洞察力的錯誤。Scraping亞馬遜客戶評論

我注意到R無法從所有評論中刮取指定節點（通過使用SelectorGadget找到）。每次運行腳本時，我都會檢索到不同的數量，但從來不會完整。這是非常令人沮喪的，因爲我們的目標是刮取評論並將它們編譯成csv文件，稍後可以使用R來處理。基本上，如果產品有200條評論，當我運行腳本時，有時我會得到150條評論，有時候會有75條評論評論等 - 但不是整個200.這個問題似乎發生後，我已經完成了重複刮。

我也得到了一些超時錯誤，特別是「open.connection（x，」rb「）中的錯誤：超時已達到」。

我該如何解決這個問題以繼續刮擦？我是初學者，但非常感謝任何幫助或見解！

url <- "http://rads.stackoverflow.com/amzn/click/B009HLOZ9U" 

N_pages <- 204 
A <- NULL 
for (j in 1: N_pages){ 
    pant <- read_html(paste0(url, j)) 
    B <- cbind(pant %>% html_nodes(".review-text") %>%  html_text() ) 
    A <- rbind(A,B) 
} 
tail(A) 


print(j)

來源

2017-03-07 PugFanatic

這不適合你嗎？

設置URL爲「https://www.amazon.com/Match-Mens-Wild-Cargo-Pants/product-reviews/B009HLOZ9U/ref=cm_cr_arp_d_paging_btm_2?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent&pageNumber=」

N_pages <- 204 
A <- NULL 
for (j in 1: N_pages){ 
    pant <- read_html(paste0(url, j)) 
    B <- cbind(pant %>% html_nodes(".review-text") %>%  html_text() ) 
    A <- rbind(A,B) 
} 
tail(A) 
     [,1]                                                                                                                                                      
[1938,] "This is really a good item to get. Trendy, probably you can choose a different color, it fits good but I wouldn't say perfect."                                                                                                                       
[1939,] "I don't write reviews for most products, but I felt the need to do so for these pants for a couple reasons. First, they are great pants! Solid material, well-made, and they fit great. Second, I want to echo those who say you need to go up in size when you order. I wear anywhere from 32-34, depending on the brand. I ordered these in a 36 and they fit like a 33 or 34. I really love the look and feel of these, and will be ordering more!"                                        
[1940,] "I bought the green one before, it is good quality and looks nice, than I purchased the similar one, but the khaki color, but received absolutely different product, different material. really disappointed."                                                                                                   
[1941,] "These pants are great! I have been looking to update my wardrobe with a more edgy style; these cargo pants deliver on that. Paired with some casual sneakers or a decent nubuck leather boot completes the look from the waist down. The lazy-casual look is great when traveling, as are the many pockets. I wore these pants on a recent day trip to NYC and traveled comfortably with essential items contained in the 8 pockets. I placed a second order shortly after my first pair arrived because I like them so much. Shipping and delivery is also fairly fast, considering these pants ship from China!" 
[1942,] "Pants are awesome, just like the picture. The size runs small, so if you order them I would order them bigger than normal. I usually wear a 34inch waist because i dont like my pants snug, these pants fit more like a 32 inch waist.Other than that i love them!"                                                                                      
[1943,] "the good:Pants are made from the durable cotton that has a nice feel; have a lot of useful features and roomy well placed pockets; durable stitching.the bad:Pants will shrink and drier/hot water is not recommended. Would have been better if the cotton was pretreated to prevent shrinking. I would gladly gave up the belt if I wouldn't have to wary about how to wash these pants.the ugly:faux pocket with a zipper. useless feature. on my pair came with a bright gold zipper, unlike a silver in a picture."

來源

2017-03-07 19:02:46 ZLevine

謝謝你這麼多的投入，我認真地欣賞它！然而，這個產品總共有2038條評論，而你的代碼產生了1943條評論（即使它只是從第二頁開始，似乎有100條評論的不足）？ – PugFanatic

啊，我只做過驗證評語！如果您希望它成爲所有評論，則需要更改URL中的類型。例如「Type = all_reviews」而不是「Type = avp_only_reviews」。 – ZLevine

哦，好的！令人驚訝的是，錯誤是如此簡單！有時候我一直在從第二頁開始抓，並且仍然得到這個不完整的抓取錯誤，因爲我記得學習嘗試使用第二頁（但不知道爲什麼）。另外，我嘗試了一下抓取的頁面數量，看起來如果我將這個數字增加到包含評論的頁面數量之外，它似乎可行？再次感謝您花時間幫助我！我一直在爲此而苦苦掙扎！ – PugFanatic

Scraping亞馬遜客戶評論

回答

相關問題