當某些頁面不存在href時，通過網頁瀏覽網頁

我正在收集網絡上的評論。有些產品有多頁評論;其他人只有一個頁面。在這裏有幾個人的幫助下，我寫了一段代碼，基本上讓刮板在有一個的時候點擊「下一頁」鏈接。當某些頁面不存在href時，通過網頁瀏覽網頁

我的問題是，當只有一頁的評論，沒有鏈接點擊和刮板持續等待。我希望該程序能夠查看下一頁鏈接是否存在：如果存在，請單擊它，如果不存在，請返回到循環的頂部。

這裏是我的代碼：

for url in list_urls: 
    while True: 
    raw_html = urllib.request.urlopen(url).read() 
    soup = BeautifulSoup(raw_html) 

#See if the "next page" link exists: if it does not, go back to the top of the loop 
    href_test = soup.find('div', id='company_reviews_pagination') 
    if href_test == None: 
     break 

#If next-page link exists, click on it 
    elif href_test != None: 
     last_link = soup.find('div',id='company_reviews_pagination').find_all('a')[-1] 
     if last_link.text.startswith('Next'): 
      next_url_parts = urllib.parse.urlparse(last_link['href']) 
      url = urllib.parse.urlunparse(#code to define the "next-page" url - that part works!) 
     else: 
      break

到目前爲止，它並沒有給我的錯誤，但該程序沒有運行，它一直等待。我究竟做錯了什麼？我應該嘗試使用「try」語句來專門處理此異常嗎？

非常感謝提前。任何指導都非常感謝。

來源

2015-03-19 anne_t

你能分享實際的網址，以重現問題和幫助你嗎？謝謝。 – alecxe 2015-03-19 05:32:13

明白了。沒有什麼好的舊的「嘗試/除外」不能修復。 ;）感謝@alecxe願意幫助。 – 2015-03-19 13:39:15

所以這裏是我如何修復它。而不是玩「如果鏈接存在的條件」，我用try/except：

try: 
     last_link = soup.find('div', id='company_reviews_pagination').find_all('a')[-1] 
     if last_link.text.startswith('Next'): 
     next_url_parts = urllib.parse.urlparse(last_link['href']) 
     url = urllib.parse.urlunparse(#code to find the next-page link) 

     else: 
     break 
    except : 
     break

來源

2015-03-19 13:42:16

當某些頁面不存在href時，通過網頁瀏覽網頁

回答

相關問題