使用BeautifulSoup解析網頁 - 跳過404錯誤頁面

我使用下面的代碼來獲取網站的標題。使用BeautifulSoup解析網頁 - 跳過404錯誤頁面

from bs4 import BeautifulSoup 
import urllib2 

line_in_list = ['www.dailynews.lk','www.elpais.com','www.dailynews.co.zw'] 

for websites in line_in_list: 
    url = "http://" + websites 
    page = urllib2.urlopen(url) 
    soup = BeautifulSoup(page.read()) 
    site_title = soup.find_all("title") 
    print site_title

如果網站的列表中包含一個「壞」（不存在）的網站/網頁或網站有某種或錯誤，例如「404頁找不到」等，該腳本將打破和停止。

以什麼方式我可以有腳本忽略/跳過「壞」（不存在）和有問題的網站/網頁？

來源

2014-06-20 Mark K

line_in_list = ['www.dailynews.lk','www.elpais.com',"www.no.dede",'www.dailynews.co.zw'] 

for websites in line_in_list: 
    url = "http://" + websites 
    try: 
     page = urllib2.urlopen(url) 
    except Exception, e: 
     print e 
     continue 

    soup = BeautifulSoup(page.read()) 
    site_title = soup.find_all("title") 
    print site_title 

[<title>Popular News Items | Daily News Online : Sri Lanka's National News</title>] 
[<title>EL PAÍS: el periódico global</title>] 
<urlopen error [Errno -2] Name or service not known> 
[<title> 
DailyNews - Telling it like it is 
</title>]

來源

2014-06-20 08:05:17

感謝帕德里克·坎寧安，它的速度與激情！ –

不用擔心，不客氣。 –

使用BeautifulSoup解析網頁 - 跳過404錯誤頁面

回答

相關問題