網頁抓取使用BeauitifulSoup錯誤：[錯誤10061]

試圖讓這片代碼工作：（網頁抓取使用BeautifulSoup樣品）網頁抓取使用BeauitifulSoup錯誤：[錯誤10061]

import urllib2  
wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India" 
page = urllib2.urlopen(wiki) 
from bs4 import BeautifulSoup 
soup = BeautifulSoup(page)

我得到這個錯誤： -

URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>

我猜測這是與一些防火牆/安全相關的問題，有人可以幫助應該做什麼？

來源

2016-12-29 Indi

我認爲，你需要設置代理 –

結帳http://stackoverflow.com/questions/1450132/proxy-with-urllib2 –

請求更好地使用 –

你可以嘗試這樣的事情與requests：

import requests 
from bs4 import BeautifulSoup 

wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India" 
page = requests.get(wiki).content 
soup = BeautifulSoup(page)

如果你想拿到桌子，你可以使用熊貓這樣的：

import pandas as pd 

wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India" 
df = pd.read_html(wiki)[1] 
df2 = df.copy() 
df2.columns = df.iloc[0] 
df2.drop(0, inplace=True) 
df2.drop('No.', axis=1, inplace=True) 
df2.head()

輸出：

來源

2016-12-29 10:31:32 MYGz

我最終發生同樣的錯誤： - ConnectionError：HTTPSConnectionPool（host ='en.wikipedia.org'，port = 443）：最大重試次數超過url：/ wiki/List_of_state_and_union_territory_capitals_in_India（由NewConnectionError引起（'：無法建立新連接：[Errno 10061]由於目標機器積極拒絕它'））----當我嘗試第一個片段時。 – Indi

@Indi我必須用代理來做一些事情。閱讀此：https：//github.com/kennethreitz/requests/issues/2875 – MYGz

同樣的錯誤，無論我嘗試 – Indi

網頁抓取使用BeauitifulSoup錯誤：[錯誤10061]

回答

相關問題