在urlopen（'http .....'）。read（）中讀取（）是做什麼的？ [urllib]

嗨，我正在閱讀「使用Python進行網頁搜索（2015）」。我看到了以下兩種打開網址的方式，使用和不使用.read()。見bs1和bs2在urlopen（'http .....'）。read（）中讀取（）是做什麼的？ [urllib]

from urllib.request import urlopen 
from bs4 import BeautifulSoup 

html = urlopen('http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html') 
bs1 = BeautifulSoup(html.read(), 'html.parser') 

html = urlopen('http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html') 
bs2 = BeautifulSoup(html, 'html.parser') 

bs1 == bs2 # true 


print(bs1.prettify()[0:100]) 
print(bs2.prettify()[0:100]) # prints same thing

所以是.read()多餘的？由於

代碼的Web的P7與蟒蛇scpraing：在P15（使用.read()）

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
html = urlopen("http://www.pythonscraping.com/pages/page1.html") 
bsObj = BeautifulSoup(html.read())

代碼（不.read()）

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
html = urlopen("http://www.pythonscraping.com/pages/warandpeace.html") 
bsObj = BeautifulSoup(html)

來源

2016-03-08 Y Zhang

除了上述問題的答案，我建議你嘗試使用圖書館的請求HTTP請求 HTTP： //docs.python-requests.org/en/latest/ 你會更多地控制HTTP響應 –

謝謝@ A.Romeu你能否引薦我一些帖子以獲得更多信息？我需要在下一步中填寫表單並獲取響應網頁，我計劃使用'mechanize' –

在我發送給您的鏈接中，有關如何使用它的許多信息，請參閱「用戶指南」。您可以直接從http://docs.python-requests.org/en/latest/user/quickstart/#make-a-request –

報價BS docs：

解析文檔，它傳遞到BeautifulSoup構造。您可以傳遞一個字符串或一個開放的文件句柄：

當你使用.read（）方法中，你使用「字符串」 inteface。當你不是的時候，你正在使用「文件句柄」界面。

實際上它的工作原理是一樣的（儘管BS4可能會以懶惰的方式讀取文件類對象）。在你的情況下，整個內容被讀取到字符串對象（它可能會消耗更多的內存）。

來源

2016-03-08 09:38:29

urllib.request.urlopen返回一個類文件對象，該read方法它將返回該網址的響應主體。

BeautifulSoup構造函數接受一個字符串或一個打開的文件句柄，所以是的，read()在這裏是多餘的。

來源

2016-03-08 09:38:17 wong2

沒有BeautifulSoup模塊

.read（）時，不使用「BeautifulSoup」模塊從而使得非冗餘在這種情況下是有用的。只有當你使用.read（），您將獲得html內容，沒有這些你就只能對象由.urlopen（）

一起返回BeautifulSoup模塊

基站模塊有2個構造此功能，一個將接受字符串和其他將接受.urlopen返回的對象（一些現場）

來源

2016-03-08 09:45:01

在urlopen（'http .....'）。read（）中讀取（）是做什麼的？ [urllib]

回答

相關問題