2011-10-01 38 views
1

我在Python的文檔中遇到了很多麻煩。有沒有像Mozilla開發者網絡的嗎?Python從URL讀取頁面?更好的文檔?

我正在做一個Python拼圖網站,我需要能夠讀取頁面的內容。我在網站上看到以下內容:

import urllib2 

urlStr = 'http://www.python.org/' 
try: 
    fileHandle = urllib2.urlopen(urlStr) 
    str1 = fileHandle.read() 
    fileHandle.close() 
    print ('-'*50) 
    print ('HTML code of URL =', urlStr) 
    print ('-'*50) 
except IOError: 
    print ('Cannot open URL %s for reading' % urlStr) 
    str1 = 'error!' 

print (str1) 

它一直說沒有urllib2模塊。

Python的文件說

The urllib module has been split into parts and renamed in Python 3.0 to urllib.request, urllib.parse, and urllib.error. The 2to3 tool will automatically adapt imports when converting your sources to 3.0. Also note that the urllib.urlopen() function has been removed in Python 3.0 in favor of urllib2.urlopen().

我試圖導入urllib.request裏過,但它ssays的urllib 2被定義... WTF是怎麼回事?

版本3.2.2

+3

你的Python版本在這個時候會很有用。 – Johnsyweb

+0

已更新。除了給出的Python以外的任何其他Python文檔? – mowwwalker

+1

@Walkerneo:http://docs.python.org/py3k/library/urllib.request.html – icktoofay

回答

4

使用urllib.request.open(),在Dive into Python 3推薦...

Python 3.2.1 (default, Jul 24 2011, 22:21:06) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import urllib.request 
>>> urlStr = 'http://www.python.org/' 
>>> fileHandle = urllib.request.urlopen(urlStr) 
>>> print(fileHandle.read()[:100]) 
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtm' 
+1

謝謝,這個工程。 – mowwwalker

+0

現在file fileHandle包含完整的源代碼。我如何使用fileHandle數據上的xpath來獲得特定的值? –

+0

@DixitSingla:您可以使用'lxml',就像在這個答案:http://stackoverflow.com/a/11466033/78845 – Johnsyweb