2012-04-02 144 views
1

我正在編寫從網站獲取鏈接的python腳本。但是,當我嘗試這web page我無法獲得鏈接。我的腳本是:Python無法從網頁獲取鏈接

soup = BeautifulSoup(urllib2.urlopen(url)) 

datas = soup.findAll('div', attrs={'class':'tsrImg'}) 
for data in datas: 
    link = data.find('a') 
    print str(link.href) 

它只打印無,任何人都可以解釋爲什麼它是這樣嗎?

回答

5

變化:

str(link.href) 

有了:

link.get('href') 

它看起來是這樣的:

from BeautifulSoup import BeautifulSoup 
import urllib2 

url = 'http://www.meinpaket.de/de/shopsList.html?page=1' 
soup = BeautifulSoup(urllib2.urlopen(url)) 
datas = soup.findAll('div', {'class':'tsrImg'}) 
for data in datas: 
    link = data.find('a') 
    print link.get('href') 

輸出:

/de/~-office-partner-gmbh-;jsessionid=11957F27FC2D888A34532D9848C922FB.as03 
/de/~-24selling-de;jsessionid=11957F27FC2D888A34532D9848C922FB.as03 
/de/~abalisi-kuenstlerbedarf-shop;jsessionid=11957F27FC2D888A34532D9848C922FB.as03 
/de/~abcmeineverpackung-de-kg;jsessionid=11957F27FC2D888A34532D9848C922FB.as03 
/de/~ability;jsessionid=11957F27FC2D888A34532D9848C922FB.as03 
/de/~ac-foto-handels-gmbh;jsessionid=11957F27FC2D888A34532D9848C922FB.as03 
/de/~ac-sat-corner-inh-dirk-hahn;jsessionid=11957F27FC2D888A34532D9848C922FB.as03 
/de/~adamo-fashion-gmbh-shop;jsessionid=11957F27FC2D888A34532D9848C922FB.as03 
/de/~adapter-markt;jsessionid=11957F27FC2D888A34532D9848C922FB.as03 
/de/~adko;jsessionid=11957F27FC2D888A34532D9848C922FB.as03