2015-03-25 123 views
0

我需要檢索的是包含/questions/20702626/javac1-8-class-not-found的href。但輸出我得到下面的代碼是//stackoverflow.com從div標籤檢索第一個href

from bs4 import BeautifulSoup 
import urllib2 

url = "http://stackoverflow.com/search?q=incorrect+operator" 
content = urllib2.urlopen(url).read() 

soup = BeautifulSoup(content) 

for tag in soup.find_all('div'): 
    if tag.get("class")==['summary']: 
     for tag in soup.find_all('div'): 
      if tag.get("class")==['result-link']: 
       for link in soup.find_all('a'): 
         print link.get('href') 
        break; 

回答

1

而不是使嵌套循環,寫CSS selector

for link in soup.select('div.summary div.result-link a'): 
    print link.get('href') 

這不僅是更具可讀性,而且還解決您的問題。它打印:

/questions/11977228/incorrect-answer-in-operator-overloading 
/questions/8347592/sizeof-operator-returns-incorrect-size 
/questions/23984762/c-incorrect-signature-for-assignment-operator 
... 
/questions/24896659/incorrect-count-when-using-comparison-operator 
/questions/7035598/patter-checking-check-of-incorrect-number-of-operators-and-brackets 

附加說明:您可能要考慮使用StackExchange API而不是當前的網絡抓取/ HTML的解析方法。