2016-04-22 88 views
0

錯誤在哪裏?我想解析我的文本沒有標籤。AttributeError:'ResultSet'對象沒有屬性'find_all'

from bs4 import BeautifulSoup  
import re 
import urllib.request 
f = urllib.request.urlopen("http://www.championat.com/football/news-2442480-orlov-zenit-obespokoen---pole-na-novom-stadione-mozhet-byt-nekachestvennym.html") 

soup = BeautifulSoup(f, 'html.parser') 

soup=soup.find_all('div', class_="text-decor article__contain") 

invalid_tags = ['b', 'i', 'u', 'br', 'a'] 

for tag in invalid_tags: 

    for match in soup.find_all(tag): 

     match.replaceWithChildren() 

soup = ''.join(map(str, soup.contents)) 

print (soup) 

錯誤:

Traceback (most recent call last): 
    File "1.py", line 9, in <module> 
    for match in soup.find_all(tag): 
AttributeError: 'ResultSet' object has no attribute 'find_all' 
+0

您更換'湯'結果集:'湯= soup.find_all('div',class _ =「text-decor article__contain」)'。 resulset只是一個帶有額外引用的列表,可以回溯到原始湯對象。我不清楚你爲什麼用結果集替換'BeautifulSoup'對象,如果你想做一個嵌套搜索使用[CSS選擇器](https://www.crummy.com/software/BeautifulSoup/bs4/ doc /#css-selectors)。 –

+0

你真的想看看[輸出格式化](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#output),不要將對象映射到字符串。 –

回答

0

soup=soup.find_all('div', class_="text-decor article__contain")

在此行中soup成爲ResultSet實例 - 基本上Tag實例的列表。而且,您正在獲取'ResultSet' object has no attribute 'find_all',因爲此ResultSet實例沒有find_all()方法。 FYI,這個問題實際上是在troubleshooting section在文檔中描述:

AttributeError: 'ResultSet' object has no attribute 'foo' - This usually happens because you expected find_all() to return a single tag or string. But find_all() returns a list of tags and strings–a ResultSet object. You need to iterate over the list and look at the .foo of each one. Or, if you really only want one result, you need to use find() instead of find_all() .

你真的想要一個結果,因爲在頁面上的一篇文章:

soup = soup.find('div', class_="text-decor article__contain") 

注意,雖然有沒有必要找標籤一個接一個,你可以直接通過標籤名稱的列表,以find_all() - BeautifulSoup是相當靈活的定位元素:

article = soup.find('div', class_="text-decor article__contain") 

invalid_tags = ['b', 'i', 'u', 'br', 'a'] 
for match in article.find_all(invalid_tags): 
    match.unwrap() # bs4 alternative for replaceWithChildren 
相關問題