如何使用beautifulsoup

鑑於定義的findAll的HTML標籤嵌套如何使用beautifulsoup

<a href="www.example.com/"></a> 

<table class="theclass"> 
<tr><td> 
<a href="www.example.com/two">two</a> 
</td></tr> 
<tr><td> 
<a href ="www.example.com/three">three</a> 
<span>blabla<span> 
</td></td> 
</table>

我怎樣才能湊僅是內部表類=「階級」？我試過使用

soup = util.mysoupopen(theexample) 
infoText = soup.findAll("table", {"class": "the class"})

但我不知道如何進一步定義發現聲明。我嘗試過的其他方法是將findAll（）的結果轉換爲數組。然後尋找針的出現時間，但我無法找到一致的模式。謝謝

來源

2011-02-07 Julio Diaz

你想要報廢什麼？你說過：「我怎樣才能刮掉桌子裏面的那個=」這個班級「？」你的意思是鏈接？ – karlcow 2011-02-07 20:37:59

如果我理解你的問題。這是應該工作的Python代碼。迭代找到類=「theclass」的所有表，然後查找裏面的鏈接。

>>> foo = """<a href="www.example.com/"></a> 
... <table class="theclass"> 
... <tr><td> 
... <a href="www.example.com/two">two</a> 
... </td></tr> 
... <tr><td> 
... <a href ="www.example.com/three">three</a> 
... <span>blabla<span> 
... </td></td> 
... </table> 
... """ 
>>> import BeautifulSoup as bs 
>>> soup = bs.BeautifulSoup(foo) 
>>> for table in soup.findAll('table', {'class':'theclass'}): 
...  links=table.findAll('a') 
... 
>>> print links 
[<a href="www.example.com/two">two</a>, <a href="www.example.com/three">three</a>]

來源

2011-02-07 20:56:10 karlcow

infoText是一個列表。你應該重複它。

>>>for info in infoText: 
>>> print info.tr.td.a 
<a href="www.example.com/two">two</a>

然後你可以訪問<table>元素。如果您只希望在文檔中使用一個類「theclass」的表格元素，soup.find("table", {"class": "the class"})會直接爲您提供表格。

來源

2011-02-07 19:50:35 zovision

我得到這個錯誤，我不知道這是爲什麼。 `Traceback（最近調用最後一個）：文件「test.py」，第10行，在 print info.tr.td.a 文件「/nfs/home/j/d/jdiaz/cs171/BeautifulSoup.py 「，第402行，在__getattr__ raise AttributeError，」'％s'對象沒有屬性'％s'「％（self .__ class __.__ name__，attr） AttributeError：'NavigableString'對象沒有屬性'tr'` – 2011-02-07 20:04:59

如何使用beautifulsoup

回答

相關問題