從div中提取特定文本與beautifulsoup4

我解析與BS4和Python 3.5的網頁。試圖只提取從一個div，看起來像這樣的用戶名（鏈接文本）：從div中提取特定文本與beautifulsoup4

<div class="about"><a href="es_viewprofile.aspx?profile_id=110181766">claudiakenzo</a>&nbsp;33&nbsp;&nbsp;&nbsp;&nbsp;Pasar el rato&nbsp;&nbsp;&nbsp;<font color="green">En línea</font></div>

米的目標是讓只有div的第一部分，在這種情況下，字符串「claudiakenzo」

這是我想使用的代碼：

  for link in soup.find_all("div", {'class': 'about'}): 
      print(username = link.text)

理論上我應該得到我想要的東西，但沒有...我得到的輸出：

claudiakenzo 33 Pasar el rato En línea

我不想要「33」，「Pasar el rato」或「Enlínea」部分。我在做什麼錯，什麼是正確的代碼來提取我所需要的？不幸的是，一些用戶名還包含數字，因此使用re很複雜......但我覺得必須有比使用re更簡單的方法來完成此操作。

PS-如果硒問題更容易解決，我也願意嘗試。謝謝！

來源

2017-02-14 skeitel

花一些時間閱讀BS4文檔。與此同時這應該解決您的問題：

for anchor in soup.select('div.about a'): 
    print(anchor.text)

來源

2017-02-14 00:47:07

謝謝。張貼後，我找到了一個解決方案：

username = link.text.split()[0]

這似乎讓我我需要什麼。

來源

2017-02-14 17:41:20 skeitel

從div中提取特定文本與beautifulsoup4

回答

相關問題