2016-02-29 79 views
0

我只想要跨度之外的文本,跨度內沒有任何內容。我當前的代碼給了我這一切:如何使用BeautifulSoup和Python在跨度之後獲取文本?

birthday = bsObj.find("div", {"class":"age"}) 
# <div class="age"><span class="category">Age:</span> 23 (10/21/1992)</div> 
birthday.get_text() 
birthplace = bsObj.find("div", {"class":"hometown"}) 
# <div class="hometown"><span class="category">Birthplace:</span> Barranquilla, Colombia</div> 
birthplace.get_text() 

結果:

"Age: 24 (04/21/1991)","Birthplace: Barranquilla, Colombia" 

期望的結果:

"24 (04/21/1991)","Barranquilla, Colombia" 

回答

3

get_text之前恰好離開跨度()

from bs4 import BeautifulSoup 

html_doc ='<html><body><div class="age"><span class="category">Age:</span> 23 (10/21/1992)</div><div class="hometown"><span class="category">Birthplace:</span> Barranquilla, Colombia</div></body></html>' 

bsObj = BeautifulSoup(html_doc, 'html.parser') 

# <div class="age"><span class="category">Age:</span> 23 (10/21/1992)</div> 
birthday = bsObj.find("div", {"class":"age"}) 
birthday.span.clear() 
print(birthday.get_text()) # 23 (10/21/1992) 

# <div class="hometown"><span class="category">Birthplace:</span> Barranquilla, Colombia</div> 
birthplace = bsObj.find("div", {"class":"hometown"}) 
birthplace.span.clear() 
print(birthplace.get_text()) # Barranquilla, Colombia 
+0

這做到了!謝謝! – macloo

1
  • strip()

from bs4 import BeautifulSoup 

soup = BeautifulSoup('<div class="age"><span class="category">Age:</span> 23 (10/21/1992)</div>', 'html.parser') 
soup.span.clear() 
print(soup.get_text().strip()) 

輸出取出spanclear()

  • 刪除開頭和結尾的空白:

    23 (10/21/1992) 
    
  • 相關問題