BeautifulSoup4 soup.find（'標記'，text = re.compile（'我的文本'））只有工作有時

我想創建一個具體的方式，從遺留的HTML中拉出文本。BeautifulSoup4 soup.find（'標記'，text = re.compile（'我的文本'））只有工作有時

</table> 
<table border="0" cellpadding="0" cellspacing="0"> 
<tr> 
<td>Close Date:</td> 
<td> June 19, 2008</td>

我的問題是，爲什麼是這樣的：

soup.find('td', text=re.compile('Close'))

回報：

<td>Close Date:</td>

然而，當我嘗試做一些更具體的它沒有返回。

3210

我想使腳本儘可能具體，以便我可以通過多個網頁運行它，而不會出現錯誤的文本。

來源

2016-12-31 Chace Mcguyer

是否'soup.find（ 'td'，text = re.compile（'Close \ s + Date：'））'工作？這將在'Close'和'Date'之間匹配一個或多個空格 - 如果該空間實際上是一個[不間斷空格]（http://stackoverflow.com/q/1357078/190597）（即' '） – unutbu

@unutbu我不相信'\ s'匹配' '。 – DyZ

是的！這確實奏效！很多信息 –

在Close和Date之間可能存在非破壞性空間。在這種情況下，你可以使用\s+匹配1或更多的空格：

print(soup.find('td', text=re.compile('Close\s+Date:')))

例如，

import re 
import bs4 as bs 

content = '''\ 
<table border="0" cellpadding="0" cellspacing="0"> 
<tr> 
<td>Close&nbsp;Date:</td> 
<td> June 19, 2008</td> 
''' 

soup = bs.BeautifulSoup(content, 'lxml') 
print(soup.find('td', text=re.compile('Close\s+Date:')))

產量

<td>Close Date:</td>

來源

2016-12-31 20:30:20 unutbu

BeautifulSoup4 soup.find（'標記'，text = re.compile（'我的文本'））只有工作有時

回答

相關問題