如何找到具有特殊字符的XML標籤在Python BeautifulSoup

我使用Python BeautifulSoup版本3. 我的XML看起來是這樣的（它從DOCX格式）： -如何找到具有特殊字符的XML標籤在Python BeautifulSoup

<w:r w:rsidRPr="00541D75"> 
<w:rPr> 
<w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman"/> 
<w:b/> 
<w:color w:val="1F497D" w:themeColor="text2"/> 
<w:sz w:val="24"/> 
<w:szCs w:val="24"/> 
</w:rPr> 
<w:t>Mandatory/Optional</w:t> 
</w:r> 
</w:p> 
</w:tc> 
</w:tr>

我想解壓出來從標籤 'W：T' 的內容，所以這是我做過什麼： -

print soup.findAll('w:t')

這是我得到的錯誤信息： -

print soup.findAll('w:t') 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 43: ordinal not in range(128)

來源

2014-10-29 lionel319

美麗的對象必須被定義爲如下：

BeautifulSoup(markup, "lxml-xml")

或

BeautifulSoup(markup, "xml")

如doc specified。

來源

2018-02-15 15:08:44

如何找到具有特殊字符的XML標籤在Python BeautifulSoup

回答

相關問題