問題背景：

我有我導入到BeautifulSoup和通過解析XML文件。一個節點有以下幾點：處理``在Python

<DIAttribute name="ObjectDesc" value="Line1&#xD;&#xA;Line2&#xD;&#xA;Line3"/>

注意，此數值和文本中
。我知道這些是回車和換行符的XML表示。

當我導入BeautifulSoup，價值被轉換成如下：

<DIAttribute name="ObjectDesc" value="Line1 
Line2 
Line3"/>

你會發現
被轉換成一個換行符。

我的用例要求值保持原來的值。任何想法如何讓它留下來？還是將其轉換回來？

源碼：

蟒：（2.7.11）

from bs4 import BeautifulSoup #version 4.4.0 
s = BeautifulSoup(open('test.xml'),'lxml-xml',from_encoding="ansi") 
print s.DIAttribute 

#XML file looks like 
''' 
<?xml version="1.0" encoding="UTF-8" ?> 
<DIAttribute name="ObjectDesc" value="Line1&#xD;&#xA;Line2&#xD;&#xA;Line3"/> 
'''

記事本++表示源XML文件的編碼是ANSI。

事情我已經嘗試：

我已經沖刷的文檔沒有任何成功。

變化爲3行：

print s.DIAttribute.prettify('ascii') 
print s.DIAttribute.prettify('windows-1252') 
print s.DIAttribute.prettify('ansi') 
print s.DIAttribute.prettify('utf-8') 
print s.DIAttribute['value'].replace('\r','&#xD;').replace('\n','&#xA;') #This works, but it feels like a bandaid and will likely other problems will remain.

任何想法嗎？我很欣賞任何意見/建議。

來源

2016-03-08 all about data

你可以在解析之前用一些原始字符串替換它們，最後按照它應該的那樣對待它們。 – josifoski

只是爲了記錄，第一庫，切勿妥善處理
實體：BeautifulSoup(data ,convertEntities=BeautifulSoup.HTML_ENTITIES)，lxml.html.soupparser.unescape，xml.sax.saxutils.unescape

這是什麼工作（在Python 2.x中）：

import sys 
import HTMLParser 

## accept file name as argument, or read stdin if nothing passed 
data = len(sys.argv) > 1 and open(sys.argv[1]).read() or sys.stdin.read() 

parser = HTMLParser.HTMLParser() 
print parser.unescape(data)

來源

2016-07-04 15:09:44 ccpizza

處理``在Python

問題背景 ：

源碼：

蟒：（2.7.11）

事情我已經嘗試：

變化爲3行：

回答

相關問題

問題背景：