2013-02-27 94 views
2

我想用Python 2.7中的lxml解析Evernote Markup Language(ENML)。 ENML是XHTML的超集。在印象筆記XML上的Python LXML解析錯誤

from StringIO import StringIO 
import lxml.etree as etree 

if __name__ == '__main__': 
    xml_str = StringIO('<?xml version="1.0" encoding="UTF-8"?>\r\n<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">\r\n\r\n<en-note style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">\nA really simple example. &nbsp;Another sentence.\n</en-note>') 
    tree = etree.parse(xml_str) 

上面出現了錯誤代碼:

XMLSyntaxError: Entity 'nbsp' not defined, line 5, column 32 

如何成功地解析ENML?

from StringIO import StringIO 
import lxml.html as LH 
if __name__ == '__main__': 
    xml_str = StringIO('<?xml version="1.0" encoding="UTF-8"?>\r\n<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">\r\n\r\n<en-note style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">\nA really simple example. &nbsp;Another sentence.\n</en-note>') 
    tree = LH.parse(xml_str) 
    print(LH.tostring(tree)) 

回答

0

您可以嘗試通過自己的數值取代實體名稱:

+0

或者更好地,通過適當編碼的unicode字符 – simon 2015-03-08 23:30:44