2
我想用Python 2.7中的lxml解析Evernote Markup Language(ENML)。 ENML是XHTML的超集。在印象筆記XML上的Python LXML解析錯誤
from StringIO import StringIO
import lxml.etree as etree
if __name__ == '__main__':
xml_str = StringIO('<?xml version="1.0" encoding="UTF-8"?>\r\n<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">\r\n\r\n<en-note style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">\nA really simple example. Another sentence.\n</en-note>')
tree = etree.parse(xml_str)
上面出現了錯誤代碼:
XMLSyntaxError: Entity 'nbsp' not defined, line 5, column 32
如何成功地解析ENML?
from StringIO import StringIO
import lxml.html as LH
if __name__ == '__main__':
xml_str = StringIO('<?xml version="1.0" encoding="UTF-8"?>\r\n<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">\r\n\r\n<en-note style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">\nA really simple example. Another sentence.\n</en-note>')
tree = LH.parse(xml_str)
print(LH.tostring(tree))
或者更好地,通過適當編碼的unicode字符 – simon 2015-03-08 23:30:44