2011-08-12 57 views
2

我是爲了測試我的Web應用程序,從/ dev /隨機到我的網頁前端。此行拋出一個錯誤:從/ dev /隨機引發lxml中的錯誤的隨機文本:所有字符串必須與XML兼容:Unicode或ASCII,無NULL字節

print repr(comment) 
import html5lib 
print html5lib.parse(comment, treebuilder="lxml") 

'a\xef\xbf\xbd\xef\xbf\xbd\xc9\xb6E\xef\xbf\xbd\xef\xbf\xbd`\xef\xbf\xbd]\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd2 \x14\xef\xbf\xbd\xc7\xbe\xef\xbf\xbdy\xcb\x9c\xef\xbf\xbdi1O\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbdZ\xef\xbf\xbd.\xef\xbf\xbd\x17^C' 

Unhandled Error 
    Traceback (most recent call last): 
     File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 893, in _inlineCallbacks 
     result = g.send(result) 
     File "/home/work/random/social/social/item.py", line 389, in _new 
     convId, conv = yield plugin.create(request) 
     File "/home/work/random/social/social/logging.py", line 47, in wrapper 
     ret = func(*args, **kwargs) 
     File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 1014, in unwindGenerator 
     return _inlineCallbacks(None, f(*args, **kwargs), Deferred()) 
    --- <exception caught here> --- 
     File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 893, in _inlineCallbacks 
     result = g.send(result) 
     File "/home/work/random/social/twisted/plugins/status.py", line 63, in create 
     print html5lib.parse(comment, treebuilder="lxml") 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 38, in parse 
     return p.parse(doc, encoding=encoding) 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 211, in parse 
     parseMeta=parseMeta, useChardet=useChardet) 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 111, in _parse 
     self.mainLoop() 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 174, in mainLoop 
     self.phase.processCharacters(token) 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 572, in processCharacters 
     self.parser.phase.processCharacters(token) 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 611, in processCharacters 
     self.parser.phase.processCharacters(token) 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 652, in processCharacters 
     self.parser.phase.processCharacters(token) 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 711, in processCharacters 
     self.parser.phase.processCharacters(token) 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 804, in processCharacters 
     self.parser.phase.processCharacters(token) 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/html5parser.py", line 948, in processCharacters 
     self.tree.insertText(token["data"]) 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/treebuilders/_base.py", line 288, in insertText 
     parent.insertText(data) 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/treebuilders/etree_lxml.py", line 225, in insertText 
     builder.Element.insertText(self, data, insertBefore) 
     File "/usr/local/lib/python2.6/dist-packages/html5lib-0.90-py2.6.egg/html5lib/treebuilders/etree.py", line 114, in insertText 
     self._element.text += data 
     File "lxml.etree.pyx", line 821, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:33308) 

     File "apihelpers.pxi", line 646, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:15287) 

     File "apihelpers.pxi", line 1295, in lxml.etree._utf8 (src/lxml/lxml.etree.c:20212) 

    exceptions.ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes 

我犯了用戶輸入的字符串之前,我這樣做:

comment.decode( 'UTF-8')編碼( 'UTF-8', 「替換」)

但這似乎沒有幫助在這種情況下。

- ABHI

+0

有同樣的錯誤,[這] [1]解決方案爲我解決它。 [1]:http://stackoverflow.com/a/13045920/2938374 –

回答

4

的問題是在XML文本不能包含特定字符主要控制那些與字節值低於32的XML 1.0 Recommendation限定字符作爲

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

的/ dev /無規可以提供不匹配的字節例如控制字符和一些多字節字符。

所以你必須在嘗試任何編碼之前過濾掉這些字節。