我正在嘗試使用BeautifulSoup v4解析文檔。我叫BeautifulSoup上note.content,這是來自Evernote的API返回的字符串:Google App Engine Python 2.7 + lxml = Unicode ParserError
soup = BeautifulSoup(note.content)
我在app.yaml文件中啓用LXML:
libraries:
- name: lxml
version: "2.3"
注意這對我的地方發展服務器。然而,當部署到谷歌的雲,我得到以下錯誤:
錯誤跟蹤:
Unicode parsing is not supported on this platform
Traceback (most recent call last):
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~ever-blog/1.356951374446096208/controller/blog.py", line 101, in get
soup = BeautifulSoup(note.content)
File "/base/data/home/apps/s~ever-blog/1.356951374446096208/lib/bs4/__init__.py", line 168, in __init__
self._feed()
File "/base/data/home/apps/s~ever-blog/1.356951374446096208/lib/bs4/__init__.py", line 181, in _feed
self.builder.feed(self.markup)
File "/base/data/home/apps/s~ever-blog/1.356951374446096208/lib/bs4/builder/_lxml.py", line 62, in feed
self.parser.feed(markup)
File "parser.pxi", line 1077, in lxml.etree._FeedParser.feed (third_party/apphosting/python/lxml/src/lxml/lxml.etree.c:76196)
ParserError: Unicode parsing is not supported on this platform
UPDATE:
我檢查了parser.pxi,我發現這幾行代碼,其生成的錯誤:
elif python.PyUnicode_Check(data):
if _UNICODE_ENCODING is NULL:
raise ParserError, \
u"Unicode parsing is not supported on this platform"
我認爲必須有一些關於GAE的部署環境導致此錯誤,但我不知道是什麼。
更新2:
因爲BeautifulSoup會自動退到其他解析器,我結束了從我的應用程序移除LXML完全。這樣做解決了這個問題。
你有使用SDK或在生產這個錯誤? (或兩者)。 – proppy 2012-02-20 14:49:23
只限於生產;它可以在使用localhost的SDK上正常工作。 – zzz 2012-02-20 15:06:36
我遇到了同樣的問題,並通過刪除lxml來降落。然而,Python的html.parser的寬容度非常糟糕,我的大部分頁面都不會被解析 – 2012-03-25 23:10:21