2012-02-20 65 views
6

我正在嘗試使用BeautifulSoup v4解析文檔。我叫BeautifulSoup上note.content,這是來自Evernote的API返回的字符串:Google App Engine Python 2.7 + lxml = Unicode ParserError

soup = BeautifulSoup(note.content)

我在app.yaml文件中啓用LXML:

libraries: 
- name: lxml 
    version: "2.3" 

注意這對我的地方發展服務器。然而,當部署到谷歌的雲,我得到以下錯誤:

錯誤跟蹤:

Unicode parsing is not supported on this platform 
Traceback (most recent call last): 
    File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__ 
    rv = self.handle_exception(request, response, e) 
    File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__ 
    rv = self.router.dispatch(request, response) 
    File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher 
    return route.handler_adapter(request, response) 
    File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__ 
    return handler.dispatch() 
    File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch 
    return self.handle_exception(e, self.app.debug) 
    File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch 
    return method(*args, **kwargs) 
    File "/base/data/home/apps/s~ever-blog/1.356951374446096208/controller/blog.py", line 101, in get 
    soup = BeautifulSoup(note.content) 
    File "/base/data/home/apps/s~ever-blog/1.356951374446096208/lib/bs4/__init__.py", line 168, in __init__ 
    self._feed() 
    File "/base/data/home/apps/s~ever-blog/1.356951374446096208/lib/bs4/__init__.py", line 181, in _feed 
    self.builder.feed(self.markup) 
    File "/base/data/home/apps/s~ever-blog/1.356951374446096208/lib/bs4/builder/_lxml.py", line 62, in feed 
    self.parser.feed(markup) 
    File "parser.pxi", line 1077, in lxml.etree._FeedParser.feed (third_party/apphosting/python/lxml/src/lxml/lxml.etree.c:76196) 
ParserError: Unicode parsing is not supported on this platform 

UPDATE:

我檢查了parser.pxi,我發現這幾行代碼,其生成的錯誤:

elif python.PyUnicode_Check(data): 
      if _UNICODE_ENCODING is NULL: 
       raise ParserError, \ 
        u"Unicode parsing is not supported on this platform" 

我認爲必須有一些關於GAE的部署環境導致此錯誤,但我不知道是什麼。

更新2:

因爲BeautifulSoup會自動退到其他解析器,我結束了從我的應用程序移除LXML完全。這樣做解決了這個問題。

+0

你有使用SDK或在生產這個錯誤? (或兩者)。 – proppy 2012-02-20 14:49:23

+0

只限於生產;它可以在使用localhost的SDK上正常工作。 – zzz 2012-02-20 15:06:36

+0

我遇到了同樣的問題,並通過刪除lxml來降落。然而,Python的html.parser的寬容度非常糟糕,我的大部分頁面都不會被解析 – 2012-03-25 23:10:21

回答

1

嘗試解析utf-8字符串而不是unicode。