2016-08-15 88 views
4

我的代碼...蟒蛇lxml.html.soupparser.fromstring提高惱人的警告

foo = fromstring(my_html) 

它提出了這樣的警告......

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. 

To get rid of this warning, change this: 

BeautifulSoup([your markup]) 

to this: 

BeautifulSoup([your markup], "html.parser") 

    markup_type=markup_type)) 

我試圖傳遞給它的字符串'html.parser'但不起作用,因爲它給我一個錯誤,說該字符串不是可調用的,所以我嘗試html.parser,然後我查看了lxml模塊,看看我能否找到另一個解析器,而不能。我查看了python stdlib,發現在2.7中有一個叫HTMLParser,所以我導入並輸入了beautifulsoup=HTMLParser,那也沒用。

我應該傳遞給fromstring的可調用函數在哪裏?

編輯添加嘗試的解決方案:

from lxml.html.soupparser import fromstring 
wiktionary_page = fromstring(wiktionary_page.read(), features="html.parser") 

from lxml.html.soupparser import BeautifulSoup 
wiktionary_page = fromstring(wiktionary_page.read(), beautifulsoup=lambda s: BeautifulSoup(s, "html.parser")) 

回答

4

您可以通過功能關鍵字將設置解析器。

tree = lxml.html.soupparser.fromstring("<p>foo</p>", features="html.parser") 

fromstring會發生什麼事是_parser被調用,但我認爲這是在該行的錯誤bsargs [ '功能'] = [ 'html.parser'],它應該是bsargs['features'] = 'html.parser'

def _parse(source, beautifulsoup, makeelement, **bsargs): 
    if beautifulsoup is None: 
     beautifulsoup = BeautifulSoup 
    if hasattr(beautifulsoup, "HTML_ENTITIES"): # bs3 
     if 'convertEntities' not in bsargs: 
      bsargs['convertEntities'] = 'html' 
    if hasattr(beautifulsoup, "DEFAULT_BUILDER_FEATURES"): # bs4 
     if 'features' not in bsargs: 
      bsargs['features'] = ['html.parser'] # use Python html parser 
    tree = beautifulsoup(source, **bsargs) 
    root = _convert_tree(tree, makeelement) 
    # from ET: wrap the document in a html root element, if necessary 
    if len(root) == 1 and root[0].tag == "html": 
     return root[0] 
    root.tag = "html" 
    return root 

你也可以使用lambda:

from lxml.html.soupparser import BeautifulSoup 
import lxml.html.soupparser 

tree = lxml.html.soupparser.fromstring("<p>foo</p>", beautifulsoup=lambda s: BeautifulSoup(s, "html.parser")) 

這兩個壓制任何警告:

In [13]: from lxml.html import soupparser 

In [14]: tree = soupparser.fromstring("<p>foo</p>", features="html.parser") 
In [15]: from lxml.html.soupparser import BeautifulSoup 

In [16]: import lxml.html.soupparser 


In [17]: tree = lxml.html.soupparser.fromstring("<p>foo</p>", beautifulsoup=lambda s: BeautifulSoup(s, "html.parser")) 
+0

很好的想法,但是這些都爲我工作 – deltaskelta

+0

對我來說這兩種工作,你使用它完全按照貼? –

+0

我添加了我所嘗試的功能與您的功能相同,據我所知 – deltaskelta