蟒蛇lxml.html.soupparser.fromstring提高惱人的警告

我的代碼...蟒蛇lxml.html.soupparser.fromstring提高惱人的警告

foo = fromstring(my_html)

它提出了這樣的警告......

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. 

To get rid of this warning, change this: 

BeautifulSoup([your markup]) 

to this: 

BeautifulSoup([your markup], "html.parser") 

    markup_type=markup_type))

我試圖傳遞給它的字符串'html.parser'但不起作用，因爲它給我一個錯誤，說該字符串不是可調用的，所以我嘗試html.parser，然後我查看了lxml模塊，看看我能否找到另一個解析器，而不能。我查看了python stdlib，發現在2.7中有一個叫HTMLParser，所以我導入並輸入了beautifulsoup=HTMLParser，那也沒用。

我應該傳遞給fromstring的可調用函數在哪裏？

編輯添加嘗試的解決方案：

from lxml.html.soupparser import fromstring 
wiktionary_page = fromstring(wiktionary_page.read(), features="html.parser")

這

from lxml.html.soupparser import BeautifulSoup 
wiktionary_page = fromstring(wiktionary_page.read(), beautifulsoup=lambda s: BeautifulSoup(s, "html.parser"))

來源

2016-08-15 deltaskelta

您可以通過功能關鍵字將設置解析器。

tree = lxml.html.soupparser.fromstring("<p>foo</p>", features="html.parser")

在fromstring會發生什麼事是_parser被調用，但我認爲這是在該行的錯誤bsargs [ '功能'] = [ 'html.parser']，它應該是bsargs['features'] = 'html.parser' ：

def _parse(source, beautifulsoup, makeelement, **bsargs): 
    if beautifulsoup is None: 
     beautifulsoup = BeautifulSoup 
    if hasattr(beautifulsoup, "HTML_ENTITIES"): # bs3 
     if 'convertEntities' not in bsargs: 
      bsargs['convertEntities'] = 'html' 
    if hasattr(beautifulsoup, "DEFAULT_BUILDER_FEATURES"): # bs4 
     if 'features' not in bsargs: 
      bsargs['features'] = ['html.parser'] # use Python html parser 
    tree = beautifulsoup(source, **bsargs) 
    root = _convert_tree(tree, makeelement) 
    # from ET: wrap the document in a html root element, if necessary 
    if len(root) == 1 and root[0].tag == "html": 
     return root[0] 
    root.tag = "html" 
    return root

你也可以使用lambda：

from lxml.html.soupparser import BeautifulSoup 
import lxml.html.soupparser 

tree = lxml.html.soupparser.fromstring("<p>foo</p>", beautifulsoup=lambda s: BeautifulSoup(s, "html.parser"))

這兩個壓制任何警告：

In [13]: from lxml.html import soupparser 

In [14]: tree = soupparser.fromstring("<p>foo</p>", features="html.parser") 
In [15]: from lxml.html.soupparser import BeautifulSoup 

In [16]: import lxml.html.soupparser 


In [17]: tree = lxml.html.soupparser.fromstring("<p>foo</p>", beautifulsoup=lambda s: BeautifulSoup(s, "html.parser"))

來源

2016-08-15 07:20:00

很好的想法，但是這些都爲我工作 – deltaskelta

對我來說這兩種工作，你使用它完全按照貼？ –

我添加了我所嘗試的功能與您的功能相同，據我所知 – deltaskelta

蟒蛇lxml.html.soupparser.fromstring提高惱人的警告

回答

相關問題