lxml解析中的命名空間參數

我有一個我正在嘗試解析的html頁面。這是我在用lxml做的事情：lxml解析中的命名空間參數

node=etree.fromstring(html) 
>>> node 
<Element {http://www.w3.org/1999/xhtml}html at 0x110676a70> 
>>> node.xpath('//body') 
[] 
>>> node.xpath('body') 
[]

不幸的是，我所有的xpath調用現在都返回一個空列表。爲什麼會發生這種情況，我如何解決這個問題？

來源

2015-02-08 David542

它可能是所有的標籤命名空間，因爲你已經猜到了，可能是最簡單的使用HTML解析模塊http://lxml.de/lxmlhtml.html#parsing-html – Anentropic 2015-02-08 20:57:49

否則與命名空間，你會有可以這樣做：'node.xpath（'// html：body'，namespaces = {'html'：'http：//www.w3.org/1999/xhtml'}）' – Anentropic 2015-02-08 20:58:39

您需要使用命名空間前綴同時查詢。像

node.xpath('//html:body', namespaces={'html': 'http://...'})

，或者您可以使用.nsmap

node.xpath('//html:body', namespaces=node.nsmap)

這是假設所有的命名空間由node指出標記定義。這通常適用於大多數xml文檔。

來源

2015-02-08 21:16:01

你可以在這裏添加命名空間，如下所示：

>>> node.xpath('//xmlns:tr', namespaces={'xmlns':'http://www.w3.org/1999/xhtml'}) 
[<Element {http://www.w3.org/1999/xhtml}tr at 0x11067b6c8>, <Element {http://www.w3.org/1999/xhtml}tr at 0x11067b710>]

和更好的方式來做到這將是使用lxml's HTML解析器：

>>> node=lxml.html.fromstring(html) 
>>> node.findall('body') 
[<Element body at 0x1106b8f18>]

來源

2015-02-08 20:58:59 David542

lxml.html.fromstring – syzygy 2016-06-27 06:16:52

lxml解析中的命名空間參數

回答

相關問題