我想用HtmlUnit解析Feedburner提要。 飼料是這一個:http://feeds.feedburner.com/alcoanewsreleases使用HtmlUnit在XPath中選擇默認名稱空間
從這個飼料我想讀的所有項目節點,所以通常是//item
的XPath應該做的伎倆。不幸的是,在這種情況下不起作用。
Groovy代碼片段:
def page = webClient.getPage("http://feeds.feedburner.com/alcoanewsreleases")
def elements = page.getByXPath("//item")
的示例XML資訊:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss1full.xsl"?>
<?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://purl.org/rss/1.0/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
[...SNIP...]
<item rdf:about="http://www.alcoa.com/global/en/news/news_detail.asp?newsYear=2011&pageID=20110518006002en">
<title>Chris L. Ayers Named President, Alcoa Global Primary Products</title>
<dc:date>2011-05-18</dc:date
<link>http://feedproxy.google.com/~r/alcoanewsreleases/~3/PawvdhpJrkc/news_detail.asp</link>
<description>NEW YORK--(BUSINESS WIRE)--Alcoa (NYSE:AA) announced today that Chris L. Ayers has been named President of Alcoa’s Global Primary Products (GPP) business, effective May 18, 2011. Ayers, previously Chief Operating Officer of GPP, succeeds John Thuestad, who will be handling special projects for the Company. Ayers joined Alcoa in February 2010 as Chief Operating Officer of Alcoa Cast, Forged and Extruded Products, a new position. He was elected a Vice President of Alcoa in April 2010 and Executive</description>
<feedburner:origLink xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://www.alcoa.com/global/en/news/news_detail.asp?newsYear=2010&pageID=20100104006194en</feedburner:origLink>
</item>
[...SNIP...]
</rdf:RDF>
我懷疑這是與命名空間的問題,因爲這個文件有4個命名空間。的命名空間是
- (這是默認值)的xmlns = 「http://purl.org/rss/1.0/」
- 的xmlns:RDF =「http://www.w3.org/1999/02/22-rdf-syntax-ns#「
- xmlns:dc =」http://purl.org/dc/elements/1.1/「
- xmlns:feedburner =」http:// rssnamespace .org/feedburner/ext/1.0「
我嘗試過使用Nokogiri(這是另一個用於ruby腳本的XML解析器)。 與Nokogiri我可以只是我們的XPath //xmlns:item
工作並返回從飼料中的所有節點。
我已經嘗試過與HtmlUnit相同的XPath,但它不起作用。
所以我想我可以將我的問題解釋爲: 如何從HtmlUnit的默認命名空間中選擇一個節點?
任何想法?
謝謝您的** **非常詳細的解答! XPath的'//:item'確實可以和HtmlUnit一起工作,儘管不像你所描述的那樣推薦實踐。 – spier 2011-05-25 19:07:50