2016-12-26 35 views
0

我在寫一個簡單的代碼來使用selenium和xpath2.0函數從網頁中抓取數據。Xpath 2.0函數不能在使用Saxon的Java中工作

因爲硒僅支持xpath1.0功能,我想使用Saxon.jar

  1. 我已經下載並提取Saxon9he.jar文件到路徑「C:\ Program Files文件\的Java \ jre1 .8.0_111 \ lib中\ EXT」
  2. 我已創建文件 「jaxp.properties」 用下面的行: javax.xml.transform.TransformerFactory中= net.sf.saxon.TransformerFactoryImpl javax.xml.xpath中。 XPathFactory「,」net.sf.saxon.xpath.XPathFactoryImpl
  3. 還將我的jar文件包含在eclipse庫中。

但是,我無法使用Xpath2.0函數獲取值。

在我的代碼,如果我用

XPathFactory factory = XPathFactory.newInstance(); 

,而不是

XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON); 

我能夠使用xpath1.0功能。但我需要Xpath2.0功能。請在這裏指導我。

我的代碼是:

import java.io.IOException; 
import java.io.StringReader; 

import javax.xml.parsers.DocumentBuilder; 
import javax.xml.parsers.DocumentBuilderFactory; 
import javax.xml.parsers.ParserConfigurationException; 
import javax.xml.xpath.XPath; 
import javax.xml.xpath.XPathConstants; 
import javax.xml.xpath.XPathExpression; 
import javax.xml.xpath.XPathExpressionException; 
import javax.xml.xpath.XPathFactory; 
import javax.xml.xpath.XPathFactoryConfigurationException; 
import javax.xml.xpath.XPathFunctionResolver; 
import javax.xml.xpath.XPathVariableResolver; 

import org.openqa.selenium.WebDriver; 
import org.openqa.selenium.firefox.FirefoxDriver; 
import org.w3c.dom.Document; 
import org.w3c.dom.NodeList; 
import org.xml.sax.InputSource; 
import org.xml.sax.SAXException; 

import net.sf.saxon.lib.NamespaceConstant; 
import net.sf.saxon.xpath.XPathFactoryImpl; 


public class XpathCheckClass { 

public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathFactoryConfigurationException, XPathExpressionException{ 


WebDriver dr = new FirefoxDriver(); 

dr.get("http://s15.a2zinc.net/clients/hartenergy/midstream17/Public/eBooth.aspx?Nav=False&BoothID=137384"); 
try { 

Thread.sleep(3000); 

} catch (Exception e) { 

} 

String source = dr.getPageSource(); 

Document doc = null; 

try { 

DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder(); 

doc = db.parse(new InputSource(new StringReader(source))); 

} catch (Exception e) { 
e.printStackTrace(); 
} 

System.setProperty("javax.xml.xpath.XPathFactory:"+NamespaceConstant.OBJECT_MODEL_SAXON, "net.sf.saxon.xpath.XPathFactoryImpl"); 
XPathFactory factory = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON); 

// XPathFactory factory = XPathFactory.newInstance(); ---> default xpath factory 

XPath xpath = factory.newXPath(); 
XPathExpression expr = xpath.compile("if(//h2) then //h2 else //h1"); 

NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET); 

System.out.println(nodes.getLength()); 

for (int i = 0; i < nodes.getLength(); i++) { 
System.out.println(nodes.item(i).getTextContent()); 
} 


dr.close(); 
} 

} 

回答

1

最近撒克遜的版本不再公佈自己作爲JAXP的XPath服務,所以你需要顯式實例的XPath工廠:

XPathFactory xf = new net.sf.saxon.XPathFactoryImpl(); 
+0

我想補充一點解釋:Saxon JAR沒有將自己公開爲XPath處理器的原因是,太多的應用程序在編寫和測試XPath 1.0時偶然發現它。不幸的是,JAXP接口沒有提供任何方式來說「請找我一個XPath 2.0處理器」。 –