如何通過文本內容獲取HTML DOM路徑？

一個HTML文件：如何通過文本內容獲取HTML DOM路徑？

<html> 
    <body> 
     <div class="main"> 
      <p id="tID">content</p> 
     </div> 
    </body> 
</html>

我有一個字符串== "content"，

我想用"content" GET HTML DOM路徑：

html body div.main p#tID

Chrome開發者工具有這個功能（要素標籤，底部欄），我想知道如何在java中做到這一點？

感謝您的幫助:)

來源

2010-09-04 Koerr

您是指java或javascript？ – aularon 2010-09-04 01:14:33

java，not javascript – Koerr 2010-09-04 01:36:18

玩得開心:)

Java代碼

import java.io.File; 

import javax.xml.xpath.XPath; 
import javax.xml.xpath.XPathConstants; 
import javax.xml.xpath.XPathFactory; 

import org.htmlcleaner.CleanerProperties; 
import org.htmlcleaner.DomSerializer; 
import org.htmlcleaner.HtmlCleaner; 
import org.htmlcleaner.TagNode; 
import org.w3c.dom.Document; 
import org.w3c.dom.NamedNodeMap; 
import org.w3c.dom.Node; 



public class Teste { 

    public static void main(String[] args) { 
     try { 
      // read and clean document 
      TagNode tagNode = new HtmlCleaner().clean(new File("test.xml")); 
      Document document = new DomSerializer(new CleanerProperties()).createDOM(tagNode); 

      // use XPath to find target node 
      XPath xpath = XPathFactory.newInstance().newXPath(); 
      Node node = (Node) xpath.evaluate("//*[text()='content']", document, XPathConstants.NODE); 

      // assembles jquery/css selector 
      String result = ""; 
      while (node != null && node.getParentNode() != null) { 
       result = readPath(node) + " " + result; 
       node = node.getParentNode(); 
      } 
      System.out.println(result); 
      // returns html body div#myDiv.foo.bar p#tID 

     } catch (Exception e) { 
      e.printStackTrace(); 
     } 
    } 

    // Gets id and class attributes of this node 
    private static String readPath(Node node) { 
     NamedNodeMap attributes = node.getAttributes(); 
     String id = readAttribute(attributes.getNamedItem("id"), "#"); 
     String clazz = readAttribute(attributes.getNamedItem("class"), "."); 
     return node.getNodeName() + id + clazz; 
    } 

    // Read attribute 
    private static String readAttribute(Node node, String token) { 
     String result = ""; 
     if(node != null) { 
      result = token + node.getTextContent().replace(" ", token); 
     } 
     return result; 
    } 

}

XML實例

<html> 
    <body> 
     <br> 
     <div id="myDiv" class="foo bar"> 
      <p id="tID">content</p> 
     </div> 
    </body> 
</html>

個

解釋

對象document點評估XML。
XPath //*[text()='content']發現everthing與text ='content'，並找到該節點。
while循環到第一個節點，獲取當前元素的id和類。

更多的解釋

在我使用HtmlCleaner這一新的解決方案。因此，例如，您可以有<br>，清潔劑將替換爲<br/>。
要使用HtmlCleaner，只需下載最新的罐子here。

來源

2010-09-04 04:21:24 Topera

但它不是XML文檔，如果有'
'或其他標籤沒有結束標籤，它將不能解析.'org.xml.sax.SAXParseException：元素類型「br」必須終止匹配的結束標籤「
」。「 – Koerr 2010-09-04 12:08:17

獲得節點，而父母，這種解決方案是好的。謝謝:) – Koerr 2010-09-04 12:13:28

我編輯了我的答案，使用格式不正確的XML。看一看。 – Topera 2010-09-04 14:12:30

如何通過文本內容獲取HTML DOM路徑？

回答

相關問題