Android：提取兩個HTML標記之間的文本

我需要提取兩個HTML標記之間的文本並將其存儲在一個字符串中。我想解析HTML的一個例子如下：Android：提取兩個HTML標記之間的文本

<div id=\"swiki.2.1\"> THE TEXT I NEED </div>

我已經在Java中使用模式(swiki\.2\.1\\\")(.*)(\/div)，並得到我從組$二要串做到了這一點。但是，這不會在Android中工作。當我去打印$ 2的內容時什麼也沒有出現，因爲匹配失敗。

有沒有人在android中使用正則表達式有類似的問題，或者有更好的方法（非正則表達式）來解析HTML頁面。再次，這在標準的Java測試程序中工作得很好。任何幫助將不勝感激！

來源

2012-01-03 B. Bowles

http://jsoup.org/應該有Android的版本......並且關於你的錯誤/匹配失敗......也許你正在加載這個網站的移動版本的設備...... – Selvin 2012-01-03 09:55:31

那是非常好的一點。不過，我剛剛檢查了HTML，我在網站的移動版本中查找的內容是相同的。我現在將查看該鏈接並稍後回覆。謝謝 – 2012-01-03 10:20:29

對於HTML的解析，東西我總是用HtmlCleaner：http://htmlcleaner.sourceforge.net/

真棒LIB與XPath和Android的路線的偉大工程。 :-)

這表明如何從網址下載一個XML和解析它從一個XML屬性（也顯示在文檔）獲得一定值：

public static String snapFromHtmlWithCookies(Context context, String xPath, String attrToSnap, String urlString, 
        String cookies) throws IOException, XPatherException { 
      String snap = ""; 

      // create an instance of HtmlCleaner 
      HtmlCleaner cleaner = new HtmlCleaner(); 

      // take default cleaner properties 
      CleanerProperties props = cleaner.getProperties(); 

      props.setAllowHtmlInsideAttributes(true); 
      props.setAllowMultiWordAttributes(true); 
      props.setRecognizeUnicodeChars(true); 
      props.setOmitComments(true); 

      URL url = new URL(urlString); 

      HttpURLConnection connection = (HttpURLConnection) url.openConnection(); 
      connection.setDoOutput(true); 

      // optional cookies 
      connection.setRequestProperty(context.getString(R.string.cookie_prefix), cookies); 
      connection.connect(); 

      // use the cleaner to "clean" the HTML and return it as a TagNode object 
      TagNode root = cleaner.clean(new InputStreamReader(connection.getInputStream())); 

      Object[] foundNodes = root.evaluateXPath(xPath); 

      if (foundNodes.length > 0) { 
        TagNode foundNode = (TagNode) foundNodes[0]; 
        snap = foundNode.getAttributeByName(attrToSnap); 
      } 

      return snap; 
    }

只是爲了您的需要進行修改。 :-)

來源

2012-01-03 12:19:48 einschnaehkeee

如果你想從你的例子中得到一個標籤的文本值：

THE TEXT I NEED

你需要檢查ContentNode並通過content.getContent（）。toString（）獲取文本值; – einschnaehkeee 2012-01-03 12:31:16

Android：提取兩個HTML標記之間的文本

回答

相關問題