2011-02-14 110 views
0

嗨全部 我有一個java字符串,我想要 1-刪除所有的html標籤,除了新的行標籤<br></br>從它,並保留文本內的標籤,如果有一個文本。 2-解析後的文本結果彼此連接如:text1andtext2,文本之間沒有空格分隔,我也想這樣做。從文本中刪除除<br>以外的所有HTML標記?

這裏是我在做什麼:

String html = "<div dir=\"ltr\">hello my friend<span>ECHO</span><br>how are you ?<br><br><div class=\"gmail_quote\">On Mon, Feb 14, 2011 at 10:45 AM, My Friend <span dir=\"ltr\">&lt;<a href=\"mailto:[email protected]\">[email protected]</a>&gt;</span> wrote:<br> " 
      + "<blockquote class=\"gmail_quote\" style=\"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;\"> "; 
    String parsedText = html.replaceAll("\\<.*?\\>", ""); 
    System.out.println(parsedText); 

電流輸出:

hello my friendECHOhow are you ?On Mon, Feb 14, 2011 at 10:45 AM, My Friend &lt;[email protected]&gt; wrote: 

所需的輸出:

hello my friend ECHO <br> how are you ? <br> <br> On Mon, Feb 14, 2011 at 10:45 AM, My Friend &`lt;[email protected]&gt; wrote:` 
+0

可能重複:http://stackoverflow.com/questions/240546/removing-html-from-a-java-string – Simon 2011-02-14 09:07:17

+0

沒了我不想刪除所有的html標籤,因爲這實際上是代碼的作用,我想刪除除了新行標籤之外的所有html標籤。 – 2011-02-14 09:13:09

回答

4

你可以這樣說:

final String html = 
    "<div dir=\"ltr\">hello my friend<span>ECHO</span><br>how are you ?" + 
    "<br><br><div class=\"gmail_quote\">On Mon, Feb 14, 2011 at 10:45 AM," + 
    " My Friend <span dir=\"ltr\">&lt;<a href=\"mailto:[email protected]" + 
    "main.com\">[email protected]</a>&gt;</span> wrote:<br><bloc" + 
    "kquote class=\"gmail_quote\" style=\"margin: 0pt 0pt 0pt 0.8ex; bord" + 
    "er-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;\"> "; 
final Pattern tagPattern = Pattern.compile("<([^\\s>/]+).*?>"); 
final Matcher matcher = tagPattern.matcher(html); 
final StringBuffer sb = new StringBuffer(html.length()); 
while(matcher.find()){ 
    matcher 
     .appendReplacement(sb, matcher.group(1).equalsIgnoreCase("br") 
      ? matcher.group() 
      : " "); 
} 
matcher.appendTail(sb); 

final String parsedText = sb.toString(); 
System.out.println(parsedText); 

輸出:

hello my friendECHO<br>how are you ?<br><br>On Mon, Feb 14, 2011 at 10:45 AM, 
My Friend &lt;[email protected]&gt; wrote:<br> 

但是我希望你們知道,Cthulhu is calling if you do 。不要用正則表達式解析HTML/XML!

2

我會

  • 用換行符或其他特殊字符替換全部< br />。
  • 刪除所有標籤。
  • 替換爲特殊字符< BR />
相關問題