2013-02-08 56 views
0

解析HTML時,每當我有'>'char時,我需要檢查它後面是否有數字。該號碼可以是1位,2位或3位數字。從Java中的HTML字符串解析數字

該代碼似乎沒問題,但我總是得到一個StringIndexOutOfBoundException

代碼:

while (matches < 19) 
    { 
     more = dataInHtml.indexOf(">",index); 
     nextOne = dataInHtml.charAt(more + 1); 
     nextTwo = dataInHtml.charAt(more + 2); 
     nextThree = dataInHtml.charAt(more + 3); 

     if (Character.isDigit(nextOne)) digitOne = true; 
     if (Character.isDigit(nextTwo)) digitTwo = true;  
     if (Character.isDigit(nextThree)) digitThree = true; 

     if (digitThree) 
     { 
      data[matches] = dataInHtml.substring(more + 1, 3); 
      matches++; 
      digitThree = false; 
      digitTwo = false; 
      digitOne = false; 
      index = more + 3; 
      itWasADigit = true; 
     } 

     if (digitTwo) 
     { 
      data[matches] = dataInHtml.substring(more + 1, 2); 
      matches++; 
      digitTwo = false; 
      digitOne = false; 
      index = more + 2; 
      itWasADigit = true; 
     }   

     if (digitOne) 
     { 
      data[matches] = dataInHtml.substring(more + 1, 1); 
      matches++; 
      digitOne = false; 
      index = more + 1; 
      itWasADigit = true; 
     }   

     if (!(itWasADigit))  
     { 
      index = more + 1; 
      itWasADigit = false; 
     } 
    } 
+0

將字符轉換爲ASCII並比較值 – orangegoat 2013-02-08 15:45:12

+0

哪一行正在執行StringIndexOutOfBoundException? – 2013-02-08 15:46:26

+0

data [matches] = dataInHtml.substring(more + 1,2); – Alpan67 2013-02-08 15:48:31

回答

2

如果傳遞字符串 「字符串> 12」 這是什麼會做:

more = dataInHtml.indexOf(">",index); 
    nextOne = dataInHtml.charAt(more + 1); <-- get the 1 
    nextTwo = dataInHtml.charAt(more + 2); <-- Get the 2 
    nextThree = dataInHtml.charAt(more + 3); <-- Try to access outside of the string as more+3 is greater than the highest index in the string, so it crashes out 

因此,你看到StringIndexOutOfBoundsException

使用這樣的

if(dataInHtml.length() > more+3) 

要檢查字符串的長度是試圖訪問一個字符之前不夠大。

如果您試圖從HTML文檔讀取數字,這可能不是理想的方法。如果可能的話,你應該考慮用解析器解析它。

http://jsoup.org/看起來很有希望。

+0

> 12 我有一個像這樣的HTML文件 – Alpan67 2013-02-08 15:50:52

+1

它會因上一個'>'而中斷。看到它後會嘗試訪問太大的字符串索引 – cowls 2013-02-08 15:52:05

+0

我該如何解決它? – Alpan67 2013-02-08 15:53:52