如何從字符串中提取數據正則表達式

爲什麼在此代碼中，我必須重複3次正則表達式才能找到3個單獨的數字？我只想用".*(\\d{10}+).*"找到字符串word中的所有數字，但我不得不重複3次爲什麼這是我做錯了什麼？如何從字符串中提取數據正則表達式

public static void main (String [] args){ 

    String word = " Some random mobile numbers 0546 105 610, 451 518 9675, 54 67892 541"; 
    word = word.replaceAll("\\s+",""); 

    Pattern pat = Pattern.compile(".*(\\d{10}+).*"+".*(\\d{10}+).*"+".*(\\d{10}+).*"); 
    Matcher mat = pat.matcher(word); 

    while (mat.find()) { 
     for (int i = 1; i <= mat.groupCount(); i++) { 
      System.out.println(mat.group(i)); 
     } 
    } 

}

來源

2017-03-06 HelloWorld

你究竟是什麼嘗試匹配？ –

字符串中的3,10個編號字符串「字」 – HelloWorld

@SrikanthA代碼有效，但我想知道爲什麼我必須打印正則表達式代碼。*（\\ d {10} +）。* 3次迭代通過這些組織，它應該只是將它們全部打印出來？ – HelloWorld

這是因爲.*是一個貪婪的模式（見Regex Quantifiers），這意味着它會嘗試儘可能多地從字符串一邊吃一邊仍然得到匹配。所以在你的情況下，它會捕獲除最後一個之外的所有數字。

爲了解決這個問題，你應該擺脫所有匹配的模式.*，因爲find已經可以與你之間的所有匹配。

所以只使用(\\d{10})應該工作。

public static void main (String [] args){ 
    String word = " Some random mobile numbers 0546 105 610, 451 518 9675, 54 67892 541"; 
    word = word.replaceAll("\\s+",""); 

    Pattern pat = Pattern.compile("(\\d{10})"); 
    Matcher mat = pat.matcher(word); 

    while (mat.find()) { 
     for (int i = 1; i <= mat.groupCount(); i++) { 
      System.out.println(mat.group(i)); 
     } 
    } 
}

來源

2017-03-06 01:59:13

非常感謝Thankyou – HelloWorld

貪婪與問題無關 – Bohemian

你介意解釋爲什麼？ –

@Hesham阿提亞的答案是很簡單的解決您的問題，只是它的工作原理不同，以原來的方式一點點的解釋。

讓我們的指數i添加到匹配組代碼：

public static void main(String[] args) throws IOException { 
    String word = " Some random mobile numbers 0546 105 610, 451 518 9675, 54 67892 541"; 
    word = word.replaceAll("\\s+", ""); 

    Pattern pat = Pattern.compile("(\\d{10})"); 
    Matcher mat = pat.matcher(word); 

    while (mat.find()) { 
     for (int i = 1; i <= mat.groupCount(); i++) { 
      System.out.println("Group-" + i + ": " + mat.group(i)); 
     } 
    } 
}

，你會得到的結果是：

第1組：0546105610

集團-1 ：4515189675

Group-1：5467892541

和你pattern的結果是：

第1組：0546105610

組2：4515189675

集團3：5467892541

其實以上代碼與新的pattern "(\\d{10})"相當於以下內容：

public static void main(String[] args) throws IOException { 
    String word = " Some random mobile numbers 0546 105 610, 451 518 9675, 54 67892 541"; 
    word = word.replaceAll("\\s+", ""); 

    Pattern pat = Pattern.compile("\\d{10}"); 
    Matcher mat = pat.matcher(word); 

    while (mat.find()) { 
     System.out.println(mat.group()); 
    } 
}

如果你指的Matcher.find(), Matcher.group(), Matcher.groupCount() javadoc的，你會發現方法Matcher.find()試圖找出給定模式的下一個匹配的子字符串，Matcher.group()返回以前的比賽，並Matcher.groupCount()不包括在整場比賽（這是組0），只有您的模式中指定的捕獲組。

簡單地說，該辦法正則表達式引擎的工作原理是，它會通過與主題序列的模式走，並試圖儘可能多地（貪婪模式）匹配，現在讓我們來談談這些模式之間的差異：

您的原始模式：".*(\\d{10}+).*"+".*(\\d{10}+).*"+".*(\\d{10}+).*"和why you need repeat it three times

如果只有".*(\\d{10}+).*"給出，該模式將整個字符串匹配，匹配的部分是：
- 「Somerandommobilenumbers」匹配標題.*
- 「0546105610」匹配\\d{10}+並進入第1組
- 」，4515189675,5467892541" 匹配拖尾.*
整個字符串已被用於第一次嘗試，並且沒有什麼可以再次匹配的模式，你只是沒有辦法提取第二個和第三個數字，所以你需要重複你的模式，把它們分成幾組。
模式"(\\d{10})"：

它會在每次調用mat.find()時間匹配一個數列，把它放到第1組和回報，那麼你就可以從中提取1組的結果，這就是爲什麼指數組總是
模式"\\d{10}"：

相同與模式2，但不會把匹配結果第1組，這樣你就可以從mat.group()直接得到的結果，實際上它是組0

來源

2017-03-06 03:27:03 shizhz

-1

你真正的問題是您正在使用Pattern，這容易出錯，因爲它需要大量的代碼;這裏是你怎麼做在一個簡單的行：

String[] numbers = word.replaceAll("[^\\d,]", "").split(",");

來源

2017-03-06 09:12:34 Bohemian

如何從字符串中提取數據正則表達式

回答

相關問題