＃2使用正則表達式

我使用亞馬遜網絡服務做在MapReduce的一個項目劈裂字符串時，我有這樣的錯誤：＃2使用正則表達式

FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.StackOverflowError at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)

我讀了一些其他問題，以瞭解爲什麼會這樣，似乎我的正則表達式有重複的替代路徑。這是正則表達式：

\\s+(?=(?:(?<=[a-zA-Z])\"(?=[A-Za-z])|\"[^\"]*\"|[^\"])*$)

它所做的是，它通過分割空間時，他們是這些符號< >或這些" "裏面除了。所以基本上就是在這兩種符號內部的字符串。我嘗試過很多其他的版本，但都沒有成功，所以我離最佳版本很遠。我有點失落，這是我第一次使用這些複雜的正則表達式。有人可以給我的正則表達式更好的選擇嗎？

我真的很感謝每一個反饋！

編輯：
此字符串與內部內<網址>和文本「」和空格：
< \ janhaeussler.com/ sioc_type =用戶& sioc_id = 1 /> 「HEY」 <。組織/ 1999/2月22日 - RDF-語法-NS＃類型/>

應產生這些3個字符串：？
1. < \ janhaeussler.com/ sioc_type =用戶& sioc_id = 1 />（有或沒有<>）
2.「HEY」
3. <。組織/ 1999/02/22 RDF-語法-NS＃類型/>

編輯2：
我認爲符號<>是混亂。我試圖找到一個由一個或多個空格分開的正則表達式，而不考慮「」中的空格，因爲這些URL沒有空格。

來源

2017-07-02 Angie94

需要分別提供一些格式化的輸入和預期的輸出。這樣可以更好地理解和提供替代解決方案 –

試試這個：

\s+(?=(?:(?:[^"]*"){2})*[^"]*$)

Demo

String string = "abc d<\\janhaeussler.com/?sioc_type=user &sioc_id=1/> \"HEY 1\" 2 3 <.org/1999/02/22-rdf-syntax-ns#type/> \"tra la\" <asdfadsf sadfasdf/> 4 \"sdf sdf\" 5 6"; 
    String[] res=string.split("\\s+(?=(?:(?:[^\"]*\"){2})*[^\"]*$)"); 
    System.out.println(Arrays.toString(res));

將輸出：

[abc, d<\janhaeussler.com/?sioc_type=user, &sioc_id=1/>, "HEY 1", 2, 3, <.org/1999/02/22-rdf-syntax-ns#type/>, "tra la", <asdfadsf, sadfasdf/>, 4, "sdf sdf", 5, 6]

來源

2017-07-02 09:02:17

這不是在我的例子工作，雖然這種方法是我在尋找。在我的輸出我有空格和「字符串」沒有劃分。更簡單地說，我需要一個正則表達式，除了空格內的空格之外，還有**。 – Angie94

更新後的正則表達式不會忽略「」中的空格。字符串正逐字分開。 @ RizwanM.Tuman – Angie94

@ Angie94我希望你的要求會有更清晰的地方，現在嘗試..並讓我知道是你想要的東西 –

不要使用split()。使用find()循環，而不是與此正則表達式：

(?:<[^<]*> 
    | 
    "[^"]*" 
    | 
    \S 
    )+

例子：

String input = "<\\janhaeussler.com/?sioc_type=user&sioc_id=1/> \"HEY\" <.org/1999/02/22-rdf-syntax-ns#type/>"; 

Pattern p = Pattern.compile("(?:<[^<]*>|\"[^\"]*\"|\\S)+"); 
for (Matcher m = p.matcher(input); m.find();) { 
    System.out.println(m.group()); 
}

輸出

<\janhaeussler.com/?sioc_type=user&sioc_id=1/> 
"HEY" 
<.org/1999/02/22-rdf-syntax-ns#type/>

來源

2017-07-02 09:07:39 Andreas

這工作得很好，但我想使用split（）出於某些原因。 – Angie94

你可以嘗試搭配：標籤或什麼雙引號之間的OR其餘的非空白。

<[^>]+>|"[^"]+"|\S+

例如：

String str = "<\\janhaeussler.com/?sioc_type=user&sioc_id=1/> \"HEY\" YOU! \"How Are You?\" <.org/1999/02/22-rdf-syntax-ns#type/>"; 

final java.util.regex.Pattern pattern = java.util.regex.Pattern.compile("<[^>]+>|\"[^\"]+\"|\\S+"); 
java.util.regex.Matcher matcher = pattern.matcher(str); 

while (matcher.find()) { 
    System.out.println("match: " + matcher.group(0)); 
}

打印：

match: <\janhaeussler.com/?sioc_type=user&sioc_id=1/> 
match: "HEY" 
match: YOU! 
match: "How Are You?" 
match: <.org/1999/02/22-rdf-syntax-ns#type/>

來源

2017-07-02 09:08:36 LukStorms

＃2使用正則表達式

回答

相關問題