返回文本中給定位置前後的指定字數

我使用以下代碼時遇到了大問題。我希望它會在找到的關鍵字（針）前後返回n個單詞，但它永遠不會。返回文本中給定位置前後的指定字數

如果我有一文，說

"There is a lot of interesting stuff going on, when someone tries to find the needle in the haystack. Especially if there is anything to see blah blah blah".

而且我有這樣的正則表達式：

"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}\b)needle(\b(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5})"

如果這不完全是給定字符串中匹配針，並返回文本

someone tries to find the needle in the haystack. Especially if

它從來沒有:-(在執行，我的方法總是返回一個空字符串，但我絕對知道，該關鍵字在給定的文本內。

private String trimStringAtWordBoundary(String haystack, int wordsBefore, int wordsAfter, String needle) { 
    if(haystack == null || haystack.trim().isEmpty()){ 
     return haystack ; 
    } 

    String textsegments = ""; 

    String patternString = "((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,"+wordsBefore+"}\b)" + needle + "(\b(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,"+wordsAfter+"})"; 


    Pattern pattern = Pattern.compile(patternString); 
    Matcher matcher = pattern.matcher(haystack); 

    logger.trace(">>> using regular expression: " + matcher.toString()); 

    while(matcher.find()){ 
     logger.trace(">>> found you between " + matcher.regionStart() + " and " + matcher.regionEnd()); 
     String segText = matcher.group(0); // as well tried it with group(1) 
     textsegments += segText + "..."; 
    } 

    return textsegments; 
}

很明顯，問題在於我的正則表達式，但我無法弄清楚它有什麼問題。

來源

2014-09-30 siliconchris

它看起來並不像你表達內計提空白字符，通常你會使用'\ s'在你有'\ b'的地方，也存在於它之前/之後的字符類中......類似於'「（（？：[\ w'\ .-] + \ s）{0，」+ wordsBefore + 「}）」'和後面的類似... – abiessu 2014-09-30 20:28:44

你的正則表達式基本上是好的，但在Java中，你需要躲避\b：

"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}\\b)needle(\\b(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5})"

來源

2014-09-30 20:28:59 wvdz

也許我錯過了一些東西，但是'\\ b'實際上是否佔空白？我認爲還有一個'\\ s'禮物... – abiessu 2014-09-30 20:34:54

\ b是單詞邊界元字符，所以它比空格稍多一點。 – wvdz 2014-09-30 20:36:33

好的，但是在詞語之間的每一個分隔處都不會有兩個邊界嗎？ '\\ b'實際上並不匹配兩個單詞之間的所有可能的空白，因爲它被指定爲「零寬度匹配」？ – abiessu 2014-09-30 20:41:07

返回文本中給定位置前後的指定字數

回答

相關問題