2013-05-05 106 views
7

我想要獲取圍繞字符串中某個位置的單詞。例如之前的兩個單詞和之前的兩個單詞。在字符串中圍繞某個位置獲取單詞

例如,考慮字符串:

String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother."; 
String find = "I"; 

for (int index = str.indexOf("I"); index >= 0; index = str.indexOf("I", index + 1)) 
{ 
    System.out.println(index); 
} 

此寫出的這裏所說的「我」是索引。但是我希望能夠得到圍繞這些位置的單詞的子串。

我希望能夠打印出「約翰和我喜歡」和「徒步我有兩個」。

不僅應該能夠選擇單個字符串。搜索「約翰和」將返回「名字是約翰,我喜歡」。

有沒有這樣做的整潔,聰明的方式?

+0

如何確定周邊的話呢? – 2013-05-05 18:59:44

+0

是的,這是一個問題,如何獲得startPos,以便在子字符串之前和之後的2個單詞中恰好有2個單詞? – user1506145 2013-05-05 19:10:20

回答

10

字:

可以achiveve,使用String's split() method。該解決方案是O(n)

public static void main(String[] args) { 
    String str = "Hello my name is John and I like to go fishing and "+ 
         "hiking I have two sisters and one brother."; 
    String find = "I"; 

    String[] sp = str.split(" +"); // "+" for multiple spaces 
    for (int i = 2; i < sp.length; i++) { 
     if (sp[i].equals(find)) { 
      // have to check for ArrayIndexOutOfBoundsException 
      String surr = (i-2 > 0 ? sp[i-2]+" " : "") + 
          (i-1 > 0 ? sp[i-1]+" " : "") + 
          sp[i] + 
          (i+1 < sp.length ? " "+sp[i+1] : "") + 
          (i+2 < sp.length ? " "+sp[i+2] : ""); 
      System.out.println(surr); 
     } 
    } 
} 

輸出:

John and I like to 
and hiking I have two 

多字:

正則表達式是一個的情況下大和清潔解決方案時find是一個多字。但由於其性質,它錯過了附近的詞也匹配find(請參閱下面的示例)。

下面的算法處理所有情況(所有解決方案的空間)。記住的是,由於該問題的性質,這種解決方案在最壞情況下是O(n * m個)(與nstr的長度和mfind的長度)

public static void main(String[] args) { 
    String str = "Hello my name is John and John and I like to go..."; 
    String find = "John and"; 

    String[] sp = str.split(" +"); // "+" for multiple spaces 

    String[] spMulti = find.split(" +"); // "+" for multiple spaces 
    for (int i = 2; i < sp.length; i++) { 
     int j = 0; 
     while (j < spMulti.length && i+j < sp.length 
            && sp[i+j].equals(spMulti[j])) { 
      j++; 
     }   
     if (j == spMulti.length) { // found spMulti entirely 
      StringBuilder surr = new StringBuilder(); 
      if (i-2 > 0){ surr.append(sp[i-2]); surr.append(" "); } 
      if (i-1 > 0){ surr.append(sp[i-1]); surr.append(" "); } 
      for (int k = 0; k < spMulti.length; k++) { 
       if (k > 0){ surr.append(" "); } 
       surr.append(sp[i+k]); 
      } 
      if (i+spMulti.length < sp.length) { 
       surr.append(" "); 
       surr.append(sp[i+spMulti.length]); 
      } 
      if (i+spMulti.length+1 < sp.length) { 
       surr.append(" "); 
       surr.append(sp[i+spMulti.length+1]); 
      } 
      System.out.println(surr.toString()); 
     } 
    } 
} 

輸出:

name is John and John and 
John and John and I like 
+3

+1 ..好的回答:) – Maroun 2013-05-05 19:11:11

+1

謝謝,但後來我不能搜索多字的字符串,如果indexOf不使用... – user1506145 2013-05-05 19:15:57

+0

+1爲周圍的邏輯 – exexzian 2013-05-05 19:16:43

1

使用String.split()拆分文本的話。然後搜索「I」和串聯詞一起回來:

String[] parts=str.split(" "); 

for (int i=0; i< parts.length; i++){ 
    if(parts[i].equals("I")){ 
    String out= parts[i-2]+" "+parts[i-1]+ " "+ parts[i]+ " "+parts[i+1] etc.. 
    } 
} 

Ofcourse您需要檢查,如果I-2是有效的索引,並且使用一個StringBuffer將方便性能明智的,如果你有很多的data ...

1
// Convert sentence to ArrayList 
String[] stringArray = sentence.split(" "); 
List<String> stringList = Arrays.asList(stringArray); 

// Which word should be matched? 
String toMatch = "I"; 

// How much words before and after do you want? 
int before = 2; 
int after = 2; 

for (int i = 0; i < stringList.size(); ++i) { 
    if (toMatch.equals(stringList.get(i))) { 
     int index = i; 
     if (0 <= index - before && index + after <= stringList.size()) { 
      StringBuilder sb = new StringBuilder(); 

      for (int i = index - before; i <= index + after; ++i) { 
       sb.append(stringList.get(i)); 
       sb.append(" "); 
      } 
      String result = sb.toString().trim(); 
      //Do something with result 
     } 
    } 
} 

這會提取匹配前後的兩個單詞。可以延伸打印最多前後兩個字而不是正好兩個字。

編輯該死的..方法來減緩並沒有花哨的三元運營商:/

2

這裏是另一種方式,我發現使用正則表達式:

 String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother."; 

     String find = "I"; 

     Pattern pattern = Pattern.compile("([^\\s]+\\s+[^\\s]+)\\s+"+find+"\\s+([^\\s]+\\s[^\\s]+\\s+)"); 
     Matcher matcher = pattern.matcher(str); 

     while (matcher.find()) 
     { 
      System.out.println(matcher.group(1)); 
      System.out.println(matcher.group(2)); 
     } 

輸出:

John and 
like to 
and hiking 
have two 
+0

完美!現在我也可以搜索多字符串。 – user1506145 2013-05-05 19:32:08

+0

大:)用'\\ s +'更新正則表達式應該處理多個空格。 – Vishy 2013-05-05 19:43:57

0
public static void main(String[] args) { 
    String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother."; 
    String find = "I"; 
    int countWords = 3; 
    List<String> strings = countWordsBeforeAndAfter(str, find, countWords); 
    strings.stream().forEach(System.out::println); 
} 

public static List<String> countWordsBeforeAndAfter(String paragraph, String search, int countWordsBeforeAndAfter){ 
    List<String> searchList = new ArrayList<>(); 
    String str = paragraph; 
    String find = search; 
    int countWords = countWordsBeforeAndAfter; 
    String[] sp = str.split(" +"); // "+" for multiple spaces 
    for (int i = 0; i < sp.length; i++) { 
     if (sp[i].equals(find)) { 

      String before = ""; 
      for (int j = countWords; j > 0; j--) { 
       if(i-j >= 0) before += sp[i-j]+" "; 
      } 

      String after = ""; 
      for (int j = 1; j <= countWords; j++) { 
       if(i+j < sp.length) after += " " + sp[i+j]; 
      } 
      String searhResult = before + find + after; 
      searchList.add(searhResult); 
     } 
    } 
    return searchList; 
} 
相關問題