找到所有字符串「中的」在.txt文件

這裏是我的代碼：找到所有字符串「中的」在.txt文件

// Import io so we can use file objects 
import java.io.*; 

public class SearchThe { 
    public static void main(String args[]) { 
     try { 
      String stringSearch = "the"; 
      // Open the file c:\test.txt as a buffered reader 
      BufferedReader bf = new BufferedReader(new FileReader("test.txt")); 

      // Start a line count and declare a string to hold our current line. 
      int linecount = 0; 
       String line; 

      // Let the user know what we are searching for 
      System.out.println("Searching for " + stringSearch + " in file..."); 

      // Loop through each line, stashing the line into our line variable. 
      while ((line = bf.readLine()) != null){ 
       // Increment the count and find the index of the word 
       linecount++; 
       int indexfound = line.indexOf(stringSearch); 

       // If greater than -1, means we found the word 
       if (indexfound > -1) { 
        System.out.println("Word was found at position " + indexfound + " on line " + linecount); 
       } 
      } 

      // Close the file after done searching 
      bf.close(); 
     } 
     catch (IOException e) { 
      System.out.println("IO Error Occurred: " + e.toString()); 
     } 
    } 
}

我想找到一些字「的」在test.txt文件。問題是當我找到第一個「the」，我的程序停止發現更多。

而當某些詞像「然後」我的程序明白它是字「的」。

來源

2010-09-13 Giffary

你有沒有考慮過使用Java的正則表達式包（java.util.regex）？ – GobiasKoffi 2010-09-13 04:39:33

你可以在這裏找到一些有用的例子。http://java.sun.com/developer/technicalArticles/releases/1.4regex/ – Emil 2010-09-13 05:01:44

使用的正則表達式的情況下不區分大小寫，用字邊界找到所有實例的變化「該」。

indexOf("the")不能之間辨別「該」和「然後」因爲每個開始與「該」。同樣，「the」位於「anathema」的中間。

爲了避免出現這種情況，請使用正則表達式，然後搜索「the」，並在任一側使用字邊界（\b）。使用單詞界限，而不是在「」上分割，或者只使用indexOf(" the ")（任一側的空格），而不會找到「。」。和標點旁邊的其他實例。您也可以不敏感地進行搜索，以找到「The」。

Pattern p = Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE); 

while ((line = bf.readLine()) != null) { 
    linecount++; 

    Matcher m = p.matcher(line); 

    // indicate all matches on the line 
    while (m.find()) { 
     System.out.println("Word was found at position " + 
         m.start() + " on line " + linecount); 
    } 
}

來源

2010-09-13 04:59:15 Chadwick

正則表達式使用+1，比其他'分裂'選項（包括我的）要好得多。 – 2010-09-13 05:06:24

您不應該使用indexOf，因爲它會查找字符串中所有可能的子字符串。因爲「then」包含字符串「the」，所以它也是一個很好的子字符串。

More about indexOf

的indexOf

公衆詮釋的indexOf（字符串str， INT的fromIndex）返回指定子的第一次出現的該串內的索引，開始指定索引處。整數返回是其中最小k值：

你應該分開線成在每個字多字和循環，並比較「的」。

String [] words = line.split(" "); 
for (String word : words) { 
    if (word.equals("the")) { 
    System.out.println("Found the word"); 
    } 
}

上面的代碼片段也會遍歷行中所有可能的「the」。使用indexOf將始終返回您第一次出現

來源

2010-09-13 04:38:04 vodkhang

這不是一個答案。這是一種批評。 – Asaph 2010-09-13 04:41:02

首先，我試圖找到他遇到的問題，並且indexOf方法是問題所在。然後，我找到了另一個好方法去做他想做的事。哪裏不對了？ – vodkhang 2010-09-13 04:42:04

是的 - 你是點誘餌。在發佈之前寫一個完整的答案。 – 2010-09-13 04:47:18

-1

您最好應該使用Regular Expressions進行此類搜索。作爲一個簡單的/髒的解決方法，你可以修改你的stringSearch從

String stringSearch = "the";

到

String stringSearch = " the ";

來源

2010-09-13 04:44:51 flash

不適用於該行的結尾或開始 – 2010-09-13 04:45:53

如果「the」在行，行尾，特殊字符或大寫字母之前。 – 2010-09-13 04:48:31

您當前的實現只發現「的」每行的第一個實例。

考慮拆分各行成的話，遍歷單詞列表，並比較每個單詞「的」而不是：

while ((line = bf.readLine()) != null) 
{ 
    linecount++; 
    String[] words = line.split(" "); 

    for (String word : words) 
    { 
     if(word.equals(stringSearch)) 
      System.out.println("Word was found at position " + indexfound + " on line " + linecount); 
    } 
}

來源

2010-09-13 04:45:32

這聽起來不像是這次演習的重點是技能，你起來的正則表達式（我不知道它可能是...但它似乎有點基本爲），即使正則表達式確實是這樣的事情的真實世界的解決方案。

我的建議是關注基礎知識，使用index和substring來測試字符串。考慮如何解釋字符串自然區分大小寫的性質。另外，你的閱讀器是否總是關閉（即有沒有bf.close（）方法會被執行）？

來源

2010-09-13 05:33:21 CurtainDog

找到所有字符串「中的」在.txt文件

回答

相關問題