計數特定字符串的發生在一個文件

下面是我在工作的代碼：計數特定字符串的發生在一個文件

while ((lineContents = tempFileReader.readLine()) != null) 
{ 
      String lineByLine = lineContents.replaceAll("/\\.", System.getProperty("line.separator")); //for matching /. and replacing it by new line 
      changer.write(lineByLine); 
      Pattern pattern = Pattern.compile("\\r?\\n"); //Find new line 
      Matcher matcher = pattern.matcher(lineByLine); 
      while(matcher.find()) 
      { 
       Pattern tagFinder = Pattern.compile("word"); //Finding the word required 
       Matcher tagMatcher = tagFinder.matcher(lineByLine); 
       while(tagMatcher.find()) 
       { 
        score++; 
       } 
       scoreTracker.add(score); 
        score = 0; 
      } 
}

我的樣品輸入包含6行，用word OCCURENCES是[0,1,0,3,0,0] 所以，當我打印scoreTracker（這是一個ArrayList）我想要上面的輸出。但相反，我得到[4,4,4,4,4,4]這是word的總髮生率，但不是一行一行。請幫忙。

來源

2012-03-13 Kazekage Gaara

lineByLine指向文件的全部內容。這就是你得到[4,4,4,4,4,4]的原因。您需要將每行存儲在另一個變量line中，然後使用tagFinder.find(line)。最終代碼看起來像這樣

while ((lineContents = tempFileReader.readLine()) != null) 
{ 
    String lineByLine = lineContents.replaceAll("/\\.", System.getProperty("line.separator")); //for matching /. and replacing it by new line 
    changer.write(lineByLine); 
    Pattern pattern = Pattern.compile(".*\\r?\\n"); //Find new line 
    Matcher matcher = pattern.matcher(lineByLine); 
    while(matcher.find()) 
    { 
     Pattern tagFinder = Pattern.compile("word"); //Finding the word required 
     //matcher.group() returns the input subsequence matched by the previous match. 
     Matcher tagMatcher = tagFinder.matcher(matcher.group()); 
     while(tagMatcher.find()) 
     { 
      score++; 
     } 
     scoreTracker.add(score); 
      score = 0; 
    } 
}

來源

2012-03-13 18:35:33

但這就是爲什麼我在我的「String」中首先找到一個新行，然後將結果應用於我的分數的原因。 matcher的'while'循環？我錯了嗎？ – 2012-03-13 18:42:21

@KazekageGaara你的代碼有兩個問題。一個是，第一個正則表達式'pattern'用於找到一個新的行分隔符。它不捕獲線路本身。所以你需要將正則表達式改爲'（。*）\\ r？\\ n'。其次，你正在調用'matcher.find（）'，而不是在任何地方調用'matcher.group（）'來提取匹配。做出這兩個改變，它應該沒問題。有關'Matcher'對象的更多信息，請點擊http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html – 2012-03-13 18:46:10

謝謝！ :-) – 2012-03-13 19:05:42

也許這個代碼將幫助您：

String str = "word word\n \n word word\n \n word\n"; 
    Pattern pattern = Pattern.compile("(.*)\\r?\\n"); //Find new line 
    Matcher matcher = pattern.matcher(str); 
    while(matcher.find()) 
    { 
     Pattern tagFinder = Pattern.compile("word"); //Finding the word required 
     Matcher tagMatcher = tagFinder.matcher(matcher.group()); 
     int score = 0; 
     while(tagMatcher.find()) 
     { 
      score++; 
     } 
     System.out.print(score + " "); 
    }

輸出爲2 0 2 0 1沒有經過高度優化的，但你的問題是，你永遠不限制內匹配它總是掃描整條線。

來源

2012-03-13 18:36:03

您可以使用掃描儀類。您將掃描器初始化爲要計數的字符串，然後只計算掃描器發現的這些標記的數量。

而且您可以使用FileInputStream直接初始化Scanner。

生成的代碼只有9行：

File file = new File(fileName); 
Scanner scanner = new Scanner(file); 
scanner.useDelimiter("your text here"); 
int occurences; 
while(scanner.hasNext()){ 
    scanner.next(); 
    occurences++; 
} 
scanner.close();

來源

2012-03-13 18:38:36

這是因爲每次你正在尋找相同的字符串（lineByLine）。你可能想要分別搜索每一行。我建議你做：

Pattern tagFinder = Pattern.compile("word"); //Finding the word required 
    for(String line : lineByLine.split("\\n") 
    { 
     Matcher tagMatcher = tagFinder.matcher(line); 
     while(tagMatcher.find()) 
      score++; 
     scoreTracker.add(score); 
     score = 0; 
    }

來源

2012-03-13 18:43:52 Untitled

原代碼是用tempFileReader.readLine()讀一次輸入一條線，然後尋找使用matcher每行內行的末尾。由於lineContents只包含一行，因此matcher從不會找到新行，因此其餘代碼將被跳過。爲什麼你需要兩個不同的代碼位來將輸入分成幾行？您可以刪除與查找新行有關的代碼中的一個位。例如。

while ((lineContents = tempFileReader.readLine()) != null) 
{ 
     Pattern tagFinder = Pattern.compile("word"); //Finding the word required 
     Matcher tagMatcher = tagFinder.matcher(lineContents); 
     while(tagMatcher.find()) 
     { 
      score++; 
     } 
     scoreTracker.add(score); 
     score = 0; 

}

我已經試過以上在Windows上使用文件test.txt由BufferedReader閱讀代碼。例如。

BufferedReader tempFileReader = new BufferedReader(new FileReader("c:\\test\\test.txt"));

scoreTracker包含[0，1，0，3，0，0]爲具有您描述的內容的文件。我不明白如何從原始代碼中獲得[4,4,4,4,4,4]，如果樣本輸入是所述的實際文件，並且tempFileReader是BufferedReader。查看用於設置tempFileReader的代碼將很有用。

來源

2012-03-13 18:51:44 AlanS

計數特定字符串的發生在一個文件

回答

相關問題