讀取文件時忽略某些詞

我的程序讀取一個文本文件並列出文件中每個詞的頻率。接下來我需要做的是讀取文件時忽略某些詞，如'the'，'an'。我已經創建了這些單詞的列表，但不知道如何在while循環中實現它。謝謝。讀取文件時忽略某些詞

public static String [] ConnectingWords = {"and", "it", "you"}; 

public static void readWordFile(LinkedHashMap<String, Integer> wordcount) { 
    // FileReader fileReader = null; 
    Scanner wordFile; 
    String word; // A word read from the file 
    Integer count; // The number of occurrences of the word 

    // LinkedHashMap <String, Integer> wordcount = new LinkedHashMap<String, Integer>(); 

    try { 
     wordFile = new Scanner(new FileReader("/Applications/text.txt")); 
     wordFile.useDelimiter(" "); 
    } catch (FileNotFoundException e) { 
     System.err.println(e); 
     return; 
    } 
    while (wordFile.hasNext()) { 
     word = wordFile.next(); 
     word = word.toLowerCase(); 

     if (word.contains("the")) { 
      count = getCount(word, wordcount) + 0; 
      wordcount.put(word, count); 

     } 
     // Get the current count of this word, add one, and then store the 
     // new count: 
     count = getCount(word, wordcount) + 1; 
     wordcount.put(word, count); 
    } 
}

來源

2015-05-09 sgolay

getCount（）方法裏面有什麼？它只是'wordcount.get（word）'？您使用的是哪個版本的Java？另外考慮關閉'掃描儀'，否則你會有資源泄漏。 –

你使用Java 8嗎？ – fge

創建一個列表，這將有字的表需要儘可能忽略：

List<String> ignoreAll= Arrays.asList("and","it", "you");

然後在while循環添加一個條件，即會忽略字包含這些詞作爲

if(ignoreAll.contains(word)){ 
       continue; 

      }

來源

2015-05-09 10:30:41 Prashant

謝謝！這對我有用！ – sgolay

有排除名單的單詞。在更新計數之前，請檢查排除列表。

public static void readWordFile (LinkedHashMap<String, Integer> wordcount) { 

    List<String> excludeList = new ArrayList<>(); 
    excludeList.add("the"); // and so on 
    // FileReader fileReader = null; 
    Scanner wordFile; 
    String word;  // A word read from the file 
    Integer count; // The number of occurrences of the word 

    // LinkedHashMap <String, Integer> wordcount = new LinkedHashMap <String, Integer>(); 

    try 
    { 
     wordFile = new Scanner(new FileReader("/Applications/text.txt")); 
     wordFile.useDelimiter(" "); 
    } 
    catch (FileNotFoundException e) 
    { 
     System.err.println(e); 
     return; 
    } 
    while (wordFile.hasNext()) 
    { 
     word = wordFile.next(); 
     word = word.toLowerCase(); 

     if(!excludeList.contains(word)) { 

     count = wordcount.get(word) + 1; 
     wordcount.put(word, count); 
     } 

    }

來源

2015-05-09 10:27:03 Manikandan

您可以嘗試下面的代碼。

public static HashSet<String> connectingWords; 
    public static Map<String,Integer> frequencyMap; 

    static { 
     connectingWords = new HashSet<>(); 
     connectingWords.add("and"); 
     connectingWords.add("it"); 
     connectingWords.add("you"); 
     frequencyMap = new HashMap<>(); 
    } 

    public static void main(String[] args) { 
     BufferedReader reader = null; 
     String line; 
     try { 
      reader = new BufferedReader(new FileReader("src/files/temp2.txt")); 
      while ((line = reader.readLine()) != null) { 
       String[] words = line.split("-"); 
       for (String word : words) { 
        if(connectingWords.contains(word)) { 
         continue; 
        } 
        Integer value = frequencyMap.get(word); 
        if(value != null) { 
         frequencyMap.put(word,value+1); 
        } else { 
         frequencyMap.put(word,0); 
        } 
       } 
      } 
     } catch (FileNotFoundException e) { 
      e.printStackTrace(); 
     } catch (IOException e) { 
      e.printStackTrace(); 
     } finally { 
      reader.close(); 
     } 
     System.out.println(frequencyMap.values()); 

    }

這是更好地存儲連接的話在HashSet，因爲它會提供快速訪問你每次調用contains在文件中的每個字的時間。此外，該詞和它的頻率可以保持在Map。此外，我假設單詞的分隔符是-，如果它是別的東西，您可以修改代碼。此外，如果您有任何與case相關的特殊要求，您可以更改代碼。我已經試過它與What-the-hell-is-going-on-and-it-is-good輸入文件，它工作正常。

來源

2015-05-09 10:31:45

你忘了關閉讀者:) –

@Sasha：謝謝:) –

IMO如果我們關閉了'BufferedReader'，它也會關閉'FileReader'。 –

讀取文件時忽略某些詞

回答

相關問題