2015-05-09 128 views
0

我的程序讀取一個文本文件並列出文件中每個詞的頻率。接下來我需要做的是讀取文件時忽略某些詞,如'the','an'。我已經創建了這些單詞的列表,但不知道如何在while循環中實現它。謝謝。讀取文件時忽略某些詞

public static String [] ConnectingWords = {"and", "it", "you"}; 

public static void readWordFile(LinkedHashMap<String, Integer> wordcount) { 
    // FileReader fileReader = null; 
    Scanner wordFile; 
    String word; // A word read from the file 
    Integer count; // The number of occurrences of the word 

    // LinkedHashMap <String, Integer> wordcount = new LinkedHashMap<String, Integer>(); 

    try { 
     wordFile = new Scanner(new FileReader("/Applications/text.txt")); 
     wordFile.useDelimiter(" "); 
    } catch (FileNotFoundException e) { 
     System.err.println(e); 
     return; 
    } 
    while (wordFile.hasNext()) { 
     word = wordFile.next(); 
     word = word.toLowerCase(); 

     if (word.contains("the")) { 
      count = getCount(word, wordcount) + 0; 
      wordcount.put(word, count); 

     } 
     // Get the current count of this word, add one, and then store the 
     // new count: 
     count = getCount(word, wordcount) + 1; 
     wordcount.put(word, count); 
    } 
} 
+0

getCount()方法裏面有什麼?它只是'wordcount.get(word)'?您使用的是哪個版本的Java?另外考慮關閉'掃描儀',否則你會有資源泄漏。 –

+0

你使用Java 8嗎? – fge

回答

2

創建一個列表,這將有字的表需要儘可能忽略:

List<String> ignoreAll= Arrays.asList("and","it", "you"); 

然後在while循環添加一個條件,即會忽略字包含這些詞作爲

if(ignoreAll.contains(word)){ 
       continue; 

      } 
+0

謝謝!這對我有用! – sgolay

0

有排除名單的單詞。在更新計數之前,請檢查排除列表。

public static void readWordFile (LinkedHashMap<String, Integer> wordcount) { 

    List<String> excludeList = new ArrayList<>(); 
    excludeList.add("the"); // and so on 
    // FileReader fileReader = null; 
    Scanner wordFile; 
    String word;  // A word read from the file 
    Integer count; // The number of occurrences of the word 

    // LinkedHashMap <String, Integer> wordcount = new LinkedHashMap <String, Integer>(); 

    try 
    { 
     wordFile = new Scanner(new FileReader("/Applications/text.txt")); 
     wordFile.useDelimiter(" "); 
    } 
    catch (FileNotFoundException e) 
    { 
     System.err.println(e); 
     return; 
    } 
    while (wordFile.hasNext()) 
    { 
     word = wordFile.next(); 
     word = word.toLowerCase(); 

     if(!excludeList.contains(word)) { 

     count = wordcount.get(word) + 1; 
     wordcount.put(word, count); 
     } 

    } 
2

您可以嘗試下面的代碼。

public static HashSet<String> connectingWords; 
    public static Map<String,Integer> frequencyMap; 

    static { 
     connectingWords = new HashSet<>(); 
     connectingWords.add("and"); 
     connectingWords.add("it"); 
     connectingWords.add("you"); 
     frequencyMap = new HashMap<>(); 
    } 

    public static void main(String[] args) { 
     BufferedReader reader = null; 
     String line; 
     try { 
      reader = new BufferedReader(new FileReader("src/files/temp2.txt")); 
      while ((line = reader.readLine()) != null) { 
       String[] words = line.split("-"); 
       for (String word : words) { 
        if(connectingWords.contains(word)) { 
         continue; 
        } 
        Integer value = frequencyMap.get(word); 
        if(value != null) { 
         frequencyMap.put(word,value+1); 
        } else { 
         frequencyMap.put(word,0); 
        } 
       } 
      } 
     } catch (FileNotFoundException e) { 
      e.printStackTrace(); 
     } catch (IOException e) { 
      e.printStackTrace(); 
     } finally { 
      reader.close(); 
     } 
     System.out.println(frequencyMap.values()); 

    } 

這是更好地存儲連接的話在HashSet,因爲它會提供快速訪問你每次調用contains在文件中的每個字的時間。此外,該詞和它的頻率可以保持在Map。此外,我假設單詞的分隔符是-,如果它是別的東西,您可以修改代碼。此外,如果您有任何與case相關的特殊要求,您可以更改代碼。我已經試過它與What-the-hell-is-going-on-and-it-is-good輸入文件,它工作正常。

+0

你忘了關閉讀者:) –

+0

@Sasha:謝謝:) –

+0

IMO如果我們關閉了'BufferedReader',它也會關閉'FileReader'。 –