爲什麼HashMap大小與文件中的行數不同？

假設我有一個具有以下類型的示例數據文件：爲什麼HashMap大小與文件中的行數不同？

info1 word1 
info2 word2 
info3 word3 
info2 word4

等

現在，我希望爲每行中的單詞映射功能。所以我最初讀了這一行，用空格分開了它，並得到了這個詞。

現在這個詞將是價值。我知道所有的單詞都是不同的和獨特的。但是，這些信息可能不是唯一的。

關於關鍵，因爲我主要關注與每個單詞相關的信息，所以我只是爲它創建一個字符串，它基本上是所有行。

我知道這些信息可能不會有所不同，但是這些行肯定是，因爲這些詞是完全不同的。

的文件有40000行，但是HashMap的大小是38490.

我不明白這裏發生了什麼。我的邏輯錯了嗎？

這裏是代碼：

private static void loadInfo(HashMap<String, String> info, File File){ 
    try { 
    BufferedReader br = new BufferedReader(new FileReader(file)); 
    String[] dataInLine = new String[2]; 
    String line = br.readLine(); 
    int counter = 0; 
    while (line != null) { 

     lineData = line.split("\\s+"); 
     info.put(lineData[1], line); 
     line = br.readLine(); 
     counter++; 
    } 
    System.out.println(counter); //counter shows the correct amount of lines 
    System.out.println(info.size()); //this shows less than the amount of lines 
    } catch (IOException io) { 
    } 
}

在此先感謝

來源

2012-04-16 jan1

你是否確信密鑰是唯一的？ – SLaks 2012-04-16 00:59:20

HashMaps中的鍵是唯一的，所以如果它嘗試使用相同的鍵添加另一行，它將只覆蓋該位置的值，因此您有40000 - 38490個重複項。您應該考慮使用一些數據結構作爲值，然後您可以將其添加到該值。 – 2012-04-16 01:01:01

的確，鑰匙並不是唯一的。我應該更好地檢查一下。感謝大家的評論和答覆。 – jan1 2012-04-16 01:08:40

，如果您有重複鍵就會發生這種情況; put將覆蓋以前的值。

來源

2012-04-16 00:59:35 SLaks

很可能你的單詞列表實際上並不是唯一的。你可以做一個檢查您的來電.put()之前確定一個詞是否已經存在，並報告重複的：

while (line != null) { 

     lineData = line.split("\\s+"); 
     final String word = lineData[1]; 
     final String previous = info.get(word); 
     if (previous != null) { 
     System.err.println("Duplicate at count "+line+" of word "+word); 
     System.err.println(" original line: "+previous); 
     System.err.println("  new line: "+line); 
     } 
     info.put(word, line); 
     line = br.readLine(); 
     counter++; 
    }

來源

2012-04-16 01:01:03 andersoj

你可能做有一些重複鍵。

一個簡單的方法來檢查，如果您要更換先前值是看的put返回值：

String last = info.put(lineData[1], line); 
if(last != null) 
    System.err.println("Warning: replaced value for key "+lineData[1]+", last value was: "+last);

來源

2012-04-16 01:03:36 trutheality

你可能有重複鍵，例如示例中的「info2」映射到「word2」和「word4」。

如果您需要具有映射到多個值的鍵，則需要「multimap」。您可以使用類型HashMap<String, Set<String>>（每個鍵映射到一組值）自己創建。或使用預先存在的，如Apache Commons。

使用你自己的，每次你想添加一個映射，你需要檢查密鑰是否存在;如果不將它添加映射到一個空集。然後添加映射，將該值放入該密鑰的集合中。

HashMap<String, Set<String>> info; 
... 
if (!info.contains(lineData[1])) { 
    info.put(lineData[1], new HashSet<String>()); 
} 
info.get(lineData[1]).put(line);

來源

2012-04-16 01:07:48 Edmund

爲什麼HashMap大小與文件中的行數不同？

回答

相關問題