lucene中的JarowinklerDistance返回奇怪的結果

我有一個包含一些短語的文件。通過lucene使用jarowinkler，它應該從我的輸入中得到最類似的短語。lucene中的JarowinklerDistance返回奇怪的結果

這是我的問題的一個例子。

我們有一個包含文件：

//phrases.txt 
this is goodd 
this is good 
this is god

如果我輸入的是這是一個好，它應該是讓我「這是件好事」從文件中第一次，因爲這裏的相似性得分是最大（1）。但由於某種原因，它返回：「這很好」和「這只是上帝」！

這裏是我的代碼：

try { 
    SpellChecker spellChecker = new SpellChecker(new RAMDirectory(), new JaroWinklerDistance()); 
    Dictionary dictionary = new PlainTextDictionary(new File("src/main/resources/words.txt").toPath()); 
    IndexWriterConfig iwc=new IndexWriterConfig(new ShingleAnalyzerWrapper()); 
    spellChecker.indexDictionary(dictionary,iwc,false); 

    String wordForSuggestions = "this is good"; 

    int suggestionsNumber = 5; 

    String[] suggestions = spellChecker.suggestSimilar(wordForSuggestions, suggestionsNumber,0.8f); 
    if (suggestions!=null && suggestions.length>0) { 
     for (String word : suggestions) { 
      System.out.println("Did you mean:" + word); 
     } 
    } 
    else { 
     System.out.println("No suggestions found for word:"+wordForSuggestions); 
    } 
} catch (IOException e) { 
    e.printStackTrace(); 
}

來源

2017-06-12 Remis07

suggestSimilar不會提供建議，這是相同的輸入。引述的源代碼：

//不建議一個詞來形容自己，如果你想知道wordForSuggestions是否在字典中，使用exist方法，這將是愚蠢的

：

if (spellChecker.exist(wordForSuggestions)) { 
    //do what you want for an, apparently, correctly spelled word 
}

來源

2017-06-12 16:38:33 femtoRgon

lucene中的JarowinklerDistance返回奇怪的結果

回答

相關問題