我有一段艱難的時間試圖圍繞Lucene庫包裹我的頭。這是我到目前爲止:如何使用Lucene庫來提取n-gram?
public void shingleMe()
{
try
{
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
FileReader reader = new FileReader("test.txt");
ShingleAnalyzerWrapper shingleAnalyzer = new ShingleAnalyzerWrapper(analyzer, 2);
shingleAnalyzer.setOutputUnigrams(false);
TokenStream stream = shingleAnalyzer.tokenStream("contents", reader);
CharTermAttribute charTermAttribute = stream.getAttribute(CharTermAttribute.class);
while (stream.incrementToken())
{
System.out.println(charTermAttribute.toString());
}
}
catch (FileNotFoundException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
它在stream.incrementToken()失敗。我的理解是ShingleAnalyzerWrapper使用另一個分析器來創建一個木瓦分析器對象。從那裏,我將它轉換爲令牌流,然後使用屬性過濾器進行分析。然而,它始終會導致此異常:
異常線程 「main」 java.lang.AbstractMethodError:org.apache.lucene.analysis.TokenStream.incrementToken()z
的思考?提前致謝!
單詞或字符ngrams? – Reactormonk 2012-04-01 12:35:08