我有我使用的是StandardAnalyzer在Lucene的地方索引的文本字符串如下的情況:哪些工作得很好在Lucene中結合分析器的最佳實踐是什麼?
public void indexText(String suffix, boolean includeStopWords) {
StandardAnalyzer analyzer = null;
if (includeStopWords) {
analyzer = new StandardAnalyzer(Version.LUCENE_30);
}
else {
// Get Stop_Words to exclude them.
Set<String> stopWords = (Set<String>) Stop_Word_Listener.getStopWords();
analyzer = new StandardAnalyzer(Version.LUCENE_30, stopWords);
}
try {
// Index text.
Directory index = new RAMDirectory();
IndexWriter w = new IndexWriter(index, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
this.addTextToIndex(w, this.getTextToIndex());
w.close();
// Read index.
IndexReader ir = IndexReader.open(index);
Text_TermVectorMapper ttvm = new Text_TermVectorMapper();
int docId = 0;
ir.getTermFreqVector(docId, PropertiesFile.getProperty(text), ttvm);
// Set output.
this.setWordFrequencies(ttvm.getWordFrequencies());
w.close();
}
catch(Exception ex) {
logger.error("Error message\n", ex);
}
}
private void addTextToIndex(IndexWriter w, String value) throws IOException {
Document doc = new Document();
doc.add(new Field(text), value, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES));
w.addDocument(doc);
}
,但我想這與使用SnowballAnalyzer以及所產生結合。
此類還具有以下構造出兩個實例變量:
public Text_Indexer(String textToIndex) {
this.textToIndex = textToIndex;
this.wordFrequencies = new HashMap<String, Integer>();
}
誰能告訴我如何最好與上面的代碼來實現這一目標?
謝謝
摩根先生。