2010-09-24 38 views
0

我想了解Lucene SpanNearQuery並撰寫了一個虛擬示例。我正在尋找「不」,其次是「狐狸」彼此之間的5。 我期望文檔3被返回作爲唯一的命中。但是,我最終沒有得到任何結果。任何關於我可能會做錯什麼的想法將不勝感激。Lucene SpanNearQuery

下面是代碼:

//索引

public void doSpanIndexing() throws IOException { 

IndexWriter writer=new IndexWriter(directory, AnalyzerUtil.getPorterStemmerAnalyzer(new StandardAnalyzer(Version.LUCENE_30)),IndexWriter.MaxFieldLength.LIMITED); 

Document doc1=new Document(); 
doc1.add(new Field("content", " brown fox jumped ", Field.Store.YES, Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); 
writer.addDocument(doc1); 


Document doc2=new Document(); 
doc2.add(new Field("content", "foxes not jumped over the huge fence", Field.Store.YES, Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS)); 
writer.addDocument(doc2); 

Document doc3=new Document(); 
doc3.add(new Field("content", " brown not fox", Field.Store.YES, Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS)); 
writer.addDocument(doc3); 


} 

//搜索
公共無效doSpanSearching(字符串文本)拋出CorruptIndexException,IOException異常,ParseException的{

IndexSearcher searcher=new IndexSearcher(directory); 

SpanTermQuery term1 = new SpanTermQuery(new Term("content", "not")); 
SpanTermQuery term2 = new SpanTermQuery(new Term("content", text)); 
SpanNearQuery query = new SpanNearQuery(new SpanQuery[] {term1, term2}, 5, true); 
TopDocs topDocs=searcher.search(query,5); 

for(int i=0; i<topDocs.totalHits; i++) { 
    System.out.println("Hit Document number: "+topDocs.scoreDocs[i].doc); 
    System.out.println("Hit Document score: "+topDocs.scoreDocs[i].score); 
    Document result=searcher.doc(topDocs.scoreDocs[i].doc); 
    System.out.println("Search result "+(i+1)+ " is "+result.get("content")); 

    } 

} 

回答

0

「不是「是標準分析儀中的一個停用詞(即它從您的文本中刪除)。你能用另一個不是停用詞的詞來嘗試嗎?

+0

我用「褐色」代替「不」,仍然沒有結果。有任何想法嗎?感謝索引期間「Not」的指針被省略。我完全忽略了它。 – 2010-09-27 15:12:30