2010-02-12 56 views
6

當搜索了一堆文件,我可以很容易地找到匹配它我的搜索條件的文檔數量:在Lucene/Lucene.net搜索中,如何計算每個文檔的點擊次數?

Hits hits = Searcher.Search(query); 
int DocumentCount = hits.Length(); 

如何確定命中文檔中的人數是多少?例如,假設我搜索「國會」,我收到2份文件。我如何獲得每個文檔中「會議」的次數?例如,讓我們說「國會」在文件#1中發生2次,在文件#2中發生3次。我正在尋找的結果是。

回答

6

這是Lucene的Java的,但應該Lucene.NET工作:

List docIds = // doc ids for documents that matched the query, 
       // sorted in ascending order 

int totalFreq = 0; 
TermDocs termDocs = reader.termDocs(); 
termDocs.seek(new Term("my_field", "congress")); 
for (int id : docIds) { 
    termDocs.skipTo(id); 
    totalFreq += termDocs.freq(); 
} 
+0

@ bajafresh4life:那如果短語是兩個詞,如「蘋果樹」? – Keltex 2010-02-12 17:15:00

+0

您是否希望短語在每個文檔或每個單詞中出現的次數? – bajafresh4life 2010-02-12 19:02:09

0

這是Lucene的Java也。如果您的查詢/搜索條件可以寫成一個SpanQuery,那麼你可以做這樣的事情:

IndexReader indexReader = // define your index reader here 
SpanQuery spanQuery = // define your span query here 
Spans spans = spanQuery.getSpans(indexReader); 
int occurrenceCount = 0; 
while (spans.next()) { 
    occurrenceCount++; 
} 
// now occurrenceCount contains the total number of occurrences of the word/phrase/etc across all documents in the index