2009-08-04 73 views
4

我有一個關於Lucene評分的問題。我在索引中有兩個文件,一個包含「我的名字」,另一個包含「我的名字」。當我搜索關鍵字「我的名字」時,第二個文檔列在第一個文檔的上方。我想要的是,如果文檔包含我輸入的確切關鍵字,則應先列出,然後再列出。任何人都可以幫助我如何做到這一點。謝謝。關於Lucene評分的問題

回答

3

第二次嘗試答案: Lucene的默認行爲應該是你所要求的。 這裏的關鍵因素是得分的lengthNorm()部分 - 有時候得分比較短的文檔要長。關於上下文,請參見Lucene's Similarity API。例如,如果lengthNorm對於兩次命中是相同的,則它們被任意排序。

explain()函數將幫助您瞭解爲什麼文檔按照原樣進行評分,而不是根據默認值進行評分。

我假設你正在使用一個布爾查詢。如果您發佈確切的查詢方式,我可能會說更多。 另請參閱Query Parser Syntax。我希望這更接近標記。

+0

這將導致第二個文檔是* only *文檔匹配。海報要求它只是爲了獲得更高的分數*而不是其他的分數。 – Avi 2009-08-04 13:33:48

0

如果從命令行使用lucli(下載最新的Lucene源代碼並且它位於contrib目錄中),那麼您可以使用「explain」命令來讓Lucene解釋它爲什麼如此高評分。

它會拿出這樣的:

---------------- 2的比分:0.6089077 ------------ ---------

(等等等等文檔)

Explanation:4.260467 = (MATCH) sum of:                                                  
    0.59024054 = (MATCH) weight(description:warwick in 276780), product of:                                          
    0.05595057 = queryWeight(description:warwick), product of:                                            
     5.2746606 = idf(docFreq=13531, numDocs=843621)                                               
     0.010607426 = queryNorm                                                     
    10.549321 = (MATCH) fieldWeight(description:warwick in 276780), product of:                                        
     1.0 = tf(termFreq(description:warwick)=1)                                                
     5.2746606 = idf(docFreq=13531, numDocs=843621)                                               
     2.0 = fieldNorm(field=description, doc=276780)                                               
    0.832554 = (MATCH) weight(keywords:warwick in 276780), product of:                                           
    0.066450186 = queryWeight(keywords:warwick), product of:                                             
     6.264497 = idf(docFreq=5028, numDocs=843621)                                               
     0.010607426 = queryNorm                                                     
    12.528994 = (MATCH) fieldWeight(keywords:warwick in 276780), product of:                                         
     1.0 = tf(termFreq(keywords:warwick)=1)                                                 
     6.264497 = idf(docFreq=5028, numDocs=843621)                                               
     2.0 = fieldNorm(field=keywords, doc=276780)                                                
    0.19180772 = (MATCH) weight(url:warwick in 276780), product of:                                            
    0.048220757 = queryWeight(url:warwick), product of:                                              
     4.5459433 = idf(docFreq=28043, numDocs=843621)                                               
     0.010607426 = queryNorm                                                     
    3.9777002 = (MATCH) fieldWeight(url:warwick in 276780), product of:                                          
     1.0 = tf(termFreq(url:warwick)=1)                                                  
     4.5459433 = idf(docFreq=28043, numDocs=843621)                                               
     0.875 = fieldNorm(field=url, doc=276780)                                                
    0.023709858 = (MATCH) weight(content:warwick in 276780), product of:                                          
    0.03373665 = queryWeight(content:warwick), product of:                                             
     3.1804748 = idf(docFreq=109863, numDocs=843621)                                               
     0.010607426 = queryNorm                                                     
    0.7027923 = (MATCH) fieldWeight(content:warwick in 276780), product of:                                         
     1.4142135 = tf(termFreq(content:warwick)=2)                                                
     3.1804748 = idf(docFreq=109863, numDocs=843621)                                               
     0.15625 = fieldNorm(field=content, doc=276780)                                               
    0.46163678 = (MATCH) weight(siteDescription:warwick in 276780), product of:                                         
    0.0494812 = queryWeight(siteDescription:warwick), product of:                                            
     4.6647696 = idf(docFreq=24901, numDocs=843621)                                               
     0.010607426 = queryNorm                                                     
    9.329539 = (MATCH) fieldWeight(siteDescription:warwick in 276780), product of:                                       
     1.0 = tf(termFreq(siteDescription:warwick)=1)                                               
     4.6647696 = idf(docFreq=24901, numDocs=843621)                                               
     2.0 = fieldNorm(field=siteDescription, doc=276780)                                              
    0.96127754 = (MATCH) weight(siteUrl:warwick in 276780), product of:                                           
    0.10097861 = queryWeight(siteUrl:warwick), product of:                                             
     9.519615 = idf(docFreq=193, numDocs=843621)                                                
     0.010607426 = queryNorm                                                     
    9.519615 = (MATCH) fieldWeight(siteUrl:warwick in 276780), product of:                                         
     1.0 = tf(termFreq(siteUrl:warwick)=1)                                                 
     9.519615 = idf(docFreq=193, numDocs=843621)                                                
     1.0 = fieldNorm(field=siteUrl, doc=276780)                                                
    0.62917286 = (MATCH) weight(title:warwick in 276780), product of:                                           
    0.05776636 = queryWeight(title:warwick), product of:                                              
     5.4458413 = idf(docFreq=11402, numDocs=843621)                                               
     0.010607426 = queryNorm                                                     
    10.891683 = (MATCH) fieldWeight(title:warwick in 276780), product of:                                          
     1.0 = tf(termFreq(title:warwick)=1)                                                  
     5.4458413 = idf(docFreq=11402, numDocs=843621)                                               
     2.0 = fieldNorm(field=title, doc=276780)                                                
    0.57006776 = (MATCH) weight(second_title:warwick in 276780), product of:                                         
    0.05498614 = queryWeight(second_title:warwick), product of:                                            
     5.18374 = idf(docFreq=14819, numDocs=843621)                                               
     0.010607426 = queryNorm                                                     
    10.36748 = (MATCH) fieldWeight(second_title:warwick in 276780), product of:                                        
     1.0 = tf(termFreq(second_title:warwick)=1)                                                
     5.18374 = idf(docFreq=14819, numDocs=843621)                                               
     2.0 = fieldNorm(field=second_title, doc=276780)  

(對不起,我只是有一個大的指數來獲得一個例子了,不是一個簡單的!)

0

我將按如下方式更改查詢。

(my AND name) OR "my name" 

這裏,每當有詞組匹配時,附加短語查詢就會添加到分數中。如果文檔中有「我的名字」作爲內容,則短語查詢不會導致任何額外的分數。但含有「我的名字」內容的文檔將獲得額外的分數並顯示在頂部。

在這裏,我假設長度標準化被忽略。