在lucene索引文檔中查找和排列多個短語匹配

給定一系列包含文本的文檔，我想搜索短語並返回所有匹配並對它們進行排名。我知道如何讓lucene/solr指出哪些文檔匹配，並在文檔中突出顯示，但是如何獲得包含來自同一文檔的多個匹配的排名？在lucene索引文檔中查找和排列多個短語匹配

First document. It has a single line of text.

Second document. This text line is quite short. 
This is another line containing more text and is a bit longer.

如果我搜索「文本行」，那麼我想它找到的三場比賽，排名如下：

2nd document -> ...This "text line" is quite short. 
1st document -> ...It has a single "line of text". 
2nd document -> ...another "line containing more text" and is...

這可能嗎？怎麼樣？

來源

2012-01-17 Chris Leishman

我本來有一個更復雜的問題，其中包括這一點，在這裏：http://stackoverflow.com/questions/8883390/obtain-metadata-associated-with-matched-content-in-solr-lucene – 2012-01-17 13:40:02

爲什麼要在結果中兩次使用document2？也許你應該將每一行索引爲一個文檔... – naresh 2012-01-18 09:44:02

這就是我所說的，如果你想匹配成行，每一行作爲一個文檔。 – milan 2012-01-18 10:24:19

-1

如果您希望每行有一個匹配項，請將每行都設置爲自己的文檔。不要讓術語「文檔」與文本是否實際上是單個文件混淆。

如果你想保持一個鏈接回到文件，只需索引該id以及在一個不同的（存儲）字段。

{ id: "myfile.txt", 
    text: "first line" } 

{ id: "myfile.txt", 
    text: "second line" }

來源

2012-01-17 19:14:09 Xodarap

我並不是在談論文件 - 我正在談論lucene文檔。 – 2012-02-22 04:54:14

爲什麼使每行都是自己的文檔不起作用，是因爲我實際上希望能夠搜索可能跨越多行的短語。如果每行都是單獨的lucene文檔，那是不可能的。 – 2012-02-22 04:55:21

在lucene索引文檔中查找和排列多個短語匹配

回答

相關問題