2012-07-11 104 views
1

我讀過關於在Lucene的突出的搜索字詞一些教程,並用一塊像這樣的代碼上來:如何在pyLucene中使用熒光筆?

(...) 
query = parser.parse(query_string) 

for scoreDoc in searcher.search(query, 50).scoreDocs: 
    doc = searcher.doc(scoreDoc.doc) 
    filename = doc.get("filename") 
    print filename 
    found_paraghaph = fetch_from_my_text_library(filename) 

    stream = lucene.TokenSources.getTokenStream("contents", found_paraghaph, analyzer); 
    scorer = lucene.Scorer(query, "contents", lucene.CachingTokenFilter(stream)) 
    highligter = lucene.Highligter(scorer) 
    fragment = highligter.getBestFragment(analyzer, "contents", found_paraghaph) 
    print '>>>' + fragment 

但是這一切都以錯誤結束:

Traceback (most recent call last): 
    File "./search.py", line 76, in <module> 
    scorer = lucene.Scorer(query, "contents", lucene.CachingTokenFilter(stream)) 
NotImplementedError: ('instantiating java class', <type 'Scorer'>) 

所以,我猜測,這部分Lucene並沒有在pyLucene中實現。有沒有其他方法可以做到這一點?

回答

4

我也有類似的錯誤。我認爲這個類的包裝器尚未在Pylucene v3.6中實現。

你可能想嘗試以下操作:

analyzer = StandardAnalyzer(Version.LUCENE_CURRENT) 

# Constructs a query parser. 
queryParser = QueryParser(Version.LUCENE_CURRENT, FIELD_CONTENTS, analyzer) 

# Create a query 
query = queryParser.parse(QUERY_STRING) 

topDocs = searcher.search(query, 50) 

# Get top hits 
scoreDocs = topDocs.scoreDocs 
print "%s total matching documents." % len(scoreDocs) 

HighlightFormatter = SimpleHTMLFormatter(); 
highlighter = Highlighter(HighlightFormatter, QueryScorer (query)) 

for scoreDoc in scoreDocs: 
    doc = searcher.doc(scoreDoc.doc) 
    text = doc.get(FIELD_CONTENTS) 
    ts = analyzer.tokenStream(FIELD_CONTENTS, StringReader(text)) 
    print doc.get(FIELD_PATH) 
    print highlighter.getBestFragments(ts, text, 3, "...") 
    print "" 

請注意,我們在搜索結果中的每一項創建令牌流。

+1

謝謝!似乎這裏最重要的部分是創建'QueryScorer'而不是'Scorer' - 現在,當我在Lucene的文檔中查找它時,發現'Scorer'是一個抽象類,所以這就是錯誤出現的原因。並且名字'NotImplementedError'在這裏是相當誤導的... – mik01aj 2012-09-26 18:10:21

+0

代碼很好用。有一點要提,StringReader是從java.io中導入的,而不是從lucene中導入的。 – vancexu 2014-04-22 05:24:43