SolR：在TextField上刻面

我正在使用SolR雲6.5.0安裝。我的目標是檢索與我的搜索字詞共同出現的所有字詞，按數字對它們進行排名，並取得前N個字詞。要做到這一點，我已經定義了一個text_en_facets類型的字段，它定義了一個帶有PatternTokenizer的文本字段以及其他一些內容（文章最後的完整定義）。SolR：在TextField上刻面

現在我的實例包含相當長的一段數據：該字段包含1.3M獨特的術語和，結果，我得到以下錯誤：

o.a.s.s.FastLRUCache Error during auto-warming of key:payload_en_facets:org.apache.solr.common.SolrException: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field…

我注意到，other people had the same issue，如果有任何我想知道有關最佳做法的新聞和/或繞過這一限制的方法。如果我不需要重新索引數據或手動分析我的文檔以使用StrField s，那將會非常棒。

我已經嘗試了facet.method,facet.limit和facet.mincount的不同配置，但這並未解決問題。有沒有其他想法？

<fieldType name="text_en_facets" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer> 
     <!-- recognises e-mail addresses, urls, #-tags and @-mentions, alphanumeric words (possibly containing inner periods) --> 
     <tokenizer class="solr.PatternTokenizerFactory" 
        pattern="(?U)([\w-\.][email protected][\w-\.]+)|(https?:\S+)|((\s|^)[@#]\w+)|(\w+(\.\w+)?)" group="0"/> 
     <!-- there might be tokens containing trailing/leading white spaces --> 
     <filter class="solr.TrimFilterFactory"/> 
     <filter class="solr.LowerCaseFilterFactory"/> 
     <filter class="solr.StopFilterFactory" format="snowball" 
       words="stopwords/stopwords_en.txt,stopwords/stopwords_en_nltk.txt,stopwords/stopwords_en_twitter.txt" 
       ignoreCase="true"/> 
     <!-- kills urls --> 
     <filter class="solr.PatternReplaceFilterFactory" pattern="(?U)https?:\S+" replacement=""/> 
     <!-- kills numbers --> 
     <filter class="solr.PatternReplaceFilterFactory" pattern="(?U)^[0-9.,']+$" replacement=""/> 
     <!-- kills meaningless tokens --> 
     <filter class="solr.LengthFilterFactory" min="2" max="1024"/> 
    </analyzer> 
</fieldType>

來源

2017-05-09 Alberto

您是否從答案中嘗試了補丁？ – MatsLindh

hi @MatsLindh，還不是 – Alberto

這是對文本字段進行分面時使用的內部結構的限制。

它應該是可能的facet.method=enum，這將是在這種情況下很慢，以規避其
您可以嘗試索引分裂成很多碎片，但工作的機會取決於你的指數其術語分佈。此外，它可能會降低性能

我已經找到了問題，並寫了一個補丁（在https://github.com/tokee/lucene-solr/tree/uninvert-optimize代碼），但是這並不能幫助你的時刻。我正在進入Solr，因此請觀看https://issues.apache.org/jira/browse/SOLR-11240以獲取更新。

更新20170824：@Alberto我已將補丁添加到Solr，但由於計時問題，它不會成爲即將發佈的6.6.1和7.0版本的一部分。如果您現在需要它，我相當確信SOLR-11240問題修補程序可以完全適用於Solr 6.5+源代碼。

更新20171017：@Alberto該修復程序是今天早些時候發佈的Solr 7.1的一部分。如果你願意升級，這應該可以解決你的問題。

來源

2017-08-15 11:33:24

對不起，@Toke Eskildsen，我剛剛看過你的更新。目前，我的團隊決定按月創建新的集合。我會看看這個補丁;） – Alberto

SolR：在TextField上刻面

回答

相關問題