如何索引Solr-5.2.1中的大內容？

我們有像超過32KB的內容，我們無法索引內容如何索引Solr-5.2.1中的大內容？

請參考下面記錄

Rails日誌：

RSolr::Error::Http: RSolr::Error::Http - 400 Bad Request Error: {'responseHeader'=>{'status'=>400,'QTime'=>13},'error'=>{'msg'=>'Exception writing document id Article 872cc4f7-8731-4049-b889-85a040edb543 to the index; possible analysis error.','code'=>400}}

Solr的日誌：

INFO - 2015-11-04 15:00:30.772; [ collection] org.apache.solr.update.processor.LogUpdateProcessor; [collection] webapp=/solr path=/update params={wt=ruby} {} 0 27 

ERROR - 2015-11-04 15:00:30.779; [ collection] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Exception writing document id Article 872cc4f7-8731-4049-b889-85a040edb543 to the index; possible analysis error.

。。。

Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="content_textv" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[60, 112, 62, 83, 109, 97, 108, 108, 32, 97, 110, 100, 32, 77, 101, 100, 105, 117, 109, 32, 83, 99, 97, 108, 101, 32, 69, 110, 116, 101]...'

內容字段類型：

<field name="content_textv" type="strings"/>

....

<fieldType name="strings" class="solr.StrField" multiValued="true" sortMissingLast="true"/>

如何索引大內容是什麼？

來源

2015-11-04 VtrKanna

你能提供的字段類型定義爲這個'content_textv'和部分樣本數據？ – YoungHobbit

content_textv是字符串字段@YoungHobbit – VtrKanna

而不是solr.StrField使用solr.TextField。創建一個新的字段類型一樣 -

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="false"> 
    <analyzer type="index"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> 
    <!-- in this example, we will only use synonyms at query time 
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> 
    --> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> 
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
    </analyzer> 
</fieldType>

比你可以使用該字段類型爲 -

<field name="content_textv" type="text_general" indexed="true" stored="false" multiValued="true"/>

來源

2015-11-05 10:19:38

謝謝@Bhagwat Mane，我們只是將字符串類型改爲text_general，它爲我們工作，非常感謝。 – VtrKanna

如何索引Solr-5.2.1中的大內容？

回答

相關問題