2017-06-21 58 views
0

我指下面給出的Cloudera的搜索索引文件 -如何在索引Solr中的數據時解決MAX_ARRAY_LENGTH錯誤?

https://www.cloudera.com/documentation/enterprise/5-9-x/topics/search_data_index_prepare.html https://www.cloudera.com/documentation/enterprise/5-9-x/topics/search_batch_index_use_mapreduce.html

我已經準備集合和模式文件和文件morphline按照我的數據集,則在csv格式。

id jobtitle jobdescription city state classification salary 
1 Senior Android Developer complex problem solving New Hope PA it 94036 
2 Mobile Solutions Developer complex problem solving Glen Allen VA it 60726 

我使用的MRIT命令是:

sudo -u hdfs hadoop \ 
--config /etc/hadoop/conf.cloudera.yarn \ 
jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-*-job.jar 
org.apache.solr.hadoop.MapReduceIndexerTool \ 
-D 'mapred.child.java.opts=-Xmx500m' \ 
--log4j /opt/cloudera/parcels/CDH/share/doc/search- 
1.0.0+cdh5.8.3+0/examples/solr-nrt/log4j.properties \ 
--morphline-file $HOME/jobs.conf \ 
--output-dir NN:8020/user/$USER/outdir \ 
--zk-host localhost/solr --collection jobs\ 
--go-live \ 
NN:8020/user/$USER/indir 

下面是我的架構文件 -

<?xml version="1.0" encoding="UTF-8" ?> 
<schema name="example" version="1.5"> 
<fields> 

    <!-- Posts --> 
    <field name="id" type="string" indexed="true" stored="true" 
    required="true"/> 
    <field name="jobtitle" type="text_general" indexed="true" 
    stored="true"/> 
    <field name="jobdescription" type="text_general" indexed="true" 
    stored="true" termVectors="true"/> 
    <field name="classification" type="splitOnPeriod" indexed="true" 
    stored="true"/> 
    <field name="city" type="text_general" indexed="true" stored="true"/> 
    <field name="state" type="text_general" indexed="true" stored="true"/> 
    <field name="salary" type="int" indexed="true" stored="true"/> 
    <field name="_version_" type="long" indexed="true" stored="true"/> 
    <field name="content" type="text_general" indexed="true" stored="true" 
    multiValued="true"/> 
    <field name="text" type="text_general" indexed="false" stored="true" 
    multiValued="true"/> 

    <copyField source="jobtitle" dest="content" /> 
    <copyField source="jobdescription" dest="content" /> 
</fields> 

<types> 
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> 
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" 
    positionIncrementGap="0"/> 
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0"     
    positionIncrementGap="0"/> 
    <fieldType name="date" class="solr.TrieDateField" precisionStep="0" 
    positionIncrementGap="0"/> 

    <fieldType name="text_general" class="solr.TextField" 
    positionIncrementGap="100"> 
     <analyzer> 
      <tokenizer class="solr.StandardTokenizerFactory"/> 
      <filter class="solr.LowerCaseFilterFactory"/> 
     </analyzer> 
    </fieldType> 

    <fieldType name="splitOnPeriod" class="solr.TextField" 
    positionIncrementGap="100"> 
     <analyzer> 
      <tokenizer class="solr.PatternTokenizerFactory" pattern="\." /> 
      <filter class="solr.LowerCaseFilterFactory"/> 
     </analyzer> 
    </fieldType>   
</types> 

<uniqueKey>id</uniqueKey> 

</schema> 

我做了幹運行,它的工作,但與上線我總是得到MAX_ARRAY_LENGTH錯誤。

1554 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Indexing 1 
files using 1 real mappers into 2 reducers 
Error: MAX_ARRAY_LENGTH 

該錯誤似乎在映射階段。

1686 [main] ERROR org.apache.hadoop.mapred.YarnChild - Error running child 
: java.lang.NoSuchFieldError: MAX_ARRAY_LENGTH 
    at org.apache.lucene.codecs.memory.DirectDocValuesFormat.<clinit> 
(DirectDocValuesFormat.java:58) 

請幫我解決這個問題。

回答

0

當您的環境中沒有正確安裝某些東西,並且在org/apache/lucene/util/ArrayUtil.class中將max_array_length屬性設置爲1時,通常會發生此錯誤。您可以升級您的CDH以擺脫該錯誤,或​​者您可以將該Java類中的變量的堆大小增加到2或更大。我在不同的環境中嘗試了相同的MRIT命令,並且它工作正常。

參考 - https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/util/ArrayUtil.html

相關問題