2017-02-18 69 views
1

我按照從Solr中的文檔中的拼寫檢查的例子。如何Solr的整理工作

我已經使用了CONFIGS:

<!-- a spellchecker built from a field of the main index --> 
<lst name="spellchecker"> 
    <str name="name">default</str> 
    <str name="field">name_spell</str> 
    <str name="classname">solr.DirectSolrSpellChecker</str> 
    <!-- the spellcheck distance measure used, the default is the internal levenshtein --> 
    <str name="distanceMeasure">internal</str> 
    <!-- minimum accuracy needed to be considered a valid spellcheck suggestion --> 
    <float name="accuracy">0.5</float> 
    <!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 --> 
    <int name="maxEdits">2</int> 
    <!-- the minimum shared prefix when enumerating terms --> 
    <int name="minPrefix">1</int> 
    <!-- maximum number of inspections per result. --> 
    <int name="maxInspections">5</int> 
    <!-- minimum length of a query term to be considered for correction --> 
    <int name="minQueryLength">4</int> 
    <!-- maximum threshold of documents a query term can appear to be considered for correction --> 
    <float name="maxQueryFrequency">0.01</float> 
    <!-- uncomment this to require suggestions to occur in 1% of the documents --> 
    <!-- <float name="thresholdTokenFrequency">.01</float> --> 

</lst> 
<lst name="spellchecker"> 
    <str name="name">wordbreak</str> 
    <str name="classname">solr.WordBreakSolrSpellChecker</str>  
    <str name="field">name_spell</str> 
    <str name="combineWords">true</str> 
    <str name="breakWords">true</str> 
    <int name="maxChanges">10</int> 
</lst> 
</searchComponent> 

處理程序:

<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy"> 
    <lst name="defaults"> 
     <str name="spellcheck.dictionary">default</str> 
     <str name="spellcheck.dictionary">wordbreak</str> 
     <str name="spellcheck">on</str> 
     <str name="spellcheck.extendedResults">true</str>  
     <str name="spellcheck.count">10</str> 
     <str name="spellcheck.alternativeTermCount">5</str> 
     <str name="spellcheck.maxResultsForSuggest">5</str>  
     <str name="spellcheck.collate">true</str> 
     <str name="spellcheck.collateExtendedResults">true</str> 
     <str name="spellcheck.maxCollationTries">10</str> 
     <str name="spellcheck.maxCollations">5</str>   
    </lst> 
    <arr name="last-components"> 
     <str>spellcheck_new</str> 
    </arr> 
    </requestHandler> 

架構字段:

<field name="attribute_key" type="text" indexed="true" stored="true" multiValued="false" /> 
    <field name="spell_check_field" type="text_spell" indexed="true" stored="false" multiValued="true"/> 
    <copyField source="attribute_key" dest="spell_check_field" /> 
    <field name="name_spell" type="text_general" indexed="true" stored="false" multiValued="false"/> 
    <copyField source="attribute_key" dest="name_spell" /> 
    <field name="attribute_key_tag" type="tag" stored="false" omitTermFreqAndPositions="true" omitNorms="true" multiValued="true"/> 
    <copyField source="attribute_key" dest="attribute_key_tag" multiValued="true"/> 
    <field name="attribute_value" type="string" indexed="false" stored="true" multiValued="false" /> 
    <defaultSearchField>attribute_key</defaultSearchField> 

我看到的建議完美的工作。但是整理數組對於所有查詢都是空的。

防爆查詢:

http://localhost:8984/solr/spell_check/spell?spellcheck.q=nike%20shoes&spellcheck=true&spellcheck.collate=true&wt=json&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true 

結果:

{ 
"responseHeader": { 
"zkConnected": true, 
"status": 0, 
"QTime": 60 
}, 
"response": { 
"numFound": 0, 
"start": 0, 
"docs": [] 
}, 
"spellcheck": { 
"suggestions": [ 
"nike", 
{ 
"numFound": 6, 
"startOffset": 0, 
"endOffset": 4, 
"origFreq": 2, 
"suggestion": [ 
{ 
"word": "n i k e", 
"freq": 19 
}, 
{ 
"word": "nine", 
"freq": 1 
}, 
{ 
"word": "none", 
"freq": 29 
}, 
{ 
"word": "note", 
"freq": 5 
}, 
{ 
"word": "nicka", 
"freq": 2 
}, 
{ 
"word": "nino", 
"freq": 2 
} 
] 
}, 
"shoes", 
{ 
"numFound": 10, 
"startOffset": 5, 
"endOffset": 10, 
"origFreq": 0, 
"suggestion": [ 
{ 
"word": "shoe", 
"freq": 30 
}, 
{ 
"word": "shoe s", 
"freq": 30 
}, 
{ 
"word": "short", 
"freq": 30 
}, 
{ 
"word": "s h o e s", 
"freq": 4 
}, 
{ 
"word": "sheer", 
"freq": 15 
}, 
{ 
"word": "sheen", 
"freq": 4 
}, 
{ 
"word": "sheet", 
"freq": 3 
}, 
{ 
"word": "shower", 
"freq": 2 
}, 
{ 
"word": "shock", 
"freq": 1 
}, 
{ 
"word": "shred", 
"freq": 1 
} 
] 
} 
], 
"correctlySpelled": false, 
"collations": [] 
} 
} 

如何設置的排序規則嗎?

+0

有你解決了這個,我也面臨着同樣的。排序規則總是空的,正確排除總是錯誤的。 – userab

回答

0

讓我們先來看看文檔中定義爲SpellCheck Collate

Solr的原因基於在提交的查詢每個 項最佳建議建立一個新的查詢。

長話短說,當您指定spellcheck.collat​​e =真正發生的事情是,你問Solr的建議,你可以重新執行一個新的查詢,會比你收到的建議的組合更好。讓我給你看幾個例子。

  • 比方說,你想搜索

初步審計

  • 而不管出於什麼原因,它被輸入爲

initila AUD TI

  • 隨着整理假,你會得到以下拼寫檢查建議

<lst name="suggestions"> 
     <lst name="initila"> 
      <int name="numFound">5</int> 
      <int name="startOffset">1</int> 
      <int name="endOffset">8</int> 
      <arr name="suggestion"> 
       <str>initial</str> 
       <str>initi la</str> 
       <str>initiala</str> 
       <str>ini tila</str> 
       <str>initilal</str> 
      </arr> 
     </lst> 
     <lst name="audt"> 
      <int name="numFound">4</int> 
      <int name="startOffset">9</int> 
      <int name="endOffset">13</int> 
      <arr name="suggestion"> 
       <str>aud t</str> 
       <str>audit</str> 
       <str>au dt</str> 
       <str>audi</str> 
      </arr> 
     </lst> 
    </lst> 

這意味着你將有每個字的若干建議

  • 但如果你 打開排序規則,最有可能 - 如果有的話 - 建議應執行的查詢是什麼。它不能保證是最好的,雖然,認爲它是一個很好的猜測,可以幫助你

    <lst name="suggestions"> 
        <lst name="initila"> 
         <int name="numFound">5</int> 
         <int name="startOffset">1</int> 
         <int name="endOffset">8</int> 
         <arr name="suggestion"> 
          <str>initial</str> 
          <str>initi la</str> 
          <str>initiala</str> 
          <str>ini tila</str> 
          <str>initilal</str> 
         </arr> 
        </lst> 
        <lst name="audti"> 
         <int name="numFound">5</int> 
         <int name="startOffset">9</int> 
         <int name="endOffset">14</int> 
         <arr name="suggestion"> 
          <str>audit</str> 
          <str>audt i</str> 
          <str>auditi</str> 
          <str>au dti</str> 
          <str>audtis</str> 
         </arr> 
        </lst> 
        <lst name="collation"> 
         <str name="collationQuery">initial audit</str> 
         <int name="hits">1983</int> 
         <lst name="misspellingsAndCorrections"> 
          <str name="initila">initial</str> 
          <str name="audti">audit</str> 
         </lst> 
        </lst> 
    </lst> 
    

,這將是推薦的查詢

初步審計

這是從這裏獲得的

<str name="collationQuery">initial audit</str> 

和歸類僅如果在索引推薦的查詢,將滿足你在找什麼

+0

您已經解釋了集合是如何工作的,但您是否也可以查看問題,即'但所有查詢的排序規則數組始終爲空'。爲什麼排序規則數組總是空的。 – userab

+0

一種可能性是該詞典尚未建立,但更有可能被搜索的詞語尚未達到要求返回的建議所需的閾值。看看這個其他職位:https://stackoverflow.com/questions/6653186/solr-suggester-not-returning-any-results – xmorera

+0

我已經建立了字典和門檻也less.You可以通過我檢查其他答案。當未指定默認字段時,Collamentation可以使用q而不使用spellcheck.q。爲什麼行爲就是這樣,不確定。 – userab

0

以下方法解決我的問題的工作:

  1. requestHandler添加默認字段爲defaults孩子列表即<str name="df">name_spell</str>。現在執行您的查詢將給出collations結果。這裏可以使用qspellcheck.q中的任何一個。

OR

  • 使用q代替spellcheck.q和同時使用q指定字段即代替spellcheck.q=nike%20shoes使用q=name_spell:(nike%20shoes)和它將使collations結果。