在Solr中，爲什麼「構建」不是被阻止「構建」，而是「構建」？

我想弄清楚兩件事情在此公告：在Solr中，爲什麼「構建」不是被阻止「構建」，而是「構建」？

爲什麼「建」不被朵朵到「構建」即使字段類型定義有定義的詞幹。然而，'建設'是被阻止'構建'
如何使用盧克檢查索引，看看哪些詞被阻止和什麼。我無法看到在盧克建造'建造' 。我知道Lucene正在阻止它，因爲我能夠通過搜索 'build'成功檢索到'building'行。

這個link是相當有幫助，但沒有回答我的問題。

僅供參考，這裏是schema.xml部分。

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> 
    <analyzer type="index"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <!-- in this example, we will only use synonyms at query time 
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> 
    --> 
    <!-- Case insensitive stop word removal. 
     add enablePositionIncrements=true in both the index and query 
     analyzers to leave a 'gap' for more accurate phrase queries. 
    --> 
    <filter class="solr.StopFilterFactory" 
      ignoreCase="true" 
      words="stopwords_en.txt" 
      enablePositionIncrements="true" 
      /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
    <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory: 
    <filter class="solr.EnglishMinimalStemFilterFactory"/> 
    --> 
    <filter class="solr.PorterStemFilterFactory"/> 
    </analyzer> 
    <analyzer type="query"> 
    <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
    <filter class="solr.StopFilterFactory" 
      ignoreCase="true" 
      words="stopwords_en.txt" 
      enablePositionIncrements="true" 
      /> 
    <filter class="solr.LowerCaseFilterFactory"/> 
    <filter class="solr.EnglishPossessiveFilterFactory"/> 
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> 
    <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory: 
    <filter class="solr.EnglishMinimalStemFilterFactory"/> 
    --> 
    <filter class="solr.PorterStemFilterFactory"/> 
    </analyzer> 
</fieldType>

和字段定義是

<field name="features" type="text_en" indexed="true" stored="true" multiValued="true"/>

數據集由多個文件，1號文件在同一領域中的特徵場「建築」，1個文件已經「建成」，並1號文件在功能領域的 '內置'：

文件：hd.xml：

<field name="features">building NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor</field>

文件ipod_video.xml：

<field name="features">Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication</field>

文件sd500.xml：

<field name="features">built in flash, red-eye reduction</field>

使用Lukeall-3.3.0，這是結果我從尋求獲得 '的特點：建設'。請注意，我找回1（而不是預期的3號文件），即便是在一個文件 enter image description here ，我沒有看到所產生的，即，我只看到了原詞如圖所示，「建築」：

，並再次在盧克，搜索「功能：內置」，返回兩個文件： enter image description here

選擇其中的一個，顯示了原來的「建」，但不是「建設」。 enter image description here

來源

2011-08-18 jabawaba

詞幹算法只是爲了任何人誰通過谷歌以後發現這個評論 - 你問什麼，通常被稱爲lemmatisation，不制止（如通常只是單詞的尾部劃分，並沒有從單詞本身獲得任何背景或意義，也沒有使用字典來查找同一單詞的其他形式）。 – MatsLindh

對於特殊情況下，這樣，你可以調整與StemmerOverrideFilter

來源

2011-08-18 02:55:34

謝謝羅伯特。我不會認爲這是特殊的。我的印象是，所有*英語單詞都可以根植於其中。建/建立，選擇/選擇，領導/領導，...都是我會假設PorterFilter能夠處理的例子。否則，如果沒有，我怎麼知道哪些不是，除非我嘗試整本字典或等待用戶抱怨。 – jabawaba

在Solr中，爲什麼「構建」不是被阻止「構建」，而是「構建」？

回答

相關問題