如何使Elasticsearch首選匹配字符串進行排序/偏好匹配

我正在使用默認分析器和索引。所以我們可以說我有這個簡單的映射：（這是一個例子對不起，如果有錯別字）如何使Elasticsearch首選匹配字符串進行排序/偏好匹配

"question": { 
    "properties": { 
     "title": { 
      "type": "string" 
     }, 
     "answer": { 
      "properties": { 
       "text": { 
        "type": "string" 
       } 
      } 
     } 
    } 
}

現在，我執行下面的搜索。

GET _search 
{ 
    "query": { 
     "query_string": { 
      "query": "yes correct", 
      "fields": ["answer.text"] 
     } 
    } 
}

結果將得分爲text的值，如「是正確的」。（文檔ID值1）高於簡單的「是的正確」（沒有一個句點，文檔ID值181）。兩個匹配都具有相同的分數值，但匹配數組首先列出了較小的doc ID。我知道默認索引選項包括按文檔ID排序，那麼如何排除該屬性並仍使用其餘默認選項？

我沒有設置任何自定義分析器，所以一切都使用Elasticsearch 2.0的默認值。

來源

2015-11-03 user5243421

請注意''fields'「應該是」default_field「，否則查詢將不起作用。兩人在我的最後都得到了完全相同的分數。你能展示你正在基於自己的樣本文件嗎？ – Val

對不起，我想我的代碼中有一個錯字。使用'fields'對我有用，並將其更改爲'default_field'不會改變匹配分數。我也沒有意識到分數是完全一樣的。 * oops * – user5243421

我的不好，抱歉，''fields「'當然需要一些咖啡:)' – Val

這可能是一個用例Dis Max Query

生成的通過其子查詢產生的文檔的聯合查詢，並且分數具有最大分數的文檔作爲由任何子查詢產生的每個文檔，再加上一條打破增加額外的匹配子查詢。

因此，您需要將您的答案分數作爲完全匹配並給予最高提升。你必須爲此使用自定義分析器。這會是你的映射：

PUT /test 
{ 
    "settings": { 
    "analysis": { 
     "analyzer": { 
     "my_keyword": { 
      "type": "custom", 
      "tokenizer": "keyword", 
      "filter": [ 
      "asciifolding", 
      "lowercase" 
      ] 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "question": { 
     "properties": { 
     "title": { 
      "type": "string" 
     }, 
     "answer": { 
      "type": "object", 
      "properties": { 
      "text": { 
       "type": "string", 
       "analyzer": "my_keyword", 
       "fields": { 
       "stemmed": { 
        "type": "string", 
        "analyzer": "standard" 
       } 
       } 
      } 
      } 
     } 
     } 
    } 
    } 
}

您的測試數據：

PUT /test/question/1 
{ 
    "title": "title nr1", 
    "answer": [ 
    { 
     "text": "yes correct." 
    } 
    ] 
} 

PUT /test/question/2 
{ 
    "title": "title nr2", 
    "answer": [ 
    { 
     "text": "yes correct" 
    } 
    ] 
}

現在，當你使用這樣的查詢查詢"yes correct."：

POST /test/_search 
{ 
    "query": { 
    "dis_max": { 
     "tie_breaker": 0.7, 
     "boost": 1.2, 
     "queries": [ 
     { 
      "match": { 
      "answer.text": { 
       "query": "yes correct.", 
       "type": "phrase" 
      } 
      } 
     }, 
     { 
      "match": { 
      "answer.text.stemmed": { 
       "query": "yes correct.", 
       "operator": "and" 
      } 
      } 
     } 
     ] 
    } 
    } 
}

你得到這樣的輸出：

{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 5, 
     "successful": 5, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 2, 
     "max_score": 0.37919715, 
     "hits": [ 
     { 
      "_index": "test", 
      "_type": "question", 
      "_id": "1", 
      "_score": 0.37919715, 
      "_source": { 
       "title": "title nr1", 
       "answer": [ 
        { 
        "text": "yes correct." 
        } 
       ] 
      } 
     }, 
     { 
      "_index": "test", 
      "_type": "question", 
      "_id": "2", 
      "_score": 0.11261705, 
      "_source": { 
       "title": "title nr2", 
       "answer": [ 
        { 
        "text": "yes correct" 
        } 
       ] 
      } 
     } 
     ] 
    } 
}

如果y OU運行同樣的查詢，而尾隨點，然後成爲"yes correct"，你得到這樣的結果：

{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 5, 
     "successful": 5, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 2, 
     "max_score": 0.37919715, 
     "hits": [ 
     { 
      "_index": "test", 
      "_type": "question", 
      "_id": "2", 
      "_score": 0.37919715, 
      "_source": { 
       "title": "title nr2", 
       "answer": [ 
        { 
        "text": "yes correct" 
        } 
       ] 
      } 
     }, 
     { 
      "_index": "test", 
      "_type": "question", 
      "_id": "1", 
      "_score": 0.11261705, 
      "_source": { 
       "title": "title nr1", 
       "answer": [ 
        { 
        "text": "yes correct." 
        } 
       ] 
      } 
     } 
     ] 
    } 
}

希望這是你在找什麼。

順便說一句，我建議在執行文本搜索時總是使用Match查詢。從資料爲準：

比較QUERY_STRING /場

匹配家庭查詢不會通過「查詢解析」的過程走的。它不支持字段名稱前綴，通配符或其他「高級」功能。由於這個原因，它失敗的機率很小/非存在，它提供了一個很好的行爲，當它涉及到分析和運行該文本作爲查詢行爲（這通常是什麼文本搜索框）。另外，phrase_prefix類型可以提供一個很棒的「你輸入」行爲來自動加載搜索結果。

來源

2015-11-03 07:30:41

感謝您推薦'match'。我剛剛開始使用Elasticsearch，需要付出很多努力。我認爲這將像設置索引並開始查詢一樣簡單！哈哈 – user5243421

我不確定我遵循我將不得不用作「其他東西」的東西。我只是想匹配''是的正確的'' - 精確匹配排序和/或得分高於類似'「是的正確。」'。 – user5243421

我誤解了你一下。我會很快更新我的答案。 –

Elasticsearch或更確切地說Lucene評分沒有考慮到令牌的相對定位。它utlizes 3條不同的規定 - 做同樣的

詞頻 - 頻率處的搜索詞出現在文檔
倒排文檔頻率 - 在整個數據庫中搜索詞的出現次數。發生的次數越多，常見搜索詞越少，搜索詞重要性越低
字段長度標準化 - 目標字段中存在的標記數。

您可以瞭解更多關於它here。

來源

2015-11-03 04:23:31

這很令人困惑，因爲它表示'string'字段默認情況下分析爲位置：https：//www.elastic .co/guide/en/elasticsearch/reference/2.0/index-options.html – user5243421

該位置也存儲，但不用於計算相關性。 –

那麼我們可以告訴Elasticsearch在計算相關性時使用位置嗎？我覺得我的情況應該足夠普遍，我應該能夠找到答案的地方，但我有很多困擾尋找正確的術語... – user5243421

如何使Elasticsearch首選匹配字符串進行排序/偏好匹配

回答

相關問題