2015-08-24 120 views
4

我想更改elasticsearch中的評分系統以擺脫計數術語的多次出現。例如,我想:elasticsearch禁用術語頻率評分

「得克薩斯州得克薩斯州得克薩斯州」

「得克薩斯」

出來的分數相同。我發現這個鍵盤映射elasticsearch表示將禁用詞頻統計,但我的搜索不出來的相同比分:

"mappings":{ 
"business": { 
    "properties" : { 
     "name" : { 
      "type" : "string", 
      "index_options" : "docs", 
      "norms" : { "enabled": false}} 
     } 
    } 
} 

}

任何幫助將不勝感激,我一直沒能找到很多這方面的信息。

編輯:

我加入我的搜索代碼,當我使用的解釋得到返回的東西。

我的搜索代碼:

Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "escluster").build(); 
    Client client = new TransportClient(settings) 
    .addTransportAddress(new InetSocketTransportAddress("127.0.0.1", 9300)); 

    SearchRequest request = Requests.searchRequest("businesses") 
      .source(SearchSourceBuilder.searchSource().query(QueryBuilders.boolQuery() 
      .should(QueryBuilders.matchQuery("name", "Texas") 
      .minimumShouldMatch("1")))).searchType(SearchType.DFS_QUERY_THEN_FETCH); 

    ExplainRequest request2 = client.prepareIndex("businesses", "business") 

,當我解釋我搜索得到:

"took" : 14, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 3, 
    "successful" : 3, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 2, 
    "max_score" : 1.0, 
    "hits" : [ { 
     "_shard" : 1, 
     "_node" : "BTqBPVDET5Kr83r-CYPqfA", 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9U5KBks4zEorv9YI4n", 
     "_score" : 1.0, 
     "_source":{ 
"name" : "texas" 
} 
, 
     "_explanation" : { 
     "value" : 1.0, 
     "description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:", 
     "details" : [ { 
      "value" : 1.0, 
      "description" : "fieldWeight in 0, product of:", 
      "details" : [ { 
      "value" : 1.0, 
      "description" : "tf(freq=1.0), with freq of:", 
      "details" : [ { 
       "value" : 1.0, 
       "description" : "termFreq=1.0" 
      } ] 
      }, { 
      "value" : 1.0, 
      "description" : "idf(docFreq=2, maxDocs=3)" 
      }, { 
      "value" : 1.0, 
      "description" : "fieldNorm(doc=0)" 
      } ] 
     } ] 
     } 
    }, { 
     "_shard" : 1, 
     "_node" : "BTqBPVDET5Kr83r-CYPqfA", 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9U5K6Ks4zEorv9YI4o", 
     "_score" : 0.8660254, 
     "_source":{ 
"name" : "texas texas texas" 
} 
, 
     "_explanation" : { 
     "value" : 0.8660254, 
     "description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:", 
     "details" : [ { 
      "value" : 0.8660254, 
      "description" : "fieldWeight in 0, product of:", 
      "details" : [ { 
      "value" : 1.7320508, 
      "description" : "tf(freq=3.0), with freq of:", 
      "details" : [ { 
       "value" : 3.0, 
       "description" : "termFreq=3.0" 
      } ] 
      }, { 
      "value" : 1.0, 
      "description" : "idf(docFreq=2, maxDocs=3)" 
      }, { 
      "value" : 0.5, 
      "description" : "fieldNorm(doc=0)" 
      } ] 
     } ] 
     } 
    } ] 
    } 

看起來它仍在考慮頻率和文檔頻率。有任何想法嗎?對不起格式不好,我不知道爲什麼它顯得那麼怪異。

編輯編輯:

我從瀏覽器搜索http://localhost:9200/businesses/business/_search?pretty=true&qname=texas 代碼:

{ 
    "took" : 2, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 3, 
    "successful" : 3, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 4, 
    "max_score" : 1.0, 
    "hits" : [ { 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9YcCKjKvtg8NgyozGK", 
     "_score" : 1.0, 
     "_source":{"business" : { 
"name" : "texas texas texas texas" } 
} 
    }, { 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9YateBKvtg8Ngyoy-p", 
     "_score" : 1.0, 
     "_source":{ 
"name" : "texas" } 

    }, { 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9YavVnKvtg8Ngyoy-4", 
     "_score" : 1.0, 
     "_source":{ 
"name" : "texas texas texas" } 

    }, { 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9Yb7NgKvtg8NgyozFf", 
     "_score" : 1.0, 
     "_source":{"business" : { 
"name" : "texas texas texas" } 
} 
    } ] 
    } 
} 

它發現的所有4個對象我在那裏,有他們都以同樣的比分。 當我運行我的Java API搜索與解釋,我得到:

{ 
    "took" : 2, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 3, 
    "successful" : 3, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 2, 
    "max_score" : 1.287682, 
    "hits" : [ { 
     "_shard" : 1, 
     "_node" : "BTqBPVDET5Kr83r-CYPqfA", 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9YateBKvtg8Ngyoy-p", 
     "_score" : 1.287682, 
     "_source":{ 
"name" : "texas" } 
, 
     "_explanation" : { 
     "value" : 1.287682, 
     "description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:", 
     "details" : [ { 
      "value" : 1.287682, 
      "description" : "fieldWeight in 0, product of:", 
      "details" : [ { 
      "value" : 1.0, 
      "description" : "tf(freq=1.0), with freq of:", 
      "details" : [ { 
       "value" : 1.0, 
       "description" : "termFreq=1.0" 
      } ] 
      }, { 
      "value" : 1.287682, 
      "description" : "idf(docFreq=2, maxDocs=4)" 
      }, { 
      "value" : 1.0, 
      "description" : "fieldNorm(doc=0)" 
      } ] 
     } ] 
     } 
    }, { 
     "_shard" : 1, 
     "_node" : "BTqBPVDET5Kr83r-CYPqfA", 
     "_index" : "businesses", 
     "_type" : "business", 
     "_id" : "AU9YavVnKvtg8Ngyoy-4", 
     "_score" : 1.1151654, 
     "_source":{ 
"name" : "texas texas texas" } 
, 
     "_explanation" : { 
     "value" : 1.1151654, 
     "description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:", 
     "details" : [ { 
      "value" : 1.1151654, 
      "description" : "fieldWeight in 0, product of:", 
      "details" : [ { 
      "value" : 1.7320508, 
      "description" : "tf(freq=3.0), with freq of:", 
      "details" : [ { 
       "value" : 3.0, 
       "description" : "termFreq=3.0" 
      } ] 
      }, { 
      "value" : 1.287682, 
      "description" : "idf(docFreq=2, maxDocs=4)" 
      }, { 
      "value" : 0.5, 
      "description" : "fieldNorm(doc=0)" 
      } ] 
     } ] 
     } 
    } ] 
    } 
} 
+0

不匹配可能更多的是與'doc frequency'有關,而不是'term frequencyc你在使用[search_type = dfs_query_then_fetch](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#query-then-fetch?q=query_then_fech )。如果這無助於嘗試在查詢中設置'explain = true'以查看計分故障 – keety

+0

,我將它切換到了dfs_query_then_fetch,但這不起作用。我將發佈我的代碼並在第二個問題解釋結果 – Chadvador

+0

您是否也可以發佈查詢? – keety

回答

1

看起來像一個不能覆蓋index options了場場後就一直初始集映射

例子:

put test 
put test/business/_mapping 
{ 

     "properties": { 
     "name": { 
      "type": "string", 
      "index_options": "freqs", 
      "norms": { 
       "enabled": false 
      } 
     } 
     } 

} 
put test/business/_mapping 
{ 

     "properties": { 
     "name": { 
      "type": "string", 
      "index_options": "docs", 
      "norms": { 
       "enabled": false 
      } 
     } 
     } 

} 
get test/business/_mapping 

    { 
    "test": { 
     "mappings": { 
     "business": { 
      "properties": { 
       "name": { 
        "type": "string", 
        "norms": { 
        "enabled": false 
        }, 
        "index_options": "freqs" 
       } 
      } 
     } 
     } 
    } 
} 

你將不得不重新創建索引來獲取新的映射

+0

嗯,這是尷尬,這是我自己的愚蠢,我正在使用我的瀏覽器使用命令:http:// localhost:9200/business/_search?pretty = true&explain = true&q = texas進行測試,在我將其更改爲「qname = texas」後,它的分數是相同。那麼,爲什麼它不能與我的java API搜索一起工作呢?它似乎就像我正在搜索名稱字段一樣? – Chadvador

+0

你可以粘貼整個片段或更好的解釋在java客戶端中設置的響應 – keety

+0

我很抱歉,我不知道如何在javaAPI中設置它,它似乎不是一個SearchRequest的選項。我將用代碼更新我的OP。 – Chadvador