2017-02-10 87 views
0

我有一個查詢應該返回具有類似興趣的配置文件。問題是與更多匹配條件得分較低的文件。Elasticseach - 匹配更多條件的文檔的分數低於匹配的分數

bool查詢,我有shouldinterests = ['games', 'music', 'sport']

interests = ['games']文件獲得得分0.14981213

文檔與interests = ['games', 'music']得到得分0.11516824。

爲什麼?我使用AWS elasticsearch,v。2.3.2。

查詢看起來像:

{ 
    "explain": true, 
    "from": 0, 
    "query": { 
     "bool": { 
      "filter": [ 
       { 
        "bool": { 
         "must_not": [ 
          { 
           "term": { 
            "id": 3918 
           } 
          } 
         ] 
        } 
       } 
      ], 
      "should": [ 
       { 
        "terms": { 
         "interests": [ 
          "games", 
          "music", 
          "sport" 
         ] 
        } 
       } 
      ] 
     } 
    }, 
    "size": 10 
} 

然後,結果我得到:

{ 
    "_shards": { 
     "failed": 0, 
     "successful": 5, 
     "total": 5 
    }, 
    "hits": { 
     "hits": [ 
      { 
       "_explanation": { 
        "description": "sum of:", 
        "details": [ 
         { 
          "description": "match on required clause, product of:", 
          "details": [ 
           { 
            "description": "# clause", 
            "details": [], 
            "value": 0.0 
           }, 
           { 
            "description": "-id:`\b\u0000\u0000\u001eN #*:*, product of:", 
            "details": [ 
             { 
              "description": "boost", 
              "details": [], 
              "value": 1.0 
             }, 
             { 
              "description": "queryNorm", 
              "details": [], 
              "value": 0.4494364 
             } 
            ], 
            "value": 0.4494364 
           } 
          ], 
          "value": 0.0 
         }, 
         { 
          "description": "product of:", 
          "details": [ 
           { 
            "description": "sum of:", 
            "details": [ 
             { 
              "description": "weight(interests:games in 1) [PerFieldSimilarity], result of:", 
              "details": [ 
               { 
                "description": "score(doc=1,freq=1.0), product of:", 
                "details": [ 
                 { 
                  "description": "queryWeight, product of:", 
                  "details": [ 
                   { 
                    "description": "idf(docFreq=2, maxDocs=3)", 
                    "details": [], 
                    "value": 1.0 
                   }, 
                   { 
                    "description": "queryNorm", 
                    "details": [], 
                    "value": 0.4494364 
                   } 
                  ], 
                  "value": 0.4494364 
                 }, 
                 { 
                  "description": "fieldWeight in 1, product of:", 
                  "details": [ 
                   { 
                    "description": "tf(freq=1.0), with freq of:", 
                    "details": [ 
                     { 
                      "description": "termFreq=1.0", 
                      "details": [], 
                      "value": 1.0 
                     } 
                    ], 
                    "value": 1.0 
                   }, 
                   { 
                    "description": "idf(docFreq=2, maxDocs=3)", 
                    "details": [], 
                    "value": 1.0 
                   }, 
                   { 
                    "description": "fieldNorm(doc=1)", 
                    "details": [], 
                    "value": 1.0 
                   } 
                  ], 
                  "value": 1.0 
                 } 
                ], 
                "value": 0.4494364 
               } 
              ], 
              "value": 0.4494364 
             } 
            ], 
            "value": 0.4494364 
           }, 
           { 
            "description": "coord(1/3)", 
            "details": [], 
            "value": 0.33333334 
           } 
          ], 
          "value": 0.14981213 
         } 
        ], 
        "value": 0.14981213 
       }, 
       "_id": "3917", 
       "_index": "test_44024988_profiles", 
       "_node": "urWXg5KhREyffYielaa6Rw", 
       "_score": 0.14981213, 
       "_shard": 2, 
       "_source": { 
        "full_name": "Bob Doe", 
        "id": 3916, 
        "interests": [ 
         "games" 
        ], 
        "user_id": 3917 
       }, 
       "_type": "profile_document" 
      }, 
      { 
       "_explanation": { 
        "description": "sum of:", 
        "details": [ 
         { 
          "description": "match on required clause, product of:", 
          "details": [ 
           { 
            "description": "# clause", 
            "details": [], 
            "value": 0.0 
           }, 
           { 
            "description": "-id:`\b\u0000\u0000\u001eN #*:*, product of:", 
            "details": [ 
             { 
              "description": "boost", 
              "details": [], 
              "value": 1.0 
             }, 
             { 
              "description": "queryNorm", 
              "details": [], 
              "value": 0.9173473 
             } 
            ], 
            "value": 0.9173473 
           } 
          ], 
          "value": 0.0 
         }, 
         { 
          "description": "product of:", 
          "details": [ 
           { 
            "description": "sum of:", 
            "details": [ 
             { 
              "description": "weight(interests:games in 0) [PerFieldSimilarity], result of:", 
              "details": [ 
               { 
                "description": "score(doc=0,freq=1.0), product of:", 
                "details": [ 
                 { 
                  "description": "queryWeight, product of:", 
                  "details": [ 
                   { 
                    "description": "idf(docFreq=1, maxDocs=1)", 
                    "details": [], 
                    "value": 0.30685282 
                   }, 
                   { 
                    "description": "queryNorm", 
                    "details": [], 
                    "value": 0.9173473 
                   } 
                  ], 
                  "value": 0.2814906 
                 }, 
                 { 
                  "description": "fieldWeight in 0, product of:", 
                  "details": [ 
                   { 
                    "description": "tf(freq=1.0), with freq of:", 
                    "details": [ 
                     { 
                      "description": "termFreq=1.0", 
                      "details": [], 
                      "value": 1.0 
                     } 
                    ], 
                    "value": 1.0 
                   }, 
                   { 
                    "description": "idf(docFreq=1, maxDocs=1)", 
                    "details": [], 
                    "value": 0.30685282 
                   }, 
                   { 
                    "description": "fieldNorm(doc=0)", 
                    "details": [], 
                    "value": 1.0 
                   } 
                  ], 
                  "value": 0.30685282 
                 } 
                ], 
                "value": 0.08637618 
               } 
              ], 
              "value": 0.08637618 
             }, 
             { 
              "description": "weight(interests:music in 0) [PerFieldSimilarity], result of:", 
              "details": [ 
               { 
                "description": "score(doc=0,freq=1.0), product of:", 
                "details": [ 
                 { 
                  "description": "queryWeight, product of:", 
                  "details": [ 
                   { 
                    "description": "idf(docFreq=1, maxDocs=1)", 
                    "details": [], 
                    "value": 0.30685282 
                   }, 
                   { 
                    "description": "queryNorm", 
                    "details": [], 
                    "value": 0.9173473 
                   } 
                  ], 
                  "value": 0.2814906 
                 }, 
                 { 
                  "description": "fieldWeight in 0, product of:", 
                  "details": [ 
                   { 
                    "description": "tf(freq=1.0), with freq of:", 
                    "details": [ 
                     { 
                      "description": "termFreq=1.0", 
                      "details": [], 
                      "value": 1.0 
                     } 
                    ], 
                    "value": 1.0 
                   }, 
                   { 
                    "description": "idf(docFreq=1, maxDocs=1)", 
                    "details": [], 
                    "value": 0.30685282 
                   }, 
                   { 
                    "description": "fieldNorm(doc=0)", 
                    "details": [], 
                    "value": 1.0 
                   } 
                  ], 
                  "value": 0.30685282 
                 } 
                ], 
                "value": 0.08637618 
               } 
              ], 
              "value": 0.08637618 
             } 
            ], 
            "value": 0.17275237 
           }, 
           { 
            "description": "coord(2/3)", 
            "details": [], 
            "value": 0.6666667 
           } 
          ], 
          "value": 0.11516824 
         } 
        ], 
        "value": 0.11516824 
       }, 
       "_id": "3918", 
       "_index": "test_44024988_profiles", 
       "_node": "urWXg5KhREyffYielaa6Rw", 
       "_score": 0.11516824, 
       "_shard": 4, 
       "_source": { 
        "full_name": "Alex Test", 
        "id": 3917, 
        "interests": [ 
         "games", 
         "music" 
        ], 
        "user_id": 3918 
       }, 
       "_type": "profile_document" 
      }, 
      ... # not interesting doc 
     ], 
     "max_score": 0.14981213, 
     "total": 3 
    }, 
    "timed_out": false, 
    "took": 3 
} 

我的輸入數據:

[{ 
    "full_name": "Bob Doe", 
    "id": 3916, 
    "interests": [ 
     "games" 
    ], 
    "user_id": 3917 
}, { 
    "full_name": "Alex Test", 
    "id": 3917, 
    "interests": [ 
     "games", 
     "music" 
    ], 
    "user_id": 3918 
}, { 
    "full_name": "Joe Test", 
    "id": 3918, 
    "user_id": 3919 
}] 

回答

0

讓我們來看看得分公式在Elasticsearch中。

score(q,d) = 
      queryNorm(q) 
      · coord(q,d)  
      · ∑ (   
       tf(t in d) 
       · idf(t)²  
       · t.getBoost() 
       · norm(t,d)  
      ) (t in q)  

的參考量爲practical scoring formula,你可以在這裏得到一些描述,如果你不知道他們。但你的案例的解釋將非常簡單,它只是公式做的東西,以及所有這些因素的組合(tf,idf,queryNorm等)。此外,如果您的索引是虛擬的並且只包含一些文檔,則可能會發生這種情況,這些值非常奇怪。

我可以深入解釋,但主要是它是一個得分公式。如果你想修復它,這是另一個問題,你可以通過做不同的查詢來做到這一點

+0

嘿!感謝您的答覆。我理解這個公式,但現在有個問題 - 公式是錯誤的還是我的期望?我認爲'過濾器'不應該影響分數,'應該'作爲查詢應該工作很直接。 – marxin

+0

是的,你是正確的,過濾器不影響評分,這正是你的情況發生了什麼,你只是從條款查詢得到分數。事情是,我們可以親自計算tf idf,看看公式是否完全一樣,並相信我會的。因爲它考慮到術語 – Mysterion

+0

這個詞的稀有性,所以我不會說這個分數與公式給出的分數是不同的。讓我們同意,考慮到公式,它的工作是正確的,但我只是想知道它是否正常工作,考慮到普通用戶的期望。但也許那只是我。 其他的事情是,它似乎並不穩定。圍繞這個問題的更多背景是,這是我在單元測試中在CI服務器上得到的結果,在本地計算機上得分「正確」(遵循我的預期)。即使使用相同的elasticsearch運行,只是不同的索引名稱。 – marxin