elasticsearch copy_to字段行爲不像聚合預期

我有一個索引映射與兩個字符串字段，field1和field2，都被聲明爲copy_to到另一個字段稱爲all_fields。 all_fields被索引爲「not_analyzed」。elasticsearch copy_to字段行爲不像聚合預期

當我在all_fields上創建存儲桶聚合時，我期待將field1和field2的鍵連接在一起的不同存儲桶。取而代之的是，我使用field1和field2的鍵未分開的單獨桶。

實施例：映射：在

{ 
    "mappings": { 
     "myobject": { 
     "properties": { 
      "field1": { 
      "type": "string", 
      "index": "analyzed", 
      "copy_to": "all_fields" 
      }, 
      "field2": { 
      "type": "string", 
      "index": "analyzed", 
      "copy_to": "all_fields" 
      }, 
      "all_fields": { 
      "type": "string", 
      "index": "not_analyzed" 
      } 
     } 
     } 
    } 
    }

數據：

{ 
    "field1": "dinner carrot potato broccoli", 
    "field2": "something here", 
    }

和

{ 
    "field1": "fish chicken something", 
    "field2": "dinner", 
    }

聚合：

{ 
    "aggs": { 
    "t": { 
     "terms": { 
     "field": "all_fields" 
     } 
    } 
    } 
}

結果：

... 
"aggregations": { 
    "t": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
      { 
       "key": "dinner", 
       "doc_count": 1 
      }, 
      { 
       "key": "dinner carrot potato broccoli", 
       "doc_count": 1 
      }, 
      { 
       "key": "fish chicken something", 
       "doc_count": 1 
      }, 
      { 
       "key": "something here", 
       "doc_count": 1 
      } 
     ] 
    } 
}

我所期待的只有2桶，fish chicken somethingdinner和dinner carrot potato broccolisomethinghere

我在做什麼錯？

來源

2015-07-22 adapt-dev

你在找什麼是串聯的兩個字符串。 copy_to即使看起來是在做這件事，事實並非如此。通過copy_to，您概念上可以創建一組來自field1和field2的值，而不是將它們連接起來。

您的使用情況下，你有兩個選擇：

使用_source transformation
執行腳本聚集

我會建議_source轉型，因爲我覺得它比做腳本更有效。意思是，在編制索引時你付出一些代價，而不是做一個沉重的腳本聚合。

對於_source改造：

PUT /lastseen 
{ 
    "mappings": { 
    "test": { 
     "transform": { 
     "script": "ctx._source['all_fields'] = ctx._source['field1'] + ' ' + ctx._source['field2']" 
     }, 
     "properties": { 
     "field1": { 
      "type": "string" 
     }, 
     "field2": { 
      "type": "string" 
     }, 
     "lastseen": { 
      "type": "long" 
     }, 
     "all_fields": { 
      "type": "string", 
      "index": "not_analyzed" 
     } 
     } 
    } 
    } 
}

和查詢：

GET /lastseen/test/_search 
{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "field": "all_fields", 
     "size": 10 
     } 
    } 
    } 
}

對於腳本聚集，更容易做的（意思是，使用doc['field'].value而不是更昂貴_source.field ）將.raw子字段添加到field1和field2：

PUT /lastseen 
{ 
    "mappings": { 
    "test": { 
     "properties": { 
     "field1": { 
      "type": "string", 
      "fields": { 
      "raw": { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }, 
     "field2": { 
      "type": "string", 
      "fields": { 
      "raw": { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }, 
     "lastseen": { 
      "type": "long" 
     } 
     } 
    } 
    } 
}

和腳本將使用這些.raw子字段：

{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "script": "doc['field1.raw'].value + ' ' + doc['field2.raw'].value", 
     "size": 10, 
     "lang": "groovy" 
     } 
    } 
    } 
}

沒有.raw子域（這是故意做成not_analyzed），你會需要做這樣的事情，這是更貴：

{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "script": "_source.field1 + ' ' + _source.field2", 
     "size": 10, 
     "lang": "groovy" 
     } 
    } 
    } 
}

來源

2015-07-22 07:07:03

elasticsearch copy_to字段行爲不像聚合預期

回答

相關問題