2015-07-22 63 views
4

我有一個索引映射與兩個字符串字段,field1field2,都被聲明爲copy_to到另一個字段稱爲all_fieldsall_fields被索引爲「not_analyzed」。elasticsearch copy_to字段行爲不像聚合預期

當我在all_fields上創建存儲桶聚合時,我期待將field1和field2的鍵連接在一起的不同存儲桶。取而代之的是,我使用field1和field2的鍵未分開的單獨桶。

實施例: 映射:在

{ 
    "mappings": { 
     "myobject": { 
     "properties": { 
      "field1": { 
      "type": "string", 
      "index": "analyzed", 
      "copy_to": "all_fields" 
      }, 
      "field2": { 
      "type": "string", 
      "index": "analyzed", 
      "copy_to": "all_fields" 
      }, 
      "all_fields": { 
      "type": "string", 
      "index": "not_analyzed" 
      } 
     } 
     } 
    } 
    } 

數據:

{ 
    "field1": "dinner carrot potato broccoli", 
    "field2": "something here", 
    } 

{ 
    "field1": "fish chicken something", 
    "field2": "dinner", 
    } 

聚合:

{ 
    "aggs": { 
    "t": { 
     "terms": { 
     "field": "all_fields" 
     } 
    } 
    } 
} 

結果:

... 
"aggregations": { 
    "t": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
      { 
       "key": "dinner", 
       "doc_count": 1 
      }, 
      { 
       "key": "dinner carrot potato broccoli", 
       "doc_count": 1 
      }, 
      { 
       "key": "fish chicken something", 
       "doc_count": 1 
      }, 
      { 
       "key": "something here", 
       "doc_count": 1 
      } 
     ] 
    } 
} 

我所期待的只有2桶,fish chicken somethingdinnerdinner carrot potato broccolisomethinghere

我在做什麼錯?

回答

2

你在找什麼是串聯的兩個字符串。 copy_to即使看起來是在做這件事,事實並非如此。通過copy_to,您概念上可以創建一組來自field1field2的值,而不是將它們連接起來。

您的使用情況下,你有兩個選擇:

  1. 使用_source transformation
  2. 執行腳本聚集

我會建議_source轉型,因爲我覺得它比做腳本更有效。意思是,在編制索引時你付出一些代價,而不是做一個沉重的腳本聚合。

對於_source改造

PUT /lastseen 
{ 
    "mappings": { 
    "test": { 
     "transform": { 
     "script": "ctx._source['all_fields'] = ctx._source['field1'] + ' ' + ctx._source['field2']" 
     }, 
     "properties": { 
     "field1": { 
      "type": "string" 
     }, 
     "field2": { 
      "type": "string" 
     }, 
     "lastseen": { 
      "type": "long" 
     }, 
     "all_fields": { 
      "type": "string", 
      "index": "not_analyzed" 
     } 
     } 
    } 
    } 
} 

和查詢:

GET /lastseen/test/_search 
{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "field": "all_fields", 
     "size": 10 
     } 
    } 
    } 
} 

對於腳本聚集,更容易做的(意思是,使用doc['field'].value而不是更昂貴_source.field )將.raw子字段添加到field1field2

PUT /lastseen 
{ 
    "mappings": { 
    "test": { 
     "properties": { 
     "field1": { 
      "type": "string", 
      "fields": { 
      "raw": { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }, 
     "field2": { 
      "type": "string", 
      "fields": { 
      "raw": { 
       "type": "string", 
       "index": "not_analyzed" 
      } 
      } 
     }, 
     "lastseen": { 
      "type": "long" 
     } 
     } 
    } 
    } 
} 

和腳本將使用這些.raw子字段:

{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "script": "doc['field1.raw'].value + ' ' + doc['field2.raw'].value", 
     "size": 10, 
     "lang": "groovy" 
     } 
    } 
    } 
} 

沒有.raw子域(這是故意做成not_analyzed),你會需要做這樣的事情,這是更貴:

{ 
    "aggs": { 
    "NAME": { 
     "terms": { 
     "script": "_source.field1 + ' ' + _source.field2", 
     "size": 10, 
     "lang": "groovy" 
     } 
    } 
    } 
}