Elasticsearch平均值日期直方圖桶

我有一大堆的ElasticSearch索引文件，我需要得到以下數據：Elasticsearch平均值日期直方圖桶

對於每個月，每拿到的工作日的文件平均數月（或者如果不可能，則使用20天作爲默認值）。

我已經使用date histogram彙總將我的數據彙總到數據桶中。我嘗試嵌套stats存儲分區，但是此聚合使用的是從文檔字段中提取的數據，而不是從父桶中提取的數據。

這是迄今爲止我的查詢：

{ 
    "query": { 
     "match_all": {} 
    }, 
    "aggs": { 
     "docs_per_month": { 
      "date_histogram": { 
       "field": "created_date", 
       "interval": "month", 
       "min_doc_count": 0 
      } 
      "aggs": { 
       '???': '???' 
      } 
     } 
    } 
}

編輯

爲了使我的問題更清楚，我需要的是：

得到總創建的文檔數（已經完成了感謝date_histogram彙總）
獲取當月的工作日數
將第一個除以第二個。

來源

2015-06-11 Thibault J

明確需要更新我的個人資料... –

什麼，你基本上需要的是這樣的事情（不工作，因爲它不是一個可用功能）：

{ 
    "query": { 
    "match_all": {} 
    }, 
    "aggs": { 
    "docs_per_month": { 
     "date_histogram": { 
     "field": "date", 
     "interval": "month", 
     "min_doc_count": 0 
     }, 
     "aggs": { 
     "average": { 
      "avg": { 
      "script": "doc_count/20" 
      } 
     } 
     } 
    } 
    } 
}

它不工作，因爲沒有從「父」聚集訪問doc_count的方式。

但是，這將在Elasticsearch的2.x分支中成爲可能，並且目前正在積極開發：https://github.com/elastic/elasticsearch/issues/8110 這個新功能將對聚合的結果（桶）添加第二層操作這不僅是你的用例，還有其他許多用例。

除非您想嘗試some ideas out there或在您的應用中執行自己的計算，否則您需要等待此功能。

來源

2015-06-15 08:50:28

要排除與時間戳的文件上週六和週日，這樣你就可以使用腳本

{ 
    "query": { 
    "filtered": { 
     "filter": { 
     "script": { 
      "script": "doc['@timestamp'].date.dayOfWeek != 7 && doc['@timestamp'].date.dayOfWeek != 6" 
     } 
     } 
    } 
    }, 
    "aggs": { 
    "docs_per_month": { 
     "date_histogram": { 
     "field": "created_date", 
     "interval": "month", 
     "min_doc_count": 0 
     }, 
     "aggs": { 
     "docs_per_day": { 
      "date_histogram": { 
      "field": "created_date", 
      "interval": "day", 
      "min_doc_count": 0 
      } 
     }, 
     "aggs": { 
      "docs_count": { 
      "avg": { 
       "field": "" 
      } 
      } 
     } 
     } 
    } 
    } 
}

你可能不通過每月需要第一聚集排除在查詢這些文件，因爲你用一天的時間間隔已經有這個信息

BTW，你需要確保動態腳本是通過添加以下內容到elasticsearch.yml配置啓用

script.disable_dynamic: false

或在/配置/腳本添加一個Groovy腳本，並在過濾器中使用過濾查詢與腳本

來源

2015-06-11 15:39:40

Thx爲您的答案。但是，我不想只計算在工作日創建的文檔，我需要統計當月的所有文檔（我已經完成），然後除以工作日的數量。我不知道的是我如何計算這個數字（本月的工作日）。 –

我會編輯我的問題，因爲我意識到這可能會引起誤解。 –

有一個相當複雜的解決方案，並沒有真正的高性能，使用以下scripted_metric aggregation。

{ 
    "size": 0, 
    "query": { 
    "match_all": {} 
    }, 
    "aggs": { 
    "docs_per_month": { 
     "date_histogram": { 
     "field": "created_date", 
     "interval": "month", 
     "min_doc_count": 0 
     }, 
     "aggs": { 
     "avg_doc_per_biz_day": { 
      "scripted_metric": { 
      "init_script": "_agg.bizdays = []; _agg.allbizdays = [:]; start = new DateTime(1970, 1, 1, 0, 0); now = new DateTime(); while (start < now) { def end = start.plusMonths(1); _agg.allbizdays[start.year + '_' + start.monthOfYear] = (start.toDate()..<end.toDate()).sum {(it.day != 6 && it.day != 0) ? 1 : 0 }; start = end; }", 
      "map_script": "_agg.bizdays << _agg.allbizdays[doc. created_date.date.year+'_'+doc. created_date.date.monthOfYear]", 
      "combine_script": "_agg.allbizdays = null; doc_count = 0; for (d in _agg.bizdays){ doc_count++ }; return doc_count/_agg.bizdays[0]", 
      "reduce_script": "res = 0; for (a in _aggs) { res += a }; return res" 
      } 
     } 
     } 
    } 
    } 
}

讓我們來詳細介紹下面的每個腳本。

我在做什麼在init_script是創建地圖工作日每個月的數量自1970年以來和存儲，在_agg.allbizdays地圖。

_agg.bizdays = []; 
_agg.allbizdays = [:]; 
start = new DateTime(1970, 1, 1, 0, 0); 
now = new DateTime(); 
while (start < now) { 
    def end = start.plusMonths(1);  
    _agg.allbizdays[start.year + '_' + start.monthOfYear] = (start.toDate()..<end.toDate()).sum {(it.day != 6 && it.day != 0) ? 1 : 0 }; 
    start = end; 
}

在map_script，我只是平日檢索每個文檔的月份數;

_agg.bizdays << _agg.allbizdays[doc.created_date.date.year + '_' + doc. created_date.date.monthOfYear];

在combine_script，我總結的平均文檔數爲每個碎片

_agg.allbizdays = null; 
doc_count = 0; 
for (d in _agg.bizdays){ doc_count++ }; 
return doc_count/_agg.bizdays[0];

在 reduce_script

最後，我總結的平均文檔數爲每個節點：

res = 0; 
for (a in _aggs) { res += a }; 
return res

再一次，我認爲它非常複雜，而且正如Andrei所說的那樣，最好等待2.0讓它按照它應該的方式工作，但是在此期間，如果你需要的話。

來源

2015-06-15 09:39:56 Val

Elasticsearch平均值日期直方圖桶

回答

相關問題