2013-02-18 67 views
1

我的問題是:Elasticsearch計數與我的數據庫不一樣。對嵌套資源使用AND的過濾器

我收錄的 「用戶」 表中,每個用戶可以有一個或多個apps_events:

curl localhost:9200/users/_count 
{"count":190291,"_shards":{"total":5,"successful":5,"failed":0}} 

SELECT COUNT(*) FROM users 
count : 190291 

=>相同的計數,一切都很好!

但是,當我做2個過濾器,一個詞,一個方面一個嵌套的資源搜索:

curl -X GET 'http://localhost:9200/users/user/_search?load=&size=10&pretty' -d ' 
{ 
"query": { 
    "match_all": { 
    } 
}, 
"filter": { 
    "and": [ 
    { 
     "terms": { 
     "apps_events.type": [ 
      "sale" 
     ] 
     } 
    }, 
    { 
     "term": { 
     "apps_events.status": "active" 
     } 
    } 
    ] 
}, 
"size": 10 
} 

total : 63756 

而且在我的數據庫:

SELECT 
    COUNT(DISTINCT(users_id)) 
FROM 
    apps_event 
WHERE 
    apps_event_state_id = 1 AND apps_event_project_id = 2; 

count : 63340 

因爲實際上,elasticsearch SQL等價查詢是:

SELECT 
    COUNT(DISTINCT(users_id)) 
FROM apps_event 
WHERE apps_event_state_id = 1 
AND users_id IN 
    (SELECT DISTINCT(users_id) FROM apps_event WHERE apps_event_project_id = 2) 

count : 63756 

===>如何爲每個資源做一個簡單的「AND」?

感謝

回答

0

你可能選中此,而是apps_event_project_id正確的推論apps_events.type?他們在表面上看起來並不一樣,但你肯定知道。另外,users_id是否直接映射到ES _id?這可能是因爲你的索引中存在重複數據而導致數據膨脹。

+1

是沒有重複的,但是我終於發現,apps_events是一個嵌套的ressource,當你有這樣的elasticsearch搜索,在真正的問題: SELECT COUNT (DISTINCT(users_id))FROM apps_event WHERE apps_event_state_id = 1 AND users_id IN(SELECT DISTINCT(users_id)FROM apps_event WHERE apps_event_project_id = 2); 合計:63756 – zywx 2013-02-18 16:30:54

+0

感謝您的跟進。我甚至沒有想過嵌套! – drewr 2013-02-18 19:05:04