性能聚合where子句

我正在嘗試查找自上次創建用戶以來的三個月內創建的用戶數量。所有按國家分組。性能聚合where子句

這裏是一個可行的查詢：

select count(u.id) as numberOfUsers, 
s.state 
from users u 
join states s on u.state_id = s.id 
where u.creationdate > (
select max(u2.creationdate) 
from users u2 
where u2.state_id = s.id 
) - interval '3 months' 
group by s.state

但是，它需要100秒。有人能給我一個更高性能的嗎？

我希望這工作：

select count(u.id) as numberOfUsers, 
s.state, max(u2.creationdate) as lastCreated 
from users u 
join states s on u.state_id = s.id 
where u.creationdate > lastCreated - interval '3 months' 
group by s.state

來源

2011-03-08 Jacob Eggers

這可能由於只是做一個掃描有更好的表現：

select count(*) as numberofusers, 
     state 
from (select id, state_id, creationdate, 
       max(creationdate) over (partition by state_id) - '3 months'::interval as cutoff 
     from users 
    ) x 
    join states on states.id = x.state_id 
where creationdate > cutoff 
group by state

然而，它會通過大量的工作存儲器的咀嚼做初始窗口聚合。

嗯，也許更多的東西一樣：

with cutoffs as (
    select id, state, 
     (select max(creationdate) 
      from users 
      where users.state_id = states.id) - '3 months'::interval as cutoff 
    from states) 
select count(*) as numberofusers, state 
from users 
    join cutoffs on users.state_id = cutoffs.id 
where users.creationdate > cutoff 
group by state

這是試圖逗PostgreSQL的去做一個合理分區掃描，但它不是真正的理想。它仍然進行全表掃描，但至少只有一個。通過CTE的輸出迭代並在循環內部發出外部查詢的結果的set-returning函數可能效果最好，因爲這將能夠爲每個狀態使用creationdate索引。

來源

2011-03-08 22:41:03 araqnid

太棒了！我修改了一下你的查詢並獲得了82ms。（vs 100000ms，這只是幾個數量級） – 2011-03-08 22:53:59

你確定查詢的哪一部分很慢嗎？你可以添加索引嗎？我不是Postgres古茹，但我懷疑如果用戶沒有在users.creationdate上編入索引，MAX（）函數將不得不進行全表掃描。嗯，它可能必須做一個反正...

這就是說，這裏什麼都不做！

SELECT u.numUsers, s.state FROM 
(SELECT count(id) as numUsers, state_id 
FROM users 
WHERE creationdate > (MAX(creationdate) - interval '3 Months' 
GROUP BY state_id) u 
left join states s on u.state_id = s.state_id

來源

2011-03-08 22:57:24 jgrim

問題是，它正在爲該狀態下的每個用戶執行一個狀態內所有用戶的全表聚合。而且，這個查詢實際上不起作用，因爲你不能在where子句中進行聚合。 – 2011-03-08 23:17:15

出於興趣，下面的查詢如何執行？我對Postgresql如何處理最內層的查詢（狀態表+標量子查詢）特別感興趣。

必須有用戶的複合索引（state_id，creation_date）才能正常工作。

select s2.id 
     ,s2.state 
     ,(select count(*) 
      from users u 
     where u.state_id  = s2.id 
      and u.creationdate > s2.max_date) as numberOfUsers 
    from (select s.id 
       ,s.state 
       ,(select max(u.creationdate) - interval '3 months' 
        from users u 
       where u.state_id = s.id) as max_date 
     from states s 
     ) s2;

編輯這是該查詢產生的10萬個用戶行對3國的計劃：

Seq Scan on states s (actual time=4.033..13.949 rows=3 loops=1) 
    Buffers: shared hit=1743 
    SubPlan 3 
    -> Aggregate (actual time=4.636..4.636 rows=1 loops=3) 
      Buffers: shared hit=1742 
      InitPlan 2 (returns $2) 
      -> Result (actual time=0.028..0.028 rows=1 loops=3) 
        Buffers: shared hit=12 
        InitPlan 1 (returns $1) 
        -> Limit (actual time=0.022..0.022 rows=1 loops=3) 
          Buffers: shared hit=12 
          -> Index Scan Backward using users_state_id_creationdate_idx on users u (actual time=0.019..0.019 rows=1 loops=3) 
           Index Cond: ((state_id = $0) AND (creationdate IS NOT NULL)) 
           Buffers: shared hit=12 
      -> Bitmap Heap Scan on users u (actual time=1.095..3.693 rows=8425 loops=3) 
       Recheck Cond: ((state_id = $0) AND (creationdate > $2)) 
       Buffers: shared hit=1730 
       -> Bitmap Index Scan on users_state_id_creationdate_idx (actual time=1.017..1.017 rows=8425 loops=3) 
         Index Cond: ((state_id = $0) AND (creationdate > $2)) 
         Buffers: shared hit=107 
Total runtime: 14.017 ms

來源

2011-03-08 23:01:05 Ronnis

表現非常好（當然，我的數據只是隨機的白噪聲）。有100,000個用戶，平均約17ms，而我的解決方案平均約爲180ms（和OP的原版，我沒有耐心等待）。我會將計劃添加到您的答案中，否則將無法讀取。 – araqnid 2011-03-09 13:16:04

@araqnid，太棒了！非常感謝您花時間！我必須說Postgresql正在慢慢變成我從未使用過的最好的數據庫;）我必須找到一個真正的項目，以便儘快使用它。 – Ronnis 2011-03-09 13:57:41

僅供參考：在我們的數據庫中，上述查詢花費了大約3秒，而我發佈的查詢花費了大約80毫秒，對於原始查詢花費了100秒。（我們的表結構稍微複雜一些，沒有必要的索引來優化這個查詢，我在這裏簡化了一下查詢。） – 2011-03-10 04:29:03

這是我用的時間縮短到82MS查詢：

with cutoffs as (
    select max(u.creationdate) as cuttoff, s.id, s.state, 
      from users u 
    join states s on u.state_id = s.id 
group by s.state, s.id) 
select count(*) as numberofusers, state 
from users 
    join cutoffs on users.state_id = cutoffs.id 
where users.creationdate > cutoff 
group by state

謝謝araqnid。

來源

2011-03-08 23:09:51

性能聚合where子句

回答

相關問題