2013-02-06 36 views
2

我有一個包含數百萬行的表。我對這個表的表達式指數(我創建了兩個方向,看它是否有效果。PostgreSQL組由date_trunc聚合索引並且大於不使用索引

CREATE INDEX ON statuses (date_trunc('hour', created_at) ASC) 
CREATE INDEX ON statuses (date_trunc('hour', created_at) DESC) 

我試圖讓收集使用一組由每小時的狀態的計數查詢但只適用於今天創建的狀態(或者在過去7天內),但是在某個日期之前嘗試刪除所有條目並不使用索引,而是對所有行進行過濾,但是如果刪除了大於使用一個等於索引的數據我已經把EXPLAIN的輸出顯示在下面了,希望有人能夠幫助我使這個查詢使用索引或者至少提高性能,以便它的數量級爲毫秒級而不是秒級。使用等於該指數是正確的:

=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) = '2013-02-06 00:00:00'; 
                     QUERY PLAN                   
--------------------------------------------------------------------------------------------------------------------------------------------------------- 
GroupAggregate (cost=132.48..29443.34 rows=1653 width=8) (actual time=4.362..4.363 rows=1 loops=1) 
    -> Bitmap Heap Scan on statuses (cost=132.48..29419.22 rows=18337 width=8) (actual time=0.209..2.159 rows=1319 loops=1) 
     Recheck Cond: (date_trunc('hour'::text, created_at) = '2013-02-06 00:00:00'::timestamp without time zone) 
     -> Bitmap Index Scan on statuses_date_trunc_idx1 (cost=0.00..131.57 rows=18337 width=0) (actual time=0.178..0.178 rows=1319 loops=1) 
       Index Cond: (date_trunc('hour'::text, created_at) = '2013-02-06 00:00:00'::timestamp without time zone) 
Total runtime: 4.416 ms 
(6 rows) 

但是,只要我用比(小於或以下)更大,這導致查詢做表的過濾器沒有索引。

=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) > '2013-02-06 00:00:00'; 
                   QUERY PLAN                
-------------------------------------------------------------------------------------------------------------------------------------- 
HashAggregate (cost=185386.54..185772.10 rows=110160 width=8) (actual time=2915.495..2915.774 rows=21 loops=1) 
    -> Seq Scan on statuses (cost=0.00..184164.06 rows=1222485 width=8) (actual time=1676.827..2869.748 rows=47070 loops=1) 
     Filter: (date_trunc('hour'::text, created_at) > '2013-02-06 00:00:00'::timestamp without time zone) 
     Rows Removed by Filter: 3620426 
Total runtime: 2916.049 ms 
(5 rows) 

我可以用IN和上市我想在這種情況下,選擇區域內,每隔一小時,但我真的很想弄清楚爲什麼索引沒有被用於除較大的解決這個問題查詢?

=> EXPLAIN ANALYSE SELECT date_trunc('hour', created_at) as hour, COUNT(*) FROM statuses GROUP BY hour HAVING date_trunc('hour', created_at) IN ('2013-02-06 00:00:00', '2013-02-06 01:00:00'); 
                     QUERY PLAN                   
--------------------------------------------------------------------------------------------------------------------------------------------------------- 
HashAggregate (cost=51988.38..51999.94 rows=3305 width=8) (actual time=7.218..7.223 rows=2 loops=1) 
    -> Bitmap Heap Scan on statuses (cost=262.96..51951.70 rows=36675 width=8) (actual time=0.376..4.576 rows=2507 loops=1) 
     Recheck Cond: (date_trunc('hour'::text, created_at) = ANY ('{"2013-02-06 00:00:00","2013-02-06 01:00:00"}'::timestamp without time zone[])) 
     -> Bitmap Index Scan on statuses_date_trunc_idx1 (cost=0.00..261.13 rows=36675 width=0) (actual time=0.341..0.341 rows=2507 loops=1) 
       Index Cond: (date_trunc('hour'::text, created_at) = ANY ('{"2013-02-06 00:00:00","2013-02-06 01:00:00"}'::timestamp without time zone[])) 
Total runtime: 7.305 ms 
(6 rows) 

回答

1

估計行statusesis 26 times more那麼實際數目返回的「壞」查詢。

  1. 嘗試運行VACUUM ANALYZE statuses;
  2. 如果沒有運氣,增加統計對象爲statuses.created_atALTER TABLE statuses ALTER created_at SET STATISTICS 500;和重新分析。

這應該有所幫助。


編輯:你需要檢查你的autovacuum設置。

this的手動部分,並檢查您的配置是這樣的:

SELECT name,setting,source FROM pg_settings WHERE name ~ 'autovacuum'; 

如果你的表是太大了,你可能會使用ALTER TABLE tab SET (storage_parameter = ...)語法調整autovacuum_analyze_threshold和/或autovacuum_analyze_scale_factor

+0

太棒了,謝謝你,VACUUM ANALYZE工作。我認爲這是按預定義的時間間隔自動運行的?我想這只是一個暫時的事情,因爲我之前運行的一些查詢有錯誤的日期範圍較大。 –

+0

@SteveSmith,這取決於你的'autovacuum'設置。更新了答案。 – vyegorov