我有一個非常奇怪的數據集,其中來自大表的幾個記錄根本沒有任何數據,但是當他們這樣做的時候,它是成百上千的記錄。 我想選擇只有有數據的記錄,但我在索引使用方面有一些問題。我知道你通常不能「強迫」postgresql使用某些索引,但在這種情況下它可以工作。Postgresql索引未使用
SELECT matches.id, count(frames.id) FROM matches LEFT JOIN frames ON frames.match_id = matches.id GROUP BY matches.id HAVING count(frames.id) > 0 ORDER BY count(frames.id) DESC;
id | count
----+--------
31 | 123363
28 | 121475
24 | 110155
21 | 108258
22 | 106837
25 | 89182
26 | 87104
27 | 86152
(8 rows)
SELECT matches.id, count(frames.id) FROM matches LEFT JOIN frames ON frames.match_id = matches.id GROUP BY matches.id HAVING count(frames.id) = 0 ORDER BY count(frames.id) DESC;
....
(568 rows)
兩個解決方案,我發現是:
SELECT "matches".* FROM "matches" WHERE EXISTS (SELECT true FROM frames WHERE frames.match_id = matches.id LIMIT 1);
Time: 11697,645 ms
or
SELECT DISTINCT "matches".* FROM "matches" INNER JOIN "frames" ON "frames"."match_id" = "matches"."id"
Time: 879,325 ms
無論是查詢似乎在框架臺上match_id使用索引。由於通常它不是非常有選擇性,所以它是可以伸縮的,不幸的是在這裏它會非常有幫助。爲:
SET enable_seqscan = OFF;
SELECT "matches".* FROM "matches" WHERE (SELECT true FROM frames WHERE frames.match_id = matches.id LIMIT 1);
Time: 1,239 ms
解釋查詢:
EXPLAIN for: SELECT DISTINCT "matches".* FROM "matches" INNER JOIN "frames" ON "frames"."match_id" = "matches"."id"
QUERY PLAN
-----------------------------------------------------------------------------
HashAggregate (cost=59253.47..59256.38 rows=290 width=155)
-> Hash Join (cost=6.26..33716.73 rows=785746 width=155)
Hash Cond: (frames.match_id = matches.id)
-> Seq Scan on frames (cost=0.00..22906.46 rows=785746 width=4)
-> Hash (cost=4.45..4.45 rows=145 width=155)
-> Seq Scan on matches (cost=0.00..4.45 rows=145 width=155)
(6 rows)
解釋:SELECT 「匹配」 * FROM 「匹配」 WHERE(EXISTS(SELECT ID FROM幀WHERE frames.match_id = matches.id LIMIT。 1)) 查詢計劃
Seq Scan on matches (cost=0.00..41.17 rows=72 width=155)
Filter: (SubPlan 1)
SubPlan 1
-> Limit (cost=0.00..0.25 rows=1 width=4)
-> Seq Scan on frames (cost=0.00..24870.83 rows=98218 width=4)
Filter: (match_id = matches.id)
(6行)
SET enable_seqscan = OFF;
EXPLAIN SELECT「matches」。* FROM「matches」WHERE(SELECT true FROM frames WHERE frames.match_id = matches.id LIMIT 1); 查詢計劃
Seq Scan on matches (cost=10000000000.00..10000000118.37 rows=72 width=155)
Filter: (SubPlan 1)
SubPlan 1
-> Limit (cost=0.00..0.79 rows=1 width=0)
-> Index Scan using index_frames_on_match_id on frames (cost=0.00..81762.68 rows=104066 width=0)
Index Cond: (match_id = matches.id)
(6行)
任何建議如何tweek在這裏使用索引的查詢?也許其他的方式來檢查recrs的存在將執行接近1ms我擺脫索引然後11s?
PS。我確實運行了ANALYZE,VACUM ANALYZE,通常建議的所有步驟以改進索引使用。
編輯感謝大衛 - 阿爾德里奇指出LIMIT 1可能會阻礙真正的查詢規劃現在我已經得到了:
SELECT "matches".* FROM "matches" WHERE EXISTS (SELECT true FROM frames WHERE frames.match_id = matches.id);
Time: 163,803 ms
的計劃:用慢
EXPLAIN SELECT "matches".* FROM "matches" WHERE EXISTS (SELECT true FROM frames WHERE frames.match_id = matches.id);
QUERY PLAN
------------------------------------------------------------------------------------
Nested Loop (cost=25455.58..25457.90 rows=8 width=155)
-> HashAggregate (cost=25455.58..25455.66 rows=8 width=4)
-> Seq Scan on frames (cost=0.00..23374.26 rows=832526 width=4)
-> Index Scan using matches_pkey on matches (cost=0.00..0.27 rows=1 width=155)
Index Cond: (id = frames.match_id)
(5 rows)
仍然是100倍僅索引版本(可能是因爲在仍然執行的幀上的Seq掃描+哈希聚合)
什麼版本的PostgreSQL您使用的是? –
9.1我剛剛在9.3上進行了測試,看起來沒有LIMIT 1的查詢正確使用索引。看起來像LIMIT 1將所有東西搞砸了+查詢優化器在9.1和9.3之間陷入了很多 –