Postgresql索引未使用

我有一個非常奇怪的數據集，其中來自大表的幾個記錄根本沒有任何數據，但是當他們這樣做的時候，它是成百上千的記錄。我想選擇只有有數據的記錄，但我在索引使用方面有一些問題。我知道你通常不能「強迫」postgresql使用某些索引，但在這種情況下它可以工作。Postgresql索引未使用

SELECT matches.id, count(frames.id) FROM matches LEFT JOIN frames ON frames.match_id = matches.id GROUP BY matches.id HAVING count(frames.id) > 0 ORDER BY count(frames.id) DESC; 
id | count 
----+-------- 
31 | 123363 
28 | 121475 
24 | 110155 
21 | 108258 
22 | 106837 
25 | 89182 
26 | 87104 
27 | 86152 
(8 rows) 

SELECT matches.id, count(frames.id) FROM matches LEFT JOIN frames ON frames.match_id = matches.id GROUP BY matches.id HAVING count(frames.id) = 0 ORDER BY count(frames.id) DESC; 
.... 
(568 rows)

兩個解決方案，我發現是：

SELECT "matches".* FROM "matches" WHERE EXISTS (SELECT true FROM frames WHERE frames.match_id = matches.id LIMIT 1); 
Time: 11697,645 ms 


or 

SELECT DISTINCT "matches".* FROM "matches" INNER JOIN "frames" ON "frames"."match_id" = "matches"."id" 
Time: 879,325 ms

無論是查詢似乎在框架臺上match_id使用索引。由於通常它不是非常有選擇性，所以它是可以伸縮的，不幸的是在這裏它會非常有幫助。爲：

SET enable_seqscan = OFF; 
SELECT "matches".* FROM "matches" WHERE (SELECT true FROM frames WHERE frames.match_id = matches.id LIMIT 1); 
Time: 1,239 ms

解釋查詢：

EXPLAIN for: SELECT DISTINCT "matches".* FROM "matches" INNER JOIN "frames" ON "frames"."match_id" = "matches"."id" 

           QUERY PLAN 
----------------------------------------------------------------------------- 
HashAggregate (cost=59253.47..59256.38 rows=290 width=155) 
    -> Hash Join (cost=6.26..33716.73 rows=785746 width=155) 
     Hash Cond: (frames.match_id = matches.id) 
     -> Seq Scan on frames (cost=0.00..22906.46 rows=785746 width=4) 
     -> Hash (cost=4.45..4.45 rows=145 width=155) 
       -> Seq Scan on matches (cost=0.00..4.45 rows=145 width=155) 
(6 rows)

解釋：SELECT 「匹配」 * FROM 「匹配」 WHERE（EXISTS（SELECT ID FROM幀WHERE frames.match_id = matches.id LIMIT。 1））查詢計劃

Seq Scan on matches (cost=0.00..41.17 rows=72 width=155) 
    Filter: (SubPlan 1) 
    SubPlan 1 
    -> Limit (cost=0.00..0.25 rows=1 width=4)                              
     -> Seq Scan on frames (cost=0.00..24870.83 rows=98218 width=4)                       
       Filter: (match_id = matches.id)

（6行）

SET enable_seqscan = OFF;

EXPLAIN SELECT「matches」。* FROM「matches」WHERE（SELECT true FROM frames WHERE frames.match_id = matches.id LIMIT 1）; 查詢計劃

Seq Scan on matches (cost=10000000000.00..10000000118.37 rows=72 width=155) 
    Filter: (SubPlan 1) 
    SubPlan 1 
    -> Limit (cost=0.00..0.79 rows=1 width=0) 
      -> Index Scan using index_frames_on_match_id on frames (cost=0.00..81762.68 rows=104066 width=0) 
       Index Cond: (match_id = matches.id)

（6行）

任何建議如何tweek在這裏使用索引的查詢？也許其他的方式來檢查recrs的存在將執行接近1ms我擺脫索引然後11s？

PS。我確實運行了ANALYZE，VACUM ANALYZE，通常建議的所有步驟以改進索引使用。

編輯感謝大衛 - 阿爾德里奇指出LIMIT 1可能會阻礙真正的查詢規劃現在我已經得到了：

SELECT "matches".* FROM "matches" WHERE EXISTS (SELECT true FROM frames WHERE frames.match_id = matches.id); 
Time: 163,803 ms

的計劃：用慢

EXPLAIN SELECT "matches".* FROM "matches" WHERE EXISTS (SELECT true FROM frames WHERE frames.match_id = matches.id); 
            QUERY PLAN          
------------------------------------------------------------------------------------ 
Nested Loop (cost=25455.58..25457.90 rows=8 width=155) 
    -> HashAggregate (cost=25455.58..25455.66 rows=8 width=4) 
     -> Seq Scan on frames (cost=0.00..23374.26 rows=832526 width=4) 
    -> Index Scan using matches_pkey on matches (cost=0.00..0.27 rows=1 width=155) 
     Index Cond: (id = frames.match_id) 
(5 rows)

仍然是100倍僅索引版本（可能是因爲在仍然執行的幀上的Seq掃描+哈希聚合）

來源

2014-02-23 Marcin Raczkowski

什麼版本的PostgreSQL您使用的是？ –

9.1我剛剛在9.3上進行了測試，看起來沒有LIMIT 1的查詢正確使用索引。看起來像LIMIT 1將所有東西搞砸了+查詢優化器在9.1和9.3之間陷入了很多 –

在基於EXISTS的替代方案中，LIMIT子句是多餘的，但可能不是在幫助優化者。

嘗試：

SELECT "matches".* 
FROM "matches" 
WHERE EXISTS (SELECT 1 
       FROM frames 
       WHERE frames.match_id = matches.id);

來源

2014-02-23 00:35:54

你說得對。LIMIT 1肯定會妨礙查詢。它仍然是100ms（seqscan關閉時速度會降低100倍），但比以前的選擇速度快得多。 –

你得到什麼執行計劃？ –

用新數據更新了問題。 –

Postgresql索引未使用

回答

相關問題