2014-10-09 55 views
0

我有一個特定的行很慢的查詢。 Postgres選擇做一個Seq Scan而不是對某些行使用Index Scan,我假設它會比使用索引更快。如何優化在postgresql中查詢此數據?

下面是一個使用索引正常的那種工作量的查詢計劃:http://explain.depesz.com/s/1A2o

EXPLAIN (ANALYZE, BUFFERS) SELECT "blocks".* FROM "blocks" INNER JOIN "jobs" ON "blocks"."job_id" = "jobs"."id" WHERE "jobs"."project_id" = 1; 
                   QUERY PLAN                 
-------------------------------------------------------------------------------------------------------------------------------------------- 
Nested Loop (cost=0.71..166.27 rows=19 width=130) (actual time=0.092..4.247 rows=2421 loops=1) 
    Buffers: shared hit=350 
    -> Index Scan using index_jobs_on_project_id on jobs (cost=0.29..18.81 rows=4 width=4) (actual time=0.044..0.099 rows=15 loops=1) 
     Index Cond: (project_id = 1) 
     Buffers: shared hit=17 
    -> Index Scan using index_blocks_on_job_id on blocks (cost=0.42..36.67 rows=19 width=130) (actual time=0.021..0.133 rows=161 loops=15) 
     Index Cond: (job_id = jobs.id) 
     Buffers: shared hit=333 
Total runtime: 4.737 ms 
(9 rows) 

這裏的查詢計劃選擇做一個順序掃描一個不太正常的那種工作量:http://explain.depesz.com/s/cJOd

EXPLAIN (ANALYZE, BUFFERS) SELECT "blocks".* FROM "blocks" INNER JOIN "jobs" ON "blocks"."job_id" = "jobs"."id" WHERE "jobs"."project_id" = 2; 
                   QUERY PLAN                  
---------------------------------------------------------------------------------------------------------------------------------------------------- 
Hash Join (cost=1138.64..11236.94 rows=10421 width=130) (actual time=5.212..72.604 rows=2516 loops=1) 
Hash Cond: (blocks.job_id = jobs.id) 
Buffers: shared hit=5671 
-> Seq Scan on blocks (cost=0.00..8478.06 rows=303206 width=130) (actual time=0.008..24.573 rows=298084 loops=1) 
     Buffers: shared hit=5446 
-> Hash (cost=1111.79..1111.79 rows=2148 width=4) (actual time=3.346..3.346 rows=2164 loops=1) 
     Buckets: 1024 Batches: 1 Memory Usage: 77kB 
     Buffers: shared hit=225 
     -> Bitmap Heap Scan on jobs (cost=40.94..1111.79 rows=2148 width=4) (actual time=0.595..2.158 rows=2164 loops=1) 
      Recheck Cond: (project_id = 2) 
      Buffers: shared hit=225 
      -> Bitmap Index Scan on index_jobs_on_project_id (cost=0.00..40.40 rows=2148 width=0) (actual time=0.516..0.516 rows=2164 loops=1) 
        Index Cond: (project_id = 2) 
        Buffers: shared hit=8 
Total runtime: 72.767 ms 
(15 rows) 

在第一種情況下,該項目有15個作業和2421個塊。在第二種情況下,該項目有2164個就業崗位和2516個街區。

有沒有辦法查詢這些數據,以便第二個工作量不是很慢?還是我正在接近某種最糟糕的性能工作負載?

編輯

更新random_page_cost至1.1後,重新運行解釋慢查詢:http://explain.depesz.com/s/xKdd

EXPLAIN (ANALYZE, BUFFERS) SELECT "blocks".* FROM "blocks" INNER JOIN "jobs" ON "blocks"."job_id" = "jobs"."id" WHERE "jobs"."project_id" = 2; 

                   QUERY PLAN                 
---------------------------------------------------------------------------------------------------------------------------------------------- 
Nested Loop (cost=0.71..7634.08 rows=10421 width=130) (actual time=0.025..10.597 rows=2516 loops=1) 
    Buffers: shared hit=9206 
    -> Index Scan using index_jobs_on_project_id on jobs (cost=0.29..1048.99 rows=2148 width=4) (actual time=0.015..1.239 rows=2164 loops=1) 
     Index Cond: (project_id = 32357) 
     Buffers: shared hit=225 
    -> Index Scan using index_blocks_on_job_id on blocks (cost=0.42..2.88 rows=19 width=130) (actual time=0.003..0.003 rows=1 loops=2164) 
     Index Cond: (job_id = jobs.id) 
     Buffers: shared hit=8981 
Total runtime: 10.925 ms 
(9 rows) 

好多了!看起來我需要投入一些時間來調整服務器配置。

+0

你有什麼指數? – 2014-10-09 04:45:16

+0

'jobs.project_id'和'blocks.job_id'列有索引。 – nfm 2014-10-09 05:06:22

+0

請顯示完整的'EXPLAIN(ANALYZE,BUFFERS)...'而不只是'EXPLAIN'。 – 2014-10-09 06:01:49

回答

2

由於兩個索引掃描的嵌套循環比位圖索引掃描的hashjoin快得多,所以我會說你的random_page_cost不能準確地反映你的真實性能,至少當數據被緩存在RAM或shared_buffers中時。

嘗試設置SET random_page_cost = 1.1並在該會話中重新運行。您可能還想在問題上投出更多work_mem

如果random_page_cost調整有效,您可能需要更新postgresql.conf以反映它。請注意,1.1是一個非常極端的設置;默認值是4,並且seq_page_cost是1,所以在配置文件中我會從更像2或1.5的東西開始,以避免使其他計劃變得更糟。