2016-08-02 108 views
-1

SUMMARY小結

在具有多行的多表連接中添加某些標準時,查詢結果會慢一級。我已經嘗試了很多事情來提高速度,包括每種類型的表連接,重新排序連接,重新排序WHERE子句,進行子查詢,在WHERE子句中使用CASE語句等。使用多個表上的條件對Postgres多連接查詢進行優化

SQL細節下面。

質詢

  1. 爲什麼加入這個簡單的條件導致策劃者徹底改變其執行計劃?
  2. 是否可以告訴規劃如何首先要分析具體的情況不急劇變化的查詢或者做子查詢(使用WITH例如)

注:我試圖寫一個通用的SQL生成器API,允許調用者在圖中的任何點指定任意條件。問題是這些電話中的一些正在快速發展,另一些電話不是由於Postgres計劃執行的方式。爲此查詢專門設計的優化不會幫助我滿足通用SQL構建器的更大目標。

DETAILS

我具有存儲頂點和邊緣Postgres裏(一個簡單的圖形數據庫)的模式:

CREATE TABLE IF NOT EXISTS vertex (type text, id serial, name text, data jsonb, UNIQUE (id)) 
CREATE INDEX vertex_data_idx ON vertex USING gin (data jsonb_path_ops) 
CREATE INDEX vertex_type_idx ON vertex (type) 
CREATE INDEX vertex_name_idx ON vertex (name) 
CREATE TABLE IF NOT EXISTS edge (src integer REFERENCES vertex (id), dst integer REFERENCES vertex (id)) 
CREATE INDEX edge_src_idx ON edge (src) 
CREATE INDEX edge_dst_idx ON edge (dst) 

架構存儲曲線圖中,其中的一個是這樣的:PLANET - >大陸 - >國家 - >區域

有447554個總頂點,在我的示例數據庫3155047層總的邊緣,但相關數據是在這裏:

  • 5行星(每個涉及5個大洲)
  • 25大洲(每個涉及2500國)
  • 62500國(25%,其中涉及到100個區域中的每個,其餘都沒有REGION關係)
  • 250000個地區

此查詢查找具有在任何給定的區域講西班牙語的行星是快:

SELECT DISTINCT 
    v1.name as name, v1.id as id 
FROM vertex v1 
    LEFT JOIN edge e1 ON (v1.id = e1.src) 
    LEFT JOIN vertex v2 ON (v2.id = e1.dst) 
    LEFT JOIN edge e2 ON (v2.id = e2.src) 
    LEFT JOIN vertex v3 ON (v3.id = e2.dst) 
    LEFT JOIN edge e3 ON (v3.id = e3.src) 
    LEFT JOIN vertex v4 ON (v4.id = e3.dst) 
WHERE 
    v4.type = 'REGION' AND 
    v4.data @> '{"languages":["spanish"]}'::jsonb 

規劃時間:6.289毫秒 執行時間:0.744毫秒

當我在圖中的(V1)在所述第一表中的索引的列添加一個條件,對結果沒有任何影響,該查詢是較慢12657倍

SELECT DISTINCT 
    v1.name as name, v1.id as id 
FROM vertex v1 
    LEFT JOIN edge e1 ON (v1.id = e1.src) 
    LEFT JOIN vertex v2 ON (v2.id = e1.dst) 
    LEFT JOIN edge e2 ON (v2.id = e2.src) 
    LEFT JOIN vertex v3 ON (v3.id = e2.dst) 
    LEFT JOIN edge e3 ON (v3.id = e3.src) 
    LEFT JOIN vertex v4 ON (v4.id = e3.dst) 
WHERE 
    v1.type = 'PLANET' AND 
    v4.type = 'REGION' AND 
    v4.data @> '{"languages":["spanish"]}'::jsonb 

規劃時間:7.664毫秒 執行時間:89010。096毫秒

這是EXPLAIN(分析一下,緩衝區)第一,快速呼叫:

Unique (cost=154592.03..155453.96 rows=114925 width=28) (actual time=0.585..0.616 rows=4 loops=1) 
    Buffers: shared hit=92 
    -> Sort (cost=154592.03..154879.34 rows=114925 width=28) (actual time=0.579..0.588 rows=4 loops=1) 
     Sort Key: v1.name, v1.id 
     Sort Method: quicksort Memory: 17kB 
     Buffers: shared hit=92 
     -> Nested Loop (cost=37.96..142377.39 rows=114925 width=28) (actual time=0.155..0.549 rows=4 loops=1) 
       Buffers: shared hit=92 
       -> Nested Loop (cost=37.53..80131.76 rows=114925 width=4) (actual time=0.141..0.468 rows=4 loops=1) 
        Join Filter: (v2.id = e1.dst) 
        Buffers: shared hit=76 
        -> Nested Loop (cost=37.10..49179.08 rows=14270 width=8) (actual time=0.126..0.386 rows=4 loops=1) 
          Buffers: shared hit=60 
          -> Nested Loop (cost=36.68..41450.17 rows=14270 width=4) (actual time=0.112..0.304 rows=4 loops=1) 
           Join Filter: (v3.id = e2.dst) 
           Buffers: shared hit=44 
           -> Nested Loop (cost=36.25..37606.57 rows=1772 width=8) (actual time=0.092..0.209 rows=4 loops=1) 
             Buffers: shared hit=28 
             -> Nested Loop (cost=35.83..36646.82 rows=1772 width=4) (actual time=0.074..0.116 rows=4 loops=1) 
              Buffers: shared hit=12 
              -> Bitmap Heap Scan on vertex v4 (cost=30.99..1514.00 rows=220 width=4) (actual time=0.039..0.042 rows=1 loops=1) 
                Recheck Cond: (data @> '{"languages":["spanish"]}'::jsonb) 
                Filter: (type = 'REGION'::text) 
                Heap Blocks: exact=1 
                Buffers: shared hit=5 
                -> Bitmap Index Scan on vertex_data_idx (cost=0.00..30.94 rows=392 width=0) (actual time=0.020..0.020 rows=1 loops=1) 
                 Index Cond: (data @> '{"languages":["spanish"]}'::jsonb) 
                 Buffers: shared hit=4 
              -> Bitmap Heap Scan on edge e3 (cost=4.84..159.12 rows=57 width=8) (actual time=0.023..0.037 rows=4 loops=1) 
                Recheck Cond: (dst = v4.id) 
                Heap Blocks: exact=4 
                Buffers: shared hit=7 
                -> Bitmap Index Scan on edge_dst_idx (cost=0.00..4.82 rows=57 width=0) (actual time=0.013..0.013 rows=4 loops=1) 
                 Index Cond: (dst = v4.id) 
                 Buffers: shared hit=3 
             -> Index Only Scan using vertex_id_key on vertex v3 (cost=0.42..0.53 rows=1 width=4) (actual time=0.008..0.011 rows=1 loops=4) 
              Index Cond: (id = e3.src) 
              Heap Fetches: 4 
              Buffers: shared hit=16 
           -> Index Scan using edge_dst_idx on edge e2 (cost=0.43..1.46 rows=57 width=8) (actual time=0.008..0.011 rows=1 loops=4) 
             Index Cond: (dst = e3.src) 
             Buffers: shared hit=16 
          -> Index Only Scan using vertex_id_key on vertex v2 (cost=0.42..0.53 rows=1 width=4) (actual time=0.006..0.009 rows=1 loops=4) 
           Index Cond: (id = e2.src) 
           Heap Fetches: 4 
           Buffers: shared hit=16 
        -> Index Scan using edge_dst_idx on edge e1 (cost=0.43..1.46 rows=57 width=8) (actual time=0.005..0.008 rows=1 loops=4) 
          Index Cond: (dst = e2.src) 
          Buffers: shared hit=16 
       -> Index Scan using vertex_id_key on vertex v1 (cost=0.42..0.53 rows=1 width=28) (actual time=0.006..0.009 rows=1 loops=4) 
        Index Cond: (id = e1.src) 
        Buffers: shared hit=16 
Planning time: 6.940 ms 
Execution time: 0.714 ms 

而且在第二,慢呼:

HashAggregate (cost=592.23..592.24 rows=1 width=28) (actual time=89009.873..89009.885 rows=4 loops=1) 
    Group Key: v1.name, v1.id 
    Buffers: shared hit=11644657 read=1240045 
    -> Nested Loop (cost=2.98..592.22 rows=1 width=28) (actual time=9098.961..89009.833 rows=4 loops=1) 
     Buffers: shared hit=11644657 read=1240045 
     -> Nested Loop (cost=2.56..306.89 rows=522 width=32) (actual time=0.424..30066.007 rows=3092522 loops=1) 
       Buffers: shared hit=454795 read=46267 
       -> Nested Loop (cost=2.13..86.31 rows=65 width=36) (actual time=0.306..2120.293 rows=62500 loops=1) 
        Buffers: shared hit=239162 read=12162 
        -> Nested Loop (cost=1.70..51.10 rows=65 width=32) (actual time=0.261..574.490 rows=62500 loops=1) 
          Buffers: shared hit=488 read=562 
actual time=0.205..1.206 rows=25 loops=1)p (cost=1.27..23.95 rows=8 width=36) (--More-- 
           Buffers: shared hit=109 read=17 
           -> Nested Loop (cost=0.85..19.62 rows=8 width=32) (actual time=0.173..0.547 rows=25 loops=1) 
             Buffers: shared hit=12 read=14 
             -> Index Scan using vertex_type_idx on vertex v1 (cost=0.42..8.44 rows=1 width=28) (actual time=0.123..0.153 rows=5 loops=1) 
              Index Cond: (type = 'PLANET'::text) 
              Buffers: shared hit=2 read=4 
             -> Index Scan using edge_src_idx on edge e1 (cost=0.43..10.18 rows=100 width=8) (actual time=0.021..0.039 rows=5 loops=5) 
              Index Cond: (src = v1.id) 
              Buffers: shared hit=10 read=10 
           -> Index Only Scan using vertex_id_key on vertex v2 (cost=0.42..0.53 rows=1 width=4) (actual time=0.009..0.013 rows=1 loops=25) 
             Index Cond: (id = e1.dst) 
             Heap Fetches: 25 
             Buffers: shared hit=97 read=3 
43..2.39 rows=100 width=8) (actual time=0.031..8.504 rows=2500 loops=25)(cost=0.--More-- 
           Index Cond: (src = v2.id) 
           Buffers: shared hit=379 read=545 
        -> Index Only Scan using vertex_id_key on vertex v3 (cost=0.42..0.53 rows=1 width=4) (actual time=0.010..0.013 rows=1 loops=62500) 
          Index Cond: (id = e2.dst) 
          Heap Fetches: 62500 
          Buffers: shared hit=238674 read=11600 
       -> Index Scan using edge_src_idx on edge e3 (cost=0.43..2.39 rows=100 width=8) (actual time=0.013..0.163 rows=49 loops=62500) 
        Index Cond: (src = v3.id) 
        Buffers: shared hit=215633 read=34105 
     -> Index Scan using vertex_id_key on vertex v4 (cost=0.42..0.54 rows=1 width=4) (actual time=0.013..0.013 rows=0 loops=3092522) 
       Index Cond: (id = e3.dst) 
       Filter: ((data @> '{"languages":["spanish"]}'::jsonb) AND (type = 'REGION'::text)) 
       Rows Removed by Filter: 1 
       Buffers: shared hit=11189862 read=1193778 
Planning time: 7.664 ms 
Execution time: 89010.096 ms 
+2

刪除'LEFT JOIN's'。它們不是必需的,只能混淆優化器。 –

+2

'v4'上的外部連接是無用的,因爲它由於'where'條件而被有效地轉變爲內部連接 –

+1

您如何獲得下面的答案,Voluntari? – halfer

回答

1

[轉貼作爲一個答案,因爲我需要的格式]

邊緣表desparately需要主鍵(這意味着NOT NULL爲{SRC,DST}這是很好的):

CREATE TABLE IF NOT EXISTS edge 
    (src integer NOT NULL REFERENCES vertex (id) 
    , dst integer NOT NULL REFERENCES vertex (id) 
    , PRIMARY KEY (src,dst) 
    ); 
CREATE UNIQUE INDEX edge_dst_src_idx ON edge (dst, src); 

-- the estimates in the question seem to be off, statistics may be absent. 
VACUUM ANALYZE edge; -- refresh the statistics 
VACUUM ANALYZE vertex; 

我也將{type,name}索引結合起來(類型似乎有一個非常低的基數)。甚至可能使它成爲UNIQUE和NOT NULL,但我不知道你的數據。

CREATE INDEX vertex_type_name_idx ON vertex (type, name); 
0

我認爲使用子查詢會使postgresql無法使用索引。因此請嘗試以下查詢以通過不使用索引來測試性能改進:

select * from (
SELECT DISTINCT 
    v1.name as name, v1.id as id, v1.type as v1_type 
FROM vertex v1 
    LEFT JOIN edge e1 ON (v1.id = e1.src) 
    LEFT JOIN vertex v2 ON (v2.id = e1.dst) 
    LEFT JOIN edge e2 ON (v2.id = e2.src) 
    LEFT JOIN vertex v3 ON (v3.id = e2.dst) 
    LEFT JOIN edge e3 ON (v3.id = e3.src) 
    LEFT JOIN vertex v4 ON (v4.id = e3.dst) 
WHERE 
    v4.type = 'REGION' AND 
    v4.data @> '{"languages":["spanish"]}'::jsonb 
) t1 
where v1_type = 'PLANET' 
+0

感謝您的評論。我已經嘗試了一個子查詢,它的確按照我期望的做了,但不幸的是我試圖創建一個通用查詢構建器。這些類型的特定優化在測試時很有用,但我開始覺得沒有通用的方法來強制規劃者在另一個之前使用特定的索引,而無需將查詢重新組織到子查詢中(這違背了「通用查詢構建器」指令) 。 – Voluntari

+0

@Voluntari我對postgresql不夠熟悉,但在mysql和oracle中可以說不使用索引。 –

+1

@Msfvtp「我認爲使用子查詢將使postgresql無法使用索引」這將是任何查詢優化器的重大失敗。這當然不是Oracle的真實情況,我懷疑任何主流RDBMS都是如此。 –