Postgres優化/替換DISTINCT

嘗試選擇大多數「followed_by」加入的用戶按「tag」過濾。兩張表都有數百萬條記錄。使用distinct來僅選擇唯一用戶。Postgres優化/替換DISTINCT

select distinct u.* 
from users u join posts p 
on u.id=p.user_id 
where p.tags @> ARRAY['love'] 
order by u.followed_by desc nulls last limit 21

它運行了16多年，似乎是因爲「獨特」導致超過600萬用戶的Seq掃描。這裏是解釋分析

Limit (cost=15509958.30..15509959.09 rows=21 width=292) (actual time=16882.861..16883.753 rows=21 loops=1) 
    -> Unique (cost=15509958.30..15595560.30 rows=2282720 width=292) (actual time=16882.859..16883.749 rows=21 loops=1) 
     -> Sort (cost=15509958.30..15515665.10 rows=2282720 width=292) (actual time=16882.857..16883.424 rows=525 loops=1) 
       Sort Key: u.followed_by DESC NULLS LAST, u.id, u.username, u.fullna 
       Sort Method: external merge Disk: 583064kBme, u.follows, u 
       -> Gather (cost=1000.57..14956785.06 rows=2282720 width=292) (actual time=0.377..11506.001 rows=1680890 loops=1).media, u.profile_pic_url_hd, u.is_private, u.is_verified, u.biography, u.external_url, u.updated, u.location_id, u.final_post 
        Workers Planned: 9 
        Workers Launched: 9 
        -> Nested Loop (cost=0.57..14727513.06 rows=253636 width=292) (actual time=1.013..12031.634 rows=168089 loops=10) 
          -> Parallel Seq Scan on posts p (cost=0.00..13187797.79 rows=253636 width=8) (actual time=0.940..10872.630 rows=168089 loops=10) 
           Filter: (tags @> '{love}'::text[]) 
           Rows Removed by Filter: 6251355 
          -> Index Scan using user_pk on users u (cost=0.57..6.06 rows=1 width=292) (actual time=0.006..0.006 rows=1 loops=1680890) 
           Index Cond: (id = p.user_id) 
Planning time: 1.276 ms 
Execution time: 16964.271 ms

想知道如何使這個快速的提示。

更新

感謝@a_horse_with_no_name，「愛」的標籤成爲真快

Limit (cost=1.14..4293986.91 rows=21 width=292) (actual time=1.735..31.613 rows=21 loops=1) 
    -> Nested Loop Semi Join (cost=1.14..10959887484.70 rows=53600 width=292) (actual time=1.733..31.607 rows=21 loops=1) 
     -> Index Scan using idx_followed_by on users u (cost=0.57..322693786.19 rows=232404560 width=292) (actual time=0.011..0.103 rows=32 loops=1) 
     -> Index Scan using fki_user_fk1 on posts p (cost=0.57..1943.85 rows=43 width=8) (actual time=0.983..0.983 rows=1 loops=32) 
       Index Cond: (user_id = u.id) 
       Filter: (tags @> '{love}'::text[]) 
       Rows Removed by Filter: 1699 
Planning time: 1.322 ms 
Execution time: 31.656 ms

然而對於一些其他的標籤，如「美麗」這是更好，但還是有些慢。它還採用了不同的執行路徑

Limit (cost=3893365.84..3893365.89 rows=21 width=292) (actual time=2813.876..2813.892 rows=21 loops=1) 
    -> Sort (cost=3893365.84..3893499.84 rows=53600 width=292) (actual time=2813.874..2813.887 rows=21 loops=1) 
     Sort Key: u.followed_by DESC NULLS LAST 
     Sort Method: top-N heapsort Memory: 34kB 
     -> Nested Loop (cost=3437011.27..3891920.70 rows=53600 width=292) (actual time=1130.847..2779.928 rows=35230 loops=1) 
       -> HashAggregate (cost=3437010.70..3437546.70 rows=53600 width=8) (actual time=1130.809..1148.209 rows=35230 loops=1) 
        Group Key: p.user_id 
        -> Bitmap Heap Scan on posts p (cost=10484.20..3434173.21 rows=1134993 width=8) (actual time=268.602..972.390 rows=814919 loops=1) 
          Recheck Cond: (tags @> '{beautiful}'::text[]) 
          Heap Blocks: exact=347002 
          -> Bitmap Index Scan on idx_tags (cost=0.00..10200.45 rows=1134993 width=0) (actual time=168.453..168.453 rows=814919 loops=1) 
           Index Cond: (tags @> '{beautiful}'::text[]) 
       -> Index Scan using user_pk on users u (cost=0.57..8.47 rows=1 width=292) (actual time=0.045..0.046 rows=1 loops=35230) 
        Index Cond: (id = p.user_id) 
Planning time: 1.388 ms 
Execution time: 2814.132 ms

我也有「標籤」杜松子酒指數已經到位

來源

2017-07-17 Serge

只是一些提示測試：你試過索引over'by_by'嗎？並嘗試通過適當的子查詢來替換帖子上的連接。 – clemens

這應該是更快：

select * 
from users u 
where exists (select * 
       from posts p 
       where u.id=p.user_id 
       and p.tags @> ARRAY['love']) 
order by u.followed_by desc nulls last 
limit 21;

如果只有少數（ < 10％）帖子與該標籤，posts.tags索引也應該有所幫助：

create index using gin on posts (tags);

來源

2017-07-17 06:57:17

謝謝！請參閱我的更新，並告知我是否可以進一步改進。 – Serge

Postgres優化/替換DISTINCT

回答

相關問題