我有一個表的parameters_products約300k記錄。 是否有可能優化此查詢?如何優化PostgreSQL的COUNT GROUP BY查詢?
SELECT parameter_id AS id,
COUNT(product_id) AS COUNT
FROM "parameters_products"
WHERE product_id IN
(SELECT product_id
FROM parameters_products
WHERE parameter_id IN ('2'))
GROUP BY parameter_id
查詢輸出:
2;274669
EXPLAIN ANALYZE VERBOSE ...輸出:
HashAggregate (cost=23628.54..23628.56 rows=2 width=8) (actual time=2231.367..2231.368 rows=1 loops=1)
Output: parameters_products.parameter_id, count(parameters_products.product_id)
Group Key: parameters_products.parameter_id
-> Hash Semi Join (cost=9607.86..22256.43 rows=274421 width=8) (actual time=692.586..1893.261 rows=274669 loops=1)
Output: parameters_products.parameter_id, parameters_products.product_id
Hash Cond: (parameters_products.product_id = parameters_products_1.product_id)
-> Seq Scan on public.parameters_products (cost=0.00..4356.28 rows=299728 width=8) (actual time=0.025..353.358 rows=299728 loops=1)
Output: parameters_products.parameter_id, parameters_products.product_id
-> Hash (cost=5105.60..5105.60 rows=274421 width=4) (actual time=692.331..692.331 rows=274669 loops=1)
Output: parameters_products_1.product_id
Buckets: 16384 Batches: 4 Memory Usage: 2425kB
-> Seq Scan on public.parameters_products parameters_products_1 (cost=0.00..5105.60 rows=274421 width=4) (actual time=0.013..344.656 rows=274669 loops=1)
Output: parameters_products_1.product_id
Filter: (parameters_products_1.parameter_id = 2)
Rows Removed by Filter: 25059
Planning time: 0.279 ms
Execution time: 2231.499 ms
的PostgreSQL 9.4.1,並真空啓用。
只是嘗試這樣做quesry,但實在是太慢了:
SELECT pp1.parameter_id,
count(pp1.product_id)
FROM parameters_products pp1
LEFT JOIN parameters_products pp2 ON pp1.product_id = pp2.product_id
WHERE pp2.parameter_id IN (2)
GROUP BY pp1.parameter_id
-
HashAggregate (cost=23742.42..23742.44 rows=2 width=8) (actual time=2361.654..2361.654 rows=1 loops=1)
Output: pp1.parameter_id, count(pp1.product_id)
Group Key: pp1.parameter_id
-> Hash Join (cost=9607.86..22370.31 rows=274421 width=8) (actual time=715.409..2012.345 rows=274669 loops=1)
Output: pp1.parameter_id, pp1.product_id
Hash Cond: (pp1.product_id = pp2.product_id)
-> Seq Scan on public.parameters_products pp1 (cost=0.00..4356.28 rows=299728 width=8) (actual time=0.012..360.789 rows=299728 loops=1)
Output: pp1.parameter_id, pp1.product_id
-> Hash (cost=5105.60..5105.60 rows=274421 width=4) (actual time=715.176..715.176 rows=274669 loops=1)
Output: pp2.product_id
Buckets: 16384 Batches: 4 Memory Usage: 2425kB
-> Seq Scan on public.parameters_products pp2 (cost=0.00..5105.60 rows=274421 width=4) (actual time=0.009..353.386 rows=274669 loops=1)
Output: pp2.product_id
Filter: (pp2.parameter_id = 2)
Rows Removed by Filter: 25059
Planning time: 0.135 ms
Execution time: 2361.735 ms
指標:
CREATE INDEX parameters_products_parameter_id_idx
ON parameters_products
USING btree
(parameter_id);
CREATE INDEX parameters_products_product_id_idx
ON parameters_products
USING btree
(product_id);
CREATE INDEX parameters_products_product_id_parameter_id_idx
ON parameters_products
USING btree
(product_id, parameter_id);
EXPLAIN ANALYZE VERBOSE
SELECT pp1.parameter_id
FROM parameters_products pp1
LEFT JOIN parameters_products pp2 ON pp1.product_id = pp2.product_id
-
Hash Left Join (cost=9241.88..22699.06 rows=299728 width=4) (actual time=727.683..2080.798 rows=299728 loops=1)
Output: pp1.parameter_id
Hash Cond: (pp1.product_id = pp2.product_id)
-> Seq Scan on public.parameters_products pp1 (cost=0.00..4324.28 rows=299728 width=8) (actual time=0.031..355.656 rows=299728 loops=1)
Output: pp1.parameter_id, pp1.product_id
-> Hash (cost=4324.28..4324.28 rows=299728 width=4) (actual time=727.579..727.579 rows=299728 loops=1)
Output: pp2.product_id
Buckets: 16384 Batches: 4 Memory Usage: 2644kB
-> Seq Scan on public.parameters_products pp2 (cost=0.00..4324.28 rows=299728 width=4) (actual time=0.008..350.797 rows=299728 loops=1)
Output: pp2.product_id
Planning time: 0.472 ms
Execution time: 2392.582 ms
SET enable_seqscan = OFF;
降低了執行時間,但不顯著。
用'JOIN'替換'WHERE IN'# – lad2025
@ lad2025執行時間:2361。735 ms – nanolab
BTW:'WHERE parameter_id IN('2'))'''''''''''''''''''''' – wildplasser