爲什麼PostgreSQL 9.5的CUBE，ROLLUP和GROUPING SETS比等效的UNION慢？

我已經非常期待新的PostgreSQL 9.5功能，並且很快就會升級我們的數據庫。不過，我很驚訝，當我發現，在我們的數據爲什麼PostgreSQL 9.5的CUBE，ROLLUP和GROUPING SETS比等效的UNION慢？

SELECT col1, col2, count(*), grouping(col1,col2) 
FROM table1 
GROUP BY CUBE(col1, col2)

查詢實際運行慢得多（約3秒），比相當於數據查詢的持續時間的總和（〜1秒總的所有4個查詢，100-300ms每）。 col1和col2都有索引。

這是預期的嗎（意思是功能更多地是關於兼容性而不是性能）？或者可以以某種方式進行調整？

這裏有一個真空生產表的例子：

> explain analyze select service_name, state, res_id, count(*) from bookings group by rollup(service_name, state, res_id); 
                  QUERY PLAN 
------------------------------------------------------------------------------------------------------------------------------- 
GroupAggregate (cost=43069.12..45216.05 rows=4161 width=24) (actual time=1027.341..1120.675 rows=428 loops=1) 
    Group Key: service_name, state, res_id 
    Group Key: service_name, state 
    Group Key: service_name 
    Group Key:() 
    -> Sort (cost=43069.12..43490.18 rows=168426 width=24) (actual time=1027.301..1070.321 rows=168426 loops=1) 
     Sort Key: service_name, state, res_id 
     Sort Method: external merge Disk: 5728kB 
     -> Seq Scan on bookings (cost=0.00..28448.26 rows=168426 width=24) (actual time=0.079..147.619 rows=168426 loops=1) 
Planning time: 0.118 ms 
Execution time: 1122.557 ms 
(11 rows) 

> explain analyze select service_name, state, res_id, count(*) from bookings group by service_name, state, res_id 
UNION ALL select service_name, state, NULL, count(*) from bookings group by service_name, state 
UNION ALL select service_name, NULL, NULL, count(*) from bookings group by service_name 
UNION ALL select NULL, NULL, NULL, count(*) from bookings; 
                   QUERY PLAN 
----------------------------------------------------------------------------------------------------------------------------------------- 
Append (cost=30132.52..118086.91 rows=4161 width=32) (actual time=208.986..706.347 rows=428 loops=1) 
    -> HashAggregate (cost=30132.52..30172.12 rows=3960 width=24) (actual time=208.986..209.078 rows=305 loops=1) 
     Group Key: bookings.service_name, bookings.state, bookings.res_id 
     -> Seq Scan on bookings (cost=0.00..28448.26 rows=168426 width=24) (actual time=0.022..97.637 rows=168426 loops=1) 
    -> HashAggregate (cost=29711.45..29713.25 rows=180 width=20) (actual time=195.851..195.879 rows=96 loops=1) 
     Group Key: bookings_1.service_name, bookings_1.state 
     -> Seq Scan on bookings bookings_1 (cost=0.00..28448.26 rows=168426 width=20) (actual time=0.029..95.588 rows=168426 loops=1) 
    -> HashAggregate (cost=29290.39..29290.59 rows=20 width=11) (actual time=181.955..181.960 rows=26 loops=1) 
     Group Key: bookings_2.service_name 
     -> Seq Scan on bookings bookings_2 (cost=0.00..28448.26 rows=168426 width=11) (actual time=0.030..97.047 rows=168426 loops=1) 
    -> Aggregate (cost=28869.32..28869.33 rows=1 width=0) (actual time=119.332..119.332 rows=1 loops=1) 
     -> Seq Scan on bookings bookings_3 (cost=0.00..28448.26 rows=168426 width=0) (actual time=0.039..93.508 rows=168426 loops=1) 
Planning time: 0.373 ms 
Execution time: 706.558 ms 
(14 rows)

總時間是不相上下，但後者採用四次掃描，應該不是很慢？「在磁盤上的外部合併」，而使用rollup（）很奇怪，我有work_mem設置爲16M。

來源

2016-02-04 codesnik

向我們展示使用'explain（analyze，verbose）' –

添加示例的執行計劃。同一列上的CUBE（）會帶來更大的差異 – codesnik

排序（外部合併排序）需要大部分時間，對嗎？ 1027+毫秒，還是我誤解了？ –

有趣的，但在這個特殊的例子SET work_mem='32mb'擺脫磁盤合併，現在使用ROLLUP比對應的聯盟快2倍。

解釋分析現在包含：「排序方法：快速排序內存：19301kB」

我仍然不知道爲什麼需要區區400行輸出的，和這麼多的內存，爲什麼需要7MB磁盤合併相比，內存19MB（快速排序開銷？），但我的問題解決了。

來源

2016-02-22 02:38:17 codesnik

排序正在168k行上工作，不是嗎？ –

是的，你是對的。這就是整個桌子！這是否意味着ROLLUP/CUBE/GROUPING SETS只能以這種（或多或少）天真的方式工作，或者在有意義的情況下會出現極端情況？ – codesnik

似乎分組集總是有GroupAggregate和Sort查詢計劃。但按頻率標準組使用HashAggragragate。

來源

2016-05-25 13:03:18 oscavi

爲什麼PostgreSQL 9.5的CUBE，ROLLUP和GROUPING SETS比等效的UNION慢？

回答

相關問題