2016-11-14 34 views
0

我經常會用相同的聚合函數組合來查詢。例如。如何重用PostgreSQL中的聚合表達式而不會減速

SELECT 
    my_id, 
    sum(a * weight)/nullif(sum(CASE WHEN a IS NOT NULL THEN weight END), 0) AS a, 
    sum(b * weight)/nullif(sum(CASE WHEN b IS NOT NULL THEN weight END), 0) AS b 
FROM my_table 
GROUP BY my_id 

我想避免重複相同的表達式一遍又一遍。這將是巨大的一項新功能weighted_avg得到相同的結果:

SELECT 
    my_id, 
    weighted_avg(a, weight) AS a, 
    weighted_avg(b, weight) AS b 
FROM my_table 
GROUP BY my_id 

要做到這一點,我知道的唯一方法,就是使用CREATE AGGREGATE與中間狀態和SFUNC它被調用的每一行。不幸的是,這比原來的查詢慢得多,這使得它在我的情況下不可用。

我想象我的理想的解決方案會是什麼樣子

CREATE AGGREGATE FUNCTION weighted_avg(x float, weight float) 
RETURNS float AS $$ 
    SELECT sum(x * weight)/nullif(sum(CASE WHEN x IS NOT NULL THEN weight END), 0) 
$$ language SQL IMMUTABLE; 

和執行查詢時會內聯。但是我找不到Postgres支持的任何類似內容。

+1

使用功能的大概總是要有點比在原始代碼中使用表達式慢。 –

+0

我對一些開銷很滿意,但是使用'CREATE AGGREGATE'的plpgsql實現需要4倍的時間才能執行。所以我會保留原始表達式,這是可以接受的,但我希望有更好的解決方案。 –

+0

在'FROM'中使用子查詢來計算一次輸入表達式。 –

回答

0

您沒有顯示測試的聚合函數。這是我會怎樣創建它:

create function weighted_avg_acumm (fa float[], x float, weight float) 
returns float[] as $$ 
    select array[ 
     fa[1] + x * weight, 
     fa[2] + weight 
    ]::float[] 
$$ language sql immutable strict; 

create function weighted_avg_acumm_final (fa float[]) 
returns float as $$ 
    select fa[1]/fa[2] 
$$ language sql immutable strict; 

create aggregate weighted_avg (x float, weight float)(
    sfunc = weighted_avg_acumm, 
    finalfunc = weighted_avg_acumm_final, 
    stype = float[], 
    initcond = '{0,0}' 
); 

更新

我測試,它也慢得多了我:

create table t (a int, weight int); 
insert into t (a, weight) 
select 
    nullif(round(random() * 10), 0), 
    trunc(random() * 10) + 1 
from generate_series(1,1000000) 
; 

explain analyze 
select weighted_avg(a, weight) 
from t; 
                QUERY PLAN              
------------------------------------------------------------------------------------------------------------------- 
Aggregate (cost=269425.25..269425.26 rows=1 width=8) (actual time=7933.440..7933.440 rows=1 loops=1) 
    -> Seq Scan on t (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.018..241.571 rows=1000000 loops=1) 
Planning time: 0.189 ms 
Execution time: 7933.508 ms 

explain analyze 
select 
    sum(a::numeric * weight)/
    nullif(sum(case when a is not null then weight end), 0) 
from t; 
                QUERY PLAN              
------------------------------------------------------------------------------------------------------------------- 
Aggregate (cost=26925.00..26925.02 rows=1 width=8) (actual time=904.852..904.852 rows=1 loops=1) 
    -> Seq Scan on t (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.010..127.264 rows=1000000 loops=1) 
Planning time: 0.048 ms 
Execution time: 904.891 ms 
+0

它大部分是相同的(一些不同的零和NULL處理)。不幸的是,這比原生表達慢大約4倍。 –