2017-10-16 92 views
2

運行總和我有下面的所有列的表,除了黃色的有間隙

enter image description here

基本上表有客戶的ID,出售發生的日期和總金額的花費當天的客戶(銷售)。現在我必須計算當天每個客戶的時間範圍內的累計銷售額,包括當天的銷售額。例如,設置時間框架爲3天的客戶2233買了兩次(14日沒有),所以他15日的累計銷售額是26,而在13日他們是25.

我不能創建新表所以我試圖這種方法,但它是相當緩慢

SELECT t.dt, 

Count(CASE WHEN t.running_sale < 1.99 THEN 1 ELSE NULL END) as "Low spender", 
Count(CASE WHEN t.running_sale BETWEEN 1.99 and 4.99 THEN 1 ELSE NULL END) as "Medium spender", 
Count(CASE WHEN t.running_sale > 4.99 THEN 1 ELSE NULL END) as "High spender" 

FROM (SELECT dt, channel, id, (
    SELECT SUM(revenue) 
    FROM myTable rd 
    WHERE CAST(rd.dt AS DATE) 
      BETWEEN (CAST(rd.dt AS DATE) - INTERVAL '3' DAY) AND CAST(rd.dt AS DATE) AND 
      rd.id = r.id 
) running_sale from myTable r) t 

WHERE channel = 'retail' 
AND dt BETWEEN '2017-06-01' AND '2017-06-15' 

GROUP BY dt 
limit 100 
+0

使用分析? '總和(銷售)OVER(分區由ID ORDER BY日期asc 行之間2 PRECEDING)作爲RunningSales' – xQbert

+0

不起作用,因爲將在第12天採取ID 2233將需要11和06,這是一個差距超過3天。 –

+0

我有點得到它,但我不明白爲什麼2233在15日有26,那麼如果範圍是3天前包括15,14,13這將給22不是26.或者應該包括12,所以範圍是15,13,​​13,12? – xQbert

回答

2

我會用一個子查詢這個

select *, 
    (
    select sum(sales) 
    from your_table dd 
    where cast(dd.dates as date) 
      between cast(your_table.dates as date) - interval '3' day and 
        cast(your_table.dates as date) and 
      dd.id = your_table.id 
) running_sales 
from your_table 

demo

和上面的查詢可以改寫成簡單的使用更有效的對口自聯接和group by

select dd.id, dd.dates, dd.sales, sum(d.sales) running_sales 
from your_table dd 
join your_table d on cast(d.dates as date) 
     between (cast(dd.dates as date) - interval '3' day) and cast(dd.dates as date) and 
     dd.id = d.id 
group by dd.id, dd.dates, dd.sales 

group by demo

您可以考慮設立以下指標來支持上述查詢:

create index ix_your_table on your_table(id, dates, sales) 
+0

用我必須做的全部更新來更新我的問題。看起來像這種方法是緩慢的,服務器超時 –

+0

@PasqualeSada好吧,我已經改寫成一個'group by'版本,請現在測試它,讓我知道 –

0
With CTE as (
    SELECT 1234 id, '2017-06-15' idate,9 sales from dual UNION ALL 
    SELECT 2233 id, '2017-06-03' idate,20 sales from dual UNION ALL 
    SELECT 2233 id , '2017-06-05' idate,4 sales from dual UNION ALL 
    SELECT 2233 id , '2017-06-06' idate,1 sales from dual UNION ALL 
    SELECT 2233 id , '2017-06-11' idate,8 sales from dual UNION ALL 
    SELECT 2233 id , '2017-06-12' idate,4 sales from dual UNION ALL 
    SELECT 2233 id, '2017-06-13' idate,21 sales from dual UNION ALL 
    SELECT 2233 id, '2017-06-15' idate,1 sales from dual UNION ALL 
    SELECT 2544 id , '2017-06-13' idate,9 sales from dual UNION ALL 
    SELECT 2443 id, '2017-06-05' idate,3.5 sales from dual) 

,cte2 as (
select cte.*, to_number(replace(idate,'-')) datekey from cte 
) 
--select * from cte2 
--SELECT cte.*, sum(cte.Sales) OVER (PARTITION by ID ORDER BY cte.iDate asc ROWS 2 PRECEDING) as RunningSales FROM CTE 

--select rownum rn from dual connect by prior 
,pp as (
SELECT to_number(dd+20170600) dkey 
FROM (SELECT rownum dd 
     FROM dual 
     CONNECT BY LEVEL <= 31 
     ) 
) 
--select * from pp 
,cc as (


select cte2.* ,pp.dkey 
from pp left join cte2 
on(cte2.datekey=pp.dkey) 
) 
select cc.* ,sum(cc.Sales) OVER (PARTITION by cc.ID ORDER BY cc.dkey asc ROWS 2 PRECEDING) as RunningSales 
from cc order by dkey asc ,id asc 
+0

它在oracle 12c上測試它。並且它可以從構建數據湖維度中借鑑。 – Mookayama

0

如果每天至多有一筆銷售,那麼最有效的方法可能會重複延遲:

select rd.*, 
     (sales + 
     (case when prev_date >= date - interval '2 day' then prev_sales else 0 end) + 
     (case when prev2_date >= date - interval '2 day' then prev2_sales else 0 end) 
     ) as sales_3day 
from (select rd.*, 
      lag(date, 1) over (partition by id order by date) as prev_date, 
      lag(date, 2) over (partition by id order by date) as prev_date2, 
      lag(sales, 1) over (partition by id order by date) as prev_sales, 
      lag(sales, 2) over (partition by id order by date) as prev_sales2 
     from mytable rd 
) rd; 

一旦你有了這個值,剩下的只是結果的條件邏輯。

如果您在一個日期有多個銷售額,則可以通過在最內層查詢中進行彙總來輕鬆完成此項工作。