2016-11-28 46 views
5

我們正在記錄用戶在桌面上的iPad應用程序中執行的主要操作流程。每個流程都有一個開始(標記爲開始)和一個標記爲已取消或已完成的結束,並且不應有任何重疊事件。在Postgresql上配對連續事件

一組開始,取消或完成了用戶流量是這樣的:

user_id    timestamp     event_text  event_num 
[email protected] 2016-10-30 00:08:00.966+00 Flow Started 0 
[email protected] 2016-10-30 00:08:15.58+00 Flow Cancelled 2 
[email protected] 2016-10-30 00:08:15.581+00 Flow Started 0 
[email protected] 2016-10-30 00:34:44.134+00 Flow Finished 1 
[email protected] 2016-10-30 00:42:26.102+00 Flow Started 0 
[email protected] 2016-10-30 00:42:49.276+00 Flow Cancelled 2 
[email protected] 2016-10-30 00:42:49.277+00 Flow Started 0 
[email protected] 2016-10-30 00:59:47.337+00 Flow Cancelled 2 
[email protected] 2016-10-30 00:59:47.337+00 Flow Started 0 
[email protected] 2016-10-30 00:59:47.928+00 Flow Cancelled 2 

我們要計算多久取消和成品流程最後的平均水平。爲此,我們需要配對事件已啓動已取消或已完成。結束正在進行的流程之前

  • 當客戶想開始新的流量(讓我們稱之爲中級課程)(初級流瑜伽:下面的代碼確實是,但是不能解決,我們有以下的數據質量問題),當我們拍攝新流程的開始活動時,我們拍攝了一個取消的活動。所以Flow1 Cancelled=Flow2 Started。然而,當我們使用窗口函數來排序和實際屬於不同流的有序事件之間的引導/滯後匹配時。 通過使用此代碼:

    WITH track_scf AS (SELECT user_id, timestamp, event_text, CASE WHEN event_text LIKE '%Started%' THEN 0 when event_text like '%Cancelled%' then 2 ELSE 1 END AS event_num FROM tracks ORDER BY 2, 4 desc) SELECT user_id, CASE WHEN event_num=0 then timestamp end as start,CASE WHEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(timestamp, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) END as end, CASE WHEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) END as action FROM track_scf 
    

我們得到這樣的結果:

user_id    start      end       action 
[email protected] 2016-10-30 00:08:00.966+00 2016-10-30 00:08:15.58+00 2 
[email protected] 2016-10-30 00:08:15.581+00 2016-10-30 00:34:44.134+00 1 
[email protected] 2016-10-30 00:42:26.102+00 2016-10-30 00:42:49.276+00 2 
[email protected] 2016-10-30 00:42:49.277+00 NULL      NULL 
[email protected] 2016-10-30 00:59:47.337+00 2016-10-30 00:59:47.337+00 2 
[email protected] NULL      2016-10-30 00:59:47.928+00 2 

但是,我們應該得到這樣的:

user_id    start      end       action 
[email protected] 2016-10-30 00:08:00.966+00 2016-10-30 00:08:15.58+00 2 
[email protected] 2016-10-30 00:08:15.581+00 2016-10-30 00:34:44.134+00 1 
[email protected] 2016-10-30 00:42:26.102+00 2016-10-30 00:42:49.276+00 2 
[email protected] 2016-10-30 00:42:49.277+00 2016-10-30 00:59:47.337+00 2 
[email protected] 2016-10-30 00:59:47.337+00 2016-10-30 00:59:47.928+00 2 

如何做我需要更改代碼,以便配對是正確的?

回答

2
select  user_id  
      ,"start"      
      ,"end"       
      ,"action" 

from  (select  user_id 
         ,timestamp     as "start" 
         ,lead (event_num) over w as "action" 
         ,lead ("timestamp") over w as "end" 
         ,event_num 

      from  tracks t 

      window  w as (partition by user_id order by "timestamp",event_num desc) 
      ) t 

where  t.event_num = 0 
; 
+0

它的工作原理!漂亮時尚的解決方案..謝謝你! –