2017-04-25 37 views
1

我正在用bigquery的谷歌分析歸因數據。爲了對不同的歸因模型進行硬編碼,對於每一筆交易,我首先希望由訪問者將該交易歸因於每次以前訪問該網站的不同visitor_id。爲此,我想將事務ID複製到該用戶數據的所有先前行(行由visitor_id和visit_number排序)。使用sql將transaction_id複製到具有相同user_id的所有以前加時間戳的行?

例如,我可以有一個這樣的表:

| Visitor_ID | Visit_Number | Transaction_ID | 
---------------------------------------------- 
|  A  |  1  |  null  | 
|  A  |  2  |  null  | 
|  A  |  3  |  F1245 | 

我想用類似下面的表中結束了:

| Visitor_ID | Visit_Number | Transaction_ID | 
---------------------------------------------- 
|  A  |  1  |  F1245 | 
|  A  |  2  |  F1245 | 
|  A  |  3  |  F1245 | 

但是,如果我有一個表像如下:

| Visitor_ID | Visit_Number | Transaction_ID | 
---------------------------------------------- 
|  B  |  1  |  null  | 
|  B  |  2  |  null  | 
|  B  |  3  |  G1245 | 
|  B  |  4  |  null  | 

我想結束一個表,其中只有前往交易所的訪問請注意以下事項:

| Visitor_ID | Visit_Number | Transaction_ID | 
---------------------------------------------- 
|  B  |  1  |  G1245 | 
|  B  |  2  |  G1245 | 
|  B  |  3  |  G1245 | 
|  B  |  4  |  null  | 

有沒有辦法使用SQL查詢來做到這一點?

回答

0

下面嘗試使用BigQuery的標準SQL

這個版本涵蓋的情況下相同的訪問者有幾筆交易 - 讓他們分配到相應的visit_numbers

#standardSQL 
WITH Input AS (
    SELECT 'A' AS Visitor_ID, 1 AS Visit_Number, NULL AS Transaction_ID UNION ALL 
    SELECT 'A' AS Visitor_ID, 2 AS Visit_Number, NULL AS Transaction_ID UNION ALL 
    SELECT 'A' AS Visitor_ID, 3 AS Visit_Number, 'F1245' AS Transaction_ID UNION ALL 
    SELECT 'A' AS Visitor_ID, 4 AS Visit_Number, NULL AS Transaction_ID UNION ALL 
    SELECT 'A' AS Visitor_ID, 5 AS Visit_Number, NULL AS Transaction_ID UNION ALL 
    SELECT 'A' AS Visitor_ID, 6 AS Visit_Number, 'F1246' AS Transaction_ID UNION ALL 
    SELECT 'A' AS Visitor_ID, 7 AS Visit_Number, NULL AS Transaction_ID UNION ALL 
    SELECT 'B' AS Visitor_ID, 1 AS Visit_Number, NULL AS Transaction_ID UNION ALL 
    SELECT 'B' AS Visitor_ID, 2 AS Visit_Number, NULL AS Transaction_ID UNION ALL 
    SELECT 'B' AS Visitor_ID, 3 AS Visit_Number, 'G1245' AS Transaction_ID UNION ALL 
    SELECT 'B' AS Visitor_ID, 4 AS Visit_Number, NULL AS Transaction_ID 
) 
SELECT 
    Visitor_ID, 
    Visit_Number, 
    Transaction_ID AS originalTransaction_ID, 
    SUBSTR(MIN(CONCAT(CAST(1000000 + Visit_Number AS STRING), Transaction_ID)) OVER(win), 7) AS Transaction_ID 
FROM Input 
WINDOW win AS (PARTITION BY Visitor_ID ORDER BY Visit_Number DESC ROWS UNBOUNDED PRECEDING) 
ORDER BY Visitor_ID, Visit_Number 

結果如下

Visitor_ID Visit_Number originalTransaction_ID Transaction_ID 
A   1    null     3F1245 
A   2    null     3F1245 
A   3    F1245     3F1245 
A   4    null     6F1246 
A   5    null     6F1246 
A   6    F1246     6F1246 
A   7    null     null  
B   1    null     3G1245 
B   2    null     3G1245 
B   3    G1245     3G1245 
B   4    null     null  
2

嘗試MAX帶有窗口子句。這裏有一個例子:

#standardSQL 
WITH Input AS (
    SELECT 'A' AS Visitor_ID, 1 AS Visit_Number, NULL AS Transaction_ID UNION ALL 
    SELECT 'A' AS Visitor_ID, 2 AS Visit_Number, NULL AS Transaction_ID UNION ALL 
    SELECT 'A' AS Visitor_ID, 3 AS Visit_Number, 'F1245' AS Transaction_ID UNION ALL 
    SELECT 'B' AS Visitor_ID, 1 AS Visit_Number, NULL AS Transaction_ID UNION ALL 
    SELECT 'B' AS Visitor_ID, 2 AS Visit_Number, NULL AS Transaction_ID UNION ALL 
    SELECT 'B' AS Visitor_ID, 3 AS Visit_Number, 'G1245' AS Transaction_ID UNION ALL 
    SELECT 'B' AS Visitor_ID, 4 AS Visit_Number, NULL AS Transaction_ID 
) 
SELECT 
    * EXCEPT (Transaction_ID), 
    MAX(Transaction_ID) OVER (
    PARTITION BY Visitor_ID ORDER BY Visitor_ID DESC 
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING 
) AS Transaction_ID 
FROM Input 
ORDER BY Visitor_ID, Visit_Number ASC; 
+0

我覺得這個只有在訂購'TransactionID'時纔有效?如果你把'H1245'放在最後的'NULL'中,所有B都會有'H1245'? – Maximilian

+1

我做了一個假設,即對於特定的'Visitor_ID'只有一個不同的'Transaction_ID'值。如果您需要不同的語義來決定使用哪種'Transaction_ID',則需要更改該方法。 –

相關問題