2016-08-01 60 views
0

我想用bigquery UDF編寫一個函數來比較字符串列表與其他字符串列表。 基本上我想知道我們每週有多少新用戶,以及這些新用戶有多少人在未來幾周內訪問我們的網站。爲此,我創建了一個查詢,該查詢每週給我一個包含所有電子郵件的字符串(使用group_concat),並將其保存爲表格。現在需要知道我怎樣才能每週與其他電子郵件收集進行比較。 最後,我想有這樣一個表:Bigquery中寫字符串比較函數

+----------------+-------+-------+--------+------+ 
|  | week 1 | week 2 | week 3| week 4 | ... | 
+----------------+-------+-------+--------+------+ 
| week1 | 17 | 7 | 5 | 9 | ... | 
+----------------+-------+-------+--------+------+ 
| week2 |  | 19 | 13 | 8 | ... | 
+-----------------+-------+-------+--------+-----+ 
| week3 |  |  | 24 | 15 | ... | 
+-----------------+-------+-------+--------+-----+ 

回答

2

只給你一個想法與

SELECT 
    CONCAT('week', STRING(prev)) AS WEEK, 
    SUM(IF(next=19, authors, 0)) AS week19, 
    SUM(IF(next=20, authors, 0)) AS week20, 
    SUM(IF(next=21, authors, 0)) AS week21, 
    SUM(IF(next=22, authors, 0)) AS week22, 
    SUM(IF(next=23, authors, 0)) AS week23 
FROM (
    SELECT prev, next, COUNT(author) AS authors 
    FROM (
    SELECT 
     prev_week.week_created AS prev, 
     next_week.week_created AS next, 
     prev_week.author AS author 
    FROM (
     SELECT 
     WEEK(SEC_TO_TIMESTAMP(created_utc)) AS week_created, 
     author 
     FROM [fh-bigquery:reddit_posts.2016_05] 
     GROUP BY 1,2 
    ) next_week 
    LEFT JOIN (
     SELECT 
     WEEK(SEC_TO_TIMESTAMP(created_utc)) AS week_created, 
     author 
     FROM [fh-bigquery:reddit_posts.2016_05] 
     GROUP BY 1,2 
    ) AS prev_week 
    ON prev_week.author = next_week.author 
    HAVING prev <= next 
) 
    GROUP BY 1,2 
) 
GROUP BY 1 
ORDER BY 1 

結果玩的就是如下
enter image description here

這是最接近你問我能想到的

同時,請注意 - BigQuery是較少ta爲報告設計而不是數據處理。所以我認爲在BigQuery(外部選擇)中創建矩陣/數據透視不是最合適的 - 它可以在您的報告工具中完成。但計算所有對prev|next|count(內部選擇)絕對適合在這裏BigQuery

+0

這是一個非常好的答案!我在想完全不同。我已經用java編寫了代碼。這就是爲什麼我想創建一個獨特的收藏,每週保存所有的電子郵件,並與其他幾周進行比較......據我所知,UDF是不可能的。 – AnaHid