2017-08-24 70 views
0

我是使用BiqQuery(幾個星期的經驗)的新手,並試圖提高我的技能。我對用戶Willian Fuks發佈的 (Recreate GA Funnel on BigQuery)以下非常有趣的查詢有一個實用的問題。它是關於BigQuery中的GA數據以有效的方式重現漏斗的。通過customDimension的特定值進行選擇(BQ <> GA)

#standardSQL 
SELECT 
SUM((SELECT COUNTIF(eventInfo.eventAction = 'landing_page') FROM UNNEST(hits))) Landing_Page, 
SUM((SELECT COUNTIF(eventInfo.eventAction = 'model_selection_page') FROM UNNEST(hits) WHERE EXISTS(SELECT 1 FROM UNNEST(hits) WHERE eventInfo.eventAction = 'landing_page'))) Model_Selection 
FROM `64269470.ga_sessions_20170720` 

在該示例中,使用eventInfo.eventAction。我嘗試了幾件事情來使它與customDimension一起工作,但我失敗了。有誰知道我怎麼能重現查詢分段與customDimension而不是eventInfo.eventAction

我曾與這樣的:

(SELECT MAX(IF(index=1,page1, NULL))FROM UNNEST(hits.customDimensions)) 

回答

0

customDimensions工作是更具挑戰性的一點,因爲這個領域也是一個數組類型(重複)。但實際上,主要區別在於需要另一個UNNEST操作。除此之外,這是相同的邏輯。

這裏有一些數據,向您展示:

#standardSQL 
WITH data AS(
    SELECT '1' AS fullvisitorid, 1 AS visitid, ARRAY<STRUCT< hitNumber INT64, customDimension ARRAY<STRUCT<index INT64, value STRING> > >> [STRUCT(1 AS hitNumber, [STRUCT(1 AS index, 'landing_page' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension), 
                                      STRUCT(2 AS hitNumber, [STRUCT(1 AS index, 'value1' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension)] AS hits UNION ALL 

    SELECT '1' AS fullvisitorid, 2 AS visitid, ARRAY<STRUCT< hitNumber INT64, customDimension ARRAY<STRUCT<index INT64, value STRING> > >> [STRUCT(1 AS hitNumber, [STRUCT(1 AS index, 'landing_page' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension), 
                                      STRUCT(2 AS hitNumber, [STRUCT(1 AS index, 'landing_page' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension)] AS hits UNION ALL 

    SELECT '2' AS fullvisitorid, 1 AS visitid, ARRAY<STRUCT< hitNumber INT64, customDimension ARRAY<STRUCT<index INT64, value STRING> > >> [STRUCT(1 AS hitNumber, [STRUCT(1 AS index, 'model_selection_page' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension), 
                                      STRUCT(2 AS hitNumber, [STRUCT(1 AS index, 'value1' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension)] AS hits UNION ALL 

    SELECT '3' AS fullvisitorid, 1 AS visitid, ARRAY<STRUCT< hitNumber INT64, customDimension ARRAY<STRUCT<index INT64, value STRING> > >> [STRUCT(1 AS hitNumber, [STRUCT(1 AS index, 'landing_page' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension), 
                                      STRUCT(2 AS hitNumber, [STRUCT(3 AS index, 'model_selection_page' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension)] AS hits UNION ALL 

    SELECT '3' AS fullvisitorid, 2 AS visitid, ARRAY<STRUCT< hitNumber INT64, customDimension ARRAY<STRUCT<index INT64, value STRING> > >> [STRUCT(1 AS hitNumber, [STRUCT(1 AS index, 'landing_page' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension), 
                                      STRUCT(2 AS hitNumber, [STRUCT(3 AS index, 'model_selection_page' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension), 
                                      STRUCT(3 AS hitNumber, [STRUCT(3 AS index, 'model_selection_page' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension)] AS hits UNION ALL 
    SELECT '4' AS fullvisitorid, 1 AS visitid, ARRAY<STRUCT< hitNumber INT64, customDimension ARRAY<STRUCT<index INT64, value STRING> > >> [STRUCT(1 AS hitNumber, [STRUCT(1 AS index, 'landing_page' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension), 
                                      STRUCT(2 AS hitNumber, [STRUCT(3 AS index, 'model_selection_page' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension), 
                                      STRUCT(3 AS hitNumber, [STRUCT(3 AS index, 'model_selection_page' AS value), STRUCT(2 AS index, 'value2' AS value)] AS customDimension)] AS hits                                                                                                                                
) 

每個用戶(fullvisitorid)和每個會話(visitid)有其hits數組。注意我把它的每個命中分開了hitNumber,我發現它讓事情變得更容易理解。

下面的查詢計算其中索引1和值「LANDING_PAGE」一個customDimension發生會話總數,以及其中指數爲3和值是「model_selection_page」:

#standardSQL 
SELECT 
    SUM((SELECT 1 FROM UNNEST(hits), UNNEST(customDimension) WHERE index = 1 AND value = 'landing_page' LIMIT 1)) Landing_Page, 
    SUM((SELECT 1 FROM UNNEST(hits), UNNEST(customDimension) WHERE EXISTS(SELECT 1 FROM UNNEST(hits), UNNEST(customDimension) WHERE index = 1 AND value = 'landing_page') AND index = 3 AND value = 'model_selection_page' LIMIT 1)) Model_Selection 
FROM 
    data 

你可以玩的模擬數據以更好地理解這裏發生了什麼。簡而言之,請注意兩個UNNEST發生,首先獲取hits中的值,第二個獲取customDimension中的值。

領域Model_Selection是一個比較複雜的,因爲它首先必須評估維度「LANDING_PAGE」是否被解僱了,因爲你可以在這個表達式看到:

EXISTS(SELECT 1 FROM UNNEST(hits), UNNEST(customDimension) WHERE index = 1 AND value = 'landing_page') 

如果hits有某處「LANDING_PAGE」維度,那麼此表達式會在WHERE子句中返回True

您也可以帶來的效果在用戶層面,像這樣:

#standardSQL 
SELECT 
    COUNT(DISTINCT (SELECT fullvisitorid FROM UNNEST(hits), UNNEST(customDimension) WHERE index = 1 AND value = 'landing_page' LIMIT 1)) Landing_Page, 
    COUNT(DISTINCT (SELECT fullvisitorid FROM UNNEST(hits), UNNEST(customDimension) WHERE EXISTS(SELECT 1 FROM UNNEST(hits), UNNEST(customDimension) WHERE index = 1 AND value = 'landing_page') AND index = 3 AND value = 'model_selection_page' LIMIT 1)) Model_Selection 
FROM 
    data 

當你正在學習的BigQuery,我建議用模擬擺弄數據和測試每一步觀察輸出。你可以玩UNNEST,運行一些查詢來測試它的輸出等,以便更好和更深入地理解如何使用這些技術。

+0

謝謝@willianfuks徹底迴應!這一定會幫助我進一步理解它。我也會通過玩弄模擬數據來跟進你的建議。 – Jesper

+0

剛從(一個很長的)假期回來,我剛開始使用查詢和模擬數據。在驗證輸入時,我注意到與GA用戶界面數據相比,會話中的差異從4%左右。對於用戶(fullvisitorid),數據完全對應。一開始,我認爲差異是由於缺少hit.type而發生的,但是這個字段不能訪問Array字段。任何想法可能會導致會話中的這種差異? – Jesper