2017-06-04 309 views
0

對於每個fullvisitorId,我試圖在date_1和date_2之間獲得所有visitId。這對於每個用戶當然是不同的。每個用戶的(不同)日期範圍之間的VisitId

任何人都可以提供任何指針我怎麼能做到這一點?

例如:

  • USER_1:我想所有visitId 1日之間& 6月20日
  • user_2:我想12 & 6月27日 之間的所有visitId ......等兒子

date_1和date_2對應於他們在網站上採取的重要操作(Event匹配)。下載試用&購買

在此先感謝您的任何線索。

回答

1

解決此問題的一種可能方法是使用analytical functions。舉個例子:

#standardSQL 
WITH data AS(
    select '1' as user, '1' as visitid, '20170520' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event1' as eventCategory) as eventInfo)] hits UNION ALL 
    select '1' as user, '2' as visitid, '20170521' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits UNION ALL 
    select '1' as user, '3' as visitid, '20170522' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event2' as eventCategory) as eventInfo)] hits UNION ALL 
    select '1' as user, '4' as visitid, '20170523' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits UNION ALL 

    select '2' as user, '1' as visitid, '20170520' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event1' as eventCategory) as eventInfo)] hits UNION ALL 
    select '2' as user, '2' as visitid, '20170521' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event2' as eventCategory) as eventInfo)] hits UNION ALL 
    select '2' as user, '3' as visitid, '20170522' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits union all 

    select '3' as user, '1' as visitid, '20170520' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event1' as eventCategory) as eventInfo)] hits UNION ALL 
    select '3' as user, '2' as visitid, '20170521' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits UNION ALL 
    select '3' as user, '3' as visitid, '20170522' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits 
) 

SELECT 
    user, 
    visitid, 
    date 
FROM(
    SELECT 
    user, 
    visitid, 
    date, 
    MIN(CASE WHEN hits.eventInfo.eventCategory = 'event1' THEN date END) OVER(PARTITION BY user) min_date, 
MAX(CASE WHEN hits.eventInfo.eventCategory = 'event2' THEN date END) OVER(PARTITION BY user) max_date 
FROM data, 
UNNEST(hits) hits 
) 
WHERE date BETWEEN min_date AND max_date 

哪裏data是您ga_sessions數據(我命名爲 'fullvisitorid' 爲 '用戶')的模擬。

這使得給定用戶可以有日期1和日期2個不同事件的假設(所以它採取了MINMAX分別),並假定您保存在eventCategory場的情況下(假設您的活動「下載」和「購買」在會話級別中定義,我建議您使用customDimensions字段而不是hits.eventInfo.eventCategory一個)。

除了分析功能,您還可以用標準的SQL版本ARRAYs and STRUCTs工作:

SELECT 
    user, 
    ARRAY(SELECT AS STRUCT visitid, date FROM UNNEST(user_data) WHERE date BETWEEN min_date AND max_date) user_data 
FROM(
    SELECT 
    user, 
    ARRAY_AGG((SELECT AS STRUCT visitid, date)) user_data, 
    MIN(CASE WHEN EXISTS(SELECT 1 FROM UNNEST(hits) hits WHERE hits.eventInfo.eventCategory = 'event1') then date END) min_date, 
    MAX(CASE WHEN EXISTS(SELECT 1 FROM UNNEST(hits) hits WHERE hits.eventInfo.eventCategory = 'event2') THEN date END) max_date 
FROM data 
GROUP BY user 
) 
WHERE ARRAY_LENGTH(ARRAY(SELECT AS STRUCT visitid, date FROM UNNEST(user_data) WHERE date BETWEEN min_date AND max_date)) > 0 

如果我所做的假設是不與您的數據一致,你可以採用這些技術來查詢你想要什麼。您也可以將模擬數據用於測試目的(以及使其更適合您的數據集)。

+0

Thanks @Will This help! :) –

相關問題