考慮下表(簡本):BIGQUERY選擇不重複的記錄
id int,
amount decimal,
transaction_no,
location_id int,
created_at datetime
上面的架構用於存儲POS收據餐廳。現在,此表有時包含同一日期的收據,相同transaction_no at same location_id。
在這種情況下,我想要做的就是最後收到的是爲了通過created_at遞減 LOCATION_ID & transaction_no。
在MySQL中,我使用下面的查詢這讓我最後一個(max(created_at)
收到一個LOCATION_ID & transaction_no:BigQuery中的
SELECT id, amount, transaction_no, location_id, created_at
FROM receipts r JOIN
(SELECT transaction_no, max(created_at) AS maxca
FROM receipts r
GROUP BY transaction_no
) t
ON r.transaction_no = t.transaction_no AND r.created_at = t.maxca
group by location_id;
但是當我運行相同的,我得到以下錯誤:
Query Failed Error: Shuffle reached broadcast limit for table __I0 (broadcasted at least 150393576 bytes). Consider using partitioned joins instead of broadcast joins . Job ID: circular-gist-812:job_A_CfsSKJICuRs07j7LHVbkqcpSg
任何想法如何使上述查詢在BigQuery中工作?
這給了我扁平化的輸出。有沒有一種方法可以保持嵌套結構?我已經在[bigquery web ui](https://www.evernote.com/l/ACRfvDG5ERpJPLBZ9ooZpOOIb_WKFvhEs6wB/image.png)中取消選中了Flatten結果選項 – CuriousMind