1
我在AWS Redshift數據庫上使用dplyr
的數據庫後端。而且由於有些查詢需要永久返回,所以我想緩存它們。我知道底層數據不會改變,所以如果查詢沒有改變,那麼結果集也不會改變。從dplyr數據庫後端緩存結果
我已經採取了在其他地方實現這一目的的方法是
- 哈希查詢字符串
- 查詢的結果保存到一個文件
{hash}.rds
- 上的腳本的下一次運行,如果散列沒有改變,從磁盤讀取結果,否則重新運行查詢
我一直在嘗試與dplyr
相同的方法。不幸的dplyr生成SQL查詢字符串改變,即使操作保持不變:
df %>%
select(week, person_id) %>%
group_by(person_id) %>%
mutate(weeks_active = n()) %>%
arrange(weeks_active) %>%
dplyr::sql_render()
第二生成
<SQL> SELECT *
FROM (SELECT "week", "person_id", COUNT(*) OVER (PARTITION BY "person_id") AS "weeks_active"
FROM (SELECT "week" AS "week", "person_id" AS "person_id"
FROM "fct_person_week") "zznunjjdwe") "ltyyfmiahu"
ORDER BY "weeks_active"
在第一次運行
和
<SQL> SELECT *
FROM (SELECT "week", "person_id", COUNT(*) OVER (PARTITION BY "person_id") AS "weeks_active"
FROM (SELECT "week" AS "week", "person_id" AS "person_id"
FROM "fct_person_week") "stxupavckd") "oaknuxjexc"
ORDER BY "weeks_active"
。有沒有辦法保持表別名的固定?查詢的其他彙總是否會在多次運行中保持一致?或者我應該研究其他緩存方法?
你可以爲散列鍵設置某種種子嗎? – sconfluentus