2017-03-02 31 views
1

我在AWS Redshift數據庫上使用dplyr的數據庫後端。而且由於有些查詢需要永久返回,所以我想緩存它們。我知道底層數據不會改變,所以如果查詢沒有改變,那麼結果集也不會改變。從dplyr數據庫後端緩存結果

我已經採取了在其他地方實現這一目的的方法是

  • 哈希查詢字符串
  • 查詢的結果保存到一個文件{hash}.rds
  • 上的腳本的下一次運行,如果散列沒有改變,從磁盤讀取結果,否則重新運行查詢

我一直在嘗試與dplyr相同的方法。不幸的dplyr生成SQL查詢字符串改變,即使操作保持不變:

df %>% 
    select(week, person_id) %>% 
    group_by(person_id) %>% 
    mutate(weeks_active = n()) %>% 
    arrange(weeks_active) %>% 
    dplyr::sql_render() 

第二生成

<SQL> SELECT * 
FROM (SELECT "week", "person_id", COUNT(*) OVER (PARTITION BY "person_id") AS "weeks_active" 
FROM (SELECT "week" AS "week", "person_id" AS "person_id" 
FROM "fct_person_week") "zznunjjdwe") "ltyyfmiahu" 
ORDER BY "weeks_active" 
在第一次運行

<SQL> SELECT * 
FROM (SELECT "week", "person_id", COUNT(*) OVER (PARTITION BY "person_id") AS "weeks_active" 
FROM (SELECT "week" AS "week", "person_id" AS "person_id" 
FROM "fct_person_week") "stxupavckd") "oaknuxjexc" 
ORDER BY "weeks_active" 

。有沒有辦法保持表別名的固定?查詢的其他彙總是否會在多次運行中保持一致?或者我應該研究其他緩存方法?

+0

你可以爲散列鍵設置某種種子嗎? – sconfluentus

回答

0

您可能可以使用compute()來創建一個臨時表。另一個選擇是獲取生成的SQL並將其轉換爲View,因此R開發人員只需將其稱爲表名即可。