2017-11-10 329 views
0

tl; dr:我想在Redshift中生成一個日期表,以便更容易地生成報告。不需要大型表已經在redshift,需要上傳一個csv文件。如何在Redshift中創建日期表?

長版本: 我正在編寫一份報告,我必須平均每週創建新項目。日期範圍可能會持續數月或更長時間,所以可能會有5個星期一,但只有4個星期日,這可能會使數學有點棘手。另外,我無法保證每天有單個項目的實例,特別是一旦用戶開始分割數據。其中,這正在絆倒BI工具。

解決此問題的最佳方法很可能是日期表。但是,日期表的大多數教程都使用了Redshift無法提供或不完全支持的SQL命令(我在看着你,generate_series)。

有沒有一種簡單的方法在Redshift中生成日期表?

我嘗試使用的代碼(在此基礎上也 - 不工作的建議:http://elliot.land/post/building-a-date-dimension-table-in-redshift

CREATE TABLE facts.dates (
    "date_id"    INTEGER      NOT NULL PRIMARY KEY, 

    -- DATE 
    "full_date"   DATE      NOT NULL, 

    -- YEAR 
    "year_number"   SMALLINT     NOT NULL, 
    "year_week_number"  SMALLINT     NOT NULL, 
    "year_day_number"  SMALLINT     NOT NULL, 

    -- QUARTER 
    "qtr_number"   SMALLINT     NOT NULL, 

    -- MONTH 
    "month_number"   SMALLINT     NOT NULL, 
    "month_name"   CHAR(9)      NOT NULL, 
    "month_day_number"  SMALLINT     NOT NULL, 

    -- WEEK 
    "week_day_number"  SMALLINT     NOT NULL, 

    -- DAY 
    "day_name"    CHAR(9)      NOT NULL, 
    "day_is_weekday"  SMALLINT     NOT NULL, 
    "day_is_last_of_month" SMALLINT     NOT NULL 
) DISTSTYLE ALL SORTKEY (date_id) 
; 


INSERT INTO facts.dates 
(
    "date_id" 
    ,"full_date" 
    ,"year_number" 
    ,"year_week_number" 
    ,"year_day_number" 

    -- QUARTER 
    ,"qtr_number" 

    -- MONTH 
    ,"month_number" 
    ,"month_name" 
    ,"month_day_number" 

    -- WEEK 
    ,"week_day_number" 

    -- DAY 
    ,"day_name" 
    ,"day_is_weekday" 
    ,"day_is_last_of_month" 
) 
    SELECT 
    cast(seq + 1 AS INTEGER)          AS date_id, 

    -- DATE 
    datum               AS full_date, 

    -- YEAR 
    cast(extract(YEAR FROM datum) AS SMALLINT)     AS year_number, 
    cast(extract(WEEK FROM datum) AS SMALLINT)     AS year_week_number, 
    cast(extract(DOY FROM datum) AS SMALLINT)      AS year_day_number, 

    -- QUARTER 
    cast(to_char(datum, 'Q') AS SMALLINT)       AS qtr_number, 

    -- MONTH 
    cast(extract(MONTH FROM datum) AS SMALLINT)     AS month_number, 
    to_char(datum, 'Month')          AS month_name, 
    cast(extract(DAY FROM datum) AS SMALLINT)      AS month_day_number, 

    -- WEEK 
    cast(to_char(datum, 'D') AS SMALLINT)       AS week_day_number, 

    -- DAY 
    to_char(datum, 'Day')           AS day_name, 
    CASE WHEN to_char(datum, 'D') IN ('1', '7') 
     THEN 0 
    ELSE 1 END             AS day_is_weekday, 
    CASE WHEN 
     extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER + 
         INTERVAL '1' MONTH) :: DATE - 
         INTERVAL '1' DAY) = extract(DAY FROM datum) 
     THEN 1 
    ELSE 0 END             AS day_is_last_of_month 
    FROM 
    -- Generate days for 81 years starting from 2000. 
    (
     SELECT 
     '2000-01-01' :: DATE + generate_series AS datum, 
     generate_series      AS seq 
     FROM generate_series(0,81 * 365 + 20,1) 
    ) DQ 
    ORDER BY 1; 

會拋出這個錯誤

[Amazon](500310) Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.; 
1 statement failed. 

......因爲,我假設INSERT和generate_series不允許在Redshift中的同一命令中

+0

正如你已經發現,'generate_series()'不能與實際的數據,因爲它僅執行領導節點上使用。你的方法生成一個數字表,然後加入它的效果很好。或者,在Excel中創建源文件並僅導入結果。像這樣的日期表非常適合報告。您可能想要添加的其他內容:公共假期標誌,季度標誌的最後一天,年份標誌的最後一天(適用於按期間最後一個日期分組的報告)。 –

+0

我喜歡那些額外的列。謝謝約翰! – Phillip

回答

1

作爲一個工作karound,您可以在本地計算機上運行Postgres實例,在那裏運行代碼,導出爲CSV,然後僅在Redshift中運行CREATE TABLE部分並從CSV加載數據。由於這是一次性操作,因此可以這麼做,這就是我實際爲新的Redshift部署所做的事情。

+0

非常好的主意,但我想出了一個方法來做到這一點,而無需上傳csv。不幸的是,會採取一些複製粘貼魔法。如果您有任何改進,我在下面發佈我的解決方案。 – Phillip

0

在問這個問題時,我明白了。哎呀。

我從一個「事實」模式開始。

CREATE SCHEMA facts; 

運行下面開始數表:

create table facts.numbers 
(
    number int PRIMARY KEY 
) 
; 

使用此生成你的電話號碼清單。我用一百萬上手

SELECT ',(' || generate_series(0,1000000,1) || ')' 
; 

然後數從結果在下面的查詢複製粘貼,值之後:

INSERT INTO facts.numbers 
VALUES 
(0) 
,(1) 
,(2) 
,(3) 
,(4) 
,(5) 
,(6) 
,(7) 
,(8) 
,(9) 
-- etc 

^確保從禁止複製刪除前導逗號數字

的粘貼名單一旦你有一個數字表,那麼你就可以生成一個日期表(再次,從艾略特土地http://elliot.land/post/building-a-date-dimension-table-in-redshift偷碼):

CREATE TABLE facts.dates (
    "date_id"    INTEGER      NOT NULL PRIMARY KEY, 

    -- DATE 
    "full_date"   DATE      NOT NULL, 

    -- YEAR 
    "year_number"   SMALLINT     NOT NULL, 
    "year_week_number"  SMALLINT     NOT NULL, 
    "year_day_number"  SMALLINT     NOT NULL, 

    -- QUARTER 
    "qtr_number"   SMALLINT     NOT NULL, 

    -- MONTH 
    "month_number"   SMALLINT     NOT NULL, 
    "month_name"   CHAR(9)      NOT NULL, 
    "month_day_number"  SMALLINT     NOT NULL, 

    -- WEEK 
    "week_day_number"  SMALLINT     NOT NULL, 

    -- DAY 
    "day_name"    CHAR(9)      NOT NULL, 
    "day_is_weekday"  SMALLINT     NOT NULL, 
    "day_is_last_of_month" SMALLINT     NOT NULL 
) DISTSTYLE ALL SORTKEY (date_id) 
; 


INSERT INTO facts.dates 
(
    "date_id" 
    ,"full_date" 
    ,"year_number" 
    ,"year_week_number" 
    ,"year_day_number" 

    -- QUARTER 
    ,"qtr_number" 

    -- MONTH 
    ,"month_number" 
    ,"month_name" 
    ,"month_day_number" 

    -- WEEK 
    ,"week_day_number" 

    -- DAY 
    ,"day_name" 
    ,"day_is_weekday" 
    ,"day_is_last_of_month" 
) 
    SELECT 
    cast(seq + 1 AS INTEGER)          AS date_id, 

    -- DATE 
    datum               AS full_date, 

    -- YEAR 
    cast(extract(YEAR FROM datum) AS SMALLINT)     AS year_number, 
    cast(extract(WEEK FROM datum) AS SMALLINT)     AS year_week_number, 
    cast(extract(DOY FROM datum) AS SMALLINT)      AS year_day_number, 

    -- QUARTER 
    cast(to_char(datum, 'Q') AS SMALLINT)       AS qtr_number, 

    -- MONTH 
    cast(extract(MONTH FROM datum) AS SMALLINT)     AS month_number, 
    to_char(datum, 'Month')          AS month_name, 
    cast(extract(DAY FROM datum) AS SMALLINT)      AS month_day_number, 

    -- WEEK 
    cast(to_char(datum, 'D') AS SMALLINT)       AS week_day_number, 

    -- DAY 
    to_char(datum, 'Day')           AS day_name, 
    CASE WHEN to_char(datum, 'D') IN ('1', '7') 
     THEN 0 
    ELSE 1 END             AS day_is_weekday, 
    CASE WHEN 
     extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER + 
         INTERVAL '1' MONTH) :: DATE - 
         INTERVAL '1' DAY) = extract(DAY FROM datum) 
     THEN 1 
    ELSE 0 END             AS day_is_last_of_month 
    FROM 
    -- Generate days for 81 years starting from 2000. 
    (
     SELECT 
     '2000-01-01' :: DATE + number AS datum, 
     number      AS seq 
     FROM facts.numbers 
     WHERE number between 0 and 81 * 365 + 20 
    ) DQ 
    ORDER BY 1; 

^務必在結束日期範圍設置號碼,你需要

相關問題