2016-11-19 56 views
1

我想查看按小時和天分解的歷史實際和預測風。PostgreSQL查詢:在截止日期前獲取最新預測,與實際比較

我對一天中的某個小時有多個預測。而且我的交易截止日期爲美國東部時間上午10點,因此我希望在此之前的最新預測與該小時的實際風險相同。

複雜的事情是,時間戳是格林威治標準時間,這是比美國東部時間早5小時。

WITH 
    forecast_prep AS (
     SELECT 
      date_trunc('day', (foretime - interval '5 hours')) :: DATE AS Foredate, 
      extract(HOUR FROM (foretime - interval '5 hours')) + 1  AS foreHE, 
      lat, 
      lon, 
      max(windspeed) as forecast, 
      max(as_of) - interval '5 hours'  AS as_of 
     FROM weather.forecast 
     WHERE date_trunc('day', foretime) :: DATE - as_of >= INTERVAL '9 hours' 
     GROUP BY Foredate, foreHE, lat, lon 
), 
    tmp AS (
    SELECT 
     meso.station, 
     meso.lat, 
     meso.lon, 
     (meso.timestmp - interval '5 hours') as timestmp, 
     date_trunc('day', (meso.timestmp - interval '5 hours')) :: DATE AS Date, 
     extract(HOUR FROM (meso.timestmp - interval '5 hours')) + 1  AS HE, 
     CAST(AVG(meso.windspd) AS NUMERIC(19, 2)) AS Actual 
    FROM weather.meso 
    GROUP BY station, lat, lon, timestmp, Date, HE 
) 
SELECT 
    tmp.station, tmp.Date, tmp.HE, tmp.Actual, forecast_prep.forecast, forecast_prep.as_of 
FROM tmp 
INNER JOIN forecast_prep ON (
    tmp.lat = forecast_prep.lat 
    AND tmp.lon = forecast_prep.lon 
    AND tmp.Date = forecast_prep.Foredate 
    AND tmp.HE = forecast_prep.foreHE 
) 
WHERE 
    (tmp.timestmp BETWEEN '2016-02-01' AND '2016-02-02') 
    AND (tmp.station = 'KSBN') 
GROUP BY 
    tmp.station, tmp.Date, tmp.HE, forecast_prep.forecast, forecast_prep.as_of, tmp.Actual 
ORDER BY tmp.Date, tmp.HE ASC; 

下面是具有相關示例數據的完整表結構。

CREATE SCHEMA weather 
CREATE TABLE weather.forecast 
    (
    foretime timestamp without time zone NOT NULL, 
    as_of timestamp without time zone NOT NULL, -- in UTC 
    summary text, 
    precipintensity numeric(8,4), 
    precipprob numeric(2,2), 
    temperature numeric(5,2), 
    apptemp numeric(5,2), 
    dewpoint numeric(5,2), 
    humidity numeric(2,2), 
    windspeed numeric(5,2), 
    windbearing numeric(4,1), 
    visibility numeric(5,2), 
    cloudcover numeric(4,2), 
    pressure numeric(6,2), 
    ozone numeric(5,2), 
    preciptype text, 
    lat numeric(8,6) NOT NULL, 
    lon numeric(9,6) NOT NULL, 
    CONSTRAINT forecast_pkey PRIMARY KEY (foretime, as_of, lat, lon) 
); 

INSERT INTO weather.forecast 
    (windspeed, foretime, as_of, lat, lon) 
VALUES 
    (11.19, '2/1/2016 8:00', '1/30/2016 23:00', 34.556, 28.345), 
    (10.98, '2/1/2016 8:00', '1/31/2016 5:00', 34.556, 28.345), 
    (10.64, '2/1/2016 8:00', '1/31/2016 11:00', 34.556, 28.345), 
    (10.95, '2/1/2016 8:00', '1/31/2016 17:00', 34.556, 28.345), 
    (10.39, '2/1/2016 8:00', '1/31/2016 23:00', 34.556, 28.345), 
    (9.22, '2/1/2016 8:00', '2/1/2016 5:00', 34.556, 28.345), 
    (10, '2/1/2016 9:00', '1/30/2016 11:00', 34.556, 28.345), 
    (9.88, '2/1/2016 9:00', '1/30/2016 17:00', 34.556, 28.345), 
    (10.79, '2/1/2016 9:00', '1/30/2016 23:00', 34.556, 28.345), 
    (10.8, '2/1/2016 9:00', '1/31/2016 5:00', 34.556, 28.345), 
    (10.35, '2/1/2016 9:00', '1/31/2016 11:00', 34.556, 28.345), 
    (10.07, '2/1/2016 9:00', '1/31/2016 17:00', 34.556, 28.345), 
    (9.57, '2/1/2016 9:00', '1/31/2016 23:00', 34.556, 28.345), 
    (7.93, '2/1/2016 9:00', '2/1/2016 5:00', 34.556, 28.345) 
; 

CREATE TABLE weather.meso 
(
    timestmp timestamp without time zone NOT NULL, 
    station text NOT NULL, 
    lat numeric NOT NULL, 
    lon numeric NOT NULL, 
    tmp numeric, 
    hum numeric, 
    windspd numeric, 
    winddir integer, 
    dew numeric, 
    CONSTRAINT meso_pkey PRIMARY KEY (timestmp, station, lat, lon) 
); 
INSERT INTO weather.meso 
    (station, timestmp, lat, lon, windspd) 
VALUES 
    ('KSBN', '2/1/2016 8:02', 34.556, 28.345, 16.1), 
    ('KSBN', '2/1/2016 8:12', 34.556, 28.345, 12.6), 
    ('KSBN', '2/1/2016 8:54', 34.556, 28.345, 11.5), 
    ('KSBN', '2/1/2016 9:02', 34.556, 28.345, 18.1), 
    ('KSBN', '2/1/2016 9:17', 34.556, 28.345, 12.2), 
    ('KSBN', '2/1/2016 9:48', 34.556, 28.345, 11.5) 
; 

這是我所希望輸出的格式:

station date  he actual forecast as_of 
KSBN  2/1/2016 4 10.4 15.1  1/31/2016 6:00 
KSBN  2/1/2016 5 12.7 11.32  1/31/2016 6:00 
+0

提供一些源數據 - 以可重用的格式 - 以及預期的結果 - 是獲取可行解決方案的最快方法。 –

+0

@Used_By_Already很抱歉提出一個新問題,但提供某些源數據的最佳方式是什麼? – otterdog2000

+1

** **最好是每個表和一組插入的DDL,或者就像一個簡單的文本表(像你的sql代碼一樣呈現)是好的,或者作爲附加的文本文件或電子表格。 9我不喜歡電子表格,因爲它們會產生混亂以進行清理。)請記住,我們不想處理大量表格 - 它只是一個需要的樣本。添加到你的問題,所以每個人都可以找到它。請參閱https://stackoverflow.com/help/mcve –

回答

0

的DDL和樣本數據確實有助於理解但是我能提出的是如何利用ROW_NUMBER,通過實例更詳細一點,這也可以在這裏在線http://rextester.com/FIEUPI83002

select 
    row_number() OVER(PARTITION BY date_trunc('day', (foretime - interval '5 hours')) :: DATE 
        ORDER BY case when extract(HOUR FROM (foretime - interval '5 hours')) < 10 then 1 else 2 end, AS_OF desc) AS rn 
, extract(HOUR FROM (foretime - interval '5 hours')) HR 
, foretime 
, as_of 
from forecast 
order by RN, as_of DESC 

這樣做的結果是,從現有的樣本數據如下:

+----+----+-----------+---------------------+---------------------+ 
| | rn | date_part |  foretime  |  as_of  | 
+----+----+-----------+---------------------+---------------------+ 
| 1 | 1 |   4 | 01.02.2016 09:00:00 | 01.02.2016 05:00:00 | 
| 2 | 2 |   3 | 01.02.2016 08:00:00 | 01.02.2016 05:00:00 | 
| 3 | 3 |   4 | 01.02.2016 09:00:00 | 31.01.2016 23:00:00 | 
| 4 | 4 |   3 | 01.02.2016 08:00:00 | 31.01.2016 23:00:00 | 
| 5 | 5 |   4 | 01.02.2016 09:00:00 | 31.01.2016 17:00:00 | 
| 6 | 6 |   3 | 01.02.2016 08:00:00 | 31.01.2016 17:00:00 | 
| 7 | 7 |   4 | 01.02.2016 09:00:00 | 31.01.2016 11:00:00 | 
| 8 | 8 |   3 | 01.02.2016 08:00:00 | 31.01.2016 11:00:00 | 
| 9 | 9 |   3 | 01.02.2016 08:00:00 | 31.01.2016 05:00:00 | 
| 10 | 10 |   4 | 01.02.2016 09:00:00 | 31.01.2016 05:00:00 | 
| 11 | 11 |   3 | 01.02.2016 08:00:00 | 30.01.2016 23:00:00 | 
| 12 | 12 |   4 | 01.02.2016 09:00:00 | 30.01.2016 23:00:00 | 
| 13 | 13 |   4 | 01.02.2016 09:00:00 | 30.01.2016 17:00:00 | 
| 14 | 14 |   4 | 01.02.2016 09:00:00 | 30.01.2016 11:00:00 | 
+----+----+-----------+---------------------+---------------------+ 

所以,如果你使用的過濾器WHERE RN = 1的「最近」行的每一天,應列出前10。我相信像這樣的東西將適合您的要求。請注意,它是同時使用case表達式和其他列的順序row_number序列(在OVER()子句內)調整列的組合以滿足您的需求。


下面

在沒有樣本數據,我會恰談辦法原文評論;我建議使用ROW_NUMBER()OVER(按date_time_column DESC排序)

select 
* 
from (
    select * 
    , ROW_NUMBER() OVER(ORDER BY timestmp DESC) AS RN 
    from forecast_table 
    -- where timestmp < 10 am (include required logic ere) 
) 
WHERE RN = 1 

在計算列RN的值爲1的行將以最最近行由於DESCendng順序。這可以與PARTITION BY結合使用,因此row_numebr方法對於查找「最近」行或「最早」行或者每個partion或整體最大/最小行是很有用的。

+0

嗨。我實際上需要按時間和小時列升序排列。對我而言,棘手的部分還在於我談論的最後期限的正確邏輯。我試圖用第11行的間隔代碼來解決它,減去一天並返回16小時的時間間隔。 – otterdog2000

+0

這不是我評論的重點...要找到「最近的一排」使用顯示的技術,在找到該行之後,您做了什麼取決於您。你真的想要一個全面合作的答案提供:**樣本數據**和B **預期的結果**(沒有更多的單詞或圖像) –

+1

抱歉剛剛意識到的順序應該是由提取日期,然後RN無法修復現在進一步 –