2

我跑的Postgres 9.2,並有溫度和時間標記,每分鐘一個時間戳毫秒時代時間的表:Postgres的:獲得最大值和最小值,和時間戳他們發生

weather=# \d weather_data 
     Table "public.weather_data" 
    Column |  Type  | Modifiers 
-------------+--------------+----------- 
timestamp | bigint  | not null 
sensor_id | integer  | not null 
temperature | numeric(4,1) | 
humidity | integer  | 
date  | date   | not null 
Indexes: 
    "weather_data_pkey" PRIMARY KEY, btree ("timestamp", sensor_id) 
    "weather_data_date_idx" btree (date) 
    "weather_data_humidity_idx" btree (humidity) 
    "weather_data_sensor_id_idx" btree (sensor_id) 
    "weather_data_temperature_idx" btree (temperature) 
    "weather_data_time_idx" btree ("timestamp") 
Foreign-key constraints: 
    "weather_data_sensor_id_fkey" FOREIGN KEY (sensor_id) REFERENCES weather_sensors(sensor_id) 

weather=# select * from weather_data order by timestamp desc; 
    timestamp | sensor_id | temperature | humidity | date  
---------------+-----------+-------------+----------+------------ 
1483272420000 |   2 |  22.3 |  57 | 2017-01-01 
1483272420000 |   1 |  24.9 |  53 | 2017-01-01 
1483272360000 |   2 |  22.3 |  57 | 2017-01-01 
1483272360000 |   1 |  24.9 |  58 | 2017-01-01 
1483272300000 |   2 |  22.4 |  57 | 2017-01-01 
1483272300000 |   1 |  24.9 |  57 | 2017-01-01 
[...] 

我有這個現有的查詢得到的高點和每一天的低點,但不是具體時間是高還是低發生:

WITH t AS (
    SELECT date, highest, lowest 
    FROM (
     SELECT date, max(temperature) AS highest 
     FROM weather_data 
     WHERE sensor_id = (SELECT sensor_id FROM weather_sensors WHERE sensor_name = 'outdoor') 
     GROUP BY date 
     ORDER BY date ASC 
    ) h 
    INNER JOIN (
     SELECT date, min(temperature) AS lowest 
     FROM weather_data 
     WHERE sensor_id = (SELECT sensor_id FROM weather_sensors WHERE sensor_name = 'outdoor') 
     GROUP BY date 
     ORDER BY date ASC 
    ) l 
    USING (date) 
    ORDER BY date DESC 
) 
SELECT * from t ORDER BY date ASC; 

有一點超過兩個百萬行的數據庫,它需要〜1.2秒運行,這不是 太糟糕了。我想現在得到的具體時間,高或低的是,我想出了這個利用窗口函數,這確實工作,但需要〜5.6秒時:

SELECT h.date, high_time, high_temp, low_time, low_temp FROM (
    SELECT date, high_temp, high_time FROM (
     SELECT date, temperature AS high_temp, timestamp AS high_time, row_number() 
     OVER (PARTITION BY date ORDER BY temperature DESC, timestamp DESC) 
     FROM weather_data 
     WHERE sensor_id = (SELECT sensor_id FROM weather_sensors WHERE sensor_name = 'outdoor') 
    ) highs 
    WHERE row_number = 1 
) h 
INNER JOIN (
    SELECT * FROM (
     SELECT date, temperature AS low_temp, timestamp AS low_time, row_number() 
     OVER (PARTITION BY date ORDER BY temperature ASC, timestamp DESC) 
     FROM weather_data 
     WHERE sensor_id = (SELECT sensor_id FROM weather_sensors WHERE sensor_name = 'outdoor') 
    ) lows 
    WHERE row_number = 1 
) l 
ON h.date = l.date 
ORDER BY h.date ASC; 

有一些相對簡單的除我可以做的第一個查詢不會增加大量的執行時間?我假設有,但我認爲我處於這個問題太久的地步了!

+1

[的PostgreSQL可能的複製 - 獲取行具有最大值的列](http://stackoverflow.com/questions/586781/postgresql-fetch-the-row-which-has-the-max-value- for-a-column) – Joe

+1

不相關,但是:第一個查詢中派生表中的「order by」無用 –

+0

@a_horse_with_no_name注意,謝謝! – VirtualWolf

回答

2
SELECT 
     DISTINCT ON (zdate) zdate 
     , first_value(ztimestamp) OVER www AS stamp_at_min 
     , first_value(temperature) OVER www AS tmin 
     , last_value(ztimestamp) OVER www AS stamp_at_max 
     , last_value(temperature) OVER www AS tmax 
FROM weather_data 
WHERE sensor_id = 2 
WINDOW www AS (PARTITION BY zdate ORDER BY temperature, ztimestamp 
       ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
       ) 
     ; 

  • 前綴ž日期和ž時間戳
  • 我加ztimestamp到排序作爲決勝
+0

很好用,謝謝!是否有任何額外的索引相關的技巧可以做到加快速度(需要大約3.7秒的時間來運行),還是在沒有太多可以針對這類事情進行優化的地方? – VirtualWolf

+0

您的表基本上有兩個候選鍵:您的PK和可能{zdate,sensor_id,溫度,...},這不完全是唯一的。無論如何,我認爲你應該擺脫單列索引。 zdate *可以*在功能上依賴於ztimestamp(其中*可以是時間戳而不是int) – wildplasser

+0

獲取單列索引的_rid_嗎?有趣。我有一些其他(更簡單)不相關的查詢,我在這張表上運行,我猜測最終會變得很慢而沒有索引,不是嗎? – VirtualWolf

2

這確實與您的第二個查詢,但只需要在weather_data表中的單個掃描:

select date, 
     max(case when high_rn = 1 then timestamp end) as high_time, 
     max(case when high_rn = 1 then temperature end) as high_temp, 
     max(case when low_rn = 1 then timestamp end) as low_time, 
     max(case when low_rn = 1 then temperature end) as low_temp 
from (
    select timestamp, temperature, date, 
     row_number() OVER (PARTITION BY date ORDER BY temperature DESC, timestamp DESC) as high_rn, 
     row_number() OVER (PARTITION BY date ORDER BY temperature ASC, timestamp DESC) as low_rn 
    from weather_data 
    where sensor_id = ... 
) t 
where (high_rn = 1 or low_rn = 1) 
group by date; 

它使用條件聚集做一個交叉表(又名「轉動」)的結果查詢只包含最低和最高溫度。


無關,而是:datetimestamp是列名可怕。首先是因爲它們是關鍵字,但更重要的是因爲它們沒有記錄列的實際含義。它是「到期日期」嗎? 「閱讀日期」? 「處理日期」?

+0

謝謝!這個運行需要5.2秒,而上面的則需要3.7秒。列名是讀取特定溫度讀數的所有時間和日期,所以我想讀取日期和讀取時間。這是一個個人項目,只是我的工作(只需保持我家內外的當前溫度)。 :) – VirtualWolf

+0

呵呵,我只記得我需要加一個'溫度!= 21.8',因爲溫度傳感器偶爾會變得奇怪,並且發送21.8的值給我的應用程序。爲窗口函數添加一個子查詢後,運行到@ wildplasser的查詢,並向您的用戶添加簡單的「where temperature!= 21.8」,它們現在都在彼此的大約100ms內! – VirtualWolf

相關問題