通過timestampz和groupid選擇最新記錄

我試圖根據導入時使用的TIMESTAMPZ返回商店的最新記錄。我對Postgres的9.5，這是我的查詢我從stackoverflowing一些線程這裏有：通過timestampz和groupid選擇最新記錄

select p.* 
from store_products p 
inner join(
    select storeid, sku, max(lastupdated) AS lastupdated 
    from store_products 
    group by storeid, sku 
)sp on p.storeid= sp.storeidand p.lastupdated = sp.lastupdated

這由每個店（和SKU），這是偉大的給我最新的產品（我們對30個商店），但我注意到查詢需要（約6M記錄）大約4分鐘收集數據。

因此，如果我們有這個作爲我的數據：

PID | StoreID | SKU | lastupdated 
1 | 1  | 1a1 | 2017-02-02 18:22:30 
2 | 1  | 1b1 | 2017-02-02 18:21:30 
3 | 1  | 1a1 | 2017-01-16 11:22:30 
4 | 2  | 1a1 | 2017-02-02 18:21:30 
5 | 2  | 1a1 | 2017-02-01 18:21:00 
6 | 3  | 1a1 | 2017-02-02 18:21:30 
7 | 3  | 1g1 | 2017-02-01 18:21:30

我得到這個：

PID | StoreID | SKU | lastupdated 
1 | 1  | 1a1 | 2017-02-02 18:22:30 
2 | 1  | 1b1 | 2017-02-02 18:21:30 
4 | 2  | 1a1 | 2017-02-02 18:21:30 
6 | 3  | 1a1 | 2017-02-02 18:21:30

是否有我們能夠導入這些店的快照，以便查詢更好的方法上面更容易消化Postgres - 更快？我們應該添加哪些索引？這裏的解釋：

Hash Join (cost=2358424.92..2715814.08 rows=311 width=371) 
    Hash Cond: ((lp.storeid = p.storeid) AND (lp.lastupdated = p.lastupdated)) 
    -> Subquery Scan on lp (cost=1676046.30..1737513.85 rows=62125 width=12) 
     -> GroupAggregate (cost=1676046.30..1736892.60 rows=62125 width=108) 
       Group Key: store_products.storeid, store_products.sku 
       -> Sort (cost=1676046.30..1691102.56 rows=6022505 width=108) 
        Sort Key: store_products.storeid, store_products.sku 
        -> Seq Scan on store_products (cost=0.00..297973.05 rows=6022505 width=108) 
    -> Hash (cost=297973.05..297973.05 rows=6022505 width=371) 
     -> Seq Scan on store_products p (cost=0.00..297973.05 rows=6022505 width=371)

我們的Postgres DBA是在假期，我們大多數人都不知道如何在這裏做什麼。

背景故事...

我們得到的店內產品每天轉儲從JSON多個商店。每個商店都由storeid確定，它們作爲一個包含所有商店及其產品的粗笨JSON文件導入。每個條目都有自己的lastupdated | TIMESTAMPZ字段。如果有人決定稍後進行更新（用於審計目的），則由觸發器支持自動更新該字段。每天，有大約2-3K價值的store_products被插入到這張表中，我們目前沒有對這些數據進行去重（所以價格可能已經改變了，它可能沒有，我們似乎還沒有關心，我們只是插）。我想我們很快就會剔除。

讓我給你一個基本的模式：

CREATE TABLE store_products 
(
    id BIGINT DEFAULT PRIMARY KEY NOT NULL, 
    storeid INTEGER, 
    ...etc etc... 
    lastupdated TIMESTAMP WITH TIME ZONE DEFAULT now() 
);

有一個FK的STOREID商店表等

來源

2017-02-10 Lisa Anna

「*這給了我每個商店的最新產品，...但我真的需要每個商店的所有最新產品*」 - 我不明白這句話。每個商店只能有一個「最新」產品。請** [編輯] **您的問題，並根據該數據添加一些示例數據和預期輸出。 [**格式化文本**]（http://stackoverflow.com/help/formatting），[無屏幕截圖]（http：//meta.stackoverflow。問題/ 285557 /爲什麼可能不上傳圖像的代碼 - 這樣當提問/ 285557＃285557）你檢查了很多答案[tag：greatest- n-per-group] –

會這樣做，謝謝@a_horse_with_no_name我會的，是的，這句話有點不對勁......我會用一些樣本來修正它，並且在通過最大的每個組查看之後，我想出了以上查詢。我認爲我的情況有點不同，因爲tiemstamp –

distinct on將使其更簡單：

select distinct on (storeid, sku) * 
from store_products 
order by storeid, sku, lastupdated desc

請注意，order by子句是強制性的，用於確定將返回哪一行。

如果沒有足夠的時間戳值，那麼在（storeid，sku，lastupdated）或者僅僅是（storeid，sku）上創建一個索引來值得額外大小的索引。

來源

2017-02-10 12:14:55

謝謝你，更乾淨的語法和索引真的幫助。 –

嘗試使用行號-over PARTITION BY子句和使用臨時表的下面一樣

select * 
from (
    select p.* 
    from store_products p 
    inner join (
     select 
      storeid, 
      max(lastupdated) AS lastupdated, 
      ROW_NUMBER() OVER (PARTITION BY storedid ORDER BY lastupdated DESC) AS RowNo 
     from store_products 
     group by storeid 
    ) sp on p.storeid= sp.storeidand p.lastupdated = sp.lastupdated 
) temp 
where 
order by temp.RowNo

來源

2017-02-10 11:09:26

的不幸分離感謝您的答覆！我更新了我的問題，以反映我在第一篇文章中錯過的SKU。 –

通過timestampz和groupid選擇最新記錄

回答

相關問題