考慮以下方案,如何在SQL中有效地查找正在運行的多個記錄的最新更新?
-- items which have periodic updates
CREATE TABLE items (
[id] int identity(1, 1) primary key,
[name] varchar(100) not null
);
-- item updates. updating an item generally means it has a new status, at a certain time.
CREATE TABLE updates (
[id] int identity(1, 1) primary key,
[item_id] int foreign key references items([id]),
[new_status] varchar(100) not null,
[update_date] datetime not null
);
這是用來跟蹤項目的狀態,經過許多國家,隨着時間的推移。
我一直試圖找到一個高效的查詢,將回答以下問題:
對於許多物品,這可以在幾個州,在那裏我們登錄狀態更新中的一個,有多少項目是目前在每個國家每天結束時?
我有一個SQLFiddle here,它有一些示例數據,以及我目前在這個查詢中的嘗試。 它在一些項目上運行良好,但我的數據庫有成千上萬,所以我的查詢目前大約需要5分鐘才能運行。
有沒有更高效的查詢來回答這個問題?
測試數據:
-- items which have periodic updates
CREATE TABLE items (
[id] int identity(1, 1) primary key,
[name] varchar(100) not null
);
-- item updates. updating an item generally means it has a new status, at a certain time.
CREATE TABLE updates (
[id] int identity(1, 1) primary key,
[item_id] int foreign key references items([id]),
[new_status] varchar(100) not null,
[update_date] datetime not null
);
-- lets just say that we just created 3 new items
INSERT INTO items (name)
VALUES ('item1'), ('item2'), ('item3');
-- and they all start in the new state
INSERT INTO updates (item_id, new_status, update_date)
SELECT
[id],
[new_status] = 'new',
[update_date] = '2017-10-9 00:00:00.000'
FROM items
-- then we have them update over the course of a couple days
-- item 1
INSERT INTO updates (item_id, new_status, update_date)
SELECT [id], [new_status] = 'in progress', [update_date] = '2017-10-10 00:00:00.000'
FROM items WHERE [name] = 'item1'
UNION
SELECT [id], [new_status] = 'ready', [update_date] = '2017-10-12 00:00:00.000'
FROM items WHERE [name] = 'item1'
UNION
SELECT [id], [new_status] = 'complete', [update_date] = '2017-10-14 00:00:00.000'
FROM items WHERE [name] = 'item1';
-- item 2
INSERT INTO updates (item_id, new_status, update_date)
SELECT [id], [new_status] = 'in progress', [update_date] = '2017-10-10 00:00:00.000'
FROM items WHERE [name] = 'item2'
UNION
SELECT [id], [new_status] = 'ready', [update_date] = '2017-10-11 00:00:00.000'
FROM items WHERE [name] = 'item2'
UNION
SELECT [id], [new_status] = 'complete', [update_date] = '2017-10-12 00:00:00.000'
FROM items WHERE [name] = 'item2';
-- item 3
INSERT INTO updates (item_id, new_status, update_date)
SELECT [id], [new_status] = 'in progress', [update_date] = '2017-10-11 00:00:00.000'
FROM items WHERE [name] = 'item3'
UNION
SELECT [id], [new_status] = 'ready', [update_date] = '2017-10-13 00:00:00.000'
FROM items WHERE [name] = 'item3'
UNION
SELECT [id], [new_status] = 'complete', [update_date] = '2017-10-15 00:00:00.000'
FROM items WHERE [name] = 'item3';
當前查詢:
-- =======================
-- Running latest record
-- =======================
-- Goal: For a period of time, with multiple items, which have multiple updates,
-- find the number of items which are in each state at the end of a day.
--
-- Issue: how can i improve this query for a large database?
--
SELECT
dates.[update_date],
state = latest_update.[new_status],
volume = COUNT(*)
FROM items i -- start with the items that we want to count per day
CROSS JOIN (
SELECT DISTINCT [update_date] FROM updates
) dates -- the days to count for
CROSS APPLY (
-- this cross apply gets all updates for an item, that occurred on or before each date
SELECT
updates.*,
RN = ROW_NUMBER() OVER (PARTITION BY [item_id] ORDER BY [update_date] DESC)
FROM updates
WHERE [update_date] <= dates.[update_date] AND [item_id] = i.[id]
) latest_update
WHERE latest_update.RN = 1 -- only count the latest update
GROUP BY dates.[update_date], latest_update.[new_status]
ORDER BY dates.[update_date], latest_update.[new_status]
[結果]:
| update_date | state | volume |
|----------------------|-------------|--------|
| 2017-10-09T00:00:00Z | new | 3 |
| 2017-10-10T00:00:00Z | in progress | 2 |
| 2017-10-10T00:00:00Z | new | 1 |
| 2017-10-11T00:00:00Z | in progress | 2 |
| 2017-10-11T00:00:00Z | ready | 1 |
| 2017-10-12T00:00:00Z | complete | 1 |
| 2017-10-12T00:00:00Z | in progress | 1 |
| 2017-10-12T00:00:00Z | ready | 1 |
| 2017-10-13T00:00:00Z | complete | 1 |
| 2017-10-13T00:00:00Z | ready | 2 |
| 2017-10-14T00:00:00Z | complete | 2 |
| 2017-10-14T00:00:00Z | ready | 1 |
| 2017-10-15T00:00:00Z | complete | 3 |
編輯您的提問並顯示你想要的結果。 –
加1樣本數據,前進請包括預期的結果作爲文本 – TheGameiswar
小提琴在查詢端有預期的輸出。問題更多的是如何有效地爲大量項目獲得正確的答案。 – mcnnowak