如何在SQL中有效地查找正在運行的多個記錄的最新更新？

考慮以下方案，如何在SQL中有效地查找正在運行的多個記錄的最新更新？

-- items which have periodic updates 
CREATE TABLE items (
    [id] int identity(1, 1) primary key, 
    [name] varchar(100) not null 
); 

-- item updates. updating an item generally means it has a new status, at a certain time. 
CREATE TABLE updates (
    [id] int identity(1, 1) primary key, 
    [item_id] int foreign key references items([id]), 
    [new_status] varchar(100) not null, 
    [update_date] datetime not null 
);

這是用來跟蹤項目的狀態，經過許多國家，隨着時間的推移。

我一直試圖找到一個高效的查詢，將回答以下問題：

對於許多物品，這可以在幾個州，在那裏我們登錄狀態更新中的一個，有多少項目是目前在每個國家每天結束時？

我有一個SQLFiddle here，它有一些示例數據，以及我目前在這個查詢中的嘗試。它在一些項目上運行良好，但我的數據庫有成千上萬，所以我的查詢目前大約需要5分鐘才能運行。

有沒有更高效的查詢來回答這個問題？

測試數據：

-- items which have periodic updates 
CREATE TABLE items (
    [id] int identity(1, 1) primary key, 
    [name] varchar(100) not null 
); 

-- item updates. updating an item generally means it has a new status, at a certain time. 
CREATE TABLE updates (
    [id] int identity(1, 1) primary key, 
    [item_id] int foreign key references items([id]), 
    [new_status] varchar(100) not null, 
    [update_date] datetime not null 
); 

-- lets just say that we just created 3 new items 
INSERT INTO items (name) 
    VALUES ('item1'), ('item2'), ('item3'); 

-- and they all start in the new state 
INSERT INTO updates (item_id, new_status, update_date) 
SELECT 
    [id], 
    [new_status] = 'new', 
    [update_date] = '2017-10-9 00:00:00.000' 
FROM items 

-- then we have them update over the course of a couple days 
-- item 1 
INSERT INTO updates (item_id, new_status, update_date) 
SELECT [id], [new_status] = 'in progress', [update_date] = '2017-10-10 00:00:00.000' 
FROM items WHERE [name] = 'item1' 
UNION 
SELECT [id], [new_status] = 'ready', [update_date] = '2017-10-12 00:00:00.000' 
FROM items WHERE [name] = 'item1' 
UNION 
SELECT [id], [new_status] = 'complete', [update_date] = '2017-10-14 00:00:00.000' 
FROM items WHERE [name] = 'item1'; 

-- item 2 
INSERT INTO updates (item_id, new_status, update_date) 
SELECT [id], [new_status] = 'in progress', [update_date] = '2017-10-10 00:00:00.000' 
FROM items WHERE [name] = 'item2' 
UNION 
SELECT [id], [new_status] = 'ready', [update_date] = '2017-10-11 00:00:00.000' 
FROM items WHERE [name] = 'item2' 
UNION 
SELECT [id], [new_status] = 'complete', [update_date] = '2017-10-12 00:00:00.000' 
FROM items WHERE [name] = 'item2'; 

-- item 3 
INSERT INTO updates (item_id, new_status, update_date) 
SELECT [id], [new_status] = 'in progress', [update_date] = '2017-10-11 00:00:00.000' 
FROM items WHERE [name] = 'item3' 
UNION 
SELECT [id], [new_status] = 'ready', [update_date] = '2017-10-13 00:00:00.000' 
FROM items WHERE [name] = 'item3' 
UNION 
SELECT [id], [new_status] = 'complete', [update_date] = '2017-10-15 00:00:00.000' 
FROM items WHERE [name] = 'item3';

當前查詢：

-- ======================= 
-- Running latest record 
-- ======================= 
-- Goal: For a period of time, with multiple items, which have multiple updates, 
--  find the number of items which are in each state at the end of a day. 
-- 
-- Issue: how can i improve this query for a large database? 
-- 

SELECT 
    dates.[update_date], 
    state = latest_update.[new_status], 
    volume = COUNT(*) 
FROM items i -- start with the items that we want to count per day 
CROSS JOIN (
    SELECT DISTINCT [update_date] FROM updates 
) dates -- the days to count for 
CROSS APPLY (
    -- this cross apply gets all updates for an item, that occurred on or before each date 
    SELECT 
    updates.*, 
    RN = ROW_NUMBER() OVER (PARTITION BY [item_id] ORDER BY [update_date] DESC) 
    FROM updates 
    WHERE [update_date] <= dates.[update_date] AND [item_id] = i.[id] 
) latest_update 
WHERE latest_update.RN = 1 -- only count the latest update 
GROUP BY dates.[update_date], latest_update.[new_status] 
ORDER BY dates.[update_date], latest_update.[new_status]

[結果]：

|   update_date |  state | volume | 
|----------------------|-------------|--------| 
| 2017-10-09T00:00:00Z |   new |  3 | 
| 2017-10-10T00:00:00Z | in progress |  2 | 
| 2017-10-10T00:00:00Z |   new |  1 | 
| 2017-10-11T00:00:00Z | in progress |  2 | 
| 2017-10-11T00:00:00Z |  ready |  1 | 
| 2017-10-12T00:00:00Z | complete |  1 | 
| 2017-10-12T00:00:00Z | in progress |  1 | 
| 2017-10-12T00:00:00Z |  ready |  1 | 
| 2017-10-13T00:00:00Z | complete |  1 | 
| 2017-10-13T00:00:00Z |  ready |  2 | 
| 2017-10-14T00:00:00Z | complete |  2 | 
| 2017-10-14T00:00:00Z |  ready |  1 | 
| 2017-10-15T00:00:00Z | complete |  3 |

來源

2017-10-10 mcnnowak

編輯您的提問並顯示你想要的結果。 –

加1樣本數據，前進請包括預期的結果作爲文本 – TheGameiswar

小提琴在查詢端有預期的輸出。問題更多的是如何有效地爲大量項目獲得正確的答案。 – mcnnowak

的GROUP BY CLA在下面的語句末尾使用根據它們的值將new_status列中的數據分組。數據庫然後向用戶呈現來自new_status列的「不同」值的列表。

select new_status,count(new_status) from updates group by new_status

換句話說，如果我們運行不計（NEW_STATUS）部分的查詢那麼這將是完全一樣的話說：

select distinct new_status from updates

因爲我們所要求的計數，數據庫能夠計算它分組在一起的每個不同值的迭代次數，並將其顯示在count（new_status）列中。由於它是數據庫不會給一個名稱，其對分組更新值的列，但你可以是這樣做的：

select new_status,count(new_status) as nmbr_items from updates group by new_status

來源

2017-10-10 00:14:08 russ

你可以給你的答案增加一些解釋嗎？ – kvorobiev

是的，我已經看到了你的要求，有一些解釋，我會盡快做到這一點。 – russ

一種方法是使用條件彙總：

select cast(update_date as date), status, count(*) 
from (select u.*, 
      row_number() over (partition by cast(update_date as date) order by update_date desc) as seqnum 
     from updates u 
    ) u 
where seqnum = 1 
group by cast(update_date as date) 
order by cast(update_date as date), count(*) desc;

來源

2017-10-10 00:39:26

您的查詢會計算每天的更新次數，而不是一天之前的最新更新次數。即如果連續4天有新消息，那麼您的解決方案會在第一天報告新消息，然後在接下來的3天內報告新消息。 – mcnnowak

如何在SQL中有效地查找正在運行的多個記錄的最新更新？

回答

相關問題