2017-10-10 50 views
1

考慮以下方案,如何在SQL中有效地查找正在運行的多個記錄的最新更新?

-- items which have periodic updates 
CREATE TABLE items (
    [id] int identity(1, 1) primary key, 
    [name] varchar(100) not null 
); 

-- item updates. updating an item generally means it has a new status, at a certain time. 
CREATE TABLE updates (
    [id] int identity(1, 1) primary key, 
    [item_id] int foreign key references items([id]), 
    [new_status] varchar(100) not null, 
    [update_date] datetime not null 
); 

這是用來跟蹤項目的狀態,經過許多國家,隨着時間的推移。

我一直試圖找到一個高效的查詢,將回答以下問題:

對於許多物品,這可以在幾個州,在那裏我們登錄狀態更新中的一個,有多少項目是目前在每個國家每天結束時?

我有一個SQLFiddle here,它有一些示例數據,以及我目前在這個查詢中的嘗試。 它在一些項目上運行良好,但我的數據庫有成千上萬,所以我的查詢目前大約需要5分鐘才能運行。

有沒有更高效的查詢來回答這個問題?

測試數據:

-- items which have periodic updates 
CREATE TABLE items (
    [id] int identity(1, 1) primary key, 
    [name] varchar(100) not null 
); 

-- item updates. updating an item generally means it has a new status, at a certain time. 
CREATE TABLE updates (
    [id] int identity(1, 1) primary key, 
    [item_id] int foreign key references items([id]), 
    [new_status] varchar(100) not null, 
    [update_date] datetime not null 
); 

-- lets just say that we just created 3 new items 
INSERT INTO items (name) 
    VALUES ('item1'), ('item2'), ('item3'); 

-- and they all start in the new state 
INSERT INTO updates (item_id, new_status, update_date) 
SELECT 
    [id], 
    [new_status] = 'new', 
    [update_date] = '2017-10-9 00:00:00.000' 
FROM items 

-- then we have them update over the course of a couple days 
-- item 1 
INSERT INTO updates (item_id, new_status, update_date) 
SELECT [id], [new_status] = 'in progress', [update_date] = '2017-10-10 00:00:00.000' 
FROM items WHERE [name] = 'item1' 
UNION 
SELECT [id], [new_status] = 'ready', [update_date] = '2017-10-12 00:00:00.000' 
FROM items WHERE [name] = 'item1' 
UNION 
SELECT [id], [new_status] = 'complete', [update_date] = '2017-10-14 00:00:00.000' 
FROM items WHERE [name] = 'item1'; 

-- item 2 
INSERT INTO updates (item_id, new_status, update_date) 
SELECT [id], [new_status] = 'in progress', [update_date] = '2017-10-10 00:00:00.000' 
FROM items WHERE [name] = 'item2' 
UNION 
SELECT [id], [new_status] = 'ready', [update_date] = '2017-10-11 00:00:00.000' 
FROM items WHERE [name] = 'item2' 
UNION 
SELECT [id], [new_status] = 'complete', [update_date] = '2017-10-12 00:00:00.000' 
FROM items WHERE [name] = 'item2'; 

-- item 3 
INSERT INTO updates (item_id, new_status, update_date) 
SELECT [id], [new_status] = 'in progress', [update_date] = '2017-10-11 00:00:00.000' 
FROM items WHERE [name] = 'item3' 
UNION 
SELECT [id], [new_status] = 'ready', [update_date] = '2017-10-13 00:00:00.000' 
FROM items WHERE [name] = 'item3' 
UNION 
SELECT [id], [new_status] = 'complete', [update_date] = '2017-10-15 00:00:00.000' 
FROM items WHERE [name] = 'item3'; 

當前查詢:

-- ======================= 
-- Running latest record 
-- ======================= 
-- Goal: For a period of time, with multiple items, which have multiple updates, 
--  find the number of items which are in each state at the end of a day. 
-- 
-- Issue: how can i improve this query for a large database? 
-- 

SELECT 
    dates.[update_date], 
    state = latest_update.[new_status], 
    volume = COUNT(*) 
FROM items i -- start with the items that we want to count per day 
CROSS JOIN (
    SELECT DISTINCT [update_date] FROM updates 
) dates -- the days to count for 
CROSS APPLY (
    -- this cross apply gets all updates for an item, that occurred on or before each date 
    SELECT 
    updates.*, 
    RN = ROW_NUMBER() OVER (PARTITION BY [item_id] ORDER BY [update_date] DESC) 
    FROM updates 
    WHERE [update_date] <= dates.[update_date] AND [item_id] = i.[id] 
) latest_update 
WHERE latest_update.RN = 1 -- only count the latest update 
GROUP BY dates.[update_date], latest_update.[new_status] 
ORDER BY dates.[update_date], latest_update.[new_status] 

[結果]

|   update_date |  state | volume | 
|----------------------|-------------|--------| 
| 2017-10-09T00:00:00Z |   new |  3 | 
| 2017-10-10T00:00:00Z | in progress |  2 | 
| 2017-10-10T00:00:00Z |   new |  1 | 
| 2017-10-11T00:00:00Z | in progress |  2 | 
| 2017-10-11T00:00:00Z |  ready |  1 | 
| 2017-10-12T00:00:00Z | complete |  1 | 
| 2017-10-12T00:00:00Z | in progress |  1 | 
| 2017-10-12T00:00:00Z |  ready |  1 | 
| 2017-10-13T00:00:00Z | complete |  1 | 
| 2017-10-13T00:00:00Z |  ready |  2 | 
| 2017-10-14T00:00:00Z | complete |  2 | 
| 2017-10-14T00:00:00Z |  ready |  1 | 
| 2017-10-15T00:00:00Z | complete |  3 | 
+0

編輯您的提問並顯示你想要的結果。 –

+0

加1樣本數據,前進請包括預期的結果作爲文本 – TheGameiswar

+0

小提琴在查詢端有預期的輸出。問題更多的是如何有效地爲大量項目獲得正確的答案。 – mcnnowak

回答

0

的GROUP BY CLA在下面的語句末尾使用根據它們的值將new_status列中的數據分組。數據庫然後向用戶呈現來自new_status列的「不同」值的列表。

select new_status,count(new_status) from updates group by new_status 

換句話說,如果我們運行不計(NEW_STATUS)部分的查詢那麼這將是完全一樣的話說:

select distinct new_status from updates 

因爲我們所要求的計數,數據庫能夠計算它分組在一起的每個不同值的迭代次數,並將其顯示在count(new_status)列中。由於它是數據庫不會給一個名稱,其對分組更新值的列,但你可以是這樣做的:

select new_status,count(new_status) as nmbr_items from updates group by new_status 
+0

你可以給你的答案增加一些解釋嗎? – kvorobiev

+0

是的,我已經看到了你的要求,有一些解釋,我會盡快做到這一點。 – russ

3

一種方法是使用條件彙總:

select cast(update_date as date), status, count(*) 
from (select u.*, 
      row_number() over (partition by cast(update_date as date) order by update_date desc) as seqnum 
     from updates u 
    ) u 
where seqnum = 1 
group by cast(update_date as date) 
order by cast(update_date as date), count(*) desc; 
+0

您的查詢會計算每天的更新次數,而不是一天之前的最新更新次數。即如果連續4天有新消息,那麼您的解決方案會在第一天報告新消息,然後在接下來的3天內報告新消息。 – mcnnowak