2011-08-30 65 views
0

的數據是15分鐘的時間間隔相關的值:數據分組根據在SQL Server

 
Time    Value 
2010-01-01 00:15 3 
2010-01-01 00:30 2 
2010-01-01 00:45 4 
2010-01-01 01:00 5 
2010-01-01 01:15 1 
2010-01-01 01:30 3 
2010-01-01 01:45 4 
2010-01-01 02:00 12 
2010-01-01 02:15 13 
2010-01-01 02:30 12 
2010-01-01 02:45 14 
2010-01-01 03:00 15 
2010-01-01 03:15 3 
2010-01-01 03:30 2 
2010-01-01 03:45 3 
2010-01-01 04:00 5 
.......... 
.......... 
.......... 
2010-01-02 00:00 

通常,將存在96點。

根據這些值,我們可能注意到從00:15到01:45的值彼此接近,並且從02:00到03:00它們彼此接近,從03:15到04:00他們彼此接近。

基於 「相互靠近」 的規則,我想對數據進行 「分組」 3個部分:

  • 00:15至01:45
  • 02:00至03: 00
  • 3點15到04:00

請考慮該數據可以是隨機的,並且可以根據以上定義的規則被分成大於3份,但最大不應超過10部分。並且分組必須遵守時間順序,例如,您不能僅將00:15/02:30/04:45分組爲1組,因爲這3個點不是連續的。

請介紹一下如何在t-sql中實現它。

更新: 的值可以是:

 
Time    Value 
2010-01-01 00:15 3 
2010-01-01 00:30 2 
2010-01-01 00:45 4 
2010-01-01 01:00 5 
2010-01-01 01:15 1 
2010-01-01 01:30 3 
2010-01-01 01:45 4 
2010-01-01 02:00 12 
2010-01-01 02:15 13 
2010-01-01 02:30 4 --suddenly decreased 
2010-01-01 02:45 14 
2010-01-01 03:00 15 
2010-01-01 03:15 3 
2010-01-01 03:30 2 
2010-01-01 03:45 3 
2010-01-01 04:00 5 
.......... 
.......... 
.......... 
2010-01-02 00:00 

對這類情況,我們不應該組單獨02:30,因爲我們想組的大小必須至少爲3分,我們將把這一點(02:30)放到上一組(從02:00到03:00)。

+3

,如果你是有關「相互靠近」的定義更清晰這可能會有幫助。你認爲「接近」最大的數值差異是什麼? –

+2

也定義了「分組」。分組是指像子彈列表一樣的報告嗎?是否有最小數量的最小組數? – Paparazzi

+4

如果我有一個序列,例如1,2,3,4,5,6,7,8,9,那麼該怎麼辦?每一個都與前一個「接近」,但是9可能不會被認爲接近於1編程中最困難的部分通常是搞清楚你想要解決什麼問題。 –

回答

0

由於您的問題改變了這麼多,這裏是一個新的答案,新問題,我只包含代碼部分。

declare @t table(time datetime, value int) 
declare @variation float 
set @variation = 2 
set nocount on 

insert @t values('2010-01-01 00:15',3) 
insert @t values('2010-01-01 00:30',2) 
insert @t values('2010-01-01 00:45',4) 
insert @t values('2010-01-01 01:00',5) 
insert @t values('2010-01-01 01:15',1) 
insert @t values('2010-01-01 01:30',3) 
insert @t values('2010-01-01 01:45',4) 
insert @t values('2010-01-01 02:00',52) 
insert @t values('2010-01-01 02:15',5) 
insert @t values('2010-01-01 02:30',52) 
insert @t values('2010-01-01 02:45',54) 
insert @t values('2010-01-01 03:00',55) 
insert @t values('2010-01-01 03:15',3) 
insert @t values('2010-01-01 03:30',2) 
insert @t values('2010-01-01 03:45',3) 
insert @t values('2010-01-01 04:00',5) 


declare @result table(mintime datetime, maxtime datetime) 
a: 
delete @result 

;with t as 
(
select *, rn = row_number() over(order by time), log(value) lv from @t where datediff(day, time, '2010-01-01') = 0 
), a as 
(
select time, lv, rn, 0 grp from t where rn = 1 
union all 
select t1.time, a.lv, t1.rn, 
case when exists (select 1 from t t2 where t1.rn between rn + 1 and rn + 3 and 
lv between t1.lv - @variation and t1.lv [email protected]) then grp else grp + 1 end 
from t t1 join a on 
t1.rn = a.rn +1 
) 
insert @result 
select min(time), max(time) from a group by grp 

if @@rowcount > 10 
begin 
    set @[email protected] + .5 
    goto a 
end 

select * from @result 

結果:

mintime      maxtime 
2010-01-01 00:15:00.000  2010-01-01 01:45:00.000 
2010-01-01 02:00:00.000  2010-01-01 03:00:00.000 
2010-01-01 03:15:00.000  2010-01-01 04:00:00.000 
7

聲明並填充TESTDATA:

set nocount on 
declare @result table(mintime datetime, maxtime datetime) 
declare @t table(time datetime, value int) 

-- variation is how much difference will be allowed from one row to the next 
declare @variation int 
set @variation = 5  

insert @t values('2010-01-01 00:15',3) 
insert @t values('2010-01-01 00:30',2) 
insert @t values('2010-01-01 00:45',4) 
insert @t values('2010-01-01 01:00',5) 
insert @t values('2010-01-01 01:15',1) 
insert @t values('2010-01-01 01:30',3) 
insert @t values('2010-01-01 01:45',4) 
insert @t values('2010-01-01 02:00',12) 
insert @t values('2010-01-01 02:15',13) 
insert @t values('2010-01-01 02:30',12) 
insert @t values('2010-01-01 02:45',14) 
insert @t values('2010-01-01 03:00',15) 
insert @t values('2010-01-01 03:15',3) 
insert @t values('2010-01-01 03:30',2) 
insert @t values('2010-01-01 03:45',3) 
insert @t values('2010-01-01 04:00',5) 

代碼:

a: 

;with t as 
(-- add a rownumber 
select *, rn = row_number() over(order by time) from @t 
), a as 
(-- increase group if current row's value varies more than @variation from last row's value 
select time, value, rn, 0 grp from t where rn = 1 
union all 
select t.time, t.value, t.rn, case when t.value between 
     a.value - @variation and a.value [email protected] 
     then grp else grp+1 end 
from t join a on 
t.rn = a.rn +1 
) 
insert @result 
select min(time), max(time) from a group by grp 


if @@rowcount > 10 
begin 
    -- this will activate if more than 10 groups of numbers are found 
    -- start over with higher tolerance for variation 
    set @[email protected] + 1 
    delete @result 
    goto a 
end 

select convert(char(5), mintime,114) + ' to ' + convert(char(5), maxtime,114) 
from @result 

結果在這裏: http://data.stackexchange.com/stackoverflow/q/110891/declare-and-populate-testdata

+0

這絕對是我想要的!你絕對是冠軍!雖然我需要修改它來處理不同的「變化」,因爲實際數據會以不同的變化結束,例如,值可能類似於:0.005,0.004,0.006,0.003,0.007等,或者5222,3122,4522, 4221,5521,1100,998,4221等。 – unruledboy

+0

對不起,最後一件事,我更新了主帖,請參考ot it。主要想法是團體規模必須至少爲3,這意味着每個團體必須至少有3分。 – unruledboy

+0

哇,奇妙的結果!非常感謝你! – unruledboy