2014-09-24 70 views
1

我要計算在綠色,琥珀色持續時間,紅色相對在每個業務週期爲一個交通燈(列sg.0在我的示例數據)例如計算每個週期從第一個綠色狀態到最後一個綠色狀態的所有時間長度,我怎麼能這樣做? Data.frame看起來如下:如何計算的持續時間中的R具有相同值的連續變量

time sg. 0 
1 2014-09-01 00:00:12.0 green 
2 2014-09-01 00:00:13.5 green 
3 2014-09-01 00:00:30.0 amber 
4 2014-09-01 00:00:30.0 amber 
5 2014-09-01 00:00:31.5 amber 
6 2014-09-01 00:00:32.0 amber 
7 2014-09-01 00:00:32.2 amber 
8 2014-09-01 00:00:33.5 amber 
9 2014-09-01 00:00:33.0 red 
10 2014-09-01 00:00:35.0 red 
11 2014-09-01 00:00:35.2 red 
12 2014-09-01 00:00:37.0 red 
13 2014-09-01 00:00:41.0 red 
14 2014-09-01 00:00:42.0 red 
15 2014-09-01 00:00:42.2 red 
16 2014-09-01 00:00:43.0 red 
17 2014-09-01 00:00:44.7 red 
18 2014-09-01 00:00:44.2 red 
19 2014-09-01 00:00:45.5 red 
20 2014-09-01 00:00:47.0 red 
21 2014-09-01 00:00:48.7 red 
22 2014-09-01 00:00:49.7 red 
23 2014-09-01 00:00:49.7 red 
24 2014-09-01 00:00:49.9 red 
25 2014-09-01 00:00:50.9 green 
26 2014-09-01 00:00:50.0 green 
27 2014-09-01 00:00:52.0 green 
28 2014-09-01 00:00:53.0 green 
29 2014-09-01 00:00:54.0 green 
30 2014-09-01 00:00:55.0 green 
31 2014-09-01 00:00:55.0 green 
32 2014-09-01 00:01:02.0 green 
33 2014-09-01 00:01:03.7 green 
34 2014-09-01 00:01:05.7 green 
35 2014-09-01 00:01:07.0 green 

原始數據:

structure(list(time = structure(c(1409518812, 1409518813.6, 1409518830, 
1409518830.1, 1409518831.6, 1409518832, 1409518832.2, 1409518833.6, 
1409518833, 1409518835, 1409518835.3, 1409518837, 1409518841, 
1409518842, 1409518842.3, 1409518843, 1409518844.8, 1409518844.2, 
1409518845.6, 1409518847, 1409518848.7, 1409518849.7, 1409518849.8, 
1409518849.9, 1409518850.9, 1409518850, 1409518852, 1409518853, 
1409518854, 1409518855, 1409518855.1, 1409518862, 1409518863.8, 
1409518865.8, 1409518867, 1409518868, 1409518870.7, 1409518870.3, 
1409518884, 1409518884.2, 1409518884.3, 1409518884.5, 1409518890, 
1409518942, 1409518942.1, 1409518943.7, 1409518943.3, 1409518944.9, 
1409518944, 1409518945, 1409518947, 1409518949.5, 1409518949.6, 
1409518953, 1409518954, 1409518957.8, 1409518957.2, 1409518961, 
1409518961.1, 1409518961.2, 1409518962.2, 1409518962.3, 1409518964, 
1409518965, 1409518966, 1409518967, 1409518967.1, 1409518974, 
1409518975.8, 1409518977.8, 1409518979, 1409518980, 1409519068, 
1409519068.1, 1409519068.7, 1409519070, 1409519071, 1409519073, 
1409519073.8, 1409519081, 1409519082, 1409519083.3, 1409519083.8, 
1409519084.7, 1409519086, 1409519087.6, 1409519089.2, 1409519089.3, 
1409519091, 1409519091.1, 1409519091.6, 1409519092, 1409519092.1, 
1409519093, 1409519094, 1409519094.5, 1409519095, 1409519095.1, 
1409519103, 1409519104), class = c("POSIXct", "POSIXt")), `sg. 0` = structure(c(2L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 
2L, 2L, 2L), .Label = c("amber", "green", "red"), class = "factor")), .Names = c("time", 
"sg. 0"), row.names = c(NA, 100L), class = "data.frame") 

回答

1

與MrFlick的方法類似,您可以使用rle首先爲每個顏色週期生成一個指標,然後使用它來計算持續時間。

# If you want to calculate the time within each colour 
r <- rle(as.numeric(dat$sg.0)) 
r$values <- seq_along(r$values) 
dat$id <- inverse.rle(r) 

(a <- aggregate(time ~ sg.0 + id, dat, function(i) diff(as.numeric(range(i))))) 
# sg.0 id time 
#1 green 1 1.6 
#2 amber 2 3.6 
#3 red 3 16.9 
# ... 

# Use a similar approach, if the cycle is for each green/amber/red 
r <- rle(as.numeric(dat$sg.0)) 
r$values <- rep(seq_along(r$values), each=3, length=length(r$values)) 
dat$cycle <- inverse.rle(r) 

(b <- aggregate(time ~ cycle, dat, function(i) diff(as.numeric(range(i))))) 
# cycle time 
#1  1 37.9 
#2  2 111.2 
#3  3 132.3 
#4  4 9.0 

編輯添加as.numeric聚合函數調用在幾秒鐘內始終報告

+0

謝謝你的體貼的答案,唯一的問題是綠色/黃色/紅色階段的完整序列,這個單位是不是獨特的,小於1分鐘的值的單位是秒,而超過一分鐘的值被轉換爲分鐘。 – chenchenmomo 2014-10-16 12:00:00

+0

所以我將時間變量轉換爲unix時間。 – chenchenmomo 2014-10-16 12:48:48

+0

謝謝,你說得很對 - 我錯過了。我做了一個小小的編輯。 – user20650 2014-10-16 13:35:01

2

你可能想先確定每種顏色週期唯一,那麼你就可以收集統計爲每個組。你可以找到這個循環與

cycle<-cumsum(c(FALSE, dd[-1,2] != dd[-nrow(dd),2])) 

(假設你的data.frame被命名爲dd)。然後,你可以從一開始就找時間與

tapply(dd[,1], interaction(dd[,2], cycle, drop=T), function(x) diff(range(x))) 

這給

green.0 amber.1 red.2 green.3 amber.4 red.5 green.6 amber.7 red.8 green.9 
    1.6  3.6 16.9 40.0  2.9 16.2 17.8  2.0 23.5  9.0 

或者,如果你的意思是一個週期的格力/黃色/紅色週期,你可以做

結束
cycle<-cumsum(c(dd[1,2]!="green", dd[-1,2] == "green" & dd[-nrow(dd),2] !="green")) 
tapply(dd[,1], cycle, function(x) as.double(diff(range(x)), units="mins")) 

這給

 0   1   2   3 
0.6316667 1.8533333 2.2050000 0.1500000