2013-03-09 115 views
4

我這有一列日期和值在另一列一長串一個數據幀,看起來像這樣:骨料日期和計算平均

set.seed(1234) 
df <- data.frame(date= as.Date(c('2010-09-05', '2011-09-06', '2010-09-13', 
           '2011-09-14', '2010-09-23', '2011-09-24', 
           '2010-10-05', '2011-10-06', '2010-10-13', 
           '2011-10-14', '2010-10-23', '2011-10-24')), 
       value= rnorm(12)) 

我需要在每個10計算出平均值天期間每個月的,但不論今年,像這樣:

dfNeeded <- data.frame(datePeriod=c('period.Sept0.10', 'period.Sept11.20', 'period.Sept21.30', 
            'period.Oct0.10', 'period.Oct11.20', 'period.Oct21.31'), 
         meanValue=c(mean(df$value[c(1,2)]), 
            mean(df$value[c(3,4)]), 
            mean(df$value[c(5,6)]), 
            mean(df$value[c(7,8)]), 
            mean(df$value[c(9,10)]), 
            mean(df$value[c(11,12)]))) 

是否有這樣做的一個快速的方法嗎?

回答

5

這裏是一個辦法做到這一點,它使用lubridate包月份和日期的提取,但您可以用基礎R日期函數做到這一點:

library(lubridate) 
df$period <- paste(month(df$date),cut(day(df$date),breaks=c(0,10,20,31)),sep="-") 
aggregate(df$value, list(period=df$period), mean) 

其中給出:

 period   x 
1 10-(0,10] -0.5606859 
2 10-(10,20] -0.7272449 
3 10-(20,31] -0.7377896 
4 9-(0,10] -0.4648183 
5 9-(10,20] -0.6306283 
6 9-(20,31] 0.4675903 
+0

(+1)也許是對'month(。更接近OP的「確切」答案。 'cut'就是這樣一個方便的功能! – Arun 2013-03-10 00:27:06

+0

輝煌,非常感謝。我試圖限制學習聚合函數到'plyr'函數,所以這是我去的代碼:'df $ period < - paste(month(df $ date,label = T),cut(day(df $ date ),break = c(0,10,20,31)),sep =「 - 」) library(plyr) ddply(df,。(period),summarize,meanValue = mean(value))' – luciano 2013-03-10 10:03:00

2

這種方法格式化日期和模運算應該是相當快的:

tapply(df$value, list(format(df$date, "%b"), as.POSIXlt(df$date)$mday %/% 10), mean) 
      0   1  2 
Oct -0.560686 -0.727245 -0.73779 
Sep -0.464818 -0.630628 0.46759 

我不確定它如何與彙總方法比較:

aggregate(df$value, list(format(df$date, "%b"), as.POSIXlt(df$date)$mday %/% 10), mean) 
    Group.1 Group.2   x 
1  Oct  0 -0.560686 
2  Sep  0 -0.464818 
3  Oct  1 -0.727245 
4  Sep  1 -0.630628 
5  Oct  2 -0.737790 
6  Sep  2 0.467590