2014-11-24 76 views
1

我正在計算單個價格向量的小時平均價格。我想比較每小時的平均值和每日平均值 - 並刪除所有超過每日平均值2倍的值。我在計算不同的值時沒有問題,但我不知道如何比較每小時值和每日值?R匹配並比較來自不同載體的值

快速數據例如:

df <- data.frame(dates = rep(seq(from = as.POSIXct("2013-01-01 00:00:00", tz = "UTC"), 
    to = as.POSIXct("2013-01-30 23:00:00", tz = "UTC"), by = "hour"), 12), 
    price = runif(8640, min = -25, max = 225)) 

require(dplyr) 

results <- group_by(df, dates) 
results <- summarise(results, 
          average = mean(price)) 

day_results <- mutate(df, days = format(df$dates, "%Y-%m-%d")) 
day_results <- group_by(day_results, days) 
day_results <- summarise(day_results, 
          average_d = mean(price)) 

我在怎樣的average的24個值與average_d的單日值進行比較很失落。

是否清楚我在做什麼?

回答

2

這是簡單:

> df %>% group_by(dates) %>% filter(price>2*mean(price)) 
Source: local data frame [811 x 2] 
Groups: dates 

       dates price 
1 2013-01-01 02:00:00 182.4726 
2 2013-01-01 07:00:00 155.5009 
3 2013-01-01 20:00:00 139.6948 
4 2013-01-01 22:00:00 132.3332 
5 2013-01-02 06:00:00 222.0633 
6 2013-01-03 01:00:00 217.6383 
7 2013-01-03 15:00:00 224.7268 
8 2013-01-03 18:00:00 215.8826 

即組數據按日期,則只能過濾那些價格比該組內連續兩次的平均值嗎?或者,如果你想保持輸出的平均價格太高,這樣做:

> df %>% group_by(dates) %>% mutate(average=mean(price)) %>% filter(price > 2*average) %>% arrange(dates) 
Source: local data frame [811 x 3] 
Groups: dates 

       dates price average 
1 2013-01-01 00:00:00 140.5748 70.12211 
2 2013-01-01 00:00:00 201.6484 70.12211 
3 2013-01-01 01:00:00 223.9240 89.91996 
4 2013-01-01 01:00:00 196.5975 89.91996 
5 2013-01-01 01:00:00 203.6165 89.91996 
6 2013-01-01 02:00:00 182.4726 70.85858 
7 2013-01-01 02:00:00 193.0930 70.85858 
8 2013-01-01 02:00:00 177.7848 70.85858 
9 2013-01-01 03:00:00 202.9842 92.84580 
10 2013-01-01 03:00:00 217.1840 92.84580 

也使用arrange訂購日期輸出。

+0

非常感謝!這真的很整潔,我認爲這將是一個多毛的應用功能! – NoThanks 2014-11-25 06:42:42

相關問題