2017-08-03 57 views
3

我想通過多個ID來計算單個數據框中數據的滾動平均值。看到我下面的示例數據集。通過索引對數據庫應用滾動平均值

date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04", 
      "2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08", 
      "2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02", 
      "2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06", 
      "2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10")) 
index <- c("a","a","a","a","a","a","a","a","a","a", 
      "b","b","b","b","b","b","b","b","b","b") 
x <- runif(20,1,100) 
y <- runif(20,50,150) 
z <- runif(20,100,200) 

df <- data.frame(date, index, x, y, z) 

我想通過a然後b來計算x,y和z的滾動平均值。

我嘗試了以下操作,但出現錯誤。

test <- tapply(df, df$index, FUN = rollmean(df, 5, fill=NA)) 

錯誤:

Error in xu[k:n] - xu[c(1, seq_len(n - k))] : 
    non-numeric argument to binary operator 

好像有一個事實,即指數是一個字符一個問題,但我需要它,以計算手段......

回答

2

這應該使用庫dplyrzoo這樣的伎倆:

library(dplyr) 
library(zoo) 

df %>% 
    group_by(index) %>% 
    mutate(x_mean = rollmean(x, 5, fill = NA), 
     y_mean = rollmean(y, 5, fill = NA), 
     z_mean = rollmean(z, 5, fill = NA)) 

你可以普羅巴使用mutate_each或某種其他形式的mutate可以更清楚地瞭解這一點。

你也可以改變內rollmean的參數,以滿足您的需求,如align = "right"na.pad = TRUE

3

1)AVE嘗試ave而不是tapply,並確保只有被應用過的興​​趣,即列列3,4,5,

roll <- function(x) rollmean(x, 5, fill = NA) 
cbind(df[1:2], lapply(df[3:5], function(x) ave(x, df$index, FUN = roll))) 

,並提供:

  date index  x   y  z 
1 2015-02-01  a  NA  NA  NA 
2 2015-02-02  a  NA  NA  NA 
3 2015-02-03  a 66.50522 127.45650 129.8472 
4 2015-02-04  a 61.71320 123.83633 129.7673 
5 2015-02-05  a 56.56125 120.86158 126.1371 
6 2015-02-06  a 66.13340 119.93428 127.1819 
7 2015-02-07  a 59.56807 105.83208 125.1244 
8 2015-02-08  a 49.98779 95.66024 139.2321 
9 2015-02-09  a  NA  NA  NA 
10 2015-02-10  a  NA  NA  NA 
11 2015-02-01  b  NA  NA  NA 
12 2015-02-02  b  NA  NA  NA 
13 2015-02-03  b 55.71327 117.52219 139.3961 
14 2015-02-04  b 54.58450 107.81763 142.6101 
15 2015-02-05  b 50.48102 104.94084 136.3167 
16 2015-02-06  b 37.89790 95.45489 135.4044 
17 2015-02-07  b 33.05259 85.90916 150.8673 
18 2015-02-08  b 49.91385 90.04940 147.1376 
19 2015-02-09  b  NA  NA  NA 
20 2015-02-10  b  NA  NA  NA 

2)通過另一種方法是使用byroll2處理一個組,by將它應用於產生by列表的每個組,並將do.call("rbind", ...)放回到一起。

roll2 <- function(x) cbind(x[1:2], rollmean(x[3:5], 5, fill = NA)) 
do.call("rbind", by(df, df$index, roll2)) 

,並提供:

  date index  x   y  z 
a.1 2015-02-01  a  NA  NA  NA 
a.2 2015-02-02  a  NA  NA  NA 
a.3 2015-02-03  a 66.50522 127.45650 129.8472 
a.4 2015-02-04  a 61.71320 123.83633 129.7673 
a.5 2015-02-05  a 56.56125 120.86158 126.1371 
a.6 2015-02-06  a 66.13340 119.93428 127.1819 
a.7 2015-02-07  a 59.56807 105.83208 125.1244 
a.8 2015-02-08  a 49.98779 95.66024 139.2321 
a.9 2015-02-09  a  NA  NA  NA 
a.10 2015-02-10  a  NA  NA  NA 
b.11 2015-02-01  b  NA  NA  NA 
b.12 2015-02-02  b  NA  NA  NA 
b.13 2015-02-03  b 55.71327 117.52219 139.3961 
b.14 2015-02-04  b 54.58450 107.81763 142.6101 
b.15 2015-02-05  b 50.48102 104.94084 136.3167 
b.16 2015-02-06  b 37.89790 95.45489 135.4044 
b.17 2015-02-07  b 33.05259 85.90916 150.8673 
b.18 2015-02-08  b 49.91385 90.04940 147.1376 
b.19 2015-02-09  b  NA  NA  NA 
b.20 2015-02-10  b  NA  NA  NA 

3)寬的形式另一種方法是從df長形式轉換到寬形式在這種情況下一個普通的rollmean將做到這一點。

rollmean(read.zoo(df, split = 2), 5, fill = NA) 

,並提供:

   x.a  y.a  z.a  x.b  y.b  z.b 
2015-02-01  NA  NA  NA  NA  NA  NA 
2015-02-02  NA  NA  NA  NA  NA  NA 
2015-02-03 66.50522 127.45650 129.8472 55.71327 117.52219 139.3961 
2015-02-04 61.71320 123.83633 129.7673 54.58450 107.81763 142.6101 
2015-02-05 56.56125 120.86158 126.1371 50.48102 104.94084 136.3167 
2015-02-06 66.13340 119.93428 127.1819 37.89790 95.45489 135.4044 
2015-02-07 59.56807 105.83208 125.1244 33.05259 85.90916 150.8673 
2015-02-08 49.98779 95.66024 139.2321 49.91385 90.04940 147.1376 
2015-02-09  NA  NA  NA  NA  NA  NA 
2015-02-10  NA  NA  NA  NA  NA  NA 

這工作,因爲日期是兩組相同。如果日期不同,那麼它可能會引入NAs,並且rollmean無法處理這些問題。在這種情況下使用

rollapply(read.zoo(df, split = 2), 5, mean, fill = NA) 

注:由於輸入使用隨機數在其定義,使其重現性,我們必須首先發出set.seed。我們用這個:

set.seed(123) 
date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04", 
      "2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08", 
      "2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02", 
      "2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06", 
      "2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10")) 
index <- c("a","a","a","a","a","a","a","a","a","a", 
      "b","b","b","b","b","b","b","b","b","b") 
x <- runif(20,1,100) 
y <- runif(20,50,150) 
z <- runif(20,100,200)