特定值的累積平均值

只有當值大於0時，我纔想計算累積平均值。如果我有一個矢量：特定值的累積平均值

v <- c(1, 3, 0, 3, 2, 0)

平均將是9/6 = 1.5，但是我只希望當數值爲> 0，所以在這種情況下，這將是9/4到取平均值= 2.25 。但是這個平均值超過了整個集合。隨着數據集的建立和積累，我希望做到這一點。所以，最初它會是：

1+3/2, 1+3+0/2, 1+3+0+3/3, 1+3+0+3+2/4, 1+3+0+3+2+0/4

我的數據集是9,000行，它的增長。我可以讓cumsum工作並計算累計總和，但不計算「成功」的累計平均值。

來源

2017-10-07 Kerry

dplyr包具有cummean功能。如果你只是想爲> 0，爲v>0 V中選擇值：

v <- c(1, 3, 0, 3, 2, 0) 

dplyr::cummean(v[v>0]) 
#> [1] 1.000000 2.000000 2.333333 2.250000

如果你想重複的結果，你可以用食指和從動物園一個輔助功能發揮。

# Create a vector container for the result (here with NA values) 
v_res <- v[NA] 
# Fill cumsum where you want to calculate it (here v>0) 
v_res[v>0] <- dplyr::cummean(v[v>0]) 
# Fill the gap with previous value 
zoo::na.locf(v_res) 
#> [1] 1.000000 2.000000 2.000000 2.333333 2.250000 2.250000

它與負值的作品以V太

v <- c(1, 3, 0, 3, -5, 2, 0, -6) 
v_res <- v[NA] 
v_res[v>0] <- dplyr::cummean(v[v>0]) 
zoo::na.locf(v_res) 
#> [1] 1.000000 2.000000 2.000000 2.333333 2.333333 2.250000 2.250000 2.250000

你可以使用tidyverse了。如果您的數據位於data.frame中，則此解決方案可能非常有用。

library(dplyr, warn.conflicts = F) 
library(tidyr) 

data <- data_frame(v = c(1, 3, 0, 3, 2, 0)) %>% 
    tibble::rowid_to_column() 
res <- data %>% 
    filter(v > 0) %>% 
    mutate(cummean = cummean(v)) %>% 
    right_join(data, by = c("rowid", "v")) %>% 
    fill(cummean) 
res 
#> # A tibble: 6 x 3 
#> rowid  v cummean 
#> <int> <dbl> <dbl> 
#> 1  1  1 1.000000 
#> 2  2  3 2.000000 
#> 3  3  0 2.000000 
#> 4  4  3 2.333333 
#> 5  5  2 2.250000 
#> 6  6  0 2.250000 
pull(res, cummean)[-1] 
#> [1] 2.000000 2.000000 2.333333 2.250000 2.250000

來源

2017-10-07 06:25:56 cderv

OK我看到，但它是不是平均本身。 '1 + 3 + 0/2'是三個值的總和，所以它應該是三個數值。我會更新答案以符合預期的結果 – cderv

可以通過除以v的累積和與邏輯矢量v > 0的累積和解決這個問題：

v1 <- cumsum(v)/cumsum(v>0)

其給出：

> v1 
[1] 1.000000 2.000000 2.000000 2.333333 2.250000 2.250000

當你想省略的第一個值：

v2 <- (cumsum(v)/cumsum(v>0))[-1]

其給出：

> v2 
[1] 2.000000 2.000000 2.333333 2.250000 2.250000

後者是等於期望的結果如問題指定：

> ref <- c((1+3)/2, (1+3+0)/2, (1+3+0+3)/3, (1+3+0+3+2)/4, (1+3+0+3+2+0)/4) 
> identical(v2, ref) 
[1] TRUE

數據集中的實現：這給

# create an example dataset 
df <- data.frame(rn = letters[seq_along(v)], v) 

# calculate the 'succes-cummulative-mean' 
library(dplyr) 
df %>% 
    mutate(succes_cum_mean = cumsum(v)/cumsum(v>0))

：

rn v succes_cum_mean 
1 a 1  1.000000 
2 b 3  2.000000 
3 c 0  2.000000 
4 d 3  2.333333 
5 e 2  2.250000 
6 f 0  2.250000

來源

2017-10-07 07:06:12 Jaap

特定值的累積平均值

回答

相關問題