2013-02-11 45 views
0

我想優化下面的代碼。自定義累積和與衰減因子

dim <- c(10000,100) 

m <- matrix(sample(0:10, prod(dim), replace = TRUE), nrow = dim[1], ncol = dim[2]) 

system.time({ 

    output <- matrix(0, nrow = dim[1], ncol = dim[2]) 

    for (i in 1:dim[1]){   
    output[i,1] <- m[i,1]  
    for (j in 2:dim[2]){   
     output[i,j] <- output[i, j-1] * 0.5 + m[i,j]   
    }   
    } 
}) 

概念上講,它是非常類似於一個簡單的累計總和:

system.time({ 

    output <- matrix(0, nrow = dim[1], ncol = dim[2]) 

    for (i in 1:dim[1]){  
     output[i,] <- cumsum(m[i,])   
    } 
}) 

的問題是,代碼的第一部分爲約100倍慢。有沒有辦法建立一個定製版本的cumsum()來實現這個功能?

+0

忽略矩陣的時刻,只集中在第i行,我將其命名爲矢量'rowi',我認爲'mtail < - rowi * 2 ^( - 1 * (0:length(rowi)))'會產生這樣的值,使得'outputi <-cumsum(mtail)'是你想要的。如果沒有人打敗我,我會盡力試驗一個測試案例。 – 2013-02-11 15:15:23

回答

1

您的情況與生成係數爲0.5的AR(1)模型完全相同。您可以使用filter函數來生成數據。 filter也支持更高階的遞歸,卷積或它們的混合(想想ARMA模型)。對於其他卷積,您可能會看到convolve。此外,你可以編譯你的代碼來加速循環。在我的代碼中,編譯循環和未編譯循環代碼分別比濾波器慢111和162倍。

library(compiler) 
library(rbenchmark) 

CustomCumsum<-function(x,alpha){ 
out<-x[1] 
for(i in 2:length(x)) 
    out[i] <- out[i-1]*alpha+x[i] 
out 
} 

compiledCustomCumsum<-cmpfun(CustomCumsum) 

FilterCustomCumsum<-function(x,alpha) as.numeric(filter(x,alpha, method = "recursive")) 

x<-rnorm(1000) 
# Test whether they are the same 
identical(compiledCustomCumsum(x,0.5) , FilterCustomCumsum(x,0.5)) 

benchmark(
CustomCumsum=CustomCumsum(x,0.5),compiledCustomCumsum=compiledCustomCumsum(x,0.5),   FilterCustomCumsum=FilterCustomCumsum(x,0.5) 
) 

輸出:

    test replications elapsed relative user.self sys.self user.child sys.child 
2 compiledCustomCumsum   100 8.89 111.125  8.78  0.01   NA  NA 
1   CustomCumsum   100 13.02 162.750  11.84  0.50   NA  NA 
3 FilterCustomCumsum   100 0.08 1.000  0.08  0.00   NA  NA 
1

C編寫我的自定義累計功能真的比一切快得多:

sign <- signature(x="numeric", n="integer", d="numeric") 

code <- " 
    for (int i=1; i < *n; i++) { 
    x[i] = x[i-1]*d[0] + x[i]; 
    }" 

c_fn <- cfunction(sign, 
        code, 
        convention=".C" 
) 

CCustomCumsum <- function(vector, decay){ 
    c_fn(x=vector, n=length(vector), d=decay)$x 
} 

運行朱利安的基準測試:

x<-rnorm(1000) 
# Test whether they are the same 
identical(compiledCustomCumsum(x,0.5) , FilterCustomCumsum(x,0.5)) 

benchmark(
    CustomCumsum=CustomCumsum(x,0.5), 
    compiledCustomCumsum=compiledCustomCumsum(x,0.5), 
    FilterCustomCumsum=FilterCustomCumsum(x,0.5), 
    CCustomCumsum = CCustomCumsum(x, 0.5) 
) 

, 一世 得到:

    test replications elapsed relative user.self sys.self user.child sys.child 
4  CCustomCumsum   100 0.002  1.0  0.002 0.000   0   0 
2 compiledCustomCumsum   100 0.631 315.5  0.536 0.095   0   0 
1   CustomCumsum   100 0.931 465.5  0.882 0.046   0   0 
3 FilterCustomCumsum   100 0.036  18.0  0.033 0.003   0   0