將t.test應用於大矩陣的每列的最快方法是什麼？

假設我有一個大的矩陣：將t.test應用於大矩陣的每列的最快方法是什麼？

M <- matrix(rnorm(1e7),nrow=20)

進一步假設每列表示的樣本。假設我想對每列應用t.test()，有沒有辦法做到這一點比使用apply()快得多？

apply(M, 2, t.test)

稍微花了不到2分鐘，在我的電腦上運行分析：

> system.time(invisible(apply(M, 2, t.test))) 
user system elapsed 
113.513 0.663 113.519

來源

2012-07-12 Alex

'apply'是非常靈活的功能，因此包含了很多你不需要的東西。用'for'循環手動編寫相同的邏輯可能會提高性能。 – ffriend 2012-07-12 21:49:50

如果你有一個多核機器有使用mclapply使用所有的核心，例如部分漲幅。

> library(multicore) 
> M <- matrix(rnorm(40),nrow=20) 
> x1 <- apply(M, 2, t.test) 
> x2 <- mclapply(1:dim(M)[2], function(i) t.test(M[,i])) 
> all.equal(x1, x2) 
[1] "Component 1: Component 9: 1 string mismatch" "Component 2: Component 9: 1 string mismatch" 
# str(x1) and str(x2) show that the difference is immaterial

這個迷你型的例子表明，事情在我們的計劃。現在擴展：

> M <- matrix(rnorm(1e7), nrow=20) 
> system.time(invisible(apply(M, 2, t.test))) 
    user system elapsed 
101.346 0.626 101.859 
> system.time(invisible(mclapply(1:dim(M)[2], function(i) t.test(M[,i])))) 
    user system elapsed 
55.049 2.527 43.668

這是使用8個虛擬內核。你的旅費可能會改變。這不是一個巨大的收益，但它來自很少的努力。

編輯

如果你只關心t統計本身，提取相應的字段（$statistic）使事情更快一點，特別是在多核情況：

> system.time(invisible(apply(M, 2, function(c) t.test(c)$statistic))) 
    user system elapsed 
80.920 0.437 82.109 
> system.time(invisible(mclapply(1:dim(M)[2], function(i) t.test(M[,i])$statistic))) 
    user system elapsed 
21.246 1.367 24.107

或者甚至更快，直接計算t值

my.t.test <- function(c){ 
    n <- sqrt(length(c)) 
    mean(c)*n/sd(c) 
}

然後

> system.time(invisible(apply(M, 2, function(c) my.t.test(c)))) 
    user system elapsed 
21.371 0.247 21.532 
> system.time(invisible(mclapply(1:dim(M)[2], function(i) my.t.test(M[,i])))) 
    user system elapsed 
144.161 8.658 6.313

來源

2012-07-12 21:31:57 Ryogi

我想我會直接計算t統計量，正如你所顯示的那樣，速度要快得多。 – Alex 2012-07-12 22:28:40

您可以使用genefilter軟件包（在Bioconductor上）的colttests函數做得比這更好。

> library(genefilter) 
> M <- matrix(rnorm(40),nrow=20) 
> my.t.test <- function(c){ 
+ n <- sqrt(length(c)) 
+ mean(c)*n/sd(c) 
+ } 
> x1 <- apply(M, 2, function(c) my.t.test(c)) 
> x2 <- colttests(M, gl(1, nrow(M)))[,"statistic"] 
> all.equal(x1, x2) 
[1] TRUE 
> M <- matrix(rnorm(1e7), nrow=20) 
> system.time(invisible(apply(M, 2, function(c) my.t.test(c)))) 
    user system elapsed 
27.386 0.004 27.445 
> system.time(invisible(colttests(M, gl(1, nrow(M)))[,"statistic"])) 
    user system elapsed 
    0.412 0.000 0.414

價：「計算數千R中同時測試統計的」，SCGN，第18卷（1），2007，http://stat-computing.org/newsletter/issues/scgn-18-1.pdf。

來源

2012-07-13 07:55:55

（+1）很高興知道，謝謝參考。 – chl 2012-07-13 09:48:50

很好的知道。謝謝！！ – Alex 2012-07-17 11:43:40

將t.test應用於大矩陣的每列的最快方法是什麼？

回答

相關問題