我有一個大名單(〜30GB)和功能如下:parLapply從內部功能將數據複製到節點意外
cl <- makeCluster(24, outfile = "")
Foo1 <- function(cl, largeList) {
return(parLapply(cl, largeList, Bar))
}
Bar1 <- function(listElement) {
return(nrow(listElement))
}
Foo2 <- function(cl, largeList, arg) {
clusterExport(cl, list("arg"), envir = environment())
return(parLapply(cl, largeList, function(x) Bar(x, arg)))
}
Bar2 <- function(listElement, arg) {
return(nrow(listElement))
}
有沒有問題:
Foo1(cl, largeList)
看內存使用情況對於每個進程,我可以看到只有一個列表元素被複制到每個節點。
但是,調用時:
Foo2(cl, largeList, 0)
largeList的副本被複制到每個節點。通過Foo2,largeList複製不會在clusterExport發生,而是在parLapply上發生。另外,當我從全局環境(而不是函數內)執行Foo2的主體時,沒有問題。這是什麼造成的?
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 21 (Twenty One)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel splines stats graphics grDevices utils
[7] datasets methods base
other attached packages:
[1] xts_0.9-7 zoo_1.7-12 snow_0.3-13
[4] Rcpp_0.12.2 randomForest_4.6-12 gbm_2.1.1
[7] lattice_0.20-33 survival_2.38-3 e1071_1.6-7
loaded via a namespace (and not attached):
[1] class_7.3-13 tools_3.2.2 grid_3.2.2
什麼操作系統和什麼是您的makeCluster調用? –
操作系統是Fedora 21.我編輯的問題包括makeCluster調用和sessionInfo – tmakino
我相信,不管操作系統,默認羣集類型是PSOCK vs FORK。這是我在包中使用的簇:'if(grepl(「Windows」,sessionInfo()$ running)){cl < - makeCluster(nnodes,type =「PSOCK」)} else {cl < - makeCluster(nnodes ,type =「FORK」)}'...你能確認你的集羣類型是使用分叉嗎? –