擴展中的R

存儲器大小限制我有，結合10個文件的每個文件是大小296MB的A R程序和我已經增加了存儲器大小以8GB（的RAM大小）擴展中的R

--max-mem-size=8192M

，當我跑這程序我得到一個錯誤說

In type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) : 
    Reached total allocation of 7646Mb: see help(memory.size)

這裏是我的[R計劃

S1 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1_400.txt"); 
S2 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_401_800.txt"); 
S3 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_801_1200.txt"); 
S4 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1201_1600.txt"); 
S5 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_1601_2000.txt"); 
S6 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2001_2400.txt"); 
S7 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2401_2800.txt"); 
S8 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_2801_3200.txt"); 
S9 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_3201_3600.txt"); 
S10 <- read.csv2("C:/Sim_Omega3_results/sim_omega3_3601_4000.txt"); 
options(max.print=154.8E10); 
combine_result <- rbind(S1,S2,S3,S4,S5,S6,S7,S8,S9,S10) 
write.table(combine_result,file="C:/sim_omega3_1_4000.txt",sep=";", 
      row.names=FALSE,col.names=TRUE, quote = FALSE);

任何人都可以，幫我這

謝謝，

Shruti。

來源

2011-04-21 Shruti

哪裏，具體地說，錯誤是否發生？ – 2011-04-21 19:56:41

你知道分號在一行的結尾不是必需的，對嗎？ – Benjamin 2011-04-21 20:27:51

如果你正在做的是聚合文件，你可能想嘗試直接使用bash或DOS。容易搜索谷歌和這個SO問題可能會有所幫助：http://stackoverflow.com/questions/4827453/merge-all-files-in-a-directory-into-one-using-bash – Chase 2011-04-21 20:27:53

我建議合併的建議在?read.csv2：

內存使用：

These functions can use a surprising amount of memory when reading 
large files. There is extensive discussion in the ‘R Data 
Import/Export’ manual, supplementing the notes here. 

Less memory will be used if ‘colClasses’ is specified as one of 
the six atomic vector classes. This can be particularly so when 
reading a column that takes many distinct numeric values, as 
storing each distinct value as a character string can take up to 
14 times as much memory as storing it as an integer. 

Using ‘nrows’, even as a mild over-estimate, will help memory 
usage. 

Using ‘comment.char = ""’ will be appreciably faster than the 
‘read.table’ default. 

‘read.table’ is not the right tool for reading large matrices, 
especially those with many columns: it is designed to read _data 
frames_ which may have columns of very different classes. Use 
‘scan’ instead for matrices.

來源

2011-04-21 19:56:14

內存分配需要連續塊。通過在磁盤上的文件所採取的大小可能不是對象多大時加載的成R.你可以看一下與功能這些文件的文件之一的良好指標：

?object.size

這是一個功能我用它來看看有什麼佔用了R中最大的空間：

getsizes <- function() {z <- sapply(ls(envir=globalenv()), 
           function(x) object.size(get(x))) 
       (tmp <- as.matrix(rev(sort(z))[1:10]))}

來源

2011-04-21 19:57:18

如果remove(S1,S2,S3,S4,S5,S6,S7,S8,S9,S10)然後gc()計算combine_result後，你可能會釋放足夠的內存。我還發現，如果你使用的是Windows，通過RScript運行它似乎允許訪問比通過GUI更多的內存。

來源

2011-04-21 20:01:08 Benjamin

看着OP錯誤信息我猜他沒有達到'combine_result'計算的步驟... – Marek 2011-04-21 21:07:54

如果這些文件是標準格式，並且您想在R中執行此操作，那麼爲什麼還要讀/寫csv。使用readLines/writeLines：

files_in <- file.path("C:/Sim_Omega3_results",c(
    "sim_omega3_1_400.txt", 
    "sim_omega3_401_800.txt", 
    "sim_omega3_801_1200.txt", 
    "sim_omega3_1201_1600.txt", 
    "sim_omega3_1601_2000.txt", 
    "sim_omega3_2001_2400.txt", 
    "sim_omega3_2401_2800.txt", 
    "sim_omega3_2801_3200.txt", 
    "sim_omega3_3201_3600.txt", 
    "sim_omega3_3601_4000.txt")) 


file.copy(files_in[1], out_file_name <- "C:/sim_omega3_1_4000.txt") 
file_out <- file(out_file_name, "at") 
for (file_in in files_in[-1]) { 
    x <- readLines(file_in) 
    writeLines(x[-1], file_out) 
} 
close(file_out)

來源

2011-04-21 21:27:28 Marek

嗨Marek，我在執行程序時出錯了錯誤是file_out < - file（out_file_name，「at 「）文件錯誤（out_file_name，」at「）：無法打開連接此外：警告消息：在文件中（out_file_name，」at「）：謝謝 – Shruti 2011-04-21 22:13:46

@Shruti您對'C：'具有寫入權限嗎？？你可以檢查'file.exists（out_file_name）'，應該是'TRUE'或者檢查磁盤上是否存在'file.copy'' C：/ sim_omega3_1_4000.txt'。同時檢查'file.exists（files_in）'是否都是'TRUE'。 – Marek 2011-04-22 07:56:20

謝謝我檢查了文件是否存在，然後運行程序它沒有工作。我專門創建了一個名爲out_file_name的文件，但沒有運氣.... – Shruti 2011-04-22 16:35:06

回答

相關問題