讀文件到內存中，通過數據環路，然後寫入文件

我試圖問一個類似的問題，這個帖子： C: read binary file to memory, alter buffer, write buffer to file 但答案沒有幫助我（我是新來的C++，所以我不能」 t瞭解全部）讀文件到內存中，通過數據環路，然後寫入文件

我該如何讓循環訪問內存中的數據，並逐行瀏覽，以便我可以將它寫入不同格式的文件？

這是我有：

#include <fstream> 
#include <iostream> 
#include <string> 
#include <sstream> 
#include <vector> 
#include <stdio.h> 
#include <sys/types.h> 
#include <sys/stat.h> 
#include <unistd.h> 
#include <stdlib.h> 

using namespace std; 

int main() 
{ 
    char* buffer; 
    char linearray[250]; 
    int lineposition; 
    double filesize; 
    string linedata; 
    string a; 

    //obtain the file 
    FILE *inputfile; 
    inputfile = fopen("S050508-v3.txt", "r"); 

    //find the filesize 
    fseek(inputfile, 0, SEEK_END); 
    filesize = ftell(inputfile); 
    rewind(inputfile); 

    //load the file into memory 
    buffer = (char*) malloc (sizeof(char)*filesize);  //allocate mem 
    fread (buffer,filesize,1,inputfile);   //read the file to the memory 
    fclose(inputfile); 

    //Check to see if file is correct in Memory 
    cout.write(buffer,filesize); 

    free(buffer); 
}

我明白任何幫助！

編輯（在數據更多信息）：

我的數據是5和10GB之間變化不同的文件。有大約3億行數據。每一行類似

M359

T359 3520 359

M400

A3592名義零增長392

其中第一個元素是一個字符，其餘的項目可以是數字或字符。我試圖將它讀入內存中，因爲逐行循環會比讀取一行，處理然後寫入要快得多。我正在編譯64bit linux。如果我需要進一步澄清，請告訴我。再次謝謝你。

編輯2 我使用的switch語句來處理每一行，每一行的第一個字符確定如何將線路的其餘部分進行格式化。例如'M'表示毫秒，並且我將接下來的三個數字放到一個結構中。每行都有不同的第一個字符，我需要做一些不同的事情。

來源

2013-02-16 BrianR

這是C的大混合++和C. – 2013-02-16 18:53:59

可以像訪問數組一樣訪問指針。如果你想逐行訪問，你應該看看['std :: istringstream']（http://en.cppreference.com/w/cpp/io/basic_istringstream）。 – 2013-02-16 18:56:53

第一次我看到'cout'和'malloc'在同一個函數中。 – us2012 2013-02-16 19:02:12

所以原諒潛在公然明顯的，但如果你想處理此一行行，然後...

#include <iostream> 
#include <fstream> 
#include <string> 
using namespace std; 

int main(int argc, char *argv[]) 
{ 
    // read lines one at a time 
    ifstream inf("S050508-v3.txt"); 
    string line; 
    while (getline(inf, line)) 
    { 
     // ... process line ... 
    } 
    inf.close(); 

    return 0; 
}

而且只需填寫while循環的身體？也許我沒有看到真正的問題（樹木有點兒森林）。

編輯

的OP是內嵌使用定製流緩衝可能不一定是世界上最輕便的事情，但他更感興趣的是避免輸入和輸出文件之間翻轉回來forh。有了足夠的內存，這應該可以做到。

#include <iostream> 
#include <fstream> 
#include <iterator> 
#include <memory> 
using namespace std; 

struct membuf : public std::streambuf 
{ 
    membuf(size_t len) 
     : streambuf() 
     , len(len) 
     , src(new char[ len ]) 
    { 
     setg(src.get(), src.get(), src.get() + len); 
    } 

    // direct buffer access for file load. 
    char * get() { return src.get(); }; 
    size_t size() const { return len; }; 

private: 
    std::unique_ptr<char> src; 
    size_t len; 
}; 

int main(int argc, char *argv[]) 
{ 
    // open file in binary, retrieve length-by-end-seek 
    ifstream inf(argv[1], ios::in|ios::binary); 
    inf.seekg(0,inf.end); 
    size_t len = inf.tellg(); 
    inf.seekg(0, inf.beg); 

    // allocate a steam buffer with an internal block 
    // large enough to hold the entire file. 
    membuf mb(len+1); 

    // use our membuf buffer for our file read-op. 
    inf.read(mb.get(), len); 
    mb.get()[len] = 0; 

    // use iss for your nefarious purposes 
    std::istream iss(&mb); 
    std::string s; 
    while (iss >> s) 
     cout << s << endl; 

    return EXIT_SUCCESS; 
}

來源

2013-02-16 19:07:30 WhozCraig

我之前就已經這樣做了，但是我的文件可以在5到10GB之間變化。該計劃花了大約一個小時才完成。 – BrianR 2013-02-16 19:13:10

@BrianR好吧，我想我現在明白了。您希望避免在您的可能的單主軸磁盤系統上進行蝶變。既然你有足夠的內存，你需要一個單一的連續讀取整個源文件，然後使用該讀緩衝區作爲格式化的流源來處理，將結果數據寫入長輸出操作系統中輸出。這是否準確？ – WhozCraig 2013-02-16 19:43:26

是的，這就是我想要做的！我很抱歉無法更好地表達它。 – BrianR 2013-02-16 19:46:12

如果我不得不這樣做，我可能會使用的代碼是這樣的：

std::ifstream in("S050508-v3.txt"); 

std::istringstream buffer; 

buffer << in.rdbuf(); 

std::string data = buffer.str(); 

if (check_for_good_data(data)) 
    std::cout << data;

這是假設你真正需要的輸入文件的全部內容在內存中一次，以確定是否它應該被複制到輸出或不。如果（例如），你可以在同一時間看一個字節的數據，並決定不看其他人字節是否應該被複制，你可以做更多的東西一樣：

std::ifstream in(...); 

std::copy_if(std::istreambuf_iterator<char>(in), 
      std::istreambuf_iterator<char>(), 
      std::ostream_iterator<char>(std::cout, ""), 
      is_good_char);

...其中is_good_char是一個返回bool的函數，用於說明char是否應包含在輸出中。

編輯：你正在處理的文件的大小主要是排除了我上面給出的第一種可能性。讀取和寫入大量數據幾乎肯定會提高一次處理一行數據的速度，這也是正確的。

來源

2013-02-16 19:07:56

你應該看看fgets和scanf，你可以在其中抽出匹配的數據片段，以便更容易操作，假設你想要做什麼。像這樣的東西可能看起來像：

FILE *input = fopen("file.txt", "r"); 
FILE *output = fopen("out.txt","w"); 

int bufferSize = 64; 
char buffer[bufferSize]; 

while(fgets(buffer,bufferSize,input) != EOF){ 
    char data[16]; 
    sscanf(buffer,"regex",data); 
    //manipulate data 
    fprintf(output,"%s",data); 
} 
fclose(output); 
fclose(input);

這將是更加的C-辦法做到這一點，C++雄辯地通過使用一個IStream處理的事情多一點： http://www.cplusplus.com/reference/istream/istream/

來源

2013-02-16 19:18:27

讀文件到內存中，通過數據環路，然後寫入文件

回答

相關問題