當獨立處理行時，如何從輸入文件並行讀取行？

我剛剛開始使用C++的OpenMP。我在C++的串行代碼看起來是這樣的：當獨立處理行時，如何從輸入文件並行讀取行？

#include <iostream> 
#include <string> 
#include <sstream> 
#include <vector> 
#include <fstream> 
#include <stdlib.h> 

int main(int argc, char* argv[]) { 
    string line; 
    std::ifstream inputfile(argv[1]); 

    if(inputfile.is_open()) { 
     while(getline(inputfile, line)) { 
      // Line gets processed and written into an output file 
     } 
    } 
}

因爲每一行非常獨立地處理，我嘗試使用OpenMP的並行，是因爲輸入文件是在千兆字節的順序。所以我猜想，首先我需要獲取輸入文件中的行數，然後通過這種方式並行化代碼。有人可以幫我在這裏嗎？

#include <iostream> 
#include <string> 
#include <sstream> 
#include <vector> 
#include <fstream> 
#include <stdlib.h> 

#ifdef _OPENMP 
#include <omp.h> 
#endif 

int main(int argc, char* argv[]) { 
    string line; 
    std::ifstream inputfile(argv[1]); 

    if(inputfile.is_open()) { 
     //Calculate number of lines in file? 
     //Set an output filename and open an ofstream 
     #pragma omp parallel num_threads(8) 
     { 
      #pragma omp for schedule(dynamic, 1000) 
      for(int i = 0; i < lines_in_file; i++) { 
       //What do I do here? I cannot just read any line because it requires random access 
      } 
     } 
    } 
}

編輯：

重要的事情

每一行獨立處理
秩序的結果並不重要

來源

2010-10-05 Legend

你說每條線都是獨立的，但是結果的順序呢？ – aneccodeal 2010-10-05 01:37:49

@aneccodeal：這也是獨立的，因爲我最終會將這些數據插入到數據庫中。 – Legend 2010-10-05 01:38:20

假設所有行的長度（大致）是相同的，則不需要計算行數（這很昂貴;您必須讀取整個文件！）您可以計算文件的大小（尋找到最後並查看指針所在的位置），按字節數將它分成八個塊，然後向前查找每個塊指針（除了最初的那個），直到它到達一個新行。 – 2010-10-05 01:38:28

不是直接OpenMP的答案 - 但你可能要找的是方法。看看Hadoop--它是用Java完成的，但至少有一些C++ API。

一般而言，您希望在不同機器上處理這些數據量，而不是在同一進程中的多個線程中（虛擬地址空間限制，缺少物理內存，交換等）。另外，內核必須帶磁盤文件按順序依次存在（你想要的 - 否則硬盤驅動器將不得不爲每個線程做額外的搜索）。

來源

2010-10-05 01:39:55

感謝您的解釋。你所說的話現在變得非常有意義。 – Legend 2010-10-05 01:52:12

當獨立處理行時，如何從輸入文件並行讀取行？

回答

相關問題