閱讀文本文件

我想知道什麼是最好的方式來閱讀C++中的大文本（至少5 MB）文件，考慮速度和效率。任何首選的類或功能使用，爲什麼？閱讀文本文件

順便說一下，我正在專門在UNIX環境上運行。

2010-01-18 jasonline

我認爲你應該指定操作系統，因爲它的操作系統具體如何快速讀取。例如Windows允許內存映射文件 – 2010-01-18 02:41:17

答案也取決於你打算如何處理文本。 Unix也有內存映射文件。 – Omnifarious 2010-01-18 02:54:19

如果你沒有做家庭作業或者做一個需要C++的項目，那麼不要在Linux中重新發明輪子，有很多工具（用C/C++完成）讀取文件，例如grep，awk等。如果你仍然想在C/C++中做到這一點，你可以檢查他們的來源，看看它是如何完成的。 – ghostdog74 2010-01-18 02:56:44

流類（ifstream）實際上做得很好;假設你沒有限制，否則請確保關閉sync_with_stdio（在ios_base：:)。您可以使用getline（）直接讀入std :: strings，但從性能角度來看，使用固定緩衝區作爲char *（chars或old-school char []的向量）可能會更快（風險更高/更復雜）。

如果你願意玩頁面大小計算等遊戲，你可以去mmap路線。我可能首先使用流類來構建它，看看它是否足夠好。

根據您對每行數據所做的操作，您可能會開始發現處理例程是優化點而不是I/O。

來源

2010-01-18 02:43:10 Joe

對於ifstream，它比fread（）有什麼優勢？ – jasonline 2010-01-18 02:55:28

表現方面，我希望他們大致相同。在代碼維護方面，我寧願處理流類。 – Joe 2010-01-18 03:37:22

使用舊樣式文件io。

fopen the file for binary read 
fseek to the end of the file 
ftell to find out how many bytes are in the file. 
malloc a chunk of memory to hold all of the bytes + 1 
set the extra byte at the end of the buffer to NUL. 
fread the entire file into memory. 
create a vector of const char * 
push_back the address of the first byte into the vector. 
repeatedly 
    strstr - search the memory block for the carriage control character(s). 
    put a NUL at the found position 
    move past the carriage control characters 
    push_back that address into the vector 
until all of the text in the buffer has been processed. 

---------------- 
use the vector to find the strings, 
and process as needed. 
when done, delete the memory block 
and the vector should self-destruct.

來源

2010-01-18 03:19:42 EvilTeach

它比流類更好嗎？ – jasonline 2010-01-18 03:22:24

舊式文件io與流是同構的。你可以這樣做。這是一次啜食整個文件，並分析重要的字符串。 – EvilTeach 2010-01-18 04:08:14

如果使用文本文件存儲整數，浮點數和小弦，我的經驗是FILE，fopen，fscanf已經足夠快，你也可以直接得到的數字。我認爲內存映射是最快的，但它需要你編寫代碼來解析文件，這需要額外的工作。

來源

2010-01-18 03:34:50

閱讀文本文件

回答

相關問題