2009-06-30 185 views
54

我必須在文本[csv]文件中寫入大量數據。我使用BufferedWriter寫入數據,寫入174 MB數據需要大約40秒。這是java可以提供的最快速度嗎?在文本文件中寫入大量數據的最快方法Java

bufferedWriter = new BufferedWriter (new FileWriter ("fileName.csv")); 

注:這40秒包括迭代和取出由結果集的記錄,以及時間。 :)。結果集中174 MB用於400000行。

+3

你不會碰巧有抗病毒的計算機上的活動,你運行該代碼? – 2011-09-18 19:24:42

回答

87

你可能會嘗試刪除BufferedWriter並直接使用FileWriter。在現代系統中,無論如何,您只需寫入驅動器的高速緩存即可。

我需要4-5秒的時間來編寫175MB(400萬字符串) - 這是一個雙核2.4GHz戴爾,運行帶有80GB,7200轉日立磁盤的Windows XP。

你能隔離多少時間是記錄檢索和文件寫入多少?

import java.io.BufferedWriter; 
import java.io.File; 
import java.io.FileWriter; 
import java.io.IOException; 
import java.io.Writer; 
import java.util.ArrayList; 
import java.util.List; 

public class FileWritingPerfTest { 


private static final int ITERATIONS = 5; 
private static final double MEG = (Math.pow(1024, 2)); 
private static final int RECORD_COUNT = 4000000; 
private static final String RECORD = "Help I am trapped in a fortune cookie factory\n"; 
private static final int RECSIZE = RECORD.getBytes().length; 

public static void main(String[] args) throws Exception { 
    List<String> records = new ArrayList<String>(RECORD_COUNT); 
    int size = 0; 
    for (int i = 0; i < RECORD_COUNT; i++) { 
     records.add(RECORD); 
     size += RECSIZE; 
    } 
    System.out.println(records.size() + " 'records'"); 
    System.out.println(size/MEG + " MB"); 

    for (int i = 0; i < ITERATIONS; i++) { 
     System.out.println("\nIteration " + i); 

     writeRaw(records); 
     writeBuffered(records, 8192); 
     writeBuffered(records, (int) MEG); 
     writeBuffered(records, 4 * (int) MEG); 
    } 
} 

private static void writeRaw(List<String> records) throws IOException { 
    File file = File.createTempFile("foo", ".txt"); 
    try { 
     FileWriter writer = new FileWriter(file); 
     System.out.print("Writing raw... "); 
     write(records, writer); 
    } finally { 
     // comment this out if you want to inspect the files afterward 
     file.delete(); 
    } 
} 

private static void writeBuffered(List<String> records, int bufSize) throws IOException { 
    File file = File.createTempFile("foo", ".txt"); 
    try { 
     FileWriter writer = new FileWriter(file); 
     BufferedWriter bufferedWriter = new BufferedWriter(writer, bufSize); 

     System.out.print("Writing buffered (buffer size: " + bufSize + ")... "); 
     write(records, bufferedWriter); 
    } finally { 
     // comment this out if you want to inspect the files afterward 
     file.delete(); 
    } 
} 

private static void write(List<String> records, Writer writer) throws IOException { 
    long start = System.currentTimeMillis(); 
    for (String record: records) { 
     writer.write(record); 
    } 
    writer.flush(); 
    writer.close(); 
    long end = System.currentTimeMillis(); 
    System.out.println((end - start)/1000f + " seconds"); 
} 
} 
+2

@rozario每個寫入調用應該只產生約175MB,然後刪除它自己。如果不是這樣,你最終會得到175MB×4個不同的寫入調用×5次迭代= 3.5GB的數據。你可以檢查file.delete()的返回值,如果它是false,則拋出一個異常。 – 2011-04-14 16:42:14

+0

請注意``writer.flush()`在這種情況下不是必需的,因爲`writer.close()`[刷新內存](http://docs.oracle.com/javase/7/docs/api/java/io /BufferedWriter.html)隱含性。順便說一下:最佳做法建議使用[嘗試資源關閉](https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html)而不是明確調用close()。 – 2015-04-25 16:53:35

+2

FWIW,這是爲Java 5編寫的,至少沒有記錄在最後,但沒有嘗試使用資源。它可能會使用更新。 – 2015-04-27 20:48:37

4

您的傳輸速度可能不會受到Java的限制。相反,我會懷疑(排名不分先後)

  1. 轉移從數據庫
  2. 轉移到磁盤

的速度的速度。如果你讀了完整的數據集,然後將它寫出來到磁盤,那麼這將花費更長的時間,因爲JVM將不得不分配內存,並且db rea/disk write將會按順序發生。相反,我會寫出緩衝的作家爲你從數據庫中做的每一個讀取,因此操作將更接近併發的(我不知道你是否在做這個或不)

28

嘗試內存映射文件(需要300米/ s到174MB寫在我的M/C,Core 2 Duo處理器,2.5GB RAM):

byte[] buffer = "Help I am trapped in a fortune cookie factory\n".getBytes(); 
int number_of_lines = 400000; 

FileChannel rwChannel = new RandomAccessFile("textfile.txt", "rw").getChannel(); 
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, buffer.length * number_of_lines); 
for (int i = 0; i < number_of_lines; i++) 
{ 
    wrBuf.put(buffer); 
} 
rwChannel.close(); 
+0

什麼是aMessage.length()意味着什麼時候你實例化ByteBuffer? – Hotel 2012-09-27 19:39:30

14

僅用於統計的緣故:

的機器是舊的戴爾新的SSD

CPU:英特爾奔騰d 2,8千兆赫

SSD:愛國者地獄120GB SSD

4000000 'records' 
175.47607421875 MB 

Iteration 0 
Writing raw... 3.547 seconds 
Writing buffered (buffer size: 8192)... 2.625 seconds 
Writing buffered (buffer size: 1048576)... 2.203 seconds 
Writing buffered (buffer size: 4194304)... 2.312 seconds 

Iteration 1 
Writing raw... 2.922 seconds 
Writing buffered (buffer size: 8192)... 2.406 seconds 
Writing buffered (buffer size: 1048576)... 2.015 seconds 
Writing buffered (buffer size: 4194304)... 2.282 seconds 

Iteration 2 
Writing raw... 2.828 seconds 
Writing buffered (buffer size: 8192)... 2.109 seconds 
Writing buffered (buffer size: 1048576)... 2.078 seconds 
Writing buffered (buffer size: 4194304)... 2.015 seconds 

Iteration 3 
Writing raw... 3.187 seconds 
Writing buffered (buffer size: 8192)... 2.109 seconds 
Writing buffered (buffer size: 1048576)... 2.094 seconds 
Writing buffered (buffer size: 4194304)... 2.031 seconds 

Iteration 4 
Writing raw... 3.093 seconds 
Writing buffered (buffer size: 8192)... 2.141 seconds 
Writing buffered (buffer size: 1048576)... 2.063 seconds 
Writing buffered (buffer size: 4194304)... 2.016 seconds 

正如我們可以看到的原始方法是慢的緩衝。

1

package all.is.well; 
 
import java.io.IOException; 
 
import java.io.RandomAccessFile; 
 
import java.util.concurrent.ExecutorService; 
 
import java.util.concurrent.Executors; 
 
import junit.framework.TestCase; 
 

 
/** 
 
* @author Naresh Bhabat 
 
* 
 
Following implementation helps to deal with extra large files in java. 
 
This program is tested for dealing with 2GB input file. 
 
There are some points where extra logic can be added in future. 
 

 

 
Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object. 
 

 

 

 
It uses random access file,which is almost like streaming API. 
 

 

 
* **************************************** 
 
Notes regarding executor framework and its readings. 
 
Please note :ExecutorService executor = Executors.newFixedThreadPool(10); 
 

 
* \t for 10 threads:Total time required for reading and writing the text in 
 
*   :seconds 349.317 
 
* 
 
*   For 100:Total time required for reading the text and writing : seconds 464.042 
 
* 
 
*   For 1000 : Total time required for reading and writing text :466.538 
 
*   For 10000 Total time required for reading and writing in seconds 479.701 
 
* 
 
* 
 
*/ 
 
public class DealWithHugeRecordsinFile extends TestCase { 
 

 
\t static final String FILEPATH = "C:\\springbatch\\bigfile1.txt.txt"; 
 
\t static final String FILEPATH_WRITE = "C:\\springbatch\\writinghere.txt"; 
 
\t static volatile RandomAccessFile fileToWrite; 
 
\t static volatile RandomAccessFile file; 
 
\t static volatile String fileContentsIter; 
 
\t static volatile int position = 0; 
 

 
\t public static void main(String[] args) throws IOException, InterruptedException { 
 
\t \t long currentTimeMillis = System.currentTimeMillis(); 
 

 
\t \t try { 
 
\t \t \t fileToWrite = new RandomAccessFile(FILEPATH_WRITE, "rw");//for random write,independent of thread obstacles 
 
\t \t \t file = new RandomAccessFile(FILEPATH, "r");//for random read,independent of thread obstacles 
 
\t \t \t seriouslyReadProcessAndWriteAsynch(); 
 

 
\t \t } catch (IOException e) { 
 
\t \t \t // TODO Auto-generated catch block 
 
\t \t \t e.printStackTrace(); 
 
\t \t } 
 
\t \t Thread currentThread = Thread.currentThread(); 
 
\t \t System.out.println(currentThread.getName()); 
 
\t \t long currentTimeMillis2 = System.currentTimeMillis(); 
 
\t \t double time_seconds = (currentTimeMillis2 - currentTimeMillis)/1000.0; 
 
\t \t System.out.println("Total time required for reading the text in seconds " + time_seconds); 
 

 
\t } 
 

 
\t /** 
 
\t * @throws IOException 
 
\t * Something asynchronously serious 
 
\t */ 
 
\t public static void seriouslyReadProcessAndWriteAsynch() throws IOException { 
 
\t \t ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class 
 
\t \t while (true) { 
 
\t \t \t String readLine = file.readLine(); 
 
\t \t \t if (readLine == null) { 
 
\t \t \t \t break; 
 
\t \t \t } 
 
\t \t \t Runnable genuineWorker = new Runnable() { 
 
\t \t \t \t @Override 
 
\t \t \t \t public void run() { 
 
\t \t \t \t \t // do hard processing here in this thread,i have consumed 
 
\t \t \t \t \t // some time and eat some exception in write method. 
 
\t \t \t \t \t writeToFile(FILEPATH_WRITE, readLine); 
 
\t \t \t \t \t // System.out.println(" :" + 
 
\t \t \t \t \t // Thread.currentThread().getName()); 
 

 
\t \t \t \t } 
 
\t \t \t }; 
 
\t \t \t executor.execute(genuineWorker); 
 
\t \t } 
 
\t \t executor.shutdown(); 
 
\t \t while (!executor.isTerminated()) { 
 
\t \t } 
 
\t \t System.out.println("Finished all threads"); 
 
\t \t file.close(); 
 
\t \t fileToWrite.close(); 
 
\t } 
 

 
\t /** 
 
\t * @param filePath 
 
\t * @param data 
 
\t * @param position 
 
\t */ 
 
\t private static void writeToFile(String filePath, String data) { 
 
\t \t try { 
 
\t \t \t // fileToWrite.seek(position); 
 
\t \t \t data = "\n" + data; 
 
\t \t \t if (!data.contains("Randomization")) { 
 
\t \t \t \t return; 
 
\t \t \t } 
 
\t \t \t System.out.println("Let us do something time consuming to make this thread busy"+(position++) + " :" + data); 
 
\t \t \t System.out.println("Lets consume through this loop"); 
 
\t \t \t int i=1000; 
 
\t \t \t while(i>0){ 
 
\t \t \t 
 
\t \t \t \t i--; 
 
\t \t \t } 
 
\t \t \t fileToWrite.write(data.getBytes()); 
 
\t \t \t throw new Exception(); 
 
\t \t } catch (Exception exception) { 
 
\t \t \t System.out.println("exception was thrown but still we are able to proceeed further" 
 
\t \t \t \t \t + " \n This can be used for marking failure of the records"); 
 
\t \t \t //exception.printStackTrace(); 
 

 
\t \t } 
 

 
\t } 
 
}

相關問題