在Java中讀取大文件，速度太慢，超出gc開銷限制

我有一個大文件（類似3GB）並讀入ArrayList中當我運行下面的代碼時，幾分鐘後代碼運行速度非常緩慢，CPU使用率高。幾分鐘後eclipse控制檯顯示錯誤java.lang.OutOfMemoryError：超出GC開銷限制。在Java中讀取大文件，速度太慢，超出gc開銷限制

OS：windows2008R2，
4杯，
32GB存儲
Java版本「1.7.0_60」

的eclipse.ini

-startup 
plugins/org.eclipse.equinox.launcher_1.3.0.v20130327-1440.jar 
--launcher.library 
plugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.200.v20140116-2212 
-product 
org.eclipse.epp.package.standard.product 
--launcher.defaultAction 
openFile 
#--launcher.XXMaxPermSize 
#256M 
-showsplash 
org.eclipse.platform 
#--launcher.XXMaxPermSize 
#256m 
--launcher.defaultAction 
openFile 
--launcher.appendVmargs 
-vmargs 
-Dosgi.requiredJavaVersion=1.6 
-Xms10G 
-Xmx10G 
-XX:+UseParallelGC 
-XX:ParallelGCThreads=24 
-XX:MaxGCPauseMillis=1000 
-XX:+UseAdaptiveSizePolicy

Java代碼：

BufferedInputStream bis = new BufferedInputStream(new FileInputStream(new File("/words/wordlist.dat")));   
      InputStreamReader isr = new InputStreamReader(bis,"utf-8"); 
      BufferedReader in = new BufferedReader(isr,1024*1024*512); 

      String strTemp = null; 
      long ind = 0; 

      while (((strTemp = in.readLine()) != null)) 
      { 
       matcher.reset(strTemp); 

       if(strTemp.contains("$")) 
       { 
        al.add(strTemp); 
        strTemp = null; 
       } 
       ind = ind + 1; 
       if(ind%100000==0) 
       { 
        System.out.println(ind+" 100,000 +"); 
       } 

      } 
      in.close();

我的使用情況：

neural network 
java 
oracle 
solaris 
quick sort 
apple 
green fluorescent protein 
acm 
trs

來源

2016-02-27 pangjiale

您能否詳細說明您的用例？爲什麼在內存中需要3GB文件？ – Mahendra

這是否需要將整個文件加載到內存中？ – Devavrata

您可以通過在eclipse配置中設置'-XX：-UseGCOverheadLimi'來暫時防止此問題：[disable-the-usegcoverheadlimit-in-centos]（http://stackoverflow.com/questions/18934146/disable-the-usegcoverheadlimit- in-centos） – Mahendra

writing a program in java to get statistics on how many times the keyword were found in the search word log list

我建議你只是做到這一點。創建一個統計關鍵字出現次數的地圖，或者統計所有關鍵詞的出現次數。

使用Java 8流，您可以在一行或兩行中執行此操作，而無需一次將整個文件加載到內存中。

try (Stream<String> s = Files.lines(Paths.get("filename"))) { 
    Map<String, Long> count = s.flatMap(line -> Stream.of(line.trim().split(" +"))) 
      .collect(Collectors.groupingBy(w -> w, Collectors.counting())); 
}

來源

2016-02-27 12:07:23

在Java中讀取大文件，速度太慢，超出gc開銷限制

回答

相關問題