lucene在while循環中創建文檔的速度越來越慢

我有一些效率問題。我正在開發一個作爲EAR歸檔部署在jboss EAP 6.1服務器上的企業應用程序。我在while循環中基於實體創建新對象並將它們寫入文件。我以有限的量獲得這些實體（在EJB DAO的幫助下）（例如，每個步驟2000）。問題是我需要處理數以百萬計的對象，前一百萬行很順利，但進一步的循環越慢越好。誰能告訴我爲什麼這個工作越來越慢，隨着循環的進展？我怎樣才能讓它工作順利？這裏是代碼的一些關鍵部分：lucene在while循環中創建文檔的速度越來越慢

public void createFullIndex(int stepSize) { 
     int logsNumber = systemLogDao.getSystemLogsNumber(); 
     int counter = 0; 
     while (counter < logsNumber) { 
      for (SystemLogEntity systemLogEntity : systemLogDao.getLimitedSystemLogs(counter, stepSize)) { 
       addDocument(systemLogEntity); 
      } 
      counter = counter + stepSize; 
     } 
     commitIndex(); 
    } 

    public void addDocument(SystemLogEntity systemLogEntity) { 
     try { 
     Document document = new Document(); 
     document.add(new NumericField("id", Field.Store.YES, true).setIntValue(systemLogEntity.getId())); 
     document.add(new Field("resource", (systemLogEntity.getResource() == null ? "" : systemLogEntity 
       .getResource().getResourceCode()), Field.Store.YES, Field.Index.ANALYZED)); 
     document.add(new Field("operationType", (systemLogEntity.getOperationType() == null ? "" : systemLogEntity 
     document.add(new Field("comment", 
       (systemLogEntity.getComment() == null ? "" : systemLogEntity.getComment()), Field.Store.YES, 
       Field.Index.ANALYZED)); 
     indexWriter.addDocument(document); 
     } catch (CorruptIndexException e) { 
      LOGGER.error("Failed to add the following log to Lucene index:\n" + systemLogEntity.toString(), e); 
     } catch (IOException e) { 
      LOGGER.error("Failed to add the following log to Lucene index:\n" + systemLogEntity.toString(), e); 
     } 
    }

我希望你的幫助！

來源

2014-09-02 AjMeen

你看過你的堆統計數據嗎？ – 2014-09-02 12:18:31

@HotLicks我想過，但說實話，我不太清楚該怎麼做。 – AjMeen 2014-09-02 12:22:27

什麼是'indexWriter'？看來你正在將所有的文檔都添加到它，並且它會保留對它們的引用，並將它們保存在內存中。 – 2014-09-02 12:41:05

據我所見，你不要把你的東西寫入文件，只要你得到它。而是嘗試創建完整的DOM對象，然後將其刷新到文件。這種策略適用於數量有限的對象。在你的情況下，你必須處理數以百萬計（如你所說），你不應該使用DOM。相反，您應該能夠在接收數據時創建XML片段並將它們寫入文件。這將減少您的內存消耗並希望提高性能。

來源

2014-09-02 12:24:51 AlexR

我認爲這是一個影響最大的建議。謝謝！ – AjMeen 2014-09-02 14:24:54

不客氣。祝你好運。 – AlexR 2014-09-02 15:03:24

伐木應該很容易。使用番石榴追加到文本的樣子：

File to = new File("C:/Logs/log.txt"); 
CharSequence from = "Your data as string\n"; 
Files.append(from, to, Charsets.UTF_8);

我有幾個注意事項：

我不知道，如果你的日誌實體垃圾收集
目前尚不清楚該文件的內容保持在內存
如果日誌是XML格式的，那麼整個XML DOM可能需要進行解析，如果新元素添加

來源

2014-09-02 12:34:39 Margus

我會嘗試重新使用Document對象。我的循環問題與垃圾收集有關，我的循環太快，gc不能合理跟上，重新使用對象解決了我所有的問題。我還沒有嘗試過親自使用Document對象，但是如果可能的話，它可能適用於您。

來源

2014-09-02 12:35:20 Kieveli

謝謝，這是一個合理的提示！ +1 – AjMeen 2014-09-02 14:26:26

lucene在while循環中創建文檔的速度越來越慢

回答

相關問題