2016-11-30 338 views
1

我需要解析非常大的使用apache poi和極限內存的excel文件。 Google搜索後,我開始知道poi將SAX解析器有效地提供給解析器大文件,而不消耗大量內存。Apache POI SAX解析 - 如何獲取單元格的實際值

Apache POI SAX Parser example

private class SheetToCSV implements SheetContentsHandler { 
    private boolean firstCellOfRow = false; 
    private int currentRow = -1; 
    private int currentCol = -1; 

    private void outputMissingRows(int number) { 
     for (int i=0; i<number; i++) { 
      for (int j=0; j<minColumns; j++) { 
       output.append(','); 
      } 
      output.append('\n'); 
     } 
    } 

    @Override 
    public void startRow(int rowNum) { 
     // If there were gaps, output the missing rows 
     outputMissingRows(rowNum-currentRow-1); 
     // Prepare for this row 
     firstCellOfRow = true; 
     currentRow = rowNum; 
     currentCol = -1; 
    } 

    @Override 
    public void endRow(int rowNum) { 
     // Ensure the minimum number of columns 
     for (int i=currentCol; i<minColumns; i++) { 
      output.append(','); 
     } 
     output.append('\n'); 
    } 

    @Override 
    public void cell(String cellReference, String formattedValue, 
      XSSFComment comment) { 
     if (firstCellOfRow) { 
      firstCellOfRow = false; 
     } else { 
      output.append(','); 
     } 

     // gracefully handle missing CellRef here in a similar way as XSSFCell does 
     if(cellReference == null) { 
      cellReference = new CellAddress(currentRow, currentCol).formatAsString(); 
     } 

     // Did we miss any cells? 
     int thisCol = (new CellReference(cellReference)).getCol(); 
     int missedCols = thisCol - currentCol - 1; 
     for (int i=0; i<missedCols; i++) { 
      output.append(','); 
     } 
     currentCol = thisCol; 

     // Number or string? 
     try { 
      Double.parseDouble(formattedValue); 
      output.append(formattedValue); 
     } catch (NumberFormatException e) { 
      output.append('"'); 
      output.append(formattedValue); 
      output.append('"'); 
     } 
    } 

    @Override 
    public void headerFooter(String text, boolean isHeader, String tagName) { 
     // Skip, no headers or footers in CSV 
    } 
} 

在上述鏈接所提供的示例中,該方法「小區」僅必須格式化值訪問但是我需要訪問單元的實際值。

+0

寫你自己的SAX處理程序傳入? – Gagravarr

回答

2

流接口的當前實現不提供此。因此,爲了達到這個目的,您需要複製底層XSSFSheetXMLHandler的代碼並對其進行調整,以避免格式化單元格內容。

+0

非常感謝@centic – Arul

相關問題