2011-01-12 103 views
24

我有大量的Excel工作表,我希望能夠使用PHPExcel讀入MySQL。如何使用PHPExcel從大型Excel文件(27MB +)中讀取大型工作表?

我現在用的是recent patch,讓您無需打開整個文件讀取工作表。這樣,我可以一次讀取一張工作表。

然而,一個Excel文件是27MB大。我可以在第一個工作表成功讀取,因爲它很小,但第二個工作是如此之大,是在22:00開始的過程中cron作業在上午8:00沒有完成,的工作很簡單太大

有什麼方法可以在工作表中按行(例如,是這樣的:

$inputFileType = 'Excel2007'; 
$inputFileName = 'big_file.xlsx'; 
$objReader = PHPExcel_IOFactory::createReader($inputFileType); 
$worksheetNames = $objReader->listWorksheetNames($inputFileName); 

foreach ($worksheetNames as $sheetName) { 
    //BELOW IS "WISH CODE": 
    foreach($row = 1; $row <=$max_rows; $row+= 100) { 
     $dataset = $objReader->getWorksheetWithRows($row, $row+100); 
     save_dataset_to_database($dataset); 
    } 
} 

附錄

@馬克,我用你貼創建下面的示例代碼:

function readRowsFromWorksheet() { 

    $file_name = htmlentities($_POST['file_name']); 
    $file_type = htmlentities($_POST['file_type']); 

    echo 'Read rows from worksheet:<br />'; 
    debug_log('----------start'); 
    $objReader = PHPExcel_IOFactory::createReader($file_type); 
    $chunkSize = 20; 
    $chunkFilter = new ChunkReadFilter(); 
    $objReader->setReadFilter($chunkFilter); 

    for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) { 
     $chunkFilter->setRows($startRow, $chunkSize); 
     $objPHPExcel = $objReader->load('data/' . $file_name); 
     debug_log('reading chunk starting at row '.$startRow); 
     $sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true); 
     var_dump($sheetData); 
     echo '<hr />'; 
    } 
    debug_log('end'); 
} 

如下面的日誌文件顯示,它運行罰款一小8K Excel文件,但是當我在3 MB運行 Excel文件,它永遠不會越過杉ST塊,有沒有什麼辦法可以優化性能的代碼,否則它看起來不像是不夠的高性能得到大塊了大量的Excel文件的:

2011-01-12 11:07:15: ----------start 
2011-01-12 11:07:15: reading chunk starting at row 2 
2011-01-12 11:07:15: reading chunk starting at row 22 
2011-01-12 11:07:15: reading chunk starting at row 42 
2011-01-12 11:07:15: reading chunk starting at row 62 
2011-01-12 11:07:15: reading chunk starting at row 82 
2011-01-12 11:07:15: reading chunk starting at row 102 
2011-01-12 11:07:15: reading chunk starting at row 122 
2011-01-12 11:07:15: reading chunk starting at row 142 
2011-01-12 11:07:15: reading chunk starting at row 162 
2011-01-12 11:07:15: reading chunk starting at row 182 
2011-01-12 11:07:15: reading chunk starting at row 202 
2011-01-12 11:07:15: reading chunk starting at row 222 
2011-01-12 11:07:15: end 
2011-01-12 11:07:52: ----------start 
2011-01-12 11:08:01: reading chunk starting at row 2 
(...at 11:18, CPU usage at 93% still running...) 

補遺2

當我註釋:

//$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true); 
//var_dump($sheetData); 

然後它解析以可接受的速度(每秒約行),反正是有增加的toArray()的表現?

2011-01-12 11:40:51: ----------start 
2011-01-12 11:40:59: reading chunk starting at row 2 
2011-01-12 11:41:07: reading chunk starting at row 22 
2011-01-12 11:41:14: reading chunk starting at row 42 
2011-01-12 11:41:22: reading chunk starting at row 62 
2011-01-12 11:41:29: reading chunk starting at row 82 
2011-01-12 11:41:37: reading chunk starting at row 102 
2011-01-12 11:41:45: reading chunk starting at row 122 
2011-01-12 11:41:52: reading chunk starting at row 142 
2011-01-12 11:42:00: reading chunk starting at row 162 
2011-01-12 11:42:07: reading chunk starting at row 182 
2011-01-12 11:42:15: reading chunk starting at row 202 
2011-01-12 11:42:22: reading chunk starting at row 222 
2011-01-12 11:42:22: end 

附錄3

這似乎工作充分,例如,至少在3 MB文件:

for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) { 
    echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ', $startRow, ' to ', ($startRow + $chunkSize - 1), '<br />'; 
    $chunkFilter->setRows($startRow, $chunkSize); 
    $objPHPExcel = $objReader->load('data/' . $file_name); 
    debug_log('reading chunk starting at row ' . $startRow); 
    foreach ($objPHPExcel->getActiveSheet()->getRowIterator() as $row) { 
     $cellIterator = $row->getCellIterator(); 
     $cellIterator->setIterateOnlyExistingCells(false); 
     echo '<tr>'; 
     foreach ($cellIterator as $cell) { 
      if (!is_null($cell)) { 
       //$value = $cell->getCalculatedValue(); 
       $rawValue = $cell->getValue(); 
       debug_log($rawValue); 
      } 
     } 
    } 
} 
+0

$ sheetData的的var_dump只是在我的代碼片段演示瞭如何分塊工程,可能不是你需要在「現實世界」中使用。如果您確實需要執行工作表數據轉儲,那麼我現在添加到Worksheet類中的rangeToArray()方法也會比toArray()方法更高效。 – 2011-01-12 11:53:43

+0

@Edward Tanguay嗨,你有沒有找到任何解決方案/替代方案?我遇到同樣的問題 – 2013-10-11 09:07:51

回答

9

它可以讀取工作表「塊」使用讀取過濾器,雖然我不能保證效率。

$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example2.xls'; 


/** Define a Read Filter class implementing PHPExcel_Reader_IReadFilter */ 
class chunkReadFilter implements PHPExcel_Reader_IReadFilter 
{ 
    private $_startRow = 0; 

    private $_endRow = 0; 

    /** Set the list of rows that we want to read */ 
    public function setRows($startRow, $chunkSize) { 
     $this->_startRow = $startRow; 
     $this->_endRow  = $startRow + $chunkSize; 
    } 

    public function readCell($column, $row, $worksheetName = '') { 
     // Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow 
     if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) { 
      return true; 
     } 
     return false; 
    } 
} 


echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory with a defined reader type of ',$inputFileType,'<br />'; 
/** Create a new Reader of the type defined in $inputFileType **/ 

$objReader = PHPExcel_IOFactory::createReader($inputFileType); 



echo '<hr />'; 


/** Define how many rows we want to read for each "chunk" **/ 
$chunkSize = 20; 
/** Create a new Instance of our Read Filter **/ 
$chunkFilter = new chunkReadFilter(); 

/** Tell the Reader that we want to use the Read Filter that we've Instantiated **/ 
$objReader->setReadFilter($chunkFilter); 

/** Loop to read our worksheet in "chunk size" blocks **/ 
/** $startRow is set to 2 initially because we always read the headings in row #1 **/ 

for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) { 
    echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startRow,' to ',($startRow+$chunkSize-1),'<br />'; 
    /** Tell the Read Filter, the limits on which rows we want to read this iteration **/ 
    $chunkFilter->setRows($startRow,$chunkSize); 
    /** Load only the rows that match our filter from $inputFileName to a PHPExcel Object **/ 
    $objPHPExcel = $objReader->load($inputFileName); 

    // Do some processing here 

    $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true); 
    var_dump($sheetData); 
    echo '<br /><br />'; 
} 

請注意,此讀取過濾器將始終讀取工作表的第一行以及塊規則定義的行。

當使用讀取濾波器,PHPExcel仍然解析整個文件,但只有那些加載匹配所定義的讀取濾波器細胞,所以它僅使用由該多個小區所需的內存。但是,它會多次解析文件,每個塊大小一次,因此速度會變慢。此示例一次讀取20行:要逐行讀取,只需將$ chunkSize設置爲1即可。

如果您有用不同「塊」引用單元格的公式,這也會導致問題,因爲數據只是簡單的' t可用於當前「塊」之外的單元格。

3

目前閱讀.xlsx.csv.ods最好的選擇是電子表格閱讀器(https://github.com/nuovo/spreadsheet-reader),因爲它可以讀取文件,而無需加載這一切到內存中。對於.xls擴展名,它有限制,因爲它使用PHPExcel進行閱讀。

1

/* * 這是ChunkReadFilter.php */

<?php 
Class ChunkReadFilter implements PHPExcel_Reader_IReadFilter { 

    private $_startRow = 0; 
    private $_endRow = 0; 

    /** Set the list of rows that we want to read */ 
    public function setRows($startRow, $chunkSize) { 
     $this->_startRow = $startRow; 
     $this->_endRow = $startRow + $chunkSize; 
    } 

    public function readCell($column, $row, $worksheetName = '') { 

     // Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow 
     if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) { 

      return true; 
     } 
     return false; 
    } 

} 
?> 

/* * 這是在index.php,並在該文件的末尾*一個並不完美,但基本實現。 */

<?php 

require_once './Classes/PHPExcel/IOFactory.php'; 
require_once 'ChunkReadFilter.php'; 

class Excelreader { 

    /** 
    * This function is used to read data from excel file in chunks and insert into database 
    * @param string $filePath 
    * @param integer $chunkSize 
    */ 
    public function readFileAndDumpInDB($filePath, $chunkSize) { 
     echo("Loading file " . $filePath . " ....." . PHP_EOL); 
     /** Create a new Reader of the type that has been identified * */ 
     $objReader = PHPExcel_IOFactory::createReader(PHPExcel_IOFactory::identify($filePath)); 

     $spreadsheetInfo = $objReader->listWorksheetInfo($filePath); 

     /** Create a new Instance of our Read Filter * */ 
     $chunkFilter = new ChunkReadFilter(); 

     /** Tell the Reader that we want to use the Read Filter that we've Instantiated * */ 
     $objReader->setReadFilter($chunkFilter); 
     $objReader->setReadDataOnly(true); 
     //$objReader->setLoadSheetsOnly("Sheet1"); 
     //get header column name 
     $chunkFilter->setRows(0, 1); 
     echo("Reading file " . $filePath . PHP_EOL . "<br>"); 
     $totalRows = $spreadsheetInfo[0]['totalRows']; 
     echo("Total rows in file " . $totalRows . " " . PHP_EOL . "<br>"); 

     /** Loop to read our worksheet in "chunk size" blocks * */ 
     /** $startRow is set to 1 initially because we always read the headings in row #1 * */ 
     for ($startRow = 1; $startRow <= $totalRows; $startRow += $chunkSize) { 
      echo("Loading WorkSheet for rows " . $startRow . " to " . ($startRow + $chunkSize - 1) . PHP_EOL . "<br>"); 
      $i = 0; 
      /** Tell the Read Filter, the limits on which rows we want to read this iteration * */ 
      $chunkFilter->setRows($startRow, $chunkSize); 
      /** Load only the rows that match our filter from $inputFileName to a PHPExcel Object * */ 
      $objPHPExcel = $objReader->load($filePath); 
      $sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, false); 

      $startIndex = ($startRow == 1) ? $startRow : $startRow - 1; 
      //dumping in database 
      if (!empty($sheetData) && $startRow < $totalRows) { 
       /** 
       * $this->dumpInDb(array_slice($sheetData, $startIndex, $chunkSize)); 
       */ 

       echo "<table border='1'>"; 
       foreach ($sheetData as $key => $value) { 
        $i++; 
        if ($value[0] != null) { 
         echo "<tr><td>id:$i</td><td>{$value[0]} </td><td>{$value[1]} </td><td>{$value[2]} </td><td>{$value[3]} </td></tr>"; 
        } 
       } 
       echo "</table><br/><br/>"; 
      } 
      $objPHPExcel->disconnectWorksheets(); 
      unset($objPHPExcel, $sheetData); 
     } 
     echo("File " . $filePath . " has been uploaded successfully in database" . PHP_EOL . "<br>"); 
    } 

    /** 
    * Insert data into database table 
    * @param Array $sheetData 
    * @return boolean 
    * @throws Exception 
    * THE METHOD FOR THE DATABASE IS NOT WORKING, JUST THE PUBLIC METHOD.. 
    */ 
    protected function dumpInDb($sheetData) { 

     $con = DbAdapter::getDBConnection(); 
     $query = "INSERT INTO employe(name,address)VALUES"; 

     for ($i = 1; $i < count($sheetData); $i++) { 
      $query .= "(" . "'" . mysql_escape_string($sheetData[$i][0]) . "'," 
        . "'" . mysql_escape_string($sheetData[$i][1]) . "')"; 
     } 

     $query = trim($query, ","); 
     $query .="ON DUPLICATE KEY UPDATE name=VALUES(name), 
       =VALUES(address), 
       "; 
     if (mysqli_query($con, $query)) { 
      mysql_close($con); 
      return true; 
     } else { 
      mysql_close($con); 
      throw new Exception(mysqli_error($con)); 
     } 
    } 

    /** 
    * This function returns list of files corresponding to given directory path 
    * @param String $dataFolderPath 
    * @return Array list of file 
    */ 
    protected function getFileList($dataFolderPath) { 
     if (!is_dir($dataFolderPath)) { 
      throw new Exception("Directory " . $dataFolderPath . " is not exist"); 
     } 
     $root = scandir($dataFolderPath); 
     $fileList = array(); 
     foreach ($root as $value) { 
      if ($value === '.' || $value === '..') { 
       continue; 
      } 
      if (is_file("$dataFolderPath/$value")) { 
       $fileList[] = "$dataFolderPath/$value"; 
       continue; 
      } 
     } 
     return $fileList; 
    } 

} 

$inputFileName = './prueba_para_batch.xls'; 
$excelReader = new Excelreader(); 
$excelReader->readFileAndDumpInDB($inputFileName, 500);