如何使用simple_html_dom.php從HTML文件中刪除空段落？

我想使用simple_html_dom.php從HTML文檔中刪除空白段落。我知道如何使用DOMDocument類來完成它，但是，由於我使用的HTML文件是在MS Word中準備的，因此DOMDocument的loadHTMLFile（）函數會給出此例外「名稱空間未定義」。如何使用simple_html_dom.php從HTML文件中刪除空段落？

這是我與DOMDocument對象爲HTML使用代碼的文件未在MS Word製備：

<?php 
/* Using the DOMDocument class */ 

/* Create a new DOMDocument object. */ 
$html = new DOMDocument("1.0", "UTF-8"); 

/* Load HTML code from an HTML file into the DOMDocument. */ 
$html->loadHTMLFile("HTML File With Empty Paragraphs.html"); 

/* Assign all the <p> elements into the $pars DOMNodeList object. */ 
$pars = $html->getElementsByTagName("p"); 

echo "The initial number of paragraphs is " . $pars->length . ".<br />"; 

/* The trim() function is used to remove leading and trailing spaces as well as 
* newline characters. */ 
for ($i = 0; $i < $pars->length; $i++){ 
    if (trim($pars->item($i)->textContent) == ""){ 
     $pars->item($i)->parentNode->removeChild($pars->item($i)); 
     $i--; 
    } 
} 

echo "The final number of paragraphs is " . $pars->length . ".<br />"; 

// Write the HTML code back into an HTML file. 
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html"); 
?>

這是我與在MS Word製備的HTML文件的模塊simple_html_dom.php使用的代碼：

<?php 
/* Using simple_html_dom.php */ 

include("simple_html_dom.php"); 

$html = file_get_html("HTML File With Empty Paragraphs.html"); 

$pars = $html->find("p"); 

for ($i = 0; $i < count($pars); $i++) { 
    if (trim($pars[$i]->plaintext) == "") { 
     unset($pars[$i]); 
     $i--; 
    } 
} 

$html->save("HTML File without Empty Paragraphs.html"); 
?>

這幾乎是一樣的，不同之處在於，所述$收杆變量是使用DOM文檔時的DOMNodeList和陣列使用simple_html_dom.php時。但是這個代碼不起作用。首先運行兩分鐘，然後報告這些錯誤：「未定義的偏移量：1」和「嘗試獲取非對象的屬性」：「if（trim（$ pars [$ i] - > plaintext）==」「）{「。

有誰知道我該如何解決這個問題？

謝謝。我也問了php devnetwork。

來源

2010-09-18 systemovich

我猜行'如果（修剪（$ pars->項目（$ I） - >的textContent == 「」））在第{'你需要發佈的代碼塊應該是if（trim（$ pars-> item（$ i） - > textContent）==「」）{' – Strae 2010-09-18 09:16:33

ps：在第二個代碼塊中相同if（trim（$ pars [$ i ] - > plaintext ==「」））{'=> if（trim（$ pars [$ i] - > plaintext）==「」）{';） – Strae 2010-09-18 09:17:37

@DaNiel，謝謝你指出，修復它，我得到相同的結果。 – systemovich 2010-09-19 20:25:16

在文檔尋找Simple HTML DOM Parser，我認爲這應該做的伎倆：

include('simple_html_dom.php'); 

$html = file_get_html('HTML File With Empty Paragraphs.html'); 
$pars = $html->find('p'); 

foreach($pars as $par) 
{ 
    if(trim($par->plaintext) == '') 
    { 
     // Remove an element, set it's outertext as an empty string 
     $par->outertext = ''; 
    } 
} 

$html->save('HTML File without Empty Paragraphs.html');

我做了一個快速測試，這對我的作品：

include('simple_html_dom.php'); 

$html = str_get_html('<html><body><h1>Test</h1><p></p><p>Test</p></body></html>'); 
$pars = $html->find("p"); 

foreach($pars as $par) 
{ 
    if(trim($par->plaintext) == '') 
    { 
     $par->outertext = ''; 
    } 
} 

echo $html; 
// Output: <html><body><h1>Test</h1><p>Test</p></body></html>

來源

2010-10-14 08:24:45 Mischa

非常感謝！我只能在14個小時內獎勵賞金。 – systemovich 2010-10-14 16:24:10

-1

空的段落看起來像 [spaces or newlines] （不區分大小寫）。您可以使用preg_replace（或str_replace）刪除空白段落。

如果一個空段落是下才起作用：

$oldHtml = file_get_contents('File With Empty Paragraphs.html'); 
$newHtml = str_replace('<p></p>', '', $oldHtml); 
// and write the new HTML to the file 
$fh = fopen('File Without Empty Paragraphs.html', 'w'); 
fwrite($fh, $newHtml); 
fclose($fh);

這將在段落屬性也行，像 ：

$oldHtml = file_get_contents('File With Empty Paragraphs.html'); 
$newHtml = preg_replace('#<p[^>]*>\s*</p>#i', '', $oldHtml); 
// and write the new HTML to the file 
$fh = fopen('File Without Empty Paragraphs.html', 'w'); 
fwrite($fh, $newHtml); 
fclose($fh);

來源

2010-09-18 08:01:59 Lekensteyn

如何使用simple_html_dom.php從HTML文件中刪除空段落？

回答

相關問題