正則表達式用於選擇性剝離HTML

我試圖用PHP解析一些HTML作爲練習，將它作爲文本輸出，並且我遇到了一個障礙。我想刪除所有使用style="display: none;"隱藏的標籤 - 請注意標籤可能包含其他屬性和樣式屬性。正則表達式用於選擇性剝離HTML

我到目前爲止的代碼是這樣的：

$page = preg_replace("#<([a-z]+).*?style=\".*?display:\s*none[^>]*>.*?</\1>#s","",$page);`

它返回NULL與PREG_BACKTRACK_LIMIT_ERROR的代碼。
我嘗試這樣做，而不是：

$page = preg_replace("#<([a-z]+)[^>]*?style=\"[^\"]*?display:\s*none[^>]*>.*?</\1>#s","",$page);

但現在它只是不更換任何標籤。

任何幫助將不勝感激。謝謝！

來源

2010-12-08 Niet the Dark Absol

剛。別。的http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – 2010-12-08 22:47:23

可能重複【如何分析和處理PHP程序HTML？]（HTTP ：//stackoverflow.com/questions/3577641/how-to-parse-and-process-html-with-php） – PeeHaa 2012-01-16 20:01:18

使用DOMDocument，你可以嘗試這樣的事：

$doc = new DOMDocument; 
$doc->loadHTMLFile("foo.html"); 
$nodeList = $doc->getElementsByTagName('*'); 
foreach($nodeList as $node) { 
    if(strpos(strtolower($node->getAttribute('style')), 'display: none') !== false) { 
     $doc->removeChild($node); 
    } 
} 
$doc->saveHTMLFile("foo.html");

來源

2010-12-08 22:57:31 karim79