拆分所有的HTML標記成陣列

-1

<!DOCTYPE html> 
<html> 
<head> 
<meta charset="UTF-8"> 
<title>Title of the document</title> 
</head>  
<body> 
<div id="x">Hello</div> 
<p>world</p> 
<h1>my name</h1> 
</body> 
</html>

，我需要提取所有的HTML標籤，把一個數組中，像這樣：

'0' => '<!DOCTYPE html>', 
'1' => '<html>', 
'2' => '<head>', 
'3' => '<meta charset="UTF-8">', 
'4' => '<title>Title of the document</title>', 
'5' => '</head>', 
'6' => '<body>', 
'7' => '<div id="x">Hello</div>', 
'8' => '<p>world</p>', 
'9' => '<h1>my name</h1>', 
....

在我的情況下，我不需要獲取標籤中的所有現有內容，因爲我只抓住每個標籤的開頭就已經非常好。

我該怎麼做？

來源

2016-07-23 Lacrifilm

使用與preg_match_all功能如下解決方案：

$html_content = '<!DOCTYPE html> 
<html> 
<head> 
<meta charset="UTF-8"> 
<title>Title of the document</title> 
</head>  
<body> 
<div id="x">Hello</div> 
<p>world</p> 
<h1>my name</h1> 
</body> 
</html>'; 

preg_match_all("/\<\w[^<>]*?\>([^<>]+?\<\/\w+?\>)?|\<\/\w+?\>/i", $html_content, $matches); 
// <!DOCTYPE html> is standardized document type definition and is not a tag 

print_r($matches[0]);

輸出：

Array 
(
    [0] => <html> 
    [1] => <head> 
    [2] => <meta charset="UTF-8"> 
    [3] => <title>Title of the document</title> 
    [4] => </head> 
    [5] => <body> 
    [6] => <div id="x">Hello</div> 
    [7] => <p>world</p> 
    [8] => <h1>my name</h1> 
    [9] => </body> 
    [10] => </html> 
)

來源

2016-07-23 15:34:02 RomanPerekhrest

最好的方法是將HTML加載到DOMDocument類中並遍歷節點。

參閱相關的問題在這裏：https://stackoverflow.com/a/20025973/2870598

來源

2016-07-23 14:34:41 Christian

拆分所有的HTML標記成陣列

回答

相關問題