2016-07-23 38 views
-1

讓我們假設我有下面這段代碼:拆分所有的HTML標記成陣列

<!DOCTYPE html> 
<html> 
<head> 
<meta charset="UTF-8"> 
<title>Title of the document</title> 
</head>  
<body> 
<div id="x">Hello</div> 
<p>world</p> 
<h1>my name</h1> 
</body> 
</html> 

,我需要提取所有的HTML標籤,把一個數組中,像這樣:

'0' => '<!DOCTYPE html>', 
'1' => '<html>', 
'2' => '<head>', 
'3' => '<meta charset="UTF-8">', 
'4' => '<title>Title of the document</title>', 
'5' => '</head>', 
'6' => '<body>', 
'7' => '<div id="x">Hello</div>', 
'8' => '<p>world</p>', 
'9' => '<h1>my name</h1>', 
.... 

在我的情況下,我不需要獲取標籤中的所有現有內容,因爲我只抓住每個標籤的開頭就已經非常好。

我該怎麼做?

回答

2

使用與preg_match_all功能如下解決方案:

$html_content = '<!DOCTYPE html> 
<html> 
<head> 
<meta charset="UTF-8"> 
<title>Title of the document</title> 
</head>  
<body> 
<div id="x">Hello</div> 
<p>world</p> 
<h1>my name</h1> 
</body> 
</html>'; 

preg_match_all("/\<\w[^<>]*?\>([^<>]+?\<\/\w+?\>)?|\<\/\w+?\>/i", $html_content, $matches); 
// <!DOCTYPE html> is standardized document type definition and is not a tag 

print_r($matches[0]); 

輸出:

Array 
(
    [0] => <html> 
    [1] => <head> 
    [2] => <meta charset="UTF-8"> 
    [3] => <title>Title of the document</title> 
    [4] => </head> 
    [5] => <body> 
    [6] => <div id="x">Hello</div> 
    [7] => <p>world</p> 
    [8] => <h1>my name</h1> 
    [9] => </body> 
    [10] => </html> 
)