奮力從字符串中提取內容（PHP）

-1

我在努力從字符串中提取內容（存儲在數據庫中）。每個div是一個章節，而h2內容是標題。我想separatly提取我已經在PHP preg_match_all tryed標題和每章（格）的內容奮力從字符串中提取內容（PHP）

<p> 
<div> 
    <h2>Title 1</h2> 
    Chapter Content 1 with standard html tags (ex: the following tags) 
    <strong>aaaaaaaa</strong><br /> 
    <em>aaaaaaaaa</em><br /> 
    <u>aaaaaaaa</u><br /> 
    <span style="color:#00ffff"></span><br /> 
</div> 
<div> 
    <h2>Title 2</h2> 
    Chapter Content 2 
</div> 
... 
</p>

，但是當我標準的HTML標籤

function splitDescription($pDescr) 
{ 
    $regex = "#<div.*?><h2.*?>(.*?)</h2>(.*?)</div>#"; 
    preg_match_all($regex, $pDescr, $result); 

    return $result; 
}

這是行不通的

來源

2012-07-19 Gaël Destrem

使用正則表達式解析HTML只是本身就是一個壞主意，使用DOM文檔的一個實例來分析你的HTML。 – 2012-07-19 17:17:10

你是否有一羣html解析器 - [DOMDocument]（http://php.net/manual/en/class.domdocument.php），[SimpleXml]（http://php.net/manual/en/book.simplexml .php）也看到這個http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Musa 2012-07-19 17:20:45

謝謝你的建議:) – 2012-07-19 17:34:03

不要爲此使用正則表達式，它不是正確的工具。使用HTML解析器，例如PHP的DOMDocument：

libxml_use_internal_errors(true); 
$doc = new DOMDocument; 
$doc->loadHTML($html); 
$xpath = new DOMXPath($doc); 

// For each <div> chapter 
foreach($xpath->query('//div') as $chapter) { 

    // Get the <h2> and save its inner value into $title 
    $title_node = $xpath->query('h2', $chapter)->item(0); 
    $title = $title_node->textContent; 

    // Remove the <h2> 
    $chapter->removeChild($title_node); 

    // Save the rest of the <div> children in $content 
    $content = ''; 
    foreach($chapter->childNodes as $child) { 
     $content .= $doc->saveHTML($child); 
    } 
    echo "$title - " . htmlentities($content) . "\n"; 
}

Demo

來源

2012-07-19 17:25:19 nickb

在您嘗試使用正則表達式解析HTML，我建議你read this post.

有很多很好的XML/HTML解析器可以使用。

來源

2012-07-19 17:17:35 Will

奮力從字符串中提取內容（PHP）

回答

相關問題