有這樣形成一個XML文件的一部分，獲取一個節點的內容：如何從包含HTML標籤的XML文件，但由於內容

<chapter id="1"> 
    <text line="1"> <p>HTML content 1</p> </text> 
    <text line="2"> <q>HTML<q> content 2 </text> 
    <text line="3"> HTML <b>content 3<b> </text> 
</chapter>

使用DOM文檔，什麼查詢我可以用得到相關聯的所有內容到<chapter id="1">...</chapter>包含HTML標記？有這樣的東西輸出爲：

<p>HTML content 1</p> 
<q>HTML<q> content 2 
HTML <b>content 3<b>

PS：作爲從筆記，我覺得這問題問的不同的東西。只是我問是否有可能以及如何處理節點內的內容忽略HTML標記如果存在時不可能修改原始XML。

來源

2016-11-11 Marcello Impastato

[您如何分析和處理HTML/XML的可能的複製在PHP？]（http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php） – RST

你的XML字符串無效，則必須在text節點轉換content到ヶ輛第一，例如：

$textContent = htmlentities($text);

在那之後，我們有：

$xmlText = '<chapter id="1"> 
    <text line="1"> &lt;p&gt;HTML content 1&lt;/p&gt; </text> 
    <text line="2"> &lt;q&gt;HTML&lt;q&gt; content 2 </text> 
    <text line="3"> HTML &lt;b&gt;content 3&lt;b&gt; </text> 
</chapter>';

現在，我們只需要請使用SimpleXMLElement來解析：

$xmlObject = new SimpleXMLElement($xmlText); 
$items = $xmlObject->xpath("text"); 
foreach ($items as $item){ 
    echo html_entity_decode($item); 
}

更新1

如果你不能改變你的XML字符串，你需要使用正則表達式，而不是htmlDom：

function get_tag_contents($tag, $xml) { 
    preg_match_all("#<$tag .*?>(.*?)</$tag>#", $xml, $matches); 

    return $matches[1]; 
} 

$invalidXml = '<chapter id="1"> 
    <text line="1"> <p>HTML content 1</p> </text> 
    <text line="2"> <q>HTML<q> content 2 </text> 
    <text line="3"> HTML <b>content 3<b> </text> 
</chapter>'; 

$textContents = get_tag_contents('text', $invalidXml); 

foreach ($textContents as $content) { 
    echo $content; 
}

來源

2016-11-11 10:38:43

有一個問題，我不能修改原始文件。在上面的例子中，我已經複製了一個真實的情況，所以我需要像il文件一樣工作給我提供數據。 –

我已更新我的答案，請檢查，現在符合您的要求 –

如何從包含HTML標籤的XML文件，但由於內容

回答

更新1

相關問題