從html中提取文本？

我有一個字符串，如下從html中提取文本？

<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>

我想從上面的HTML中提取文本的，我想刪除 以及Hello World, this is StackOverflow's question details page通知。

我們如何在PHP中實現這一點，我嘗試了幾個函數，strip_tags，html_entity_decode等，但都在某些條件下失敗。

請幫忙，謝謝！

編輯我的代碼，我想是如下，但它不工作:(它的葉子 和'這種類型的字符。

$TMP_DESCR = trim(strip_tags($rs['description']));

來源

2011-02-02 Prashant

什麼條件，不要離開我們猜測！？ – 2011-02-02 11:45:08

正如@jakenoble所說，如果你發佈你的示例代碼和輸出和錯誤將會有所幫助。 – diagonalbatman 2011-02-02 11:46:21

如果顯示的字符串是完整的HTML頁面或包含附加標記的較大片段的一部分，請參閱[最佳方法解析HTML]（http://stackoverflow.com/questions/3577641/best-methods-to-parse- html/3577662＃3577662） – Gordon 2011-02-02 11:47:51

下面爲我工作......不得不在非做str_replace儘管如此，空間還是很大。

$string = "<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>"; 
echo htmlspecialchars_decode(trim(strip_tags(str_replace('&nbsp;', '', $string))), ENT_QUOTES);

來源

2011-02-02 12:06:41

strip_tags()將擺脫的標籤，並trim()應該擺脫空白的。我不知道這是否會與非打破空間的工作，雖然。

來源

2011-02-02 11:45:54 sevenseacat

首先，你必須呼籲HTML裝飾（）刪除空白。http://php.net/manual/en/function.trim.php

然後strip_tags，然後html_entity_decode。

所以：html_entity_decode(strip_tags(trim(html)));

來源

2011-02-02 11:46:24

可能做到這一點的最好的和最可靠的方法是用真正的（X | HT）ML解析像DOMDocument類功能：

<?php 

$str = "<p>&nbsp;Hello World, this is StackOverflow&#39;s question details page</p>"; 

$dom = new DOMDocument; 
$dom->loadXML(str_replace('&nbsp;', ' ', $str)); 

echo trim($dom->firstChild->nodeValue); 
// "Hello World, this is StackOverflow's question details pages"

這是可能略有矯枉過正這個問題，但使用適當的解析功能是一個很好的習慣。

編輯：您可以重用DOMDocument對象，所以你只需要在循環中兩行：

$dom = new DOMDocument; 
while ($rs = mysql_fetch_assoc($result)) { // or whatever 
    $dom->loadHTML(str_replace('&nbsp;', ' ', $rs['description'])); 
    $TMP_DESCR = $dom->firstChild->nodeValue; 

    // do something with $TMP_DESCR 
}

來源

2011-02-02 11:52:07 lonesomeday

從html中提取文本？

回答

相關問題