獲取內容和處理

後更換我有一個HTML（sample.html）是這樣的：獲取內容和處理

<html> 
<head> 
</head> 
<body> 
<div id="content"> 
<!--content--> 

<p>some content</p> 

<!--content--> 
</div> 
</body> 
</html>

我要如何那是2 HTML註釋''之間的內容部分使用PHP？我想弄明白，做一些處理並把它放回去，所以我必須得到並放下！可能嗎？

來源

2010-08-04 esafwan

通過「內容」你的意思是'一些content'或'

一些內容

' 並且評論節點總會被寫入'<！ - content - >'？ – Gordon 2010-08-04 10:05:03

esafwan - 你可以使用正則表達式來提取div（特定id）之間的內容。

我之前爲這個圖片標籤做過這個，所以應用了相同的規則。我會查看代碼並稍微更新消息。

[更新]試試這個：

<?php 
    function get_tag($attr, $value, $xml) { 

     $attr = preg_quote($attr); 
     $value = preg_quote($value); 

     $tag_regex = '/<div[^>]*'.$attr.'="'.$value.'">(.*?)<\\/div>/si'; 

     preg_match($tag_regex, 
     $xml, 
     $matches); 
     return $matches[1]; 
    } 

    $yourentirehtml = file_get_contents("test.html"); 
    $extract = get_tag('id', 'content', $yourentirehtml); 
    echo $extract; 
?>

或者更簡單地說：

preg_match("/<div[^>]*id=\"content\">(.*?)<\\/div>/si", $text, $match); 
$content = $match[1];

吉姆

來源

2010-08-04 10:06:53

那麼'<！ - content - >'中的'id'屬性在哪裏？ – Gordon 2010-08-04 10:24:15

gordon - 拉出的部分是包含在內容（id）div內的內容。沿着與jQuery $（'＃content'）。html（）函數 – 2010-08-04 10:32:20

相同的線條，但我如何實際加載html到$ yourentirehtml？ – esafwan 2010-08-04 10:36:57

看一看這裏，這意味着你可以加載HTML代碼示例文件轉換爲SimpleXML http://blog.charlvn.com/2009/03/html-in-php-simplexml.html

然後，您可以將其視爲正常的SimpleXML對象。

編輯：這個，如果你想在標籤中的內容只會工作（如之間<DIV>和</DIV >）

來源

2010-08-04 10:22:28 Jake

如果這是一個簡單更換不涉及實際的解析HTML文檔中，您可以使用正則表達式，甚至只需使用str_replace即可。但一般來說，it is not a advisable to use Regex for HTML，因爲HTML不規則和coming up with reliable patterns can quickly become a nightmare。

正確的方法是使用一個解析庫，它實際上知道如何理解HTML文檔。您最好的原生賭注是DOM，但PHP有一些other native XML extensions您可以使用，並且還有一些第三方庫，如phpQuery,Zend_Dom,QueryPath和FluentDom。

如果你使用search function, you will see that this topic has been covered extensively，你應該沒有問題找到示例來展示如何解決你的問題。

來源

2010-08-04 10:22:43 Gordon

好點提出重新可靠模式 – 2010-08-04 10:33:51

+1，如果你正在尋找一個適當的XPath匹配節點，它是'（// * | // text（））[before-sibling :: comment（）='content'and following-sibling :: comment（）='content']' – Wrikken 2010-08-04 10:50:09

Thanx ...所有鏈接幫助我很多，但它沒有直接回答我的問題。鏈接值得閱讀，並幫助我在php中獲得更多深度！ – esafwan 2010-08-04 11:32:28

<?php 

    $content=file_get_contents("sample.html"); 
    $comment=explode("<!--content-->",$content); 
    $comment=explode("<!--content-->",$comment[1]); 
    var_dump(strip_tags($comment[0])); 
?>

檢查這一點，它會爲

來源

2010-08-04 11:07:53

你的問題的工作是與嵌套的div 我找到了解決方案here

<?php // File: MatchAllDivMain.php 
// Read html file to be processed into $data variable 
$data = file_get_contents('test.html'); 
// Commented regex to extract contents from <div class="main">contents</div> 
// where "contents" may contain nested <div>s. 
// Regex uses PCRE's recursive (?1) sub expression syntax to recurs group 1 
$pattern_long = '{   # recursive regex to capture contents of "main" DIV 
<div\s+class="main"\s*>    # match the "main" class DIV opening tag 
    (         # capture "main" DIV contents into $1 
    (?:        # non-cap group for nesting * quantifier 
     (?: (?!<div[^>]*>|</div>).)++ # possessively match all non-DIV tag chars 
    |         # or 
     <div[^>]*>(?1)</div>   # recursively match nested <div>xyz</div> 
    )*        # loop however deep as necessary 
)         # end group 1 capture 
</div>        # match the "main" class DIV closing tag 
}six'; // single-line (dot matches all), ignore case and free spacing modes ON 

// short version of same regex 
$pattern_short = '{<div\s+class="main"\s*>((?:(?:(?!<div[^>]*>|</div>).)++|<div[^>]*>(? 1)</div>)*)</div>}si'; 

$matchcount = preg_match_all($pattern_long, $data, $matches); 
// $matchcount = preg_match_all($pattern_short, $data, $matches); 
echo("<pre>\n"); 
if ($matchcount > 0) { 
    echo("$matchcount matches found.\n"); 
// print_r($matches); 
    for($i = 0; $i < $matchcount; $i++) { 
     echo("\nMatch #" . ($i + 1) . ":\n"); 
     echo($matches[1][$i]); // print 1st capture group for match number i 
    } 
} else { 
    echo('No matches'); 
} 
echo("\n</pre>"); 
?>

來源

2012-02-06 09:07:00 piernik

獲取內容和處理

回答

相關問題