PHP：從HTML

解析字符串我已經打開使用PHP：從HTML

file_get_contents('http://www.example.com/file.html')

，並希望解析線包括「ParseThis」一個HTML文件：

<h1 class=\"header\">ParseThis<\/h1>

正如你可以看到，這是一個h1內標記（文件中的第一個h1標記）。我如何獲得文本「ParseThis」？

來源

2010-08-28 John Paneth

您可以使用DOM這一點。

// Load remote file, supress parse errors 
libxml_use_internal_errors(TRUE); 
$dom = new DOMDocument; 
$dom->loadHTMLFile('http://www.example.com/file.html'); 
libxml_clear_errors(); 

// use XPath to find all nodes with a class attribute of header 
$xp = new DOMXpath($dom); 
$nodes = $xp->query('//h1[@class="header"]'); 

// output first item's content 
echo $nodes->item(0)->nodeValue;

另見

標記此CW，因爲我已經回答過這一點，但我懶得找重複

來源

2010-08-28 17:24:59 Gordon

使用此功能。

<?php 
function get_string_between($string, $start, $end) 
{ 
    $string = " ".$string; 
    $ini = strpos($string,$start); 
    if ($ini == 0) 
     return ""; 
    $ini += strlen($start); 
    $len = strpos($string,$end,$ini) - $ini; 
    return substr($string,$ini,$len); 
} 

$data = file_get_contents('http://www.example.com/file.html'); 

echo get_string_between($data, '<h1 class=\"header\">', '<\/h1>');

來源

2010-08-28 17:19:38 shamittomar

它可能適用於此ca se，但您應該使用DOM選擇器或XML導航。 – Incognito 2010-08-28 17:21:41

我更喜歡這個，因爲它比DOM更快，當有這樣的非常簡單的需求時，我使用我的'get_string_between' :) – shamittomar 2010-08-28 17:27:40

+1用它來獲得最佳的跟隨者數量。 – 2013-03-19 23:29:51

因爲它是第一個h1標籤，得到它應該是相當微不足道：

$doc = new DOMDocument(); 
$doc->loadHTML($html); 
$h1 = $doc->getElementsByTagName('h1'); 
echo $h1->item(0)->nodeValue;

http://php.net/manual/en/class.domdocument.php

來源

2010-08-28 17:23:17 karim79

回答

相關問題