從DOM中獲取從特定id名稱開始的div數據

我想要獲取html div數據，其中id從特定名稱或字符串開始。從DOM中獲取從特定id名稱開始的div數據

例如，假設我有這個網站的數據： -

<html> 
    <div id="post_message_1"> 
     somecontent1 
    </div> 
<div id="post_message_2"> 
     somecontent2 
    </div> 
    <div id="post_message_3"> 
     somecontent3 
    </div> 
</html>

爲此，我試過捲曲。

 <?php 
     function file_get_contents_curl($url) 
     { 
     $ch = curl_init(); 
     curl_setopt($ch, CURLOPT_HEADER, 0); 
     curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
     curl_setopt($ch, CURLOPT_URL, $url); 
     curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
     $data = curl_exec($ch); 
     curl_close($ch); 
     return $data; 
     } 


     $html = file_get_contents_curl("myUrl"); 
     $fh = fopen("test.html", 'w'); // we create the file, notice the 'w'. This is to be able to write to the file once. 
     //writing response in newly created file 
     fwrite($fh, $html); // here we write the data to the file. 
     fclose($fh);      
     ?>

如果我使用

$select= $doc->getElementById("post_message_");

，則它不會返回數據，因爲其搜索這個ID在DOM，但在HTML DIV ID只能從該字符串開始。它可能是post_message_1或post_message_2。

來源

2015-05-09 neo

我會轉的file_get_contents_curl輸出到 SimpleXmlElement對象，我會使用的xpath

的功能之一例如，你可以這樣做：

$html = <<<HTML 
<html> 
    <div id="post_message_1"> 
     somecontent1 
    </div> 
<div id="post_message_2"> 
     somecontent2 
    </div> 
    <div id="post_message_3"> 
     somecontent3 
    </div> 
</html> 
HTML; 

$dom = new SimpleXMLElement($html); 

var_dump($dom->xpath('//div[starts-with(@id, "post_message_")]'));

UPDATE

在你的情況下，你應該這樣做：

$doc = new DOMDocument(); 
$doc->loadHTML(file_get_contents_curl($url)); 

$sxml = simplexml_import_dom($doc); 

var_dump($sxml->xpath('//div[starts-with(@id, "post_message_")]'));

來源

2015-05-09 09:20:53 smarber

嗨，感謝您的建議，但是當我使用xpath，然後我得到這個錯誤「PHP致命錯誤：未捕獲異常'異常'消息'字符串不能被解析爲XML'」，您可以瞭解HTML給，只是爲了演示的目的。 – neo

您的意思是說您無法將'file_get_contents_curl'的html輸出轉換爲php對象simplexmlElement？我的意思是，如果你可以利用'SimpleXMLElement'，你可以非常容易地做任何你需要的工作 – smarber

是的，很多警告也顯示像「PHP Warning：SimpleXMLElement :: __ construct（）： English<選項值=「78」class =「fjdpth2」> in /var/www/index.php on line 1「 – neo

我可能會迭代所有div，並使用他們的id的正則表達式來獲得我需要的。

我不認爲有一個更乾淨的方式來做到這一點，除非你可以編輯html頁面代碼並向包含消息的div添加類。

來源

2015-05-09 09:17:30

我正在提取另一個網站數據，所以我不能在HTML中添加一個類。 – neo

這裏鹽漬大錘，但我從來沒有能夠得到這樣的模式在PHP中的工作 - 但正則表達式將工作 - >

$subject = $html; 
$pattern = '/id\=\"post_message_\d+\"\>(?<match>.*)<\/div\>/isUg'; 
preg_match($pattern, $subject, $matches); 

var_dump(trim($matches['match']));

正則表達式的解釋：

id\=\"post_message_\d+\"\>(?<matches>.*)<\/div\>/isU 
id matches the characters id literally (case insensitive) 
\= matches the character = literally 
\" matches the character " literally 
post_message_ matches the characters post_message_ literally (case insensitive) 
\d+ match a digit [0-9] 
Quantifier: + Between one and unlimited times, as few times as possible, expanding as needed [lazy] 
\" matches the character " literally 
\> matches the character > literally 
(?<matches>.*) Named capturing group matches 
.* matches any character 
Quantifier: * Between zero and unlimited times, as few times as possible, expanding as needed [lazy] 
< matches the characters < literally 
\/ matches the character/literally 
div matches the characters div literally (case insensitive) 
\> matches the character > literally 
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z]) 
s modifier: single line. Dot matches newline characters 
U modifier: Ungreedy. The match becomes lazy by default. Now a ? following a quantifier makes it greedy

來源

2015-05-09 09:26:40

它只給我第一個div數據。可能需要一些修改。 – neo

我找到解決方案，它的工作正常。可能是這個代碼會幫助別人。感謝@smarber，他的模式幫助我解決了這個問題。

<?php 
     function file_get_contents_curl($url) 
     { 
     $ch = curl_init(); 
     curl_setopt($ch, CURLOPT_HEADER, 0); 
     curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
     curl_setopt($ch, CURLOPT_URL, $url); 
     curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); 
     $data = curl_exec($ch); 
     curl_close($ch); 
     return $data; 
     } 


     $html = file_get_contents_curl("myUrl"); 
     $dom = new DOMDocument(); 
    $result = $dom->loadHTML($html); 
    $finder = new DomXPath($dom); 
     $nodes = $finder->query('//div[starts-with(@id, "post_message_")]'); 

     $tmp_dom = new DOMDocument(); 
    foreach ($nodes as $node) 
    { 
$tmp_dom->appendChild($tmp_dom->importNode($node,true)); 
    } 

     $innerHTML = trim($tmp_dom->saveHTML()); 
     $fh = fopen("test.html", 'w'); // we create the file, notice the 'w'. This is to be able to write to the file once. 
     //writing response in newly created file 
     fwrite($fh, $innerHTML); // here we write the data to the file. 
     fclose($fh);      
     ?>

來源

2015-05-09 09:50:50 neo

從DOM中獲取從特定id名稱開始的div數據

回答

相關問題