SimpleXML->的XPath問題

http://www.alliedelec.com/search/searchresults.aspx?N=0&Ntt=PIC16F648&Ntk=Primary&i=0&sw=n

與SimpleXML->的XPath。我已經確定了表的XPath的是：

'//*[@id="tblParts"]'

現在我把我的捲曲串$串並執行以下操作：

$tidy->parseString($string); 
$output = (string) $tidy; 
$xml = new SimpleXMLElement($output); 
$result = $xml->xpath('//*[@id="tblParts"]'); 
while(list(, $node) = each($result)) 
{ 
echo 'NODE:' . $node . "\n"; 
}

我回來的錯誤，如這些，由百位：

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 60: parser error : Opening and ending tag mismatch: meta line 22 and head in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119 

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: </head> in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119 

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]:^in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119 

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 108: parser error : Opening and ending tag mismatch: img line 106 and td in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

除了這個底：

Fatal error: Uncaught exception 'Exception' with message 'String could not be parsed as XML' in C:\xampp\htdocs\app\com\get\get_alliedelectronics.php:119 Stack trace: #0 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(119): SimpleXMLElement->__construct('<!DOCTYPE html ...') #1 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(95): get_Alliedelectronics->extractData('<!DOCTYPE html ...') #2 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(138): get_Alliedelectronics->query('PIC16F648') #3 {main} thrown in C:\xampp\htdocs\app\com\get\get_alliedelectronics.php on line 119

來源

2011-05-08 Jack Murphy

你看上去獲取並試圖解析頁面的HTML格式不正確（標籤不匹配等）

您可以嘗試使用simplexml_import_dom修正錯誤，因爲我在this SO post解釋。

來源

2011-05-08 14:17:53

此外，您需要使用適合您正在處理的數據的工具。如果您打算使用XML方法，那麼編寫好的代碼要求可以*保證輸入的格式良好，而不僅僅是希望和實驗的猜測。您只能相信XML庫爲您生成XML，因此如果您在處理的早期處於「骯髒」階段，則必須使用HTML方法進行轉換並使代碼安全。 – 2011-05-08 14:50:05

我不知道我可以用什麼其他工具從這個HTML文件中提取數據，我不知道如何清除髒代碼，除非讓它通過整齊運行。 – 2011-05-08 14:53:27

我建議不要使用SimpleXML（@Nev Stokes和@Nicholas Wilson是正確的：這是html，而不是XML，你不能保證它會驗證爲XML）並使用類似DOM的東西（請參閱http://www.php.net/manual/en/book.dom.php）。你可以這樣做：

$doc = new DOMDocument(); 
$doc->loadHTML($string); 
$xpath = new DOMXPath($doc); 
$entries = $xpath->query('//*[@id="tblParts"]'); 
foreach ($entries as $entry) { 
    // do something 
}

看看是否有幫助。

來源

2011-05-08 16:23:41 Femi

SimpleXML->的XPath問題

回答

相關問題