2015-04-04 172 views
3

試圖從AEC網站提取一些信息(例如http://apps.aec.gov.au/eSearch/LocalitySearchResults.aspx?filter=3977&filterby=Postcode)。我正在運行的XPath查詢是「//x:tbody/x:tr/x:td[4]/x:a」,我已經在XPath Checker(Firefox擴展)中進行了測試,並且它提取了相關的本地數據。通過PHP中的XPath提取信息

我然後使用PHP來加載頁面,執行查詢,然後遍歷結果。

$ch = curl_init(); 
$timeout = 5; 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); 
$html = curl_exec($ch); 
curl_close($ch); 

# Create a DOM parser object 
$dom = new DOMDocument(); 
libxml_use_internal_errors(true); 


$dom->loadHTML($html); 

$xpath = new DOMXpath($dom); 

$elements = $xpath->query('//tbody/tr/td[4]/a'); 


foreach ($elements as $element) { 
    echo $element; 
} 

我然後讓:

Warning: Invalid argument supplied for foreach() in /home/givesh5/public_html/dig/electoratesearch.php on line 41 

看來,查詢返回某種布爾而不是查詢匹配列表?

相關標記如下:

<table cellspacing="0" rules="all" border="1" id="ContentPlaceHolderBody_gridViewLocalities" style="border-collapse:collapse;"> 
     <tr class="headingLink"> 
      <th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$StateAb&#39;)">State</a></th><th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$LocalityNm&#39;)">Locality/Suburb</a></th><th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$Postcode&#39;)">Postcode</a></th><th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$DivisionNm&#39;)">Electorate</a></th><th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$DivisionNmRedistributed&#39;)">Redistributed Electorate</a></th><th scope="col">Other Locality(s)</th> 
     </tr><tr> 
      <td>VIC</td><td>BOTANIC RIDGE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td> 
     </tr><tr> 
      <td>VIC</td><td>CANNONS CREEK</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td> 
     </tr><tr> 
      <td>VIC</td><td>CRANBOURNE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Holt&amp;filterby=Electorate&amp;divid=216">Holt</a></td><td></td><td>&nbsp;</td> 
     </tr><tr> 
      <td>VIC</td><td>CRANBOURNE EAST</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td> 
     </tr><tr> 
      <td>VIC</td><td>CRANBOURNE EAST</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Holt&amp;filterby=Electorate&amp;divid=216">Holt</a></td><td></td><td>&nbsp;</td> 
     </tr><tr> 
      <td>VIC</td><td>CRANBOURNE NORTH</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Holt&amp;filterby=Electorate&amp;divid=216">Holt</a></td><td></td><td>&nbsp;</td> 
     </tr><tr> 
      <td>VIC</td><td>CRANBOURNE SOUTH</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td> 
     </tr><tr> 
      <td>VIC</td><td>CRANBOURNE WEST</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Holt&amp;filterby=Electorate&amp;divid=216">Holt</a></td><td></td><td>&nbsp;</td> 
     </tr><tr> 
      <td>VIC</td><td>DEVON MEADOWS</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td> 
     </tr><tr> 
      <td>VIC</td><td>FIVEWAYS</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td><a href="LocalitySearchResults.aspx?filter=DEVON+MEADOWS&amp;filterby=LocalityorSuburb&amp;state=VIC">DEVON MEADOWS</a></td> 
     </tr><tr> 
      <td>VIC</td><td>JUNCTION VILLAGE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td> 
     </tr><tr> 
      <td>VIC</td><td>SANDHURST</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Isaacs&amp;filterby=Electorate&amp;divid=219">Isaacs</a></td><td></td><td>&nbsp;</td> 
     </tr><tr> 
      <td>VIC</td><td>SKYE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Dunkley&amp;filterby=Electorate&amp;divid=210">Dunkley</a></td><td></td><td>&nbsp;</td> 
     </tr><tr> 
      <td>VIC</td><td>SKYE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Isaacs&amp;filterby=Electorate&amp;divid=219">Isaacs</a></td><td></td><td>&nbsp;</td> 
     </tr> 
    </table> 
+0

'DOMXpath'如果*表達式的格式不正確或contextnode無效* – adeneo 2015-04-04 09:19:33

+0

能否請您提供標記的相關部分返回false你正在解析。從Firefox派生的XPath來自可包含隱含標記的實時DOM。所以以這種方式得到它們是不可靠的。此外,你究竟想要獲取什麼? – Gordon 2015-04-04 09:30:59

+1

用標記更新了OP,謝謝。在這種情況下,試圖獲取本地鏈接文本(例如文本)。例如,在前兩個單元中,這將是「弗林德斯」。 – Edward 2015-04-04 09:35:16

回答

0

有在HTML中沒有tbody
瀏覽器將插入在需要的地方tbody元素,但我們不使用的瀏覽器,我們正在使用DOMDocument不插入tbody元素。

相反,tr元素表

$elements = $xpath->query('//table/tr/td[4]/a'); 

foreach ($elements as $element) { 
    echo $dom->saveHTML($element); 
} 
+0

//應該與文檔中途的選擇一致嗎?從這個意義上講,如果table/tr/td是一個唯一的選擇器,那麼我們可以省略前面的部分路徑,仍然通過// table/tr/td [4]訪問相同的信息。那是不正確的? – Edward 2015-04-04 09:54:19

+0

@愛德華 - 是的,這是正確的,我只是從控制檯複製路徑,但測試它'/ table/tr/td [4]/a'也可以,但是你得到了什麼'// tbody/tr/td [4]/a'不起作用 – adeneo 2015-04-04 10:07:33

+0

可能因爲沒有tbody,呃。 – adeneo 2015-04-04 10:08:55

1

的直接孩子看來,查詢返回某種布爾而不是查詢匹配列表?

是的,它可以返回一個布爾值,然後將是FALSE。它表示有一個錯誤運行xpath查詢。這可以通過傳遞給DOMXpath::query()Php Manual兩個參數中的一個引起的,或者是xpath表達式上下文節點

在你的情況下,你只使用一個參數,所以這表示xpath表達式是錯誤的。然而,你使用的是沒有錯的,不會導致布爾FALSE。但是,當你遇到這種錯誤,我認爲可能有其他錯誤,所以可能xpath對象沒有完全初始化,但即使沒有或部分下載我模擬我無法重現錯誤。這可能與PHP版本有所不同?我不知道。

對於實際XPath表達式,它適用什麼adeneo戈登已經寫的<tbody> - 元素插入到Firefox瀏覽器的DOM,在PHP DOM文檔執行不同的行爲在這裏。你可以在這裏模擬Firefox(更多的工作) - 或者 - 你只是搜索實際的表格元素,然後它可以正常工作。在這裏工作的例子:

$url = 'http://apps.aec.gov.au/eSearch/LocalitySearchResults.aspx?filter=3977&filterby=Postcode'; 

# Create a DOMDocument to parse HTML 
$doc = new DOMDocument(); 
$saved = libxml_use_internal_errors(true); 
$result = $doc->loadHTMLFile($url); 
libxml_use_internal_errors($saved); 
if (false === $result) { 
    throw new UnexpectedValueException(sprintf('Failed to create DOMDocument from url %s', var_export($url, true))); 
} 

# Create a DOMXPath to get data from HTML document 
$xpath = new DOMXpath($doc); 

$expression = '//table/tr/td[4]/a'; 
$elements = $xpath->query($expression); 
if (false === $elements) { 
    throw new UnexpectedValueException(sprintf('The xpath expression %s failed', var_export($expression, true))); 
} 

foreach ($elements as $index => $element) { 
    printf("#%02d: %s\n", $index + 1, trim($element->textContent)); 
} 

與具體的輸出:

#01: Flinders 
#02: Flinders 
#03: Holt 
#04: Flinders 
#05: Holt 
#06: Holt 
#07: Flinders 
#08: Holt 
#09: Flinders 
#10: Flinders 
#11: Flinders 
#12: Isaacs 
#13: Dunkley 
#14: Isaacs