2011-12-01 51 views
-2

我想訪問此list的多個頁面。我的嘗試是下面的代碼,我得到一個XML文件與第一頁的酒店數據,但我想訪問其他酒店的頁面.. ..如何做到這一點?使用PHP/cURL的Web自動化

正如你可以想象的所有網頁的網址是相同的。

<?php 

//extract data from the post 
extract($_POST); 

//set POST variables 
$url = 'http://www.turismovenezia.it/index.php'; 

$fields1 = array(
      'ajax'=>'searchEngineTopdata', 
      'next_pair'=>'Dove Allogiare|*', 
      'lang'=>'it'); 


$fields2 = array(

'ajax'=>'xmlSearchEngineResponder', 
'xml' => "%3C%3Fxml%20version%3D%221.0%22%3F%3E%3CSearchRequest%20xmlns%3D%22http%3A%2F%2Fwww.liberologico.com%2Fdbsite%2Fjolly-search%22%20xmlns%3Axsi%3D%22http%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema-instance%22%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5B%2A%5D%5D%3E%3C%2FScope%3E%3CFilters%3E%3CFilters%20xsi%3Atype%3D%22FilterSpecType%22%3E%3CField%3Eaptve_territorio%3C%2FField%3E%3CValue%3E%3CSingleValue%3E%3C%21%5BCDATA%5B%2A%5D%5D%3E%3C%2FSingleValue%3E%3C%2FValue%3E%3CMode%3ETHESAURUS%3C%2FMode%3E%3COperation%3ELIKE%3C%2FOperation%3E%3C%2FFilters%3E%3CFilters%20xsi%3Atype%3D%22FilterSpecType%22%3E%3CField%3Efull_text_search%3C%2FField%3E%3CValue%3E%3CSingleValue%3E%3C%21%5BCDATA%5B%2A%5D%5D%3E%3C%2FSingleValue%3E%3C%2FValue%3E%3CMode%3EFREE_TEXT%3C%2FMode%3E%3COperation%3ELIKE%3C%2FOperation%3E%3C%2FFilters%3E%3CFilters%20xsi%3Atype%3D%22FilterSpecType%22%3E%3CField%3Elang%3C%2FField%3E%3CValue%3E%3CSingleValue%3E%3C%21%5BCDATA%5Bit%5D%5D%3E%3C%2FSingleValue%3E%3C%2FValue%3E%3CMode%3EFREE_TEXT%3C%2FMode%3E%3COperation%3EEQUAL%3C%2FOperation%3E%3C%2FFilters%3E%3C%2FFilters%3E%3CSubSearches%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BEventi%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BArte%20%26%20Cultura%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BMare%20%26%20Natura%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BPiatti%20%26%20Prodotti%20tipici%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BRelax%20%26%20Divertimento%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BDove%20Alloggiare%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BDove%20Mangiare%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BInformazioni%20Utili%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3C%2FSubSearches%3E%3C%2FSearch%3E%3CActiveResultSet%3E%3CTab%3E%3C%21%5BCDATA%5BDove%20Alloggiare%5D%5D%3E%3C%2FTab%3E%3CFirstItem%3E0%3C%2FFirstItem%3E%3CPagerSize%3E10%3C%2FPagerSize%3E%3C%2FActiveResultSet%3E%3C%2FSearchRequest%3E", 
'force' => 'false'); 

//open connection 
$ch = curl_init(); 

//set the url, number of POST vars, POST data 
curl_setopt($ch,CURLOPT_URL,$url); 
curl_setopt($ch,CURLOPT_POST, true); 
curl_setopt($ch,CURLOPT_POSTFIELDS, $fields1); 
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); 

//set the url, number of POST vars, POST data 
curl_setopt($ch,CURLOPT_URL,$url); 
curl_setopt($ch,CURLOPT_POST, true); 
curl_setopt($ch,CURLOPT_POSTFIELDS, $fields2); 
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); 

//execute post 
$result = curl_exec($ch); 

echo $result; 

//close connection 
curl_close($ch); 

哈維

+2

的可能重複[替代PHP /捲曲腳本:網絡自動化(http://stackoverflow.com/questions/8344136/alternative-to-php-curl-scripts-web-automation) – mario

+2

請不要重複你自己的問題。 – hakre

回答

0

首先,你爲什麼要配置兩次捲曲?

//set the url, number of POST vars, POST data 
curl_setopt($ch,CURLOPT_URL,$url); 
curl_setopt($ch,CURLOPT_POST, true); 
curl_setopt($ch,CURLOPT_POSTFIELDS, $fields1); 
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); 

//set the url, number of POST vars, POST data 
curl_setopt($ch,CURLOPT_URL,$url); 
curl_setopt($ch,CURLOPT_POST, true); 
curl_setopt($ch,CURLOPT_POSTFIELDS, $fields2); 
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); 

然後,當你用$ fields2(長XML字符串)第一頁第一個呼叫,該呼叫應返回的XML(像你說的),並在該XML響應也包含一個字段ItemCount中酒店的總數。

如果您查看使用$ fields2發送的XML長字符串,則會有一個字段調用「FirstItem」,其中第一個調用的值爲0。這個字段是你抵消的,你可以增加它來跳過酒店。

實施例:

$fields2 = array(

'ajax'=>'xmlSearchEngineResponder', 
'xml' => "%3C%3Fxml%20version%3D%221.0%22%3F%3E%3CSearchRequest%20xmlns%3D%22http%3A%2F%2Fwww.liberologico.com%2Fdbsite%2Fjolly-search%22%20xmlns%3Axsi%3D%22http%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema-instance%22%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5B%2A%5D%5D%3E%3C%2FScope%3E%3CFilters%3E%3CFilters%20xsi%3Atype%3D%22FilterSpecType%22%3E%3CField%3Eaptve_territorio%3C%2FField%3E%3CValue%3E%3CSingleValue%3E%3C%21%5BCDATA%5B%2A%5D%5D%3E%3C%2FSingleValue%3E%3C%2FValue%3E%3CMode%3ETHESAURUS%3C%2FMode%3E%3COperation%3ELIKE%3C%2FOperation%3E%3C%2FFilters%3E%3CFilters%20xsi%3Atype%3D%22FilterSpecType%22%3E%3CField%3Efull_text_search%3C%2FField%3E%3CValue%3E%3CSingleValue%3E%3C%21%5BCDATA%5B%2A%5D%5D%3E%3C%2FSingleValue%3E%3C%2FValue%3E%3CMode%3EFREE_TEXT%3C%2FMode%3E%3COperation%3ELIKE%3C%2FOperation%3E%3C%2FFilters%3E%3CFilters%20xsi%3Atype%3D%22FilterSpecType%22%3E%3CField%3Elang%3C%2FField%3E%3CValue%3E%3CSingleValue%3E%3C%21%5BCDATA%5Bit%5D%5D%3E%3C%2FSingleValue%3E%3C%2FValue%3E%3CMode%3EFREE_TEXT%3C%2FMode%3E%3COperation%3EEQUAL%3C%2FOperation%3E%3C%2FFilters%3E%3C%2FFilters%3E%3CSubSearches%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BEventi%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BArte%20%26%20Cultura%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BMare%20%26%20Natura%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BPiatti%20%26%20Prodotti%20tipici%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BRelax%20%26%20Divertimento%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BDove%20Alloggiare%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BDove%20Mangiare%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BInformazioni%20Utili%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3C%2FSubSearches%3E%3C%2FSearch%3E%3CActiveResultSet%3E%3CTab%3E%3C%21%5BCDATA%5BDove%20Alloggiare%5D%5D%3E%3C%2FTab%3E%3CFirstItem%3E0%3C%2FFirstItem%3E%3CPagerSize%3E10%3C%2FPagerSize%3E%3C%2FActiveResultSet%3E%3C%2FSearchRequest%3E", 
'force' => 'false'); 

返回你的前10個結果;

$fields2 = array(
'ajax'=>'xmlSearchEngineResponder', 
'xml' => "%3C%3Fxml%20version%3D%221.0%22%3F%3E%3CSearchRequest%20xmlns%3D%22http%3A%2F%2Fwww.liberologico.com%2Fdbsite%2Fjolly-search%22%20xmlns%3Axsi%3D%22http%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema-instance%22%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5B%2A%5D%5D%3E%3C%2FScope%3E%3CFilters%3E%3CFilters%20xsi%3Atype%3D%22FilterSpecType%22%3E%3CField%3Eaptve_territorio%3C%2FField%3E%3CValue%3E%3CSingleValue%3E%3C%21%5BCDATA%5B%2A%5D%5D%3E%3C%2FSingleValue%3E%3C%2FValue%3E%3CMode%3ETHESAURUS%3C%2FMode%3E%3COperation%3ELIKE%3C%2FOperation%3E%3C%2FFilters%3E%3CFilters%20xsi%3Atype%3D%22FilterSpecType%22%3E%3CField%3Efull_text_search%3C%2FField%3E%3CValue%3E%3CSingleValue%3E%3C%21%5BCDATA%5B%2A%5D%5D%3E%3C%2FSingleValue%3E%3C%2FValue%3E%3CMode%3EFREE_TEXT%3C%2FMode%3E%3COperation%3ELIKE%3C%2FOperation%3E%3C%2FFilters%3E%3CFilters%20xsi%3Atype%3D%22FilterSpecType%22%3E%3CField%3Elang%3C%2FField%3E%3CValue%3E%3CSingleValue%3E%3C%21%5BCDATA%5Bit%5D%5D%3E%3C%2FSingleValue%3E%3C%2FValue%3E%3CMode%3EFREE_TEXT%3C%2FMode%3E%3COperation%3EEQUAL%3C%2FOperation%3E%3C%2FFilters%3E%3C%2FFilters%3E%3CSubSearches%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BEventi%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BArte%20%26%20Cultura%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BMare%20%26%20Natura%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BPiatti%20%26%20Prodotti%20tipici%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BRelax%20%26%20Divertimento%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BDove%20Alloggiare%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BDove%20Mangiare%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3CSearch%3E%3CScope%3E%3C%21%5BCDATA%5BInformazioni%20Utili%5D%5D%3E%3C%2FScope%3E%3C%2FSearch%3E%3C%2FSubSearches%3E%3C%2FSearch%3E%3CActiveResultSet%3E%3CTab%3E%3C%21%5BCDATA%5BDove%20Alloggiare%5D%5D%3E%3C%2FTab%3E%3CFirstItem%3E10%3C%2FFirstItem%3E%3CPagerSize%3E10%3C%2FPagerSize%3E%3C%2FActiveResultSet%3E%3C%2FSearchRequest%3E", 
'force' => 'false'); 

會返回下一個10個結果。還有一個字段調用PagerSize可以讓你一次檢索更多的結果。

所以我會做第一個電話來獲得酒店的總數,並得到所有其他網頁的循環。

//do first curl call 

//then 
$totalHotel = 5214; // To retrieve in the first call 
$increment = 10; // the number of hotel to treat at once 
$nbOfHotelimported = $increment; 
while($totalHotel-$increment){ 
// do another curl call 
// with FirstItem set to $nbOfHotelimported 
// and pageSizer set to $increment 

$nbOfHotelimported += $increment; 
} 
0

相反刮初始HTML頁面本身,你應該獲取的URL,該網站本身訪問使用AJAX。您可以使用瀏覽器的開發人員工具來窺探請求其他結果頁面並「複製」它們時發出的AJAX請求,從而準確找出應該如何構建請求。

順便說一句,取決於爲什麼和如何這可能不是最道德或合法的事情。