我想學習網頁抓取我選擇https://www.betfair.com作爲一個例子,我已經成功獲取了很多頁面的數據,但是當我要去訪問https://www.betfair.com/sport/horse-racing我沒有得到但是,如果我從瀏覽器中查看頁面源並向我顯示數據,那麼它不會出現內容是由JavaScript或類似內容生成的問題。 這裏是我的代碼:curl沒有顯示正確的來源,通過瀏覽器查看頁面源查看
$url ='https://www.betfair.com/sport/horse-racing';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$page = curl_exec($ch);
curl_close($ch);
echo $page;
如果您可以通過瀏覽器查看源代碼時,看看你能找到這樣的:
<a href="/sport/horse-racing?action=loadRacingSpecials&tab=SPECIALS& modules=multipick-horse-racing" class="ui-nav link ui-clickselect ui-ga- click" data-dimension3="sports-header" data-dimension4="Specials" data-dimension5="Horse Racing" data-gacategory="Interface" data-gaaction="Clicked Horse Racing Header" data-galabel="Specials"
data-loader=".multipick-content-container > div, .antepost-content- container > div, .future-racing-content-container > div, .bet-finder-content- container > div, .racing-specials-content-container > div, .future-racing- market-content-container > div"
>
Specials</a>
但捲曲沒有得到這些元素。
它是在$頁面結果保存到一個文件,你會看到結果http://prntscr.com/edcdny – Faxsy
@Faxsy當我贊同這是我的本地網頁,看看源它不存在你能告訴我它的表現嗎? – Codester