使用CURL解析單個鏈接並將其保存在txt文件中

-1

我的任務是僅解析來自給定URL的單個鏈接。使用CURL解析單個鏈接並將其保存在txt文件中

問題是，每次刷新頁面時，我都會使用Curl下載目標網站，並使用正則表達式來查找鏈接。當給定的鏈接相同時，如何避免再次下載目標網站？

$url = 'http://ruh.kz'; 

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt ($ch , CURLOPT_USERAGENT , "Mozilla/5.0 "); 
curl_setopt ($ch , CURLOPT_RETURNTRANSFER , 1); 
$content = curl_exec($ch); 
curl_close($ch); 

$link = preg_match_all('/<h3 class="entry"><a href="(.*)">(.*)<\/a><\/h3>/', $content, $matches); 
$link = $matches[1][0]; 
$title = $matches[2][0];

輸出：

<a href="http://ruh.kz<?php print $link; ?>" target="_blank"><?php print $title; ?></a>

來源

2012-03-03 Heihachi

解決這個問題的最簡單的解決辦法是記住在緩存中的所有解析/加載的URL。這意味着，無論何時處理成功，都將URL存儲在會話/ cookie /數據庫中（以最好的方式爲您提供服務）。

頁面刷新首先首先檢查這個緩存。如果URL沒有存儲在那裏，那麼加載/解析是很好的。

來源

2012-03-03 08:54:47 mschloesser

您可以使用simple html dom先做一個foreach，然後根據需要解析鏈接。

require('simple_html_dom.php'); 
    $url = 'http://ruh.kz'; 
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_URL, $url); 
    curl_setopt ($ch , CURLOPT_USERAGENT , "Mozilla/5.0 "); 
    curl_setopt ($ch , CURLOPT_RETURNTRANSFER , 1); 
    $content = curl_exec($ch); 
    curl_close($ch); 
    $html= str_get_html($content); 
    foreach($html->find('.entry') as $element){ 
     preg_match_all('/<a href="(.*)">(.*)<\/a>/', $element, $matches); 
     $link = $matches[1][0]; 
     $title = $matches[2][0]; 
     echo '<a href="http://ruh.kz'.$link,'" target="_blank">'.$title.'</a><br />'; 
    }

來源

2012-03-03 09:21:01 Giberno

但是，每次刷新頁面時都會啓動該功能嗎？ – Heihachi 2012-03-03 09:22:28

是的，當你刷新頁面時，它會捕獲鏈接模擬Mozilla瀏覽器，所以如果你不需要，你可以將它保存爲'txt，html'或'sql data'。 – Giberno 2012-03-03 09:25:57

使用CURL解析單個鏈接並將其保存在txt文件中

回答

相關問題