2010-08-19 90 views
121

我試圖讓curl遵循重定向,但我無法完全正確地工作。我有一個字符串,我想將其作爲GET參數發送到服務器並獲取生成的URL。如何找到我將使用捲曲重定向的位置?

例子:

字符串= 狗頭害蟲
URL = www.wowhead.com/search?q=Kobold+Worker

如果你去那個網址它會將您重定向到「www.wowhead.com/npc=257」。我想讓curl將這個URL返回給我的PHP代碼,這樣我就可以提取「npc = 257」並使用它。

當前代碼:

function npcID($name) { 
    $urltopost = "http://www.wowhead.com/search?q=" . $name; 
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"); 
    curl_setopt($ch, CURLOPT_URL, $urltopost); 
    curl_setopt($ch, CURLOPT_REFERER, "http://www.wowhead.com"); 
    curl_setopt($ch, CURLOPT_HTTPHEADER, Array("Content-Type:application/x-www-form-urlencoded")); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); 
    return curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); 
} 

然而,這將返回www.wowhead.com/search?q=Kobold+Worker而不是www.wowhead.com/npc=257

我懷疑在外部重定向發生之前PHP會返回。我怎樣才能解決這個問題?

+6

這是「捲曲跟隨重定向」的主要問題之一。要使用'curl'命令自動跟蹤重定向,請傳遞'-L'或'--location'標誌。例如。 'curl -L http:// example.com /' – 2013-09-09 19:09:15

回答

214

爲了使捲曲遵循重定向,使用:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 

呃......我不認爲你實際上是在執行卷曲...嘗試:

curl_exec($ch);

...設置選項後,並在撥打curl_getinfo()之前。

編輯:如果你只是想找出一個頁面重定向到,我會使用的建議here,只是使用curl搶頭和提取地點:從他們頭:

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_HEADER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
$result = curl_exec($ch); 
if (preg_match('~Location: (.*)~i', $result, $match)) { 
    $location = trim($match[1]); 
} 
+1

這使得php遵循重定向。我不想跟隨重定向,我只想知道重定向頁面的網址。 – 2010-08-19 08:50:33

+8

噢,所以你實際上並不想抓取頁面?只需找出位置?在這種情況下,我建議使用這裏的策略:http://zzz.rezo.net/HowTo-Expand-Short-URLs.html - 基本上只需從重定向頁面抓取標題,然後獲取位置:頭從它。無論哪種方式,但你仍然需要爲Curl執行exec()來實際執行任何操作...... – 2010-08-19 09:03:28

+4

謝謝,這個工作就像一個魅力:) – 2010-08-19 10:00:32

8

上面的答案在我的一臺服務器上不適用於我,這對於basedir來說是有用的,所以我重新對它進行了一些修改。下面的代碼適用於我的所有服務器。

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_HEADER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
$a = curl_exec($ch); 
curl_close($ch); 
// the returned headers 
$headers = explode("\n",$a); 
// if there is no redirection this will be the final url 
$redir = $url; 
// loop through the headers and check for a Location: str 
$j = count($headers); 
for($i = 0; $i < $j; $i++){ 
// if we find the Location header strip it and fill the redir var  
if(strpos($headers[$i],"Location:") !== false){ 
     $redir = trim(str_replace("Location:","",$headers[$i])); 
     break; 
    } 
} 
// do whatever you want with the result 
echo redir; 
+0

'Location:'標題並不總是遵循重定向。也請看到一個明確的問題:[curl跟蹤位置錯誤](http://stackoverflow.com/questions/2511410/curl-follow-location-error) – hakre 2013-03-13 09:19:50

4

這裏所選擇的答案是不錯,但其區分大小寫,並不能防止相對location:頭(其中一些網站做),或實際上可能短語Location:其含量Zillow的網頁...(目前確實如此)。

有點草率,但一對夫婦快速編輯,使這個有點聰明是:

function getOriginalURL($url) { 
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_URL, $url); 
    curl_setopt($ch, CURLOPT_HEADER, true); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
    $result = curl_exec($ch); 
    $httpStatus = curl_getinfo($ch, CURLINFO_HTTP_CODE); 
    curl_close($ch); 

    // if it's not a redirection (3XX), move along 
    if ($httpStatus < 300 || $httpStatus >= 400) 
     return $url; 

    // look for a location: header to find the target URL 
    if(preg_match('/location: (.*)/i', $result, $r)) { 
     $location = trim($r[1]); 

     // if the location is a relative URL, attempt to make it absolute 
     if (preg_match('/^\/(.*)/', $location)) { 
      $urlParts = parse_url($url); 
      if ($urlParts['scheme']) 
       $baseURL = $urlParts['scheme'].'://'; 

      if ($urlParts['host']) 
       $baseURL .= $urlParts['host']; 

      if ($urlParts['port']) 
       $baseURL .= ':'.$urlParts['port']; 

      return $baseURL.$location; 
     } 

     return $location; 
    } 
    return $url; 
} 

注意,這仍然只去1個重定向深。要深入下去,您實際上需要獲取內容並遵循重定向。

4

有時你需要得到HTTP頭,但在同一時間,你不想返回這些頭。**

這個骨架承擔餅乾的關懷和使用遞歸HTTP重定向。此處的主要想法是以避免將HTTP標頭返回給客戶端代碼。

你可以在它上面建立一個非常強大的捲曲類。加入POST功能等

<?php 

class curl { 

    static private $cookie_file   = ''; 
    static private $user_agent    = ''; 
    static private $max_redirects   = 10; 
    static private $followlocation_allowed = true; 

    function __construct() 
    { 
    // set a file to store cookies 
    self::$cookie_file = 'cookies.txt'; 

    // set some general User Agent 
    self::$user_agent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)'; 

    if (! file_exists(self::$cookie_file) || ! is_writable(self::$cookie_file)) 
    { 
     throw new Exception('Cookie file missing or not writable.'); 
    } 

    // check for PHP settings that unfits 
    // correct functioning of CURLOPT_FOLLOWLOCATION 
    if (ini_get('open_basedir') != '' || ini_get('safe_mode') == 'On') 
    { 
     self::$followlocation_allowed = false; 
    }  
    } 

    /** 
    * Main method for GET requests 
    * @param string $url URI to get 
    * @return string  request's body 
    */ 
    static public function get($url) 
    { 
    $process = curl_init($url);  

    self::_set_basic_options($process); 

    // this function is in charge of output request's body 
    // so DO NOT include HTTP headers 
    curl_setopt($process, CURLOPT_HEADER, 0); 

    if (self::$followlocation_allowed) 
    { 
     // if PHP settings allow it use AUTOMATIC REDIRECTION 
     curl_setopt($process, CURLOPT_FOLLOWLOCATION, true); 
     curl_setopt($process, CURLOPT_MAXREDIRS, self::$max_redirects); 
    } 
    else 
    { 
     curl_setopt($process, CURLOPT_FOLLOWLOCATION, false); 
    } 

    $return = curl_exec($process); 

    if ($return === false) 
    { 
     throw new Exception('Curl error: ' . curl_error($process)); 
    } 

    // test for redirection HTTP codes 
    $code = curl_getinfo($process, CURLINFO_HTTP_CODE); 
    if ($code == 301 || $code == 302) 
    { 
     curl_close($process); 

     try 
     { 
     // go to extract new Location URI 
     $location = self::_parse_redirection_header($url); 
     } 
     catch (Exception $e) 
     { 
     throw $e; 
     } 

     // IMPORTANT return 
     return self::get($location); 
    } 

    curl_close($process); 

    return $return; 
    } 

    static function _set_basic_options($process) 
    { 

    curl_setopt($process, CURLOPT_USERAGENT, self::$user_agent); 
    curl_setopt($process, CURLOPT_COOKIEFILE, self::$cookie_file); 
    curl_setopt($process, CURLOPT_COOKIEJAR, self::$cookie_file); 
    curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
    // curl_setopt($process, CURLOPT_VERBOSE, 1); 
    // curl_setopt($process, CURLOPT_SSL_VERIFYHOST, false); 
    // curl_setopt($process, CURLOPT_SSL_VERIFYPEER, false); 
    } 

    static function _parse_redirection_header($url) 
    { 
    $process = curl_init($url);  

    self::_set_basic_options($process); 

    // NOW we need to parse HTTP headers 
    curl_setopt($process, CURLOPT_HEADER, 1); 

    $return = curl_exec($process); 

    if ($return === false) 
    { 
     throw new Exception('Curl error: ' . curl_error($process)); 
    } 

    curl_close($process); 

    if (! preg_match('#Location: (.*)#', $return, $location)) 
    { 
     throw new Exception('No Location found'); 
    } 

    if (self::$max_redirects-- <= 0) 
    { 
     throw new Exception('Max redirections reached trying to get: ' . $url); 
    } 

    return trim($location[1]); 
    } 

} 
-3

您可以使用:

$redirectURL = curl_getinfo($ch,CURLINFO_REDIRECT_URL); 
14

添加此行卷曲inizialization

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 

和使用程序getinfo前curl_close

$redirectURL = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL); 

ES :

$ch = curl_init($url); 
curl_setopt($ch, CURLOPT_HEADER, false); 
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0); 
curl_setopt($ch, CURLOPT_TIMEOUT, 60); 
$html = curl_exec($ch); 
$redirectURL = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL); 
curl_close($ch); 
+2

我認爲這是一個更好的解決方案,因爲它也展示了多個重定向。 – 2015-04-12 20:24:33

+0

記住:(ok,duh)POST數據在重定向後不會被重新提交。 在我的情況下,發生了這種情況,之後我感到很蠢,因爲:只是使用適當的URL並且它是固定的。 – twicejr 2017-05-22 17:57:25