Preg-replace - 替換除域和其子域以外的所有URL

我有一個Glype代理，我不想分析外部URL。網頁上的所有網址都會自動轉換爲：http://proxy.com/browse.php?u=[URL HERE]。例如：如果我訪問海盜灣在我代理的話，我想不解析以下網址：Preg-replace - 替換除域和其子域以外的所有URL

ByteLove.com (Not to: http://proxy.com/browse.php?u=http://bytelove.com&b=0) 
BayFiles.com (Not to: http://proxy.com/browse.php?u=http://bayfiles.com&b=0) 
BayIMG.com (Not to: http://proxy.com/browse.php?u=http://bayimg.com&b=0) 
PasteBay.com (Not to: http://proxy.com/browse.php?u=http://pastebay.com&b=0) 
Ipredator.com (Not to: http://proxy.com/browse.php?u=https://ipredator.se&b=0) 
etc.

我當然想保持內部URL，因此：

thepiratebay.se/browse (To: http://proxy.com/browse.php?u=http://thepiratebay.se/browse&b=0) 
thepiratebay.se/top (To: http://proxy.com/browse.php?u=http://thepiratebay.se/top&b=0) 
thepiratebay.se/recent (To: http://proxy.com/browse.php?u=http://thepiratebay.se/recent&b=0) 
etc.

有preg_replace替換除了thepiratebay.se之外的所有URL，還有子域名（如示例中所示）？另一個功能也是受歡迎的。（如DOM文檔，QueryPath中，SUBSTR或strpos不str_replace函數，因爲那時我應該定義的所有URL）。

我找到了一些東西，但我不熟悉的preg_replace：

$exclude = '.thepiratebay.se'; 
$pattern = '(https?\:\/\/.*?\..*?)(?=\s|$)'; 
$message= preg_replace("~(($exclude)?($pattern))~i", '$2<a href="$4" target="_blank">$5</a>$6', $message);

來源

2012-03-03 Ton Hoekstra

我猜你會需要提供一個白名單來判斷哪些領域應該被代理

$whitelist = array(); 
$whitelist[] = "internal1.se"; 
$whitelist[] = "internal2.no"; 
$whitelist[] = "internal3.com"; 
// and so on... 

$string = '<a href="http://proxy.org/browse.php?u=http%3A%2F%2Fexternal1.com&b=0">External link 1</a><br>'; 
$string .= '<a href="http://proxy.org/browse.php?u=http%3A%2F%2Finternal1.se&b=0">Internal link 1</a><br>'; 
$string .= '<a href="http://proxy.org/browse.php?u=http%3A%2F%2Finternal3.com&b=0">Internal link 2</a><br>'; 
$string .= '<a href="http://proxy.org/browse.php?u=http%3A%2F%2Fexternal2.no&b=0">External link 2</a><br>'; 

//Assuming the URL always is inside '' or "" you can use this pattern: 
$pattern = '#(https?://proxy\.org/browse\.php\?u=(https?[^&|\"|\']*)(&?[^&|\"|\']*))#i'; 

$string = preg_replace_callback($pattern, "my_callback", $string); 

//I had only PHP 5.2 on my server, so I decided to use a callback function. 
function my_callback($match) { 
    global $whitelist; 
    // set return bypass proxy URL 
    $returnstring = urldecode($match[2]); 

    foreach ($whitelist as $white) { 
     // check if URL matches whitelist 
     if (stripos($match[2], $white) > 0) { 
      $returnstring = $match[0]; 
      break; } } 
    return $returnstring; 
} 

echo "NEW STRING[:\n" . $string . "\n]\n";

來源

2012-03-03 17:08:01

它不工作，這是我的代碼：http://pastebin.com/6ML8q7JN URL的位於：$ document – 2012-03-03 18:03:09

我需要查看$ document變量的內容以評估鱈魚是否可以工作。 – 2012-03-03 18:11:42

它現在正在工作，但_＆b = 0_在url後面。如何解決這個問題？ – 2012-03-04 15:55:41

可以使用preg_replace_callback()爲每個匹配執行回調函數。在該函數中，您可以確定是否應該轉換匹配的字符串。

<?php 
$string = 'http://foobar.com/baz and http://example.org/bumm'; 
$pattern = '#(https?\:\/\/.*?\..*?)(?=\s|$)#i'; 
$string = preg_replace_callback($pattern, function($match) { 
    if (stripos($match[0], 'example.org/') !== false) { 
     // exclude all URLs containing example.org 
     return $match[0]; 
    } else { 
     return 'http://proxy.com/?u=' . urlencode($match[0]); 
    } 
}, $string); 

echo $string, "\n";

（例子是使用PHP 5.3閉符號）

來源

2012-03-03 12:26:58 rodneyrehm

Preg-replace - 替換除域和其子域以外的所有URL

回答

相關問題