比較URL數組中的主機名並獲取唯一值

我需要比較URL並從陣列中刪除重複項，但我只想比較來自url的主機。當我比較時，我需要跳過http和https以及www和其他最後一個斜槓。所以當我有數組：比較URL數組中的主機名並獲取唯一值

$urls = array(
'http://www.google.com/test', 
'https://www.google.com/test', 
'https://www.google.com/example', 
'https://www.facebook.com/example', 
'http://www.facebook.com/example');

結果將是隻

http://www.google.com/test 
http://www.google.com/example 
http://www.facebook.com/example

我試圖比較喜歡：

$urls = array_udiff($urls, $urls, function ($a, $b) { 
       return strcmp(preg_replace('|^https?://(www\\.)?|', '', rtrim($a,'/')), preg_replace('|^https?://(www\\.)?|', '', rtrim($b,'/'))); 
      });

但它返回我空數組。

來源

2017-09-27 LukeKov

也許添加正則表達式標記。 – charlesreid1

看看[這個]（http://php.net/manual/en/function.parse-url.php） – gmc

但是，你可以向我展示工作示例或任何想法？ – LukeKov

<?php 
    $urls = array(
    'http://www.google.com/test', 
    'https://www.google.com/test', 
    'https://www.google.com/example', 
    'https://www.facebook.com/example', 
    'http://www.facebook.com/example'); 


$MyArray = []; 
for($i=0;$i<count($urls);$i++) { 

preg_match_all('/www.(.*)/', $urls[$i], $matches); 

    if (!in_array($matches[1], $MyArray)) 
     $MyArray[] = $matches[1]; 
} 

echo "<pre>"; 
print_r($MyArray); 
echo "</pre>";

，輸出是

Array 
(
    [0] => Array 
     (
      [0] => google.com/test 
     ) 

    [1] => Array 
     (
      [0] => google.com/example 
     ) 

    [2] => Array 
     (
      [0] => facebook.com/example 
     ) 

)

修剪只保留主機名

來源

2017-09-27 09:43:57 pr1nc3

我提出了我的問題。我需要比較所有主機名和所有主機名後例如http：//www/google.com/test我需要檢查是否在數組中我有google.com/test，如果我有thid重複然後刪除您的代碼工作很好，但我需要與主機名後的所有着陸頁進行比較 – LukeKov

我用新的正則表達式更新了我的答案。如果它適合你，請接受它。 – pr1nc3

它仍然不一樣，我需要刪除所有域名之前。我嘗試了一些像^^https？：//（www \\。）？|的想法 – LukeKov

試試這個辦法：

<?php 
function parseURLs(array $urls){ 
    $rs = []; 
    foreach($urls as $url){ 
     $segments = parse_url($url); 
     if(!in_array($segments['host'], $rs)) 
      $rs[] = $segments['host']; 
    } 
    return $rs; 
}

然後：

<?php 
$urls = array(
    'http://www.google.com', 
    'https://www.google.com', 
    'https://www.google.com/', 
    'https://www.facebook.com', 
    'http://www.facebook.com' 
); 
$uniqueURLs = parseURLs($urls); 
print_r($uniqueURLs); 

/* result : 
Array 
(
    [0] => www.google.com 
    [1] => www.facebook.com 
) 
*/

來源

2017-09-27 09:26:04 mrJ0ul3

我還有一個問題，如果我想比較hostanme與路徑例如http://www.google.com/test，並希望只比較google.com/test？ – LukeKov

基本上我們使用['parse_url']（http://php.net/manual/en/function.parse-url.php）來提取url，這個函數也返回路徑。只需稍微修改parseURLS fn以檢查路徑值。 [這裏]（https://gist.github.com/tajhulfaijin/a623772931919886d9ea2cc9b84e90cd） – mrJ0ul3

您通過URL的，解析URL需要循環使用PHP的url_parse()功能和使用array_unique刪除從數組重複的，所以我們正在檢查主機和路徑..

我寫了一個類你：

<?php 
/** Get Unique Values from array Values **/ 
Class Parser { 
    //Url Parser Function 
    public function arrayValuesUrlParser($urls) { 
     //Create Container 
     $parsed = []; 
     //Loop Through the Urls 
     foreach($urls as $url) { 
      $parse = parse_url($url); 
      $parsed[] = $parse["host"].$parse["path"]; 
      //Delete Duplicates 
      $result = array_unique($parsed); 
     } 
     //Dump result 
     print_r($result); 
    } 

} 

?>

用你能做到在一個文件中的類

<?php 
//Inlcude tghe Parser 
include_once "Parser.php"; 

    $urls = array(
    'http://www.google.com/test', 
    'https://www.google.com/test', 
    'https://www.google.com/example', 
    'https://www.facebook.com/example', 
    'http://www.facebook.com/example'); 
    //Instantiate 
    $parse = new Parser(); 
    $parse->arrayValuesUrlParser($urls); 

?>

，如果你不需要單獨的文件，但你將不得不如果你正在使用一個PHP文件刪除include_once。這個類也在PHP類上，爲了好玩！

祝你好運！

來源

2017-09-27 10:08:31

如果我想比較着陸頁我的問題是更新 – LukeKov

您只需連接此。$ parse [「path」];我已經更新了這個類.. –

它看起來不錯，如果路徑後面會有查詢。其次認爲這需要這個www。我有時不需要www – LukeKov

比較URL數組中的主機名並獲取唯一值

回答

相關問題