從字符串中獲取所有網址？

我有一個包含URL和其他文本的字符串。我想將所有的URL都存入$matches數組中。但是，下面的代碼將無法獲得全部的URL中$matches陣列：從字符串中獲取所有網址？

$matches = array(); 
$text = "words cotry.lk and newe.com joemiller.us schoollife.edu hello.net some random news.yahoo.com text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988 and others will en.wikipedia.org/wiki/Country_music URL"; 

preg_match_all('$\b[-A-Z0-9+&@#/%?=~_|!:,.;][.]*[-A-Z0-9+&@#/%=~_|(https?|ftp|file)://-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%?=~_|!:,.;]{8,50}$i', $text, $matches); 
print_r($matches);

上面的代碼不會告訴我以下網址：

cotry.lk 
newe.com

你能告訴我一個例子，如何我可以修改上面的代碼來獲取所有的URL。

請注意，並非所有的URL都包含herf，並且這個字符串不是從html文件中獲取的。

來源

2013-04-27 Learner_51

對於您的情況，您的正則表達式只匹配網址，因爲它們的長度 - 它也匹配長度超過8個字符的任何其他單詞 – 2013-04-27 12:29:50

import re 
def getall_urls(value): 
    pattern = '((?:[\w\d]+\:\/\/)?(?:[\w\-\d]+\.)+[\w\-\d]+(?:\/[\w\-\d]+)*(?:\/|\.[\w\-\d]+)?(?:\?[\w\-\d]+\=[\w\-\d]+\&?)?(?:\#[\w\-\d]*)?)' 
    # Place matches into list (a.k.a array) 
    getall = re.findall(pattern, value) # preg_match_all() function in PHP 
    # Remove duplicates and return the result 
    return set(getall) if getall else()

這裏是Python代碼，做的正是你所需要的。表達最初是在互聯網上發現和修改的。儘管這段代碼是用Python編寫的，但您也可以在PHP中輕鬆使用表達式。

來源

2013-04-27 12:59:29 vaultah

非常感謝您向我解釋細節。該正則表達式運行良好。感謝你的幫助。 – 2013-04-27 13:05:39

如果我是你，我不會使用preg_match_all，如果你想檢查字符串的有效地址。相反，我會將字符串切成單詞並使其變得艱難。

filter_var($url, FILTER_VALIDATE_URL)

如果返回true，你知道它是一個有效的URL

來源

2013-04-27 12:32:06 Funonly

謝謝您的回覆。使用您的建議，我只能獲得以http：//開頭的網址。諸如schoollife.edu等其他網址將被忽略。 – 2013-04-27 13:01:22

從字符串中獲取所有網址？

回答

相關問題