正則表達式爲字符串

有誰知道正則表達式我可以用它來找到一個字符串中的URL中找到的網址？我發現了很多關於谷歌正則表達式來確定是否整個字符串是一個URL，但我需要能夠搜索整個字符串的URL。例如，我希望能夠找到www.google.com和http://yahoo.com以下字符串：正則表達式爲字符串

Hello www.google.com World http://yahoo.com

我不是在尋找字符串中的具體網址。我正在尋找字符串中的所有URL，這就是爲什麼我需要一個正則表達式。

來源

2011-05-17 user758263

如果您有整個字符串的表達式，只需取^和$ out以使它們匹配字符串的部分。 – entonio 2011-05-17 22:55:13

如果您有URL模式，你應該能夠在你的字符串搜索。只要確保圖案不必須^和$標誌着URL字符串的開頭和結尾。因此，如果P是對URL的模式，尋找比賽爲P.

來源

2011-05-17 22:54:06 manojlds

這是我發現的驗證整個字符串是否爲URL的正則表達式。我就像你說的那樣，在開始和結束時拿出^，但它仍然不起作用。我究竟做錯了什麼？ '^（HTTP | HTTPS | FTP）\：// [A-ZA-Z0-9 \ - \] + \ [A-ZA-Z] {2,3}（：[A-ZA- Z0-9] *）？/？（[a-zA-Z0-9 \ - \。\ \？\，\'/ \\\ + &％\ $＃\ =〜]）* [^ \。\ ，\）\（\ s] $' – user758263 2011-05-17 23:19:58

如果你顯示了你正在使用的語言，它可能會有所幫助。無論哪種方式，一定要檢查'http：// regexpal.com /';你可以測試不同的表達方式字符串，直到你得到它的權利 – entonio 2011-05-17 23:37:12

@ user758263 - 你真的需要這樣一個複雜的正則表達式的url嗎？取決於你可能找到的可能的url。另請參閱http://gskinner.com/RegExr/嘗試正則表達式他們也有在右邊數百個樣品的'Community'標籤包括那些對於網址 – manojlds 2011-05-18 00:06:47

140

這是我使用的一個

(http|ftp|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\[email protected]?^=%&/~+#-])?

對我的作品，應爲你工作了。

來源

2011-05-18 08:37:53 CodeWrite

我試過，但它說「投票需要15聲望」。對不起regexhacks :( – user758263 2011-05-20 20:56:53

這是我所見過的最偉大的事情。你不知道，你到底有多少時間救了我。 – 2014-07-02 12:41:58

的''&在表達腥。這哪裏是應該在使用？ – nhahtdh 2015-07-30 17:34:04

-1

我用找出兩個點或時間段

之間的文本的邏輯下面的正則表達式正常工作與蟒蛇

(?<=\.)[^}]*(?=\.)

來源

2014-08-26 18:37:13 faisal00813

這是/調整（這取決於你的需要）略有改善拉傑夫的回答是：

([\w\-_]+(?:(?:\.|\s*\[dot\]\s*[A-Z\-_]+)+))([A-Z\-\.,@?^=%&amp;:/~\+#]*[A-Z\-\@?^=%&amp;/~\+#]){2,6}?

爲它做什麼和不匹配的例子見here。

我擺脫了「http」等檢查，因爲我想趕上網址沒有這個。我稍微在正則表達式中添加了一些混淆的url（即用戶使用[dot]而不是「。」）。最後，我用「A-Z」替換了「\ w」和「{2,3}」以減少像v2.0和「moo.0dd」這樣的誤報。

對此歡迎的任何改進。

來源

2015-01-19 10:43:56 avjaarsveld

'[a-zA-Z] {2,3}'對於匹配TLD確實很差，請參閱官方列表：https：//data.iana.org/TLD/tlds-alpha-by-domain.txt。你的正則表達式匹配'_......... &&&&&&''不確定它是一個有效的url。 – Toto 2015-01-19 11:06:04

感謝那個JE SUIS CHAELIE，有任何改進建議（特別是對於誤報）？ – avjaarsveld 2015-01-19 16:31:55

我用下面的正則表達式找到URL字符串中：

/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/

來源

2015-01-19 10:47:28 aditya

'[a-zA-Z] {2,3}'對於匹配TLD確實很差，請參閱官方列表：https：//data.iana.org/TLD/tlds-alpha-by-domain.txt – Toto 2015-01-19 11:04:15

想沒有正則表達式是爲這個完美的使用。我發現了一個非常堅實的here

/(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$])/igm

一些分歧/相比，這裏張貼其他的優點：

它確實不匹配的電子郵件地址
它確實匹配本地主機：12345
它不會檢測到類似moo.com沒有http或www

見here的例子

來源

2015-03-26 21:08:40

示例提供非常有說服力的 – 2015-07-12 14:59:43

它匹配www.e 這不是一個有效的網址 – 2016-12-20 22:46:41

-1

這是最好的一個。

NSString *urlRegex="(http|ftp|https|www|gopher|telnet|file)(://|.)([\\w_-]+(?:(?:\\.[\\w_-]+)‌+))([\\w.,@?^=%&:/~+#-]*[\\[email protected]?^=%&/~+#-])?";

來源

2015-08-28 07:16:44 Dhinakar

上述所有的答案不匹配的URL Unicode字符，例如：http://google.com?query=đức+filan+đã+search

對於解決方案，這一項應該工作：

(ftp:\/\/|www\.|https?:\/\/){1}[a-zA-Z0-9u00a1-\uffff0-]{2,}\.[a-zA-Z0-9u00a1-\uffff0-]{2,}(\S*)

來源

2016-06-22 06:33:00

根據URL上的RFC 1738禁止Unicode字符（http://www.faqs.org/rfcs/rfc1738.html）。他們必須將百分比編碼爲符合標準 - 儘管我認爲它最近可能已更改 - 值得閱讀https://www.w3.org/International/articles/idn-and-iri/ – mrswadge 2016-09-07 09:41:49

@mrswadge我剛剛案件。我們不確定是否所有人都關心標準。感謝您的信息。 – 2016-09-12 02:54:53

-1

匹配一個URL中文字不應該這麼複雜

(?:(?:(?:ftp|http)[s]*:\/\/|www\.)[^\.]+\.[^ \n]+)

https://regex101.com/r/wewpP1/2

來源

2016-11-03 15:11:10

嘗試用您的正則表達式找到「google.com」。 – Squazz 2016-12-20 12:22:41

-1

String regex = "[a-zA-Z0-9]+[.]([.a-zA-Z0-9])+";

這部作品在你的案例也沒關係。

來源

2016-11-08 13:38:50 ARP

這裏提供的解決方案中沒有解決的問題/使用情況我了。

我在這裏提供的，是我所發現的最佳/迄今所取得。當我發現它不處理的新邊緣案例時，我會更新它。

\b 
    #Word cannot begin with special characters 
    (?<![@.,%&#-]) 
    #Protocols are optional, but take them with us if they are present 
    (?<protocol>\w{2,10}:\/\/)? 
    #Domains have to be of a length of 1 chars or greater 
    ((?:\w|\&\#\d{1,5};)[.-]?)+ 
    #The domain ending has to be between 2 to 15 characters 
    (\.([a-z]{2,15}) 
     #If no domain ending we want a port, only if a protocol is specified 
     |(?(protocol)(?:\:\d{1,6})|(?!))) 
\b 
#Word cannot end with @ (made to catch emails) 
(?![@]) 
#We accept any number of slugs, given we have a char after the slash 
(\/)? 
#If we have endings like ?=fds include the ending 
(?:([\w\d\?\-=#:%@&.;])+(?:\/(?:([\w\d\?\-=#:%@&;.])+))*)? 
#The last char cannot be one of these symbols .,?!,- exclude these 
(?<![.,?!-])

來源

2016-12-20 12:21:22 Squazz

簡單而簡單。我沒有在JavaScript代碼中進行過測試，但看起來它的工作：

((http|ftp|https):\/\/)?(([\w.-]*)\.([\w]*))

Code on regex101.com

來源

2017-11-12 12:18:10 bafsar

如果你必須要嚴格的選擇鏈接，我會去：

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»「」‘’]))

來源

2017-11-12 12:29:29

不要這樣做。 http://www.regular-expressions.info/catastrophic.html 它會殺了你的應用程序... – Auric 2017-11-28 19:22:04

我想這正則表達式處理正是你想要什麼

/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/

，這是一個片段爲例，提取URL：

// The Regular Expression filter 
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/"; 

// The Text you want to filter for urls 
$text = "The text you want https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string to filter goes here."; 

// Check if there is a url in the text 
preg_match_all($reg_exUrl, $text, $url,$matches); 
var_dump($matches);

來源

2017-12-10 05:43:46 zhilevan

我用這個

^(https?:\\/\\/([a-zA-z0-9]+)(\\.[a-zA-z0-9]+)(\\.[a-zA-z0-9\\/\\=\\-\\_\\?]+)?)$

來源

2018-01-10 16:36:39

一個可能太簡單但工作方法可能是：

[localhost|http|https|ftp|file]+://[\w\S(\.|:|/)]+

我測試了Python和只要字符串解析包含空格應該罰款之前和之後，並沒有在URL（這是我以前從未見過）。

Here is an online ide demonstrating it

但是這裏是使用它的一些好處：

它承認file:和localhost以及IP地址
將永遠比賽沒有他們
它確實不介意不尋常的字符，如#或-（請參閱本文的網址）

來源

2018-02-06 19:52:00 Simon

text = """The link of this question: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string 
Also there are some urls: www.google.com, facebook.com, http://test.com/method?param=wasd 
The code below catches all urls in text and returns urls in list.""" 

urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', text) 
print(urls)

輸出：

[ 
    'https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string', 
    'www.google.com', 
    'facebook.com', 
    'http://test.com/method?param=wasd' 
]

來源

2018-02-13 14:56:34 GooDeeJaY

這是一個最簡單的一種。這對我很好。

%(http|ftp|https|www)(://|\.)[A-Za-z0-9-_\.]*(\.)[a-z]*%

來源

2018-02-20 16:04:20

正則表達式爲字符串

回答

相關問題