2014-09-21 75 views
1

我參與了一個需要刪除URL的項目在推文中找到匹配的URL

在Ruby 1.8.7中是否有一個很好的正則表達式來匹配任何URL?

PS - 我有一個

/(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»「」‘’])\/*)/x 

的正則表達式是在http://www.rubular.com/r/UFDveGLNGt用幾個例子弦 我如何匹配像fb.me或goo.gl或google.com網站?

+0

最後有一個非轉義正斜槓。最後應該是:')\ /))/' – 2014-09-21 09:30:05

+0

@ enrico.bacis 檢查此[Rubular鏈接](http://www.rubular.com/r/dYTJUJTETH) 之前嘗試過 - 一些非常複雜錯誤 – 2014-09-21 09:32:50

+0

@ enrico.bacis 我修正了它[Rubular Link] http://www.rubular.com/r/OGt43uxWfw。但仍然不匹配所有類型的網址我希望 – 2014-09-21 09:38:29

回答

0

關於什麼:

require 'uri' 
URI.regexp 

這個計算結果爲:

/ 
    ([a-zA-Z][\-+.a-zA-Z\d]*):       (?# 1: scheme) 
    (?: 
     ((?:[\-_.!~*'()a-zA-Z\d;?:@&=+$,]|%[a-fA-F\d]{2})(?:[\-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]|%[a-fA-F\d]{2})*)     (?# 2: opaque) 
    | 
     (?:(?: 
     \/\/(?: 
      (?:(?:((?:[\-_.!~*'()a-zA-Z\d;:&=+$,]|%[a-fA-F\d]{2})*)@)?  (?# 3: userinfo) 
       (?:((?:(?:[a-zA-Z0-9\-.]|%\h\h)+|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|\[(?:(?:[a-fA-F\d]{1,4}:)*(?:[a-fA-F\d]{1,4}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})|(?:(?:[a-fA-F\d]{1,4}:)*[a-fA-F\d]{1,4})?::(?:(?:[a-fA-F\d]{1,4}:)*(?:[a-fA-F\d]{1,4}|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}))?)\]))(?::(\d*))?))? (?# 4: host, 5: port) 
      | 
      ((?:[\-_.!~*'()a-zA-Z\d$,;:@&=+]|%[a-fA-F\d]{2})+)     (?# 6: registry) 
      ) 
     | 
     (?!\/\/))       (?# XXX: '\/\/' is the mark for hostport) 
     (\/(?:[\-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*(?:;(?:[\-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*)*(?:\/(?:[\-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*(?:;(?:[\-_.!~*'()a-zA-Z\d:@&=+$,]|%[a-fA-F\d]{2})*)*)*)?     (?# 7: path) 
     )(?:\?((?:[\-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]|%[a-fA-F\d]{2})*))?     (?# 8: query) 
    ) 
    (?:\#((?:[\-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]|%[a-fA-F\d]{2})*))?     (?# 9: fragment) 
    /x 

這將可能是比什麼都好,我們可以來這裏(否則考慮將它提交給ruby

+0

它不匹配像pic.twitter.com/*的東西 - 他們真的很常見推特 – 2014-09-21 09:21:10

+0

得到了一個很好的正則表達式 - [鏈接](http://www.rubular.com/ r/UFDveGLNGt) 但不匹配google.com或fb.me或9gag.tv等網站鏈接 – 2014-09-21 10:26:57