2017-02-25 88 views
2

我想從字符串中刪除多個網址。如果字符串如下:R中的正則表達式:從字符串中刪除多個網址

this is a URL http://test.com and another one http://test.com/hi and this one http://www.test.com/

它應該返回

this is a URL and another one and this one

我用下面的代碼嘗試:

gsub(" ?(f|ht)(tp)(s?)(://)(.*)[.|/](.*)", "", string)

但它返回我:

this is a URL

回答

2

這一次也將工作,而不是(.*)我們可以使用[^\\.]*(至域的點)和\\S*匹配,直到URL的末尾(直到空間被發現):

gsub("\\s?(f|ht)(tp)(s?)(://)([^\\.]*)[\\.|/](\\S*)", "", string) 
# [1] "this is a URL and another one and this one" 
1

.*將匹配字符串之前沒有約束的結束,因此所有零件的第一個URL被刪除後,通常網址中不包含空格,你可以使用\\S(比賽無空格),而不是. (匹配任何字符),以避免該問題:

gsub(" ?(f|ht)(tp)s?(://)(\\S*)[./](\\S*)", "", string) 
# [1] "this is a URL and another one and this one" 
1

您可以嘗試使用下面的正則表達式/代碼

gsub("https?:\\/\\/(.*?|\\/)(?=\\s|$)\\s?", "", string) 
# [1] "this is a URL and another one and this one" 

DEMO

相關問題