除正確匹配的特定域名之外的URL正則表達式

我試圖用正則表達式匹配一些表達式，但它不起作用。我想匹配一個不以http://www.domain.com開頭的字符串。這是我的正則表達式：除正確匹配的特定域名之外的URL正則表達式

^https?:\/\/(www\.)?(?!domain\.com)

是否有我的正則表達式的問題？

我想匹配的表達以http：//但不同於http://site.com 例如：

/page.html => false 
http://www.google.fr => true 
http://site.com => false 
http://site.com/page.html => false

來源

2013-03-27 guillaume

''^字符以外類意味着「行的開始」，而不是「不」。 – geoffspear 2013-03-27 15:47:33

你可以發表一個你期望/不匹配的例子，但是沒有？正則表達式看起來是合理的。也沒有必要轉義'/'。 – FatalError 2013-03-27 15:49:23

使用此匹配不具有你所提到的域名的URL：在行動https?://(?!(www\.domain\.com\/?)).*

例子：http://regexr.com?34a7p

來源

2013-03-27 16:02:58 Daedalus

感謝此解決方案正在工作 – guillaume 2013-03-27 16:06:36

@guillaume - 沒問題。 – Daedalus 2013-03-27 16:07:07

你要負先行斷言：

^https?://(?!(?:www\.)?site\.com).+

其中給出：

>>> testdata = '''\ 
... /page.html => false 
... http://www.google.fr => true 
... http://site.com => false 
... http://site.com/page.html => false 
... '''.splitlines() 
>>> not_site_com = re.compile(r'^https?://(?!(?:www\.)?site\.com).+') 
>>> for line in testdata: 
...  match = not_site_com.search(line) 
...  if match: print match.group() 
... 
http://www.google.fr => true

請注意，該模式不包括www.site.com和site.com：

>>> not_site_com.search('https://www.site.com') 
>>> not_site_com.search('https://site.com') 
>>> not_site_com.search('https://site-different.com') 
<_sre.SRE_Match object at 0x10a548510>

來源

2013-03-27 15:55:12

Oups，我忘了一些細節，我編輯我的第一篇文章 – guillaume 2013-03-27 15:57:38

@guillaume：對，那麼你仍然需要一個負面的預見斷言。 – 2013-03-27 16:08:12

這裏的問題是，當正則表達式引擎遇到負面前瞻中的成功匹配，它會將比賽視爲失敗（如預期的那樣），並回溯到量化爲可選的前一組(www\.)，然後查看該表達是否成功沒有它。這是你看過的。

這可以通過應用原子分組或佔有量詞來修復，以「忘記」回溯的可能性。不幸的是python正則表達式不支持本地。相反，您必須使用效率更低的方法：使用更大的預見。

^https?:\/\/(?!(www\.)?(domain\.com))

來源

2013-03-27 16:06:56 JonM

+1，但爲什麼在前瞻中包含「https？：//」？ – FatalError 2013-03-27 16:08:48

OP仍然需要匹配以「http：//」或「https：//」開頭的行，只是* not *與域名。 – 2013-03-27 16:09:57

好點，雖然它不應該對錶達式的整體結果產生影響，但它可能會使效率下降得更少。我改變了答案來反映這一點。 – JonM 2013-03-27 16:16:50

除正確匹配的特定域名之外的URL正則表達式

回答

相關問題