2011-10-08 76 views
1

我需要一個正則表達式來查找任何不在[url(= ...)] ... [/ url]標籤內的URL。換句話說,我想鏈接任何未鏈接的URL,並將鏈接替換爲[url]鏈接[/ url],以便我使用的解析器可以照常處理它。鏈接未鏈接的網址(BBCode)正則表達式

我一直在試圖理解負面的lookaheads(這顯然是我應該使用的),但我不能把它弄明白。

這是我到目前爲止有:

preg_replace('/(?!\[url(=.*?)?\])(https?|ftps?|irc):\/\/(www\.)?(\w+(:\w+)[email protected])?[a-z0-9-]+(\.[a-z0-9-])*.*(?!\[\/url\])/i',"[url]$0[/url]",$Str); 

感謝

+0

您可能還需要驗證URL不是'[IMG]'標籤內,如果你的BB代碼分析器允許那些。 – ridgerunner

+0

實際上,我並沒有在我的網站上解析img標籤,所以它都很好。 – user966939

回答

3

我的解決辦法:

<?php 
$URLRegex = '/(?:(?<!(\[\/url\]|\[\/url=))(\s|^))';  // No [url]-tag in front and is start of string, or has whitespace in front 
$URLRegex.= '(';          // Start capturing URL 
$URLRegex.= '(https?|ftps?|ircs?):\/\/';    // Protocol 
$URLRegex.= '\S+';          // Any non-space character 
$URLRegex.= ')';          // Stop capturing URL 
$URLRegex.= '(?:(?<![[:punct:]])|(?<=\/))(\s|\.?$)/i'; // Doesn't end with punctuation (excluding /) and is end of string (with a possible dot at the end), or has whitespace after 

$Str = preg_replace($URLRegex,"$2[url]$3[/url]$5",$Str); 
?> 
+0

如果URL位於字符串的末尾(也就是說,它不屬於鏈接的一部分),它也允許URL後的點。 – user966939

+0

喜歡它!感謝您的正則表達式。 – Tanoro

+0

您是否可以編輯您的答案以匹配以'/'結尾的網址?不匹配以標點符號結尾的網頁是很好的,除了以'/'結尾幾乎總是網址的一部分。我嘗試使用字符類減法來修改'[:punct:]',但不幸的是,這在PCRE中不受支持。 –

1

Linkifying未鏈接的網址是不是不重要的。有很多陷阱(請參閱:The Problem with URLs)以及本博客條目後的評論主題。如果您希望跳過已鏈接的網址,則問題會複雜化。我已經研究過這個問題,並一直致力於解決方案 - 一個開源項目:LinkifyURL。這是一個功能最新的化身,它可以完成你所要求的功能。請注意,正則表達式並非微不足道(但事實並非如此)。

function linkify($text) { 
    $url_pattern = '/# Rev:20100913_0900 github.com\/jmrware\/LinkifyURL 
    # Match http & ftp URL that is not already linkified. 
     # Alternative 1: URL delimited by (parentheses). 
     (\()      # $1 "(" start delimiter. 
     ((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+) # $2: URL. 
     (\))      # $3: ")" end delimiter. 
    | # Alternative 2: URL delimited by [square brackets]. 
     (\[)      # $4: "[" start delimiter. 
     ((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+) # $5: URL. 
     (\])      # $6: "]" end delimiter. 
    | # Alternative 3: URL delimited by {curly braces}. 
     (\{)      # $7: "{" start delimiter. 
     ((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+) # $8: URL. 
     (\})      # $9: "}" end delimiter. 
    | # Alternative 4: URL delimited by <angle brackets>. 
     (<|&(?:lt|\#60|\#x3c);) # $10: "<" start delimiter (or HTML entity). 
     ((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+) # $11: URL. 
     (>|&(?:gt|\#62|\#x3e);) # $12: ">" end delimiter (or HTML entity). 
    | # Alternative 5: URL not delimited by(), [], {} or <>. 
     (      # $13: Prefix proving URL not already linked. 
     (?:^    # Can be a beginning of line or string, or 
     | [^=\s\'"\]]   # a non-"=", non-quote, non-"]", followed by 
     ) \s*[\'"]?   # optional whitespace and optional quote; 
     | [^=\s]\s+    # or... a non-equals sign followed by whitespace. 
    )      # End $13. Non-prelinkified-proof prefix. 
     (\b      # $14: Other non-delimited URL. 
     (?:ht|f)tps?:\/\/  # Required literal http, https, ftp or ftps prefix. 
     [a-z0-9\-._~!$\'()*+,;=:\/?#[\]@%]+ # All URI chars except "&" (normal*). 
     (?:     # Either on a "&" or at the end of URI. 
      (?!     # Allow a "&" char only if not start of an... 
      &(?:gt|\#0*62|\#x0*3e);     # HTML ">" entity, or 
      | &(?:amp|apos|quot|\#0*3[49]|\#x0*2[27]); # a [&\'"] entity if 
      [.!&\',:?;]?  # followed by optional punctuation then 
      (?:[^a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]|$) # a non-URI char or EOS. 
     ) &     # If neg-assertion true, match "&" (special). 
      [a-z0-9\-._~!$\'()*+,;=:\/?#[\]@%]* # More non-& URI chars (normal*). 
     )*      # Unroll-the-loop (special normal*)*. 
     [a-z0-9\-_~$()*+=\/#[\]@%] # Last char can\'t be [.!&\',;:?] 
    )      # End $14. Other non-delimited URL. 
    /imx'; 
    $url_replace = '$1$4$7$10$13<a href="$2$5$8$11$14">$2$5$8$11$14</a>$3$6$9$12'; 
    return preg_replace($url_pattern, $url_replace, $text); 
} 

該解決方案也存在一定的侷限性,最近我一直在努力的改進版本(這是簡單,效果更好) - 但它尚未準備好黃金時段。

請務必查看linkify test page,我在這裏列出了真正難以匹配的野外URL列表。

+0

好奇,如果你有你的改進版本工作? –

+0

@Jeff Widman - 抱歉,我從來沒有完成過更新的版本 - 這一個這裏仍然是我現在最好的一個。 – ridgerunner