我該如何檢測什麼是Imgur圖片鏈接，哪些不是？

我試圖以編程方式確定鏈接是否是鏈接到Imgur圖像或不。一個Imgur圖片鏈接的一個例子是：http://imgur.com/0AKSCQ4或http://i.imgur.com/0AKSCQ4.jpg（第一是間接的聯繫，而後者是直接的，但編號保持不變）我該如何檢測什麼是Imgur圖片鏈接，哪些不是？

我想http://imgur.com/0AKSCQ4時如果Imgur鏈接要求評估，以true ，但http://imgur.com/gallery爲false。我很困惑如何區分這兩者，當他們都imgur.com/*letters*。

我問，因爲我知道Reddit Enhancement Suite有這個功能。如果我發佈http://imgur.com/gallery它不提供圖像按鈕來預覽它，但它會爲http://imgur.com/0AKSCQ4

那麼我將如何能夠識別此？找到不符合條件的每個詞，例如gallery，jobs或about在imgur.com/*whatever*中看起來真的很亂，並且會在添加任何新頁面時崩潰。並且在第二部分中不存在總是的數字，所以我不能依靠它來識別它。

來源

2014-10-20 Doug Smith

當然，你有這樣做的一個優選的框架。考慮一下，你應該首先用合適的URL解析器解析URL，然後將測試應用到主機名和相對路徑組件（可能還要檢查協議，端口等）。有一種高度發展的URL混淆科學，旨在打敗基於字符串模式的測試。 – 2014-10-20 01:58:28

什麼框架？特別針對Imgur鏈接？不幸的是，我沒有。 – 2014-10-20 02:31:17

您用於大部分應用程序開發的框架。您是否將此作爲網絡服務？然後像ASP.NET或PHP或Rails。即使你對其他實現開放，也可以說出你最熟悉的內容。 – 2014-10-20 02:47:57

運行該代碼段爲JavaScript例如

$(function(){ 
 
    
 
    var url_re = /https?[^<"]+/g /* pattern for url-like substrings */ 
 
    
 
    var txt = $(".post-text").html(); /* taking this question text as input */ 
 
    
 
\t while(m = url_re.exec(txt)){ /* match all url-like substrings in input */ 
 
     
 
     /* verify if it's a imgur URL */ 
 
     
 
\t \t var imgur_re = /^https?:\/\/(\w+\.)?imgur.com\/(\w*\d\w*)+(\.[a-zA-Z]{3})?$/ 
 
     
 
     
 
     /* Show result */ 
 
     
 
     $("#results").append("<li>" + m + ": " + imgur_re.test(m) + "</li>"); 
 
\t } 
 
    
 
});

<ul id="results"></ul> 
 

 
<div class="post-text" itemprop="text"> 
 
<p>I'm trying to programmatically figure out whether or not an link is a link to an Imgur image or not. An example of an Imgur image link would be: <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a> or <a href="http://i.imgur.com/0AKSCQ4.jpg" rel="nofollow">http://i.imgur.com/0AKSCQ4.jpg</a> (the first is an indirect link and the latter is direct, but the ID stays the same)</p> 
 

 
<p>I want <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a> to evaluate to <code>true</code> when asked if an Imgur link, but <a href="http://imgur.com/gallery" rel="nofollow">http://imgur.com/gallery</a> to be <code>false</code>. I'm confused how to distinguish between those two when they're both <code>imgur.com/*letters*</code>.</p> 
 

 
<p>I ask because I know <a href="http://redditenhancementsuite.com" rel="nofollow">Reddit Enhancement Suite</a> has this functionality. If I post <a href="http://imgur.com/gallery" rel="nofollow">http://imgur.com/gallery</a> it doesn't offer an image button to preview it, but it would for <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a></p> 
 

 
<p>So how would I be able to identify this? Finding every word that doesn't qualify, like <code>gallery</code>, <code>jobs</code>, or <code>about</code> in <code>imgur.com/*whatever*</code> would seem really hacky, and would break upon any new page being added. And there's not <em>always</em> numbers in the second part so I can't rely on that to identify it.</p> 
 
</div> 
 

 

 
<script type="text/javascript" src="//code.jquery.com/jquery-2.1.1.min.js"></script>

來源

2014-10-20 11:43:44 kums

只是一個用於解析ID的替代正則表達式。這將匹配/不包含「http（s）：//」，並從i.imgur.com中提取ID（包括縮略圖後綴和網頁），圖庫圖像（可以從imgur中以普通圖像的形式檢索） API，我正在使用），當然還有定期圖片。請注意「www。」不匹配，因爲imgur應該自動重定向而不使用「www」，所以人們不應該提供這樣的URL。 '（？：HTTPS：\/\ /）？？？？？（？：I \）imgur \ .COM \ /（?:長廊\ /）（+（= [sbtmlh] \ .. {3， 4）} |。+（？= \ .. {3,4}）|。+？（？= \ s））' – cyanic 2016-03-01 15:50:34

編輯修復錨定到最後（我的用例需要鏈接在中（？：https：\/\ /）？（?: i \。）？imgur \ .com \ /（?: gallery \ /）？（。+（？= [sbtmlh] \ .. {3,4}）| +（？= \ .. {3,4}）| +（：？？？（= \ s）| $））' – cyanic 2016-03-01 15:57:44

我該如何檢測什麼是Imgur圖片鏈接，哪些不是？

回答

相關問題