2014-10-20 22 views
1

我試圖以編程方式確定鏈接是否是鏈接到Imgur圖像或不。一個Imgur圖片鏈接的一個例子是:http://imgur.com/0AKSCQ4http://i.imgur.com/0AKSCQ4.jpg(第一是間接的聯繫,而後者是直接的,但編號保持不變)我該如何檢測什麼是Imgur圖片鏈接,哪些不是?

我想http://imgur.com/0AKSCQ4時如果Imgur鏈接要求評估,以true ,但http://imgur.com/galleryfalse。我很困惑如何區分這兩者,當他們都imgur.com/*letters*

我問,因爲我知道Reddit Enhancement Suite有這個功能。如果我發佈http://imgur.com/gallery它不提供圖像按鈕來預覽它,但它會爲http://imgur.com/0AKSCQ4

那麼我將如何能夠識別此?找到不符合條件的每個詞,例如galleryjobsaboutimgur.com/*whatever*中看起來真的很亂,並且會在添加任何新頁面時崩潰。並且在第二部分中不存在總是的數字,所以我不能依靠它來識別它。

+0

當然,你有這樣做的一個優選的框架。考慮一下,你應該首先用合適的URL解析器解析URL,然後將測試應用到主機名和相對路徑組件(可能還要檢查協議,端口等)。有一種高度發展的URL混淆科學,旨在打敗基於字符串模式的測試。 – 2014-10-20 01:58:28

+0

什麼框架?特別針對Imgur鏈接?不幸的是,我沒有。 – 2014-10-20 02:31:17

+0

您用於大部分應用程序開發的框架。您是否將此作爲網絡服務?然後像ASP.NET或PHP或Rails。即使你對其他實現開放,也可以說出你最熟悉的內容。 – 2014-10-20 02:47:57

回答

2

運行該代碼段爲JavaScript例如

$(function(){ 
 
    
 
    var url_re = /https?[^<"]+/g /* pattern for url-like substrings */ 
 
    
 
    var txt = $(".post-text").html(); /* taking this question text as input */ 
 
    
 
\t while(m = url_re.exec(txt)){ /* match all url-like substrings in input */ 
 
     
 
     /* verify if it's a imgur URL */ 
 
     
 
\t \t var imgur_re = /^https?:\/\/(\w+\.)?imgur.com\/(\w*\d\w*)+(\.[a-zA-Z]{3})?$/ 
 
     
 
     
 
     /* Show result */ 
 
     
 
     $("#results").append("<li>" + m + ": " + imgur_re.test(m) + "</li>"); 
 
\t } 
 
    
 
});
<ul id="results"></ul> 
 

 
<div class="post-text" itemprop="text"> 
 
<p>I'm trying to programmatically figure out whether or not an link is a link to an Imgur image or not. An example of an Imgur image link would be: <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a> or <a href="http://i.imgur.com/0AKSCQ4.jpg" rel="nofollow">http://i.imgur.com/0AKSCQ4.jpg</a> (the first is an indirect link and the latter is direct, but the ID stays the same)</p> 
 

 
<p>I want <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a> to evaluate to <code>true</code> when asked if an Imgur link, but <a href="http://imgur.com/gallery" rel="nofollow">http://imgur.com/gallery</a> to be <code>false</code>. I'm confused how to distinguish between those two when they're both <code>imgur.com/*letters*</code>.</p> 
 

 
<p>I ask because I know <a href="http://redditenhancementsuite.com" rel="nofollow">Reddit Enhancement Suite</a> has this functionality. If I post <a href="http://imgur.com/gallery" rel="nofollow">http://imgur.com/gallery</a> it doesn't offer an image button to preview it, but it would for <a href="http://imgur.com/0AKSCQ4" rel="nofollow">http://imgur.com/0AKSCQ4</a></p> 
 

 
<p>So how would I be able to identify this? Finding every word that doesn't qualify, like <code>gallery</code>, <code>jobs</code>, or <code>about</code> in <code>imgur.com/*whatever*</code> would seem really hacky, and would break upon any new page being added. And there's not <em>always</em> numbers in the second part so I can't rely on that to identify it.</p> 
 
</div> 
 

 

 
<script type="text/javascript" src="//code.jquery.com/jquery-2.1.1.min.js"></script>

+0

只是一個用於解析ID的替代正則表達式。這將匹配/不包含「http(s)://」,並從i.imgur.com中提取ID(包括縮略圖後綴和網頁),圖庫圖像(可以從imgur中以普通圖像的形式檢索) API,我正在使用),當然還有定期圖片。請注意「www。」不匹配,因爲imgur應該自動重定向而不使用「www」,所以人們不應該提供這樣的URL。 '(?:HTTPS:\/\ /)?????(?:I \)imgur \ .COM \ /(?:長廊\ /)(+(= [sbtmlh] \ .. {3, 4)} |。+(?= \ .. {3,4})|。+?(?= \ s))' – cyanic 2016-03-01 15:50:34

+0

編輯修復錨定到最後(我的用例需要鏈接在中(?:https:\/\ /)?(?: i \。)?imgur \ .com \ /(?: gallery \ /)?(。+(?= [sbtmlh] \ .. {3,4})| +(?= \ .. {3,4})| +(:???(= \ s)| $))' – cyanic 2016-03-01 15:57:44

相關問題