2016-09-29 178 views
3

我很努力使用正則表達式從youtube url中提取視頻ID。在.net中從youtube url中提取視頻ID

"(?:.+?)?(?:\\/v\\/|watch\\/|\\?v=|\\&v=|youtu\\.be\\/|\\/v=|^youtu\\.be\\/)([a-zA-Z0-9_-]{11})+";

它的工作,因爲它匹配的視頻ID,但我想在YouTube的域來限制它,我不希望它匹配的ID如果域訪問youtube.com或youtu.be不同。不幸的是我無法理解這個正則表達式來應用這個限制。

我想匹配的ID只有當域:

  • www.youtube.com
  • youtube.com
  • youtu.be
  • www.youtu.be

用http或https在前面(或不在)

上述正則表達式被成功匹配的以下實施例的YouTube的ID:

"http://youtu.be/AAAAAAAAA01" 
"http://www.youtube.com/embed/watch?feature=player_embedded&v=AAAAAAAAA02" 
"http://www.youtube.com/embed/watch?v=AAAAAAAAA03" 
"http://www.youtube.com/embed/v=AAAAAAAAA04" 
"http://www.youtube.com/watch?feature=player_embedded&v=AAAAAAAAA05" 
"http://www.youtube.com/watch?v=AAAAAAAAA06" 
"http://www.youtube.com/v/AAAAAAAAA07" 
"www.youtu.be/AAAAAAAAA08" 
"youtu.be/AAAAAAAAA09" 
"http://www.youtube.com/watch?v=i-AAAAAAA14&feature=related" 
"http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA" 
"http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail" 
"http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17" 
"http://www.youtube.com/v/A-AAAAAAA18?fs=1&rel=0" 
"http://www.youtube.com/watch/AAAAAAAAA11" 

,檢查該URL現在是當前的代碼:

private const string YoutubeLinkRegex = "(?:.+?)?(?:\\/v\\/|watch\\/|\\?v=|\\&v=|youtu\\.be\\/|\\/v=|^youtu\\.be\\/)([a-zA-Z0-9_-]{11})+"; 
    private static Regex regexExtractId = new Regex(YoutubeLinkRegex, RegexOptions.Compiled); 


    public string ExtractVideoIdFromUrl(string url) 
    { 
     //extract the id 
     var regRes = regexExtractId.Match(url); 
     if (regRes.Success) 
     { 
      return regRes.Groups[1].Value; 
     } 
     return null; 
    } 
+0

檢查此[正則表達式(http://stackoverflow.com/a/27796139/6290553) –

回答

2

問題是,正則表達式無法檢查採礦操作之前所需的字符串,並且同時將此採礦作爲採礦操作本身使用。

例如,我們來看看"http://www.youtu.be/v/AAAAAAAAA07" YouTu。是爲在URL的開頭強制性的,但採礦行動"/v/(11 chars)"

"http://www.youtu.be/AAAAAAAAA07"挖掘行動是"youtu.be/(11 chars)"

這不可能是在相同的正則表達式,這就是爲什麼我們不能檢查域提取物該id在相同的正則表達式。

我決定從有效域列表中檢查域權限,然後從URL中提取該ID。

private const string YoutubeLinkRegex = "(?:.+?)?(?:\\/v\\/|watch\\/|\\?v=|\\&v=|youtu\\.be\\/|\\/v=|^youtu\\.be\\/)([a-zA-Z0-9_-]{11})+"; 
private static Regex regexExtractId = new Regex(YoutubeLinkRegex, RegexOptions.Compiled); 
private static string[] validAuthorities = { "youtube.com", "www.youtube.com", "youtu.be", "www.youtu.be" }; 

public string ExtractVideoIdFromUri(Uri uri) 
{ 
    try 
    { 
     string authority = new UriBuilder(uri).Uri.Authority.ToLower(); 

     //check if the url is a youtube url 
     if (validAuthorities.Contains(authority)) 
     { 
      //and extract the id 
      var regRes = regexExtractId.Match(uri.ToString()); 
      if (regRes.Success) 
      { 
       return regRes.Groups[1].Value; 
      } 
     } 
    }catch{} 


    return null; 
} 

UriBuilder是優選的,因爲它可以理解更寬範圍的URL比Uri類。它可以從不包含方案的URL(如"youtube.com")創建Uri

該函數返回空值(正確)與下面的測試網址:

"ww.youtube.com/v/AAAAAAAAA13" 
"http:/www.youtube.com/v/AAAAAAAAA13" 
"http://www.youtub1e.com/v/AAAAAAAAA13" 
"http://www.vimeo.com/v/AAAAAAAAA13" 
"www.youtube.com/b/AAAAAAAAA13" 
"www.youtube.com/v/AAAAAAAAA1" 
"www.youtube.com/v/AAAAAAAAA1&" 
"www.youtube.com/v/AAAAAAAAA1/" 
".youtube.com/v/AAAAAAAAA13" 
1

septihhere

所述

我有一個玩的例子,並提出了這些: 。

Youtube:youtu(?:\.be|be\.com)/(?:.*v(?:/|=)|(?:.*/)?)([a-zA-Z0-9-_]+) 他們應該匹配所有給出的。 (?:...)表示括號內的所有內容都不會被捕獲。所以只有id應該被獲得。

8

使用正則表達式不要求這裏

var url = @"https://www.youtube.com/watch?v=6QlW4m9xVZY"; 
var uri = new Uri(url); 

// you can check host here => uri.Host <= "www.youtube.com" 

var query = HttpUtility.ParseQueryString(uri.Query); 
var videoId = query["v"]; 

// videoId = 6QlW4m9xVZY 

好了,上面的例子是工作,當你有V =視頻ID作爲參數。如果你有VideoID的如段,您可以使用此:

var url = "http://youtu.be/AAAAAAAAA09"; 
var uri = new Uri(url); 

var videoid = uri.Segments.Last(); // AAAAAAAAA09 

所有結合在一起,我們可以得到

var url = @"https://www.youtube.com/watch?v=Lvcyj1GfpGY&list=PLolZLFndMkSIYef2O64OLgT-njaPYDXqy"; 
var uri = new Uri(url); 

// you can check host here => uri.Host <= "www.youtube.com" 

var query = HttpUtility.ParseQueryString(uri.Query); 

var videoId = string.Empty; 

if (query.AllKeys.Contains("v")) 
{ 
    videoId = query["v"]; 
} 
else 
{ 
    videoId = uri.Segments.Last(); 
} 

Ofcourse,我不知道你需要什麼,但希望它幫助。

+1

我個人不喜歡使用正則表達式,當其他更可讀的選項存在 - 我喜歡這個答案比我自己:) – confusedandamused

+0

哦!我喜歡這個答案!注意,如果你還沒有這樣做,你需要爲'HttpUtility'添加一個對System.Web的引用。 –

+0

不幸的是,它不適用於:youtu.be/AAAAAAAAA09,www.youtube.com/watch/aaaaaaaaa,www.youtube.com/v/aaaaaaaa –

0

tym32167的回答拋出在var uri = new Uri(url);異常時url沒有一個計劃,像「www.youtu.be/AAAAAAAAA08 」。

此外,錯誤videoId是返回一些網址。

所以這裏是我的代碼基於tym32167的。

static private string GetYouTubeVideoIdFromUrl(string url) 
    { 
     Uri uri = null; 
     if (!Uri.TryCreate(url, UriKind.Absolute, out uri)) 
     { 
      try 
      { 
       uri = new UriBuilder("http", url).Uri; 
      } 
      catch 
      { 
       // invalid url 
       return ""; 
      } 
     } 

     string host = uri.Host; 
     string[] youTubeHosts = { "www.youtube.com", "youtube.com", "youtu.be", "www.youtu.be" }; 
     if (!youTubeHosts.Contains(host)) 
      return ""; 

     var query = HttpUtility.ParseQueryString(uri.Query); 

     if (query.AllKeys.Contains("v")) 
     { 
      return Regex.Match(query["v"], @"^[a-zA-Z0-9_-]{11}$").Value; 
     } 
     else if (query.AllKeys.Contains("u")) 
     { 
      // some urls have something like "u=/watch?v=AAAAAAAAA16" 
      return Regex.Match(query["u"], @"/watch\?v=([a-zA-Z0-9_-]{11})").Groups[1].Value; 
     } 
     else 
     { 
      // remove a trailing forward space 
      var last = uri.Segments.Last().Replace("/", ""); 
      if (Regex.IsMatch(last, @"^v=[a-zA-Z0-9_-]{11}$")) 
       return last.Replace("v=", ""); 

      string[] segments = uri.Segments; 
      if (segments.Length > 2 && segments[segments.Length - 2] != "v/" && segments[segments.Length - 2] != "watch/") 
       return ""; 

      return Regex.Match(last, @"^[a-zA-Z0-9_-]{11}$").Value; 
     } 
    } 

讓我們來測試它。

 string[] urls = {"http://youtu.be/AAAAAAAAA01", 
      "http://www.youtube.com/embed/watch?feature=player_embedded&v=AAAAAAAAA02", 
      "http://www.youtube.com/embed/watch?v=AAAAAAAAA03", 
      "http://www.youtube.com/embed/v=AAAAAAAAA04", 
      "http://www.youtube.com/watch?feature=player_embedded&v=AAAAAAAAA05", 
      "http://www.youtube.com/watch?v=AAAAAAAAA06", 
      "http://www.youtube.com/v/AAAAAAAAA07", 
      "www.youtu.be/AAAAAAAAA08", 
      "youtu.be/AAAAAAAAA09", 
      "http://www.youtube.com/watch?v=i-AAAAAAA14&feature=related", 
      "http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA", 
      "http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail", 
      "http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17", 
      "http://www.youtube.com/v/A-AAAAAAA18?fs=1&rel=0", 
      "http://www.youtube.com/watch/AAAAAAAAA11",}; 

     Console.WriteLine("***Youtube urls***"); 
     foreach (string url in urls) 
     { 
      Console.WriteLine("{0}\n-> {1}", url, GetYouTubeVideoIdFromUrl(url)); 
     } 

     string[] invalidUrls = { 
      "ww.youtube.com/v/AAAAAAAAA13", 
      "http:/www.youtube.com/v/AAAAAAAAA13", 
      "http://www.youtub1e.com/v/AAAAAAAAA13", 
      "http://www.vimeo.com/v/AAAAAAAAA13", 
      "www.youtube.com/b/AAAAAAAAA13", 
      "www.youtube.com/v/AAAAAAAAA1", 
      "www.youtube.com/v/AAAAAAAAA1&", 
      "www.youtube.com/v/AAAAAAAAA1/", 
      ".youtube.com/v/AAAAAAAAA13"}; 

     Console.WriteLine("***Invalid youtube urls***"); 
     foreach (string url in invalidUrls) 
     { 
      Console.WriteLine("{0}\n-> {1}", url, GetYouTubeVideoIdFromUrl(url)); 
     } 

結果(一切是正常的)

***Youtube urls*** 
http://youtu.be/AAAAAAAAA01 
-> AAAAAAAAA01 
http://www.youtube.com/embed/watch?feature=player_embedded&v=AAAAAAAAA02 
-> AAAAAAAAA02 
http://www.youtube.com/embed/watch?v=AAAAAAAAA03 
-> AAAAAAAAA03 
http://www.youtube.com/embed/v=AAAAAAAAA04 
-> AAAAAAAAA04 
http://www.youtube.com/watch?feature=player_embedded&v=AAAAAAAAA05 
-> AAAAAAAAA05 
http://www.youtube.com/watch?v=AAAAAAAAA06 
-> AAAAAAAAA06 
http://www.youtube.com/v/AAAAAAAAA07 
-> AAAAAAAAA07 
www.youtu.be/AAAAAAAAA08 
-> AAAAAAAAA08 
youtu.be/AAAAAAAAA09 
-> AAAAAAAAA09 
http://www.youtube.com/watch?v=i-AAAAAAA14&feature=related 
-> i-AAAAAAA14 
http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA 
-> AAAAAAAAA15 
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail 
-> AAAAAAAAA16 
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17 
-> AAAAAAAAA17 
http://www.youtube.com/v/A-AAAAAAA18?fs=1&rel=0 
-> A-AAAAAAA18 
http://www.youtube.com/watch/AAAAAAAAA11 
-> AAAAAAAAA11 



***Invalid youtube urls*** 
ww.youtube.com/v/AAAAAAAAA13 
-> 
http:/www.youtube.com/v/AAAAAAAAA13 
-> 
http://www.youtub1e.com/v/AAAAAAAAA13 
-> 
http://www.vimeo.com/v/AAAAAAAAA13 
-> 
www.youtube.com/b/AAAAAAAAA13 
-> 
www.youtube.com/v/AAAAAAAAA1 
-> 
www.youtube.com/v/AAAAAAAAA1& 
-> 
www.youtube.com/v/AAAAAAAAA1/ 
-> 
.youtube.com/v/AAAAAAAAA13 
-> 
0

這應做到:

public static string GetYouTubeId(string url) { 
    var regex = @"(?:youtube\.com\/(?:[^\/]+\/.+\/|(?:v|e(?:mbed)?|watch)\/|.*[?&amp;]v=)|youtu\.be\/)([^""&amp;?\/ ]{11})"; 

    var match = Regex.Match(url, regex); 

    if (match.Success) 
    { 
     return match.Groups[1].Value; 
    } 

    return url; 
    }