2013-05-11 63 views
0

編寫一個在YouTube上搜索短語的程序的一部分,然後我希望它獲得第一個視頻的網址。但我無法弄清楚如何獲得url第一視頻使用urllib2從網頁中提取信息

這裏是我的代碼:

import urllib2, urllib 

raw_i=raw_input("Search: ") 
x = urllib.quote_plus(raw_i) 
site1 = urllib2.urlopen('http://www.youtube.com/results?search_query=%s'%x) 
y = site1.read() 

這種讀取搜索頁面,但我想它返回剛纔的URL視頻

例如允許使用短語「椰由哈利·尼爾森」

這裏是第一視頻

<li class="yt-lockup2 clearfix yt-uix-tile result-item-padding has-hover-effects yt- lockup2-video yt-lockup2-tile context-data-item" data-context-item-title="Harry Nilsson -  Coconut (1971)" data-context-item-views="2,930,881 views" data-context-item-type="video"  data-context-item-id="Tbgv8PkO9eo" data-context-item-time="4:32" data-context-item- user="Zoltán Makk"> 
    <div class="yt-lockup2-thumbnail"> 
     <a href="/watch?v=Tbgv8PkO9eo" class="ux-thumb-wrap yt-uix-sessionlink yt-uix- contextlink contains-addto " data-sessionlink="ved=CDIQwBs&amp;ei=prWOUZT9KIK8igLtyICAAQ">   <span class="video-thumb yt-thumb yt-thumb-185" > 
     <span class="yt-thumb-default"> 
     <span class="yt-thumb-clip"> 
      <span class="yt-thumb-clip-inner"> 
      <img alt="Thumbnail" src="//i1.ytimg.com/vi/Tbgv8PkO9eo/mqdefault.jpg" width="185" > 
      <span class="vertical-align"></span> 
     </span> 
    </span> 
    </span> 
</span> 
<span class="video-time">4:32</span> 
的HTML

我想要的只是"/watch?v=Tbgv8PkO9eo"其中有待退回

謝謝!

+2

你試過BeautifulSoup? – 2013-05-11 21:23:10

+0

生病檢查出來! – Serial 2013-05-11 21:23:27

回答

1

You can use HTMLParser。創建自己的解析器派生自Python類。

從HTMLParser的進口HTMLParser的

class MyHTMLParser(HTMLParser): 

    def handle_starttag(self, tag, attrs): 
     # Only parse the 'anchor' tag. 
     if tag == "a": 
      # Check the list of defined attributes. 
      for name, value in attrs: 
       # If href is defined, print it. 
       if name == "href": 
        print name, "=", value 

你創建一個解析器和你的HTML字符串feed它。

your_html_string='<li class="yt-lockup2 clearfix yt-uix-tile result-item- \ 
        padding has-hover-effects yt-lockup2-video yt-lockup2-tile \ 
        context-data-item" data-context-item-title="Harry Nilsson - \ 
        Coconut (1971)" data-context-item-views="2,930,881 views" \ 
        data-context-item-type="video" data-context-item- \ 
        id="Tbgv8PkO9eo" data-context-item-time="4:32" \ 
        data-context-item-user="Zoltán Makk">\ 
        <div class="yt-lockup2-thumbnail">\ 
        <a href="/watch?v=Tbgv8PkO9eo" class="ux-thumb-wrap \ 
        yt-uix-sessionlink yt-uix-contextlink contains-addto" data-\ 
        sessionlink="ved=CDIQwBs&amp;ei=prWOUZT9KIK8igLtyICAAQ">\ 
        <span class="video-thumb yt-thumb yt-thumb-185" >\ 
        <span class="yt-thumb-default"> \ 
        <span class="yt-thumb-clip" \ 
        <span class="yt-thumb-clip-inner"> \ 
        <img alt="Thumbnail" \   
        src="//i1.ytimg.com/vi/Tbgv8PkO9eo/mqdefault.jpg" \ 
        width="185" > <span class="vertical-align"></span> \ 
        </span> </span></span></span> \ 
        <span class="video-time">4:32</span>' 

parser = MyHTMLParser() 
parser.feed(your_html_string) 

結果是

>>> 
href = /watch?v=Tbgv8PkO9eo 
+0

但是id必須首先獲取HTML片段 – Serial 2013-05-11 21:42:24