如何使用Python的gdata模塊獲取所有YouTube評論？

試圖抓住給定視頻中的所有評論，而不是一次只瀏覽一頁。如何使用Python的gdata模塊獲取所有YouTube評論？

from gdata import youtube as yt 
from gdata.youtube import service as yts 

client = yts.YouTubeService() 
client.ClientLogin(username, pwd) #the pwd might need to be application specific fyi 

comments = client.GetYouTubeVideoComments(video_id='the_id') 
a_comment = comments.entry[0]

與讓你搶單的評論，可能是最近的評論上面的代碼，但是我正在尋找一種方式來一次搶所有的意見。這可能與Python的gdata模塊？

YouTube的API文檔comments，評論飼料docs和Python的API docs

來源

2012-10-10 TankorSmash

這回答了[這裏]（http://stackoverflow.com/questions/10941803/using-youtube-api-to-get-all-comments-from-a-video-with-the-json-feed）使用PHP的解決方案，因爲YouTube PHP API有一個允許它的調用。我不認爲純Python的答案就在那裏。 –

@KenB我也看到了。這太遺憾了。有問題的視頻有9k條評論，我不認爲製作360'GetNextLink'是最好的方法。 – TankorSmash

「www.youtube.com/all_comments？v = video_id」網址有一個可解析的評論列表，但這是一個很長的加載時間。假設我可以嘗試。 – TankorSmash

後下達到你的要求使用Python YouTube API：

from gdata.youtube import service 

USERNAME = '[email protected]' 
PASSWORD = 'a_very_long_password' 
VIDEO_ID = 'wf_IIbT8HGk' 

def comments_generator(client, video_id): 
    comment_feed = client.GetYouTubeVideoCommentFeed(video_id=video_id) 
    while comment_feed is not None: 
     for comment in comment_feed.entry: 
      yield comment 
     next_link = comment_feed.GetNextLink() 
     if next_link is None: 
      comment_feed = None 
     else: 
      comment_feed = client.GetYouTubeVideoCommentFeed(next_link.href) 

client = service.YouTubeService() 
client.ClientLogin(USERNAME, PASSWORD) 

for comment in comments_generator(client, VIDEO_ID): 
    author_name = comment.author[0].name.text 
    text = comment.content.text 
    print("{}: {}".format(author_name, text))

不幸的是，API限制了可檢索到條目的數量。這就是我，當我嘗試了微調的版本，用一隻手的錯誤製作GetYouTubeVideoCommentFeed URL參數：

gdata.service.RequestError: {'status': 400, 'body': 'You cannot request beyond item 1000.', 'reason': 'Bad Request'}

注意，同樣的原則應適用於檢索API的其他供稿條目。

如果你想手工工藝GetYouTubeVideoCommentFeed URL參數，它的格式是：

'https://gdata.youtube.com/feeds/api/videos/{video_id}/comments?start-index={sta‌rt_index}&max-results={max_results}'

以下限制：start-index <= 1000和max-results <= 50。

來源

2012-10-10 20:38:16

太棒了。你知道是否有辦法手動設置'start_index'或'items_per_page'？將它設置在第一組評論上似乎沒有任何作用。 – TankorSmash

您只需將以下格式的網址傳遞給'GetYouTubeVideoCommentFeed'：'https://gdata.youtube.com/feeds/api/videos/{video_id}/comments?start-index={start_index}&max-results = {} MAX_RESULTS'。以下限制適用：'start-index <= 1000'和'max-results <= 50'。 –

太棒了，甚至沒有想到改變URI，歡呼！ – TankorSmash

我有現在唯一的解決辦法，但它不使用API，並得到緩慢的有幾千當註釋。

import bs4, re, urllib2 
#grab the page source for vide 
data = urllib2.urlopen(r'http://www.youtube.com/all_comments?v=video_id') #example XhFtHW4YB7M 
#pull out comments 
soup = bs4.BeautifulSoup(data) 
cmnts = soup.findAll(attrs={'class': 'comment yt-tile-default'}) 
#do something with them, ie count them 
print len(cmnts)

注意的是，由於「階級」是一個內置的Python名字，你無法通過正則表達式或lambda表達式做「startwith」常規搜索所看到here，由於您使用的字典，在常規參數。由於BeautifulSoup，它也變得很慢，但它需要被使用，因爲etree和minidom由於某種原因找不到匹配的標籤。即使prettyfying()與bs4

來源

2012-10-10 20:24:50 TankorSmash

嗨，感興趣的答案，但我認爲，HTML結構已經改變。你是否使用替代標籤而不是'comment yt-tile-default'？謝謝！ – Thoth

@Thoth我一段時間都沒有使用過這個，但是打開開發工具並編輯我的答案，如果你發現 – TankorSmash

如何使用Python的gdata模塊獲取所有YouTube評論？

回答

相關問題