2016-02-28 336 views

回答

4

尚未通過公共API獲取。

3

我發現的唯一方法是使用像Seleniuim這樣的瀏覽器自動化(使用某些處理格式的邏輯,例如5.6k視圖和1,046視圖)系統地刮取帖子的固定鏈接,並挑選出合適的元素。由於缺少javascript檢測,簡單的GET請求不會產生所需的DOM。

在蟒蛇:

from bs4 import BeautifulSoup 
from selenium import webdriver 

def insertViews(posts): 
    driver = webdriver.PhantomJS('<path-to-phantomjs-driver-ignoring-escapes>') 
    views_span_dom_path = '._9jphp > span' 

    for post in posts: 
     post_type = post.get('Type') 
     link = post.get('Link') 
     views = post.get('Views') 

     if post_type == 'video': 
      driver.get(link) 
      html = driver.page_source 

      soup = BeautifulSoup(html, "lxml") 
      views_string_results = soup.select(views_span_dom_path) 
      if len(views_string_results) > 0: 
       views_string = views_string_results[0].get_text() 
      if 'k' in views_string: 
       views = float(views_string.replace('k', '')) * 1000 
      elif ',' in views_string: 
       views = float(views_string.replace(',', '')) 
      elif 'k' not in views_string and ',' not in views_string: 
       views = float(views_string) 
     else: 
      views = None 

     post['Views'] = views 
    driver.quit() 
    return posts 

的PhantomJS驅動程序可以下載here

相關問題