drysrape安裝Ubuntu服務器16.04

我在執行dryscrape和ubuntu 16.04服務器（乾淨安裝在數字海洋上）時出現問題 - 目的是刮掉JS人口稠密的網站。drysrape安裝Ubuntu服務器16.04

我下面dryscrape從here安裝說明：

apt-get update 
apt-get install qt5-default libqt5webkit5-dev build-essential \ 
        python-lxml python-pip xvfb 

pip install dryscrape

，然後運行下面的Python腳本，我發現here以及在同一鏈路測試HTML頁面。（它返回HTML或JS）

的Python

import dryscrape 
from bs4 import BeautifulSoup 
session = dryscrape.Session() 
my_url = 'http://www.example.com/scrape.php' 
session.visit(my_url) 
response = session.body() 
soup = BeautifulSoup(response) 
soup.find(id="intro-text")

HTML - scrape.php

<!DOCTYPE html> 
<html> 
<head> 
    <meta charset="utf-8"> 
    <title>Javascript scraping test</title> 
</head> 
<body> 
    <p id='intro-text'>No javascript support</p> 
    <script> 
    document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript'; 
    </script> 
</body> 
</html>

當我這樣做，我似乎無法得到預期的回報數據，相反，它只是錯誤。

我想知道是否有任何明顯的我失蹤？

注意：我瀏覽了很多安裝指南/線程，似乎無法使其工作。我也嘗試過使用硒，但似乎也無法使用硒。非常感謝。

輸出

Traceback (most recent call last): 
    File "js.py", line 3, in <module> 
    session = dryscrape.Session() 
    File "/usr/local/lib/python2.7/dist-packages/dryscrape/session.py", line 22, in __init__ 
    self.driver = driver or DefaultDriver() 
    File "/usr/local/lib/python2.7/dist-packages/dryscrape/driver/webkit.py", line 30, in __init__ 
    super(Driver, self).__init__(**kw) 
    File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 230, in __init__ 
    self.conn = connection or ServerConnection() 
    File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 507, in __init__ 
    self._sock = (server or get_default_server()).connect() 
    File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 450, in get_default_server 
    _default_server = Server() 
    File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 424, in __init__ 
    raise NoX11Error("Could not connect to X server. " 
webkit_server.NoX11Error: Could not connect to X server. Try calling dryscrape.start_xvfb() before creating a session.

工作腳本

import dryscrape 
from bs4 import BeautifulSoup 

dryscrape.start_xvfb() 
session = dryscrape.Session() 
my_url = 'https://www.example.com/scrape.php' 
session.visit(my_url) 
response = session.body() 
soup = BeautifulSoup(response, "html.parser") 
print soup.find(id="intro-text").text

來源

2017-08-06 denski

你有沒有X服務器上運行。線索是

嘗試調用dryscrape.start_xvfb（）創建會話之前

見http://dryscrape.readthedocs.io/en/latest/usage.html

if 'linux' in sys.platform: 
    # start xvfb in case no X is running. Make sure xvfb 
    # is installed, otherwise this won't work! 
    dryscrape.start_xvfb()

http://dryscrape.readthedocs.io/en/latest/installation.html

xvfb_（僅需如果沒有其他X服務器可用）

所以，你可以再補充：

dryscrape.start_xvfb()

前：

session = dryscrape.Session()

來源

2017-08-07 10:37:18

感謝這個，我在更新/工作python腳本已經添加到我的答案的底部。我只需要添加額外的東西就是在'soup = BeautifulSoup（response，「html.parser」）內指定html解析器，因爲我花了4個小時閱讀並試圖在昨天解決問題，所以我非常欣賞這個幫助。 – denski

drysrape安裝Ubuntu服務器16.04

回答

相關問題