所以我試圖在我的WebDriver裏面的新選項卡上打開網站。我想這樣做,因爲爲每個網站打開一個新的WebDriver大約需要3.5秒使用PhantomJS,我想要更多的速度...在新標籤中打開網頁Selenium + Python
我使用多進程python腳本,我想從每個頁面,所以工作流程是這樣的:
Open Browser
Loop throught my array
For element in array -> Open website in new tab -> do my business -> close it
但我找不到任何方法來實現這一目標。
這是我正在使用的代碼。它需要永久在網站之間,我需要它是快速的...其他工具是允許的,但我不知道有太多的工具來報廢JavaScript加載的網站內容(在加載時觸發某些事件時創建的div)這就是爲什麼我需要Selenium ... BeautifulSoup不能用於我的某些頁面。
#!/usr/bin/env python
import multiprocessing, time, pika, json, traceback, logging, sys, os, itertools, urllib, urllib2, cStringIO, mysql.connector, shutil, hashlib, socket, urllib2, re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from PIL import Image
from os import listdir
from os.path import isfile, join
from bs4 import BeautifulSoup
from pprint import pprint
def getPhantomData(parameters):
try:
# We create WebDriver
browser = webdriver.Firefox()
# Navigate to URL
browser.get(parameters['target_url'])
# Find all links by Selector
links = browser.find_elements_by_css_selector(parameters['selector'])
result = []
for link in links:
# Extract link attribute and append to our list
result.append(link.get_attribute(parameters['attribute']))
browser.close()
browser.quit()
return json.dumps({'data': result})
except Exception, err:
browser.close()
browser.quit()
print err
def callback(ch, method, properties, body):
parameters = json.loads(body)
message = getPhantomData(parameters)
if message['data']:
ch.basic_ack(delivery_tag=method.delivery_tag)
else:
ch.basic_reject(delivery_tag=method.delivery_tag, requeue=True)
def consume():
credentials = pika.PlainCredentials('invitado', 'invitado')
rabbit = pika.ConnectionParameters('localhost',5672,'/',credentials)
connection = pika.BlockingConnection(rabbit)
channel = connection.channel()
# Conectamos al canal
channel.queue_declare(queue='com.stuff.images', durable=True)
channel.basic_consume(callback,queue='com.stuff.images')
print ' [*] Waiting for messages. To exit press CTRL^C'
try:
channel.start_consuming()
except KeyboardInterrupt:
pass
workers = 5
pool = multiprocessing.Pool(processes=workers)
for i in xrange(0, workers):
pool.apply_async(consume)
try:
while True:
continue
except KeyboardInterrupt:
print ' [*] Exiting...'
pool.terminate()
pool.join()
如何在開始時創建所有WebDriver? – Raito 2015-02-10 13:48:41