2017-04-25 78 views
0

我試圖使用dryscrape和python爲學習目的而刮掉http://quotes.toscrape.com/。我能夠通過class =「quote」獲得所有div。想用class =「quote」循環div的列表,並使用xpath從這個父元素獲取多個數據。Dryscrape:使用xpath從父節點列表中刮取子節點數據

import dryscrape 
from bs4 import BeautifulSoup 
session = dryscrape.Session() 
url = 'http://quotes.toscrape.com/' 
print 'Visiting the URL...' 
session.visit(url) 
print 'Status: ', session.status_code() 
for div in session.xpath("//div[@class='quote']"): 
    # please help me to scrape author and quote for each div elements 

回答

0

我們可以遍歷每個XPath元件和那些將是具有各元素的內容對象。每個對象都有獲取數據的方法。

import dryscrape 
session = dryscrape.Session() 
url = 'http://quotes.toscrape.com/' 
print 'Visiting the URL...' 
session.visit(url) 
print 'Status: ', session.status_code() 
for div in session.xpath("//div[@class='quote']"): 
    print "Quote: ", div.at_xpath(".//span").text() 
    print "Author: ", div.at_xpath(".//small").text() 
1
import requests 
from bs4 import BeautifulSoup 
url = 'http://quotes.toscrape.com/' 
r = requests.get(url) 
soup = BeautifulSoup(r.text) 
for div in soup.findAll("div", {"class": "quote"}): 
    print('Quote : ' + div.find('span').get_text()) 
    print('Author : ' + div.find('small').get_text())