0
我花了很多時間試圖用scrapy取消信息而沒有成功。 我的目標是衝浪通過類別和每個項目廢料標題,價格和標題的href鏈接。Scrapy無法取消物品,xpath無法正常工作
該問題似乎來自parse_items函數。我已經請與firepath XPath和我能夠選擇的想要的物品,所以也許我只是不抓怎麼樣的XPath被scrapy處理...
這裏是我的代碼
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.selector import Selector
from ..items import electronic_Item
class robot_makerSpider(CrawlSpider):
name = "robot_makerSpider"
allowed_domains = ["robot-maker.com"]
start_urls = [
"http://www.robot-maker.com/shop/",
]
rules = (
Rule(LinkExtractor(
allow=(
"http://www.robot-maker.com/shop/12-kits-robots",
"http://www.robot-maker.com/shop/36-kits-debutants-arduino",
"http://www.robot-maker.com/shop/13-cartes-programmables",
"http://www.robot-maker.com/shop/14-shields",
"http://www.robot-maker.com/shop/15-capteurs",
"http://www.robot-maker.com/shop/16-moteurs-et-actionneurs",
"http://www.robot-maker.com/shop/17-drivers-d-actionneurs",
"http://www.robot-maker.com/shop/18-composants",
"http://www.robot-maker.com/shop/20-alimentation",
"http://www.robot-maker.com/shop/21-impression-3d",
"http://www.robot-maker.com/shop/27-outillage",
),
),
callback='parse_items',
),
)
def parse_items(self, response):
hxs = Selector(response)
products = hxs.xpath("//div[@id='center_column']/ul/li")
items = []
for product in products:
item = electronic_Item()
item['title'] = product.xpath(
"li[1]/div/div/div[2]/h2/a/text()").extract()
item['price'] = product.xpath(
"div/div/div[3]/div/div[1]/span[1]/text()").extract()
item['url'] = product.xpath(
"li[1]/div/div/div[2]/h2/a/@href").extract()
#check that all field exist
if item['title'] and item['price'] and item['url']:
items.append(item)
return items
感謝您的幫助
謝謝你!我會從這裏小心翼翼。您能否向我解釋直接從響應中查找xpath而不是使用Selector(response)方法的影響? –
@ArtFilPortraitArtistetisseu它本質上是一回事。 Response對象使用自己創建'Selector',所以你可以有一個方便的'response.selector'快捷方式,而不必每次都創建Selector。 'response.xpath'是'response.selector.xpath'的快捷方式。 [響應來源](https://github.com/scrapy/scrapy/blob/master/scrapy/http/response/text.py#L112)非常簡單,你可以自己給它一個高峯:) – Granitosaurus