0
我是Scrapy的新手。我正嘗試使用媒體管道下載文件。但是,當我運行蜘蛛沒有文件存儲在文件夾中。Scrapy Media Pipeline,文件無法下載
蜘蛛:
import scrapy
from scrapy import Request
from pagalworld.items import PagalworldItem
class JobsSpider(scrapy.Spider):
name = "songs"
allowed_domains = ["pagalworld.me"]
start_urls =['https://pagalworld.me/category/11598/Latest%20Bollywood%20Hindi%20Mp3%20Songs%20-%202017.html']
def parse(self, response):
urls = response.xpath('//div[@class="pageLinkList"]/ul/li/a/@href').extract()
for link in urls:
yield Request(link, callback=self.parse_page,)
def parse_page(self, response):
songName=response.xpath('//li/b/a/@href').extract()
for song in songName:
yield Request(song,callback=self.parsing_link)
def parsing_link(self,response):
item= PagalworldItem()
item['file_urls']=response.xpath('//div[@class="menu_row"]/a[@class="touch"]/@href').extract()
yield{"download_link":item['file_urls']}
項目文件:
import scrapy
class PagalworldItem(scrapy.Item):
file_urls=scrapy.Field()
設置文件:
BOT_NAME = 'pagalworld'
SPIDER_MODULES = ['pagalworld.spiders']
NEWSPIDER_MODULE = 'pagalworld.spiders'
ROBOTSTXT_OBEY = True
CONCURRENT_REQUESTS = 5
DOWNLOAD_DELAY = 3
ITEM_PIPELINES = {
'scrapy.pipelines.files.FilesPipeline': 1
}
FILES_STORE = '/tmp/media/'
你沒有寫任何代碼來下載/保存文件。去這裏,得到一些想法。 https://stackoverflow.com/questions/36135809/using-scrapy-to-to-find-and-download-pdf-files-from-a-website希望這可以幫助 – Nabin