Scrapy Media Pipeline，文件無法下載

我是Scrapy的新手。我正嘗試使用媒體管道下載文件。但是，當我運行蜘蛛沒有文件存儲在文件夾中。Scrapy Media Pipeline，文件無法下載

蜘蛛：

import scrapy 
from scrapy import Request 
from pagalworld.items import PagalworldItem 

class JobsSpider(scrapy.Spider): 
    name = "songs" 
    allowed_domains = ["pagalworld.me"] 
    start_urls =['https://pagalworld.me/category/11598/Latest%20Bollywood%20Hindi%20Mp3%20Songs%20-%202017.html'] 

    def parse(self, response): 
     urls = response.xpath('//div[@class="pageLinkList"]/ul/li/a/@href').extract() 

     for link in urls: 

      yield Request(link, callback=self.parse_page,) 




    def parse_page(self, response): 
     songName=response.xpath('//li/b/a/@href').extract() 
     for song in songName: 
      yield Request(song,callback=self.parsing_link) 


    def parsing_link(self,response): 
     item= PagalworldItem() 
     item['file_urls']=response.xpath('//div[@class="menu_row"]/a[@class="touch"]/@href').extract() 
     yield{"download_link":item['file_urls']}

項目文件：

import scrapy 


class PagalworldItem(scrapy.Item): 


    file_urls=scrapy.Field()

設置文件：

BOT_NAME = 'pagalworld' 

SPIDER_MODULES = ['pagalworld.spiders'] 
NEWSPIDER_MODULE = 'pagalworld.spiders' 
ROBOTSTXT_OBEY = True 
CONCURRENT_REQUESTS = 5 
DOWNLOAD_DELAY = 3 
ITEM_PIPELINES = { 

'scrapy.pipelines.files.FilesPipeline': 1 
} 
FILES_STORE = '/tmp/media/'

輸出看起來像這樣：

來源

2017-08-03 emon

你沒有寫任何代碼來下載/保存文件。去這裏，得到一些想法。 https://stackoverflow.com/questions/36135809/using-scrapy-to-to-find-and-download-pdf-files-from-a-website希望這可以幫助 – Nabin

def parsing_link(self,response): 
    item= PagalworldItem() 
    item['file_urls']=response.xpath('//div[@class="menu_row"]/a[@class="touch"]/@href').extract() 
    yield{"download_link":item['file_urls']}

您的收益率：

yield {"download_link": ['http://someurl.com']}

其中用於scrapy的媒體/文件流水線工作，你需要產生和包含file_urls場項目。所以試試這個：

def parsing_link(self,response): 
    item= PagalworldItem() 
    item['file_urls']=response.xpath('//div[@class="menu_row"]/a[@class="touch"]/@href').extract() 
    yield item

來源

2017-08-03 05:09:27 Granitosaurus

早些時候，我試圖crawlspider解析，但它沒有' t工作https://stackoverflow.com/questions/45447451/scrapy-results-are-repeating你可以看到它 – emon

Scrapy Media Pipeline，文件無法下載

回答

相關問題