python中使用scrapy包的數據爬行

我試圖從網站（IMDB）使用'scrapy'包得到一些數據。python中使用scrapy包的數據爬行
如果在div類中有image_URL，那麼我可以使用電影海報來抓取數據。但是，如果沒有，我的代碼無法正常工作。它跳過了一些與圖像相關的數據。
我想修復它像沒有image_URL然後忘了圖像，只是抓取數據。
我該如何解決除零件之外的問題？

高清解析（個體經營，響應）：

//some other lines 

try: 
     poster_image_url = 
     response.xpath('//div[@class="poster"]/a/img/@src').extract()[0] 
     poster_image_url = [ poster_image_url.split("_V1_")[0] + "_V1_.jpg" ] 

except: 
     poster_image_url = None 
     item['image_urls'] = poster_image_url

這是管道代碼↓↓↓↓

類ImdbPipeline（對象）：

def process_item(self, item, spider): 
    return item 

def get_media_requests(self, item, info): 
    for image_url in item['image_urls']: 
     yield scrapy.Request(image_url)

來源

2017-04-25 KevinShim

您可以使用extract_first()如果檢查：

poster_image_url = response.xpath('//div[@class="poster"]/a/img/@src').extract_first() 
if poster_image_url: 
    item['image_urls'] = poster_image_url.split('_V1')[0] + '_V1_.jgp'

或者，您可以使用scrapy ItemLoader's。

來源

2017-04-25 11:03:45 Granitosaurus

python中使用scrapy包的數據爬行

回答

相關問題