Scrapy - 在保存鏈接的同時遵循它們

我對Python和Scrapy很新穎，我認爲答案應該很簡單，但是很難找出自己的答案。該代碼採取所有鏈接，跟隨他們並記錄文章的標題。如何將我獲得的網址傳遞給我的物品？我想保存它與文章標題一起使用的短鏈接。謝謝Scrapy - 在保存鏈接的同時遵循它們

def parse(self, response): 
    for url in response.xpath("//li[@id]/@data-shortlink").extract(): 
     yield scrapy.Request(url, callback=self.get_details) 

def get_details(self, response): 
     article = ArticleItem() 
     article['title'] = response.xpath("//h1/text()").extract() 
     yield article

來源

2017-02-24 yurashark

，因爲它包含了Response() object中，你可以使用response.url獲得的網址：

def get_details(self, response): 
     article = ArticleItem() 
     article['title'] = response.xpath("//h1/text()").extract() 
     article['url'] = response.url 
     yield article

來源

2017-02-24 04:01:20 Roundel

的偉大工程。有沒有辦法讓它複製短鏈接我提供解析，而不是它實際遵循的完整鏈接？ – yurashark

我不確定是否誠實。我建議試驗'print（）'出不同的[Response子類方法]（https://doc.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-response -subclasses）。或者，如果它是一個arg，你提供解析（），那麼它似乎你當然也應該能夠傳遞給get_details ... – Roundel

Scrapy - 在保存鏈接的同時遵循它們

回答

相關問題