2017-09-16 90 views
0

我試圖從一個函數傳遞一個值。scrapy從多個站點獲取值

我查了文檔,只是不明白。 REF:

def parse_page1(self, response): 
    item = MyItem() 
    item['main_url'] = response.url 
    request = scrapy.Request("http://www.example.com/some_page.html", 
          callback=self.parse_page2) 
    request.meta['item'] = item 
    yield request 

def parse_page2(self, response): 
    item = response.meta['item'] 
    item['other_url'] = response.url 
    yield item 

這裏是什麼,我想才達到一個psudo代碼:

import scrapy 

class GotoSpider(scrapy.Spider): 
    name = 'goto' 
    allowed_domains = ['first.com', 'second.com] 
    start_urls = ['http://first.com/'] 

def parse(self, response): 
    name = response.xpath(...) 
    price = scrapy.Request(second.com, callback = self.parse_check) 
    yield(name, price) 


def parse_check(self, response): 
    price = response.xpath(...) 
    return price 
+0

你想要一個包含來自這兩個網站的信息的項目嗎?或每個網站一個項目? – eLRuLL

+0

不,我不想要一個包含所有變量的對象,我想要不同的變量。如果這是不可能的,我必須,那麼一個對象。 – daniel

回答

0

這是你可以通過任何價值,鏈接等,以其他方法:

import scrapy 

class GotoSpider(scrapy.Spider): 
    name = 'goto' 
    allowed_domains = ['first.com', 'second.com'] 
    start_urls = ['http://first.com/'] 

    def parse(self, response): 
     name = response.xpath(...) 
     link = response.xpath(...) # link for second.com where you may find the price 
     request = scrapy.Request(url=link, callback = self.parse_check) 
     request.meta['name'] = name 
     yield request 

    def parse_check(self, response): 
     name = response.meta['name'] 
     price = response.xpath(...) 
     yield {"name":name,"price":price} #Assuming that in your "items.py" the fields are declared as name, price 
+0

非常感謝。最後一個很好的簡單答案!我正在翻閱其他人的stackoverflow問題,只是沒有設法理解它。但現在它清澈透明。謝謝! – daniel

+0

順便說一句你的解決方案是傳遞一個值的功能,我將如何去繞另一種方式?而不是發送名稱,收到價格。 – daniel