2017-07-03 43 views
0

我有一個scrapy腳本如下更新列表,迭代過程中設置 - while循環

1)navigation_path收集到一個列表,並調用一個新的解析

g_next_page_list = [] 
g_next_page_set = set() 

def parse(self,response): 

    #code to extract nav_links 

    for nav_link in nav_links: 
     if nav_link not in g_next_page_set: 
      g_next_page_list.append(nav_link) 
      g_next_page_set.add(nav_link) 

    for next_page in g_next_page_list: 
     next_page = response.urljoin(next_page) 
     yield scrapy.Request(next_page, callback=self.parse_start_url, dont_filter=True,) 

,我已經定義parse_start_url爲:

def parse_start_url(self,response): 

    #code to extract nav_links 

    for nav_link in nav_links: 
     if nav_link not in g_next_page_set: 
      g_next_page_list.append(nav_link) 
      g_next_page_set.add(nav_link) 

然而,全球列表和主解析設置(g_next_page_set,g_next_page_list)沒有得到追加。我究竟做錯了什麼?

提前致謝!

+0

是'v_next_page_list'一回事'g_next_page_list'或者是其中的一個錯字?如果它們不同,請提供'v_next_page_list'的一些示例數據。 – supersam654

+0

@ supersam654對此感到抱歉。他們是一樣的,我已經更新了我原來的帖子 – user6055239

回答

1

你不要在這裏全球使用,你使用self.variable_name

g_next_page_list = [] 
g_next_page_set = set() 

def parse(self,response): 

    #code to extract nav_links 

    for nav_link in nav_links: 
     if nav_link not in v_next_page_set: 
      self.g_next_page_list.append(nav_link) 
      self.g_next_page_set.add(nav_link) 

    for next_page in v_next_page_list: 
     next_page = response.urljoin(next_page) 
     yield scrapy.Request(next_page, callback=self.parse_start_url, dont_filter=True,) 


def parse_start_url(self,response): 

    #code to extract nav_links 

    for nav_link in nav_links: 
     if nav_link not in v_next_page_set: 
      self.g_next_page_list.append(nav_link) 
      self.g_next_page_set.add(nav_link) 

這應該使其工作