scrapy：另一種方法，避免了很多的嘗試，除了

我想問一個問題
當我使用CSS選擇器，extract()會使輸出的事情的清單
因此，如果CSS選擇沒有價值
它會顯示在終端錯誤（如下圖所示），並在我的JSON文件蜘蛛不會得到任何項目scrapy：另一種方法，避免了很多的嘗試，除了

item['intro'] = intro[0] 
exceptions.IndexError: list index out of range

所以我用嘗試，除了檢查列表是存在

sel = Selector(response) 
    sites = sel.css("div.con ul > li") 
    for site in sites: 
     item = Shopping_appleItem() 
     links = site.css(" a::attr(href)").extract() 
     title = site.css(" a::text").extract() 
     date = site.css(" time::text").extract() 

     try: 
      item['link'] = urlparse.urljoin(response.url,links[0]) 
     except: 
      print "link not found" 
     try: 
      item['title'] = title[0]  
     except: 
      print "title not found" 
     try: 
      item['date'] = date[0]  
     except: 
      print "date not found"

我覺得我用了很多嘗試和除了，我不知道這是否是一個好方法。
請引導我謝謝

來源

2014-08-27 user2492364

：使用項[「鏈路」] =「沒有鏈接」，而不是印刷信息只要。但您也可以打印消息 – Nabin 2014-08-27 07:35:53

您可以使用單獨的函數來提取數據。例如用於文本節點，樣本代碼是在這裏

def extract_text(node): 
     if not node: 
      return '' 
     _text = './/text()' 
     extracted_list = [x.strip() for x in node.xpath(_text).extract() if len(x.strip()) > 0] 
     if not extracted_list: 
      return '' 
     return ' '.join(extracted_list)

並且可以調用此方法這樣

self.extract_text(sel.css("your_path"))

除了

來源

2014-08-27 09:41:56

scrapy：另一種方法，避免了很多的嘗試，除了

回答

相關問題