2013-04-07 159 views
2

我的解析器完成抓取數據後,如何調用writeXML?目前,我可以看到數據抓取,但沒有看到輸出文件。我試圖在writeXML下打印沒有輸出。Scrapy蜘蛛解析器調用函數

下面是我的代碼:

class FriendSpider(BaseSpider): 
    # identifies of the Spider 
    name = "friend" 
    count = 0 
    allowed_domains = ["example.com.us"] 
    start_urls = [ 
     "http://example.com.us/biz/friendlist/" 
    ] 

    def start_requests(self): 
     for i in range(0,1722,40): 
      yield self.make_requests_from_url("http://example.com.us/biz/friendlist/?start=%d" % i) 

    def parse(self, response): 
     response = response.replace(body=response.body.replace('<br />', '\n')) 
     hxs = HtmlXPathSelector(response) 
     sites = hxs.select('//ul/li') 
     items = [] 

     for site in sites: 
      item = Item() 
      self.count += 1 
      item['id'] = str(self.count) 
      item['name'] = site.select('.//div/div/h4/text()').extract() 
      item['address'] = site.select('h4/span/text()').extract() 
      item['review'] = ''.join(site.select('.//div[@class="review"]/p/text()').extract()) 
      item['birthdate'] = site.select('.//div/div/h5/text()').extract() 

      items.append(item) 
     return items 

    def writeXML(self, items): 
     root = ET.Element("Test") 
     for item in items: 
      item= ET.SubElement(root,'item') 
      item.set('id', item['id']) 
      address= ET.SubElement(item, 'address') 
      address.text = item['address'] 
      user = ET.SubElement(item, 'user') 
      user.text = item['user'] 
      birthdate= ET.SubElement(item, 'birthdate') 
      birthdate.text = item['birthdate'] 
      review = ET.SubElement(item, 'review') 
      review.text = item['review'] 

     # wrap it in an ElementTree instance, and save as XML 
     file = open("out.xml", 'w') 
     tree = ET.ElementTree(root) 
     tree.write(file,xml_declaration=True,encoding='utf-8',method="xml") 
使用內置的XML導出,請嘗試以下命令

回答

2

爲了輸出:

scrapy crawl friend -o items.xml -t xml 

如果輸出不是根據自己的喜好,那麼你可以嘗試使用XMLExporter class作爲創建自己的出口商。

+0

我試過使用XMLExporter item_element =「id」 - >它給我但我想 user245398 2013-04-08 04:14:05