Scrapy正在運行的結果

剛開始使用Scrapy時，我希望能夠朝正確的方向輕推。Scrapy正在運行的結果

我想從這裏抽取數據：

https://www.sportstats.ca/display-results.xhtml?raceid=29360

這是我到目前爲止有：

import scrapy 
import re 

class BlogSpider(scrapy.Spider): 
    name = 'sportstats' 
    start_urls = ['https://www.sportstats.ca/display-results.xhtml?raceid=29360'] 

    def parse(self, response): 
     headings = [] 
     results = [] 
     tables = response.xpath('//table') 
     headings = list(tables[0].xpath('thead/tr/th/span/span/text()').extract()) 
     rows = tables[0].xpath('tbody/tr[contains(@class, "ui-widget-content ui-datatable")]') 
     for row in rows: 
      result = [] 
      tds = row.xpath('td') 
      for td in enumerate(tds): 
       if headings[td[0]].lower() == 'comp.': 
        content = None 
       elif headings[td[0]].lower() == 'view': 
        content = None 
       elif headings[td[0]].lower() == 'name': 
        content = td[1].xpath('span/a/text()').extract()[0] 
       else: 
        try: 
         content = td[1].xpath('span/text()').extract()[0] 
        except: 
         content = None 
       result.append(content) 
      results.append(result) 
     for result in results: 
      print(result)

現在我需要移動到下一個頁面，我可以在瀏覽器中單擊底部的「右箭頭」，我相信它是以下li：

<li><a id="mainForm:j_idt369" href="#" class="ui-commandlink ui-widget fa fa-angle-right" onclick="PrimeFaces.ab({s:&quot;mainForm:j_idt369&quot;,p:&quot;mainForm&quot;,u:&quot;mainForm:result_table mainForm:pageNav mainForm:eventAthleteDetailsDialog&quot;,onco:function(xhr,status,args){hideDetails('athlete-popup');showDetails('event-popup');scrollToTopOfElement('mainForm\\:result_table');;}});return false;"></a>

我該如何獲得scrapy才能遵循這一點？

來源

2016-05-12 user3449833

增加了主要職位的當前進度。 – user3449833

這是一個javascript渲染問題，如果您使用firefox檢查涉及的請求，或者最終使用[Splash]（https://github.com/scrapinghub/splash）等一些JavaScript呈現服務，我會推薦使用firebug。或硒。 – eLRuLL

如果您在沒有JavaScript的瀏覽器中打開網址，您將無法移動到下一頁。正如你可以在li標籤中看到的那樣，爲了得到下一頁，有一些javascript被執行。

喲解決這個問題，第一個選項通常是嘗試識別javascript生成的請求。在你的情況下，它應該很容易：只需分析Java腳本代碼並用蜘蛛中的python複製它。如果你能做到這一點，你可以從scrapy發送相同的請求。如果你不能這樣做，下一個選項通常是使用一些與JavaScript /瀏覽器仿真或類似的包。像ScrapyJS或Scrapy + Selenium。

來源

2016-05-13 00:21:41 Djunzu

您將需要執行回調。從「下一頁」按鈕的xpath生成url。所以url = response.xpath(xpath to next_page_button)，然後當你完成該頁面時，你會做yield scrapy.Request(url, callback=self.parse_next_page)。最後你創建一個名爲def parse_next_page(self, response):的新功能。
最後的最後一點是它是否恰好在Javascript中（即使您確定使用正確的xpath，也不能刮掉它）查看我的回購使用scrapy使用splash https://github.com/Liamhanninen/Scrape

來源

2016-05-15 23:05:57

Scrapy正在運行的結果

回答

相關問題