我正在爬行一個網站,它有JavaScript轉到下一頁。我正在使用splash在第一頁上執行我的JavaScript代碼。但我能夠去第二頁。但我無法去3,4,5 ....頁面。爬網僅在一頁後停止。遞歸爬行使用javascript與scrapy和splash的相同頁面
的鏈接,我爬: http://59.180.234.21:8788/user/viewallrecord.aspx
代碼:
import scrapy
from scrapy_splash import SplashRequest
from time import sleep
class MSEDCLSpider(scrapy.Spider):
name = "msedcl_spider"
scope_path = 'body > table:nth-child(11) tr > td.content_area > table:nth-child(4) tr:not(:first-child)'
ref_no_path = "td:nth-child(1) ::text"
title_path = "td:nth-child(2) ::text"
end_date_path = "td:nth-child(5) ::text"
fee_path = "td:nth-child(6) ::text"
start_urls = ["http://59.180.234.21:8788/user/viewallrecord.aspx"]
lua_src = """function main(splash)
local url = splash.args.url
splash:go(url)
splash:wait(2.0)
splash:runjs("document.querySelectorAll('#lnkNext')[0].click()")
splash:wait(4.0)
return {
splash:html(),
}
end
"""
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(
url,
self.parse,
endpoint='execute',
method='POST',
dont_filter=True,
args={
'wait': 1.0,
'lua_source': self.lua_src,
},
)
def parse(self, response):
print response.status
scopes = response.css('#page-info').extract()[0]
print(response.url)
print(scopes)
我是新手既scrapy和飛濺。請溫柔。謝謝
主代碼中沒有縮進問題。當我粘貼代碼時,它被改變了。 –
我認爲你在混合空格和製表符(至少在粘貼的代碼中)。嘗試使用所有空格(每個選項卡4個空格)來格式化問題中的代碼。 –
問題不在於縮進。任何我編輯後的方式,並修改它 –