Scrapy，沒有錯誤，蜘蛛爬行

scrapy

2016-06-07 106 views 0 likes

for restaurant in response.xpath('//div[@class="listing"]'): 
 
\t \t 
 
\t \t 
 
\t \t 
 
\t \t 
 
\t \t restaurantItem = RestaurantItem() 
 
\t \t 
 
\t \t restaurantItem['name'] = response.css(".title::text").extract() 
 
\t \t 
 
\t \t 
 
\t \t yield restaurantItem 
 
\t \t 
 
\t \t next_page = response.css(".next > a::attr('href')") 
 
\t \t if next_page: 
 
\t \t url = response.urlJoin(next_page[0].extract()) 
 
\t \t yield scrapy.Request(url, self.parse)

我固定所有的錯誤後關閉，這是給我。現在，我沒有得到任何錯誤。抓取start_url後，蜘蛛就會關閉。 for循環永遠不會被執行。

來源

2016-06-07 panther1

也許是因爲它沒有在DOM內找到'next_page'？ –

它從來沒有達到這一點，我試圖寫一個打印聲明後，它永遠不會進入for循環...並且，它給我沒有錯誤... – panther1

好吧，所以它永遠不會找到你嘗試的'div'達到。給一個鏈接也許或一些html代碼示例 –

回答

當你試圖找到一個元素是這樣的：

response.xpath('//div[@class="listing"]')

你告訴我想找到一個div字面上只已「上市」作爲其類：

<div class="listing"></div>

但這在DOM中不存在任何地方，發生了以下事情：

<div class="listing someOtherClass"></div>

要選擇上述元素，您有t告訴該元素包含某個屬性文本，但可以包含更多。在這裏，像這樣：

response.xpath('//div[contains(@class,"listing")]')

來源

2016-06-07 12:03:47

另一種方法是使用CSS選擇器進行這種類的測試，即'response.css（'div .listing'）' –

相關問題

1. Scrapy蜘蛛不爬行
2. Scrapy蜘蛛沒有發現錯誤
3. Scrapy爬行蜘蛛只觸摸start_urls
4. Scrapy關閉蜘蛛如果沒有網址爬行
5. Scrapy蜘蛛錯誤處理
6. scrapy蜘蛛沒有發現
7. Scrapy CrawlSpider沒有蜘蛛
8. Scrapy與Scrapy蜘蛛
9. 蜘蛛不爬行網頁
10. 使用Scrapy創建蜘蛛，蜘蛛生成錯誤
11. Scrapy找不到蜘蛛錯誤
12. Scrapy蜘蛛沒有收到spider_idle信號
13. Python的scrapy蜘蛛
14. Scrapy DOMAIN_NAME的蜘蛛
15. 如何喂蜘蛛蜘蛛爬行內的鏈接？
16. 防止scrapy蜘蛛爬行網站的一部分太長
17. Scrapy從主蜘蛛運行多個蜘蛛？
18. Python Scrapy錯誤。不再支持運行帶有多個蜘蛛的'scrapy crawl'
19. 爬行蜘蛛不進入下一頁
20. 鏈接檢查器（蜘蛛爬行器）
21. Scrapy蜘蛛Xpath的選擇
22. scrapy中的連環蜘蛛
23. Scrapy蜘蛛登錄問題
24. Scrapy蜘蛛過早關閉
25. Scrapy雅虎集團蜘蛛
26. Scrapy找不到蜘蛛
27. 錯的Xpath在IMDB蜘蛛scrapy
28. 從scrapy腳本運行多個蜘蛛
29. 爬行蜘蛛不爬行規則問題
30. 爲所有scrapy蜘蛛編寫函數