Scrapy錯誤：無法綁定：24：打開的文件太多

我的域的列表上運行Scrapy，很多頁面都收到此錯誤： Couldn't bind: 24: Too many open files.Scrapy錯誤：無法綁定：24：打開的文件太多

我沒有得到這錯誤在我的Linux機器上，但我現在正在我的Mac上得到它。我不確定這是否與在Sierra上運行有關，或者我是否忽略了Scrapy配置。我檢查了ulimit，它返回unlimited，所以我不認爲認爲是這樣的。

在情況下，它是我的蜘蛛做的，這裏要說的是：

class JakeSpider(CrawlSpider): 
    name = 'jake' 
    allowed_domains = allowedDomains 
    start_urls = startUrls 
    rules = (
     Rule(LinkExtractor(), callback='parse_page', follow=True), 
    ) 


    def parse_page(self, response): 
     page = response.url 
     domain = urlparse(page).netloc 
     domain = domain.replace('www.','') 
     #print(domain, 'is domain and page is', page) 
     linksToGet = getHotelUrlsForDomain(domain) 
     #if(len(linksToGet) == 0): 
     # print('\n ... links to get was zero \n') 
     #print('linksToGet = ', linksToGet) 
     links = response.xpath('//a/@href').extract() 
     for link in links: 
      if link in linksToGet: 
       print('\n\n\n found one! ', link, 'is on', domain, ' and the page is', page,'\n\n\n') 
       with open('hotelBacklinks.csv', 'a') as csvfile: 
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 
        writer.writerow({'hotelURL':link, 'targetDomain': domain})

編輯：這裏是其中的一個完整的錯誤路線。這不會導致刮擦崩潰，但有很多這樣的線條，所以我認爲我得到的頁面數量並不多。錯誤行： 2017-09-24 14:21:29 [scrapy.core.scraper] ERROR: Error downloading <GET https://alabamatheatre.com/>: Couldn't bind: 24: Too many open files.

在此先感謝您的任何提示。

來源

2017-09-24 Jake 1986

您讓我們猜測_where_錯誤正在發生。編輯您的問題以包含完整的錯誤追溯，包括導致錯誤的代碼行。 –

另外，最好在函數的頂部打開一次csv文件，而不是關閉並重新打開每個鏈接。 –

@JohnGordon，謝謝你，我在其中添加了一個。這是從Scrapy記錄的一個錯誤，並且它不是一個嚴重錯誤，所以我沒有得到我的代碼的特定行導致它的回溯。另外，感謝csv提示，我已經解決了這個問題。 –

您應該使用pipeline保存所有刮取的數據。
你有這個錯誤，因爲你有很多調用函數parse_page。每個函數都會嘗試打開並寫入相同的文件。寫入文件是阻止操作這是來自Scrapy的文檔https://doc.scrapy.org/en/latest/topics/item-pipeline.html

來源

2017-09-24 18:16:30 AndMar

Scrapy錯誤：無法綁定：24：打開的文件太多

回答

相關問題