2016-03-03 45 views
0

當我執行腳本時,我收到了FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory,該腳本中有大量需要完成的請求。Node.JS - 如何限制多重承諾的請求以防止溢出(Web抓取)

我假設我需要降低請求數量?例如:發送5個請求(也許加延遲),直到完成,緩衝到瀏覽器,然後發送下一批5個請求...等。

任何幫助讚賞!

這裏的源:

var http = require('http'); 
var request = require('request'); 
var cheerio = require('cheerio'); 
var rp = require('request-promise'); 
var _ = require("underscore"); 
var dommy = require('dommy'); 

http.createServer(function(request, response) { 
// ** Something like this causes overflow ** 
// var appIds = _.range(451131,450131); 
    var appIds = [253250, 445170, 327510, 346110, 421900, 385070] 

    var document = dommy(); 
    var html = document.createElement('html'); 
    var body = document.createElement('body'); 
    var ul = document.createElement('ul'); 
    var li = document.createElement('li'); 

    function steamappRequestConfig(appId) { 
     var options = { 
      uri: 'http://steamcommunity.com/games/' + appId + '/Avatar', 
      transform: function(body) { 
       return cheerio.load(body); 
      } 
     }; 
     return rp(options).then(function($) { 
      return { 
       appId: appId, 
       appDom: $, 
      }; 
     }); 
    } 
    var appInfoRequests = appIds.map(steamappRequestConfig); 
    var listPromise = Promise.all(appInfoRequests); 
    listPromise.then(function(appResults) { 
      appResults.sort(function(x) { 
       return x.appId; 
      }); 

      var results = appResults.map(function(rpResult) { 
       var $ = rpResult.appDom; 
       var appId = rpResult.appId; 

       var statusCheck = $('h2').text(); 
       // Check if page contains Avatars 
       if (statusCheck != 'Avatars') { 
        // We are sorry nothing important here 
       } else { 
        return document.createTextNode('<li><a href="http://steamcommunity.com/games/' + appId + '/Avatar">' + appId + '</a></li>'); 
       } 
      }); 
      // Output 
      html.appendChild(body); 
      body.appendChild(ul); 
      ul.appendChild(results.join('\n')); 
      document.appendChild(html); 
      response.write(document.outerHTML); 
      // console.log(document.outerHTML); 
     }) 
     .catch(function(err) { 
      // Crawling failed or Cheerio choked... 
     }); 
}).listen(80); 

更新#1我試着擺弄了一會兒,我就開始在瀏覽器上收到此錯誤:

ERR_EMPTY_RESPONSE

因此,我沒有使用網絡http服務器,我決定現在使用文件system.write並追加到一個html文件。它開始很好,過了一段時間我開始在終端上顯示這兩個錯誤(在單獨的PC上,同一個腳本)。

<--- Last few GCs ---> 

1847363 ms: Scavenge 704.7 (738.6) -> 704.7 (738.6) MB, 25.6/0 ms (+ 16.0 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep]. 
1848957 ms: Mark-sweep 704.7 (738.6) -> 703.2 (738.6) MB, 1597.4/0 ms (+ 155.0 ms in 250 steps since start of marking, biggest step 16.0 ms) [last resort gc]. 
1850473 ms: Mark-sweep 703.2 (738.6) -> 703.2 (738.6) MB, 1526.1/0 ms [last resort gc]. 


<--- JS stacktrace ---> 

==== JS stack trace ========================================= 

Security context: 39E85C51 <JS Object> 
    1: ontext [C:\njs\node_modules\domhandler\index.js:~102] [pc=49426425] (this=1534D645 <a DomHandler with map 170773ED>,data=1534D631 <String[4]: Home>) 
    2: _parse [C:\njs\node_modules\htmlparser2\lib\Tokenizer.js:~635][pc=49412CAA] (this=1534D6A9 <a Tokenizer with map 17077839>) 
    3: parseDOM [C:\njs\node_modules\htmlparser2\lib\index.js:~39] [pc=49443D56] (thi... 

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory 

這:

<--- Last few GCs ---> 

2195352 ms: Scavenge 1397.9 (1457.9) -> 1397.9 (1457.9) MB, 0.4/0 ms (+ 4.0 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep]. 
2196159 ms: Mark-sweep 1397.9 (1457.9) -> 1391.7 (1457.9) MB, 807.0/0 ms (+ 6.0 ms in 3 steps since start of marking, biggest step 4.0 ms) [last resort gc]. 
2197192 ms: Mark-sweep 1391.7 (1457.9) -> 1391.7 (1457.9) MB, 1033.0/0 ms [last resort gc]. 


<--- JS stacktrace ---> 

==== JS stack trace ========================================= 

Security context: 000000E4D0FE3AD1 <JS Object> 
    2: _parse [F:\nodejs\njs\node_modules\htmlparser2\lib\Tokenizer.js:~635] [pc=000000821C37FA11] (this=0000011BB742F669 <a Tokenizer with map 00000071B3686C09>) 
    3: write [F:\nodejs\njs\node_modules\htmlparser2\lib\Tokenizer.js:632] [pc=000000821C37BBE6] (this=0000011BB742F669 <a Tokenizer with map 00000071B3686C09>,chunk=000001EEC32D0809 <Very long string[18191]>) 
    4:... 

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory 

如果有任何人在那裏,可以找到一個補救這一點,我會很樂意欣賞它。這些測試與粘土的響應使用藍鳥的併發性進行測試。

回答

0

如果錯誤是爲請求準備的數據過多(或請求過早結束或某事),則必須在其他位置進行調查。

如果錯誤是在處理時,像您的節點進程遇到問題,則需要批量運行。或者真的,只有這麼多的同時承諾。我會建議Promise庫,如bluebirdq,以限制併發承諾的數量。

在藍鳥中,.map.filter方法具有「併發選項」以完全限制您提及的內容。退房the API docs,但它主要是:

var appInfoRequests = Promise.map(appIds, steamappRequestConfig, { 
    concurrency: 5 //or whatever 
}); 
appInfoRequests.then(..... 

另外,如果你並不需要所有的併發性,你可以使用.each.mapSeries(或類似),以確保它們按順序處理,而不是全部一旦。

+0

如何防止瀏覽器上的ERR_EMPTY_RESPONSE?我需要能夠將瀏覽器緩衝到瀏覽器上,否則我認爲請求沒有發生錯誤,但我認爲這次瀏覽器放棄了。 – Kai

+0

你可以在寫完後嘗試'response.end()'嗎? – clay

+0

仍然不起作用,過了一段時間後,給了我上面的錯誤。 – Kai