2016-08-16 52 views
2

我正在嘗試從〜400mb csv文件中按摩一些數據並將其保存到本地查詢的數據庫中。這是免費提供的ip2location lite數據庫,我試圖導入它的數據庫是嵌入式nedbNodeJS流超過堆

require('dotenv').load() 

const fs = require('fs') 
const csv = require('csv-parse') 
const es = require('event-stream') 
const Datastore = require('nedb') 
const BatchStream = require('batch-stream') 

const db = new Datastore({ filename: process.env.DB_PATH, autoload: true }) 
const debug = require('debug')('setup') 

function massage ([ipLo, ipHi, cc, country, area, city, lat, lng]) { 
    return { ipLo, ipHi, cc, country, area, city, lat, lng } 
} 

function setup() { 
    let qty = 0 

    return new Promise((resolve, reject) => { 
    fs.createReadStream(process.env.IP2LOCATION_PATH) 
     // read and parse csv 
     .pipe(csv()) 
     // batch it up 
     .pipe(new BatchStream({ size: 100 })) 
     // write it into the database 
     .pipe(es.map((batch, cb) => { 
     // massage and persist it 
     db.insert(batch.map(massage), _ => { 
      qty += batch.length 
      if (qty % 100 === 0) 
      debug(`Inserted ${qty} documents…`) 
      cb.apply(this, arguments) 
     }) 
     })) 
     .on('end', resolve) 
     .on('error', reject) 
    }) 
} 

module.exports = setup 

if (!module.parent) { 
    debug('Setting up geo database…') 
    setup() 
    .then(_ => debug('done!')) 
    .catch(err => debug('there was an error :/', err)) 
} 

經過約75000項我得到以下錯誤:

<--- Last few GCs ---> 

    80091 ms: Mark-sweep 1372.0 (1435.0) -> 1371.7 (1435.0) MB, 1174.6/0 ms (+ 1.4 ms in 1 steps since start of marking, biggest step 1.4 ms) [allocation failure] [GC in old space requested]. 
    81108 ms: Mark-sweep 1371.7 (1435.0) -> 1371.6 (1435.0) MB, 1017.2/0 ms [last resort gc]. 
    82158 ms: Mark-sweep 1371.6 (1435.0) -> 1371.6 (1435.0) MB, 1049.9/0 ms [last resort gc]. 


<--- JS stacktrace ---> 

==== JS stack trace ========================================= 

Security context: 0x4e36fec9e31 <JS Object> 
    1: substr [native string.js:~320] [pc=0xdab4e7f1185] (this=0x35500e175a29 <Very long string[65537]>,Q=50,am=65487) 
    2: __write [/Users/arnold/Develop/mount-meru/node_modules/csv-parse/lib/index.js:304] [pc=0xdab4e7b8f98] (this=0x350ff4f97991 <JS Object>,chars=0x35500e175a29 <Very long string[65537]>,end=0x4e36fe04299 <false>,callback=0x4e36fe04189 <undefined>) 
    3: arguments adaptor fra... 

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory 
1: node::Abort() [/usr/local/Cellar/node/6.3.1/bin/node] 
2: node::FatalException(v8::Isolate*, v8::Local<v8::Value>, v8::Local<v8::Message>) [/usr/local/Cellar/node/6.3.1/bin/node] 
3: v8::Utils::ReportApiFailure(char const*, char const*) [/usr/local/Cellar/node/6.3.1/bin/node] 
4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [/usr/local/Cellar/node/6.3.1/bin/node] 
5: v8::internal::Factory::NewByteArray(int, v8::internal::PretenureFlag) [/usr/local/Cellar/node/6.3.1/bin/node] 
6: v8::internal::TranslationBuffer::CreateByteArray(v8::internal::Factory*) [/usr/local/Cellar/node/6.3.1/bin/node] 
7: v8::internal::LCodeGenBase::PopulateDeoptimizationData(v8::internal::Handle<v8::internal::Code>) [/usr/local/Cellar/node/6.3.1/bin/node] 
8: v8::internal::LChunk::Codegen() [/usr/local/Cellar/node/6.3.1/bin/node] 
9: v8::internal::OptimizedCompileJob::GenerateCode() [/usr/local/Cellar/node/6.3.1/bin/node] 
10: v8::internal::Compiler::GetConcurrentlyOptimizedCode(v8::internal::OptimizedCompileJob*) [/usr/local/Cellar/node/6.3.1/bin/node] 
11: v8::internal::OptimizingCompileDispatcher::InstallOptimizedFunctions() [/usr/local/Cellar/node/6.3.1/bin/node] 
12: v8::internal::StackGuard::HandleInterrupts() [/usr/local/Cellar/node/6.3.1/bin/node] 
13: v8::internal::Runtime_StackGuard(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/Cellar/node/6.3.1/bin/node] 
14: 0xdab4e60961b 
15: 0xdab4e7f1185 
16: 0xdab4e7b8f98 
[1] 18102 abort  npm run setup 

到底會發生什麼? Stream API的全部內容不是一次只能在內存中存儲大量數據,而是能夠一塊一塊地處理它?它看起來像是直接來自csv解析庫的錯誤,對嗎?

+0

你能轉換您的承諾回調,看看問題是否解決,我因爲節點詢問有這個問題「永不解決的承諾鏈創建內存泄漏」https://github.com/promises-aplus/promises-spec/issues/179我不知道它是否已在最新版本 – David

+0

中得到解決,也可能不是可能導致它的csv解析,我認爲恰好是csv庫試圖分配一些內存空間當時堆 – David

回答