將數據從Node.js流上傳到ElasticSearch數據庫

我目前的Node.js代碼從一個非常大的USPTO專利XML文件（大約100mb）中創建一個流，並在解析XML流時創建一個patentGrant對象。專利授權對象包括出版號，出版國，出版日期和專利種類。我正在嘗試使用ElasticSearch創建一個包含所有patentGrant對象的數據庫。我已成功添加代碼以連接到本地ElasticSearch數據庫，但我無法理解ElasticSearch-js API。我不知道應該如何將專利授權對象上傳到數據庫。從以下tutorial和以前的一個計算器問題我問here。好像我應該使用bulk api。
繼承人我ParseXml.js代碼：將數據從Node.js流上傳到ElasticSearch數據庫

var CreateParsableXml = require('./CreateParsableXml.js'); 
var XmlParserStream = require('xml-stream'); 
// var Upload2ES = require('./Upload2ES.js'); 
var parseXml; 


var es = require('elasticsearch'); 
var client = new es.Client({ 
    host: 'localhost:9200' 
}); 


// create xml parser using xml-stream node.js module 
parseXml = new XmlParserStream(CreateParsableXml.concatXmlStream('ipg140107.xml')); 

parseXml.on('endElement: us-patent-grant', function(patentGrantElement) { 
    var patentGrant; 
    patentGrant = { 
     pubNo: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['doc-number'], 
     pubCountry: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['country'], 
     kind: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['kind'], 
     pubDate: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['date'] 
    }; 
    console.log(patentGrant); 
}); 

parseXml.on('end', function() { 
    console.log('all done'); 
});

來源

2015-07-12 Daniel Kobe

大宗原料藥，因爲它在你鏈接的文檔說，用於「指數」和「刪除」操作。

使用create https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-create

parseXml.on('endElement: us-patent-grant', function(patentGrantElement) { 
    var patentGrant; 
    patentGrant = { 
     pubNo: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['doc-number'], 
     pubCountry: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['country'], 
     kind: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['kind'], 
     pubDate: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['date'] 
    }; 
    client.create({ 
     index: 'myindex', 
     type: 'mytype', 
     body: patentGrant, 
    }, function() {} 
    ) 
    console.log(patentGrant); 
});

沒有ID，它應該創建一個ID爲每https://www.elastic.co/guide/en/elasticsearch/reference/1.6/docs-index_.html#_automatic_id_generation

來源

2015-07-12 22:17:53 jperelli

這是偉大的，謝謝。後續問題，當我到localhost時：9200/mytype/myindex /它給了我以下錯誤消息：{「error」：「ElasticsearchIllegalArgumentException [名稱沒有特徵[patentGrants]]」，「status」：400} ' –

是索引和映射創建的？ https://www.elastic.co/guide/en/elasticsearch/reference/1.6/indices-create-index.html#mappings – jperelli

不，我沒有創建映射，是否沒有默認映射會照顧到這一點我。此外，我一直在做更多的研究，並從該視頻https://www.youtube.com/watch?v=7FLXjgB0PQI聽說您可以通過使用批量api節省大量網絡開銷。對於我來說，使用create會更好，因爲否則我必須將所有數據存儲在一個javascript對象中，然後通過批量獲取過程，這會佔用很高的內存成本？ –

將數據從Node.js流上傳到ElasticSearch數據庫

回答

相關問題