2015-07-12 109 views
1

我目前的Node.js代碼從一個非常大的USPTO專利XML文件(大約100mb)中創建一個流,並在解析XML流時創建一個patentGrant對象。專利授權對象包括出版號,出版國,出版日期和專利種類。我正在嘗試使用ElasticSearch創建一個包含所有patentGrant對象的數據庫。我已成功添加代碼以連接到本地ElasticSearch數據庫,但我無法理解ElasticSearch-js API。我不知道應該如何將專利授權對象上傳到數據庫。從以下tutorial和以前的一個計算器問題我問here。好像我應該使用bulk api
繼承人我ParseXml.js代碼:將數據從Node.js流上傳到ElasticSearch數據庫

var CreateParsableXml = require('./CreateParsableXml.js'); 
var XmlParserStream = require('xml-stream'); 
// var Upload2ES = require('./Upload2ES.js'); 
var parseXml; 


var es = require('elasticsearch'); 
var client = new es.Client({ 
    host: 'localhost:9200' 
}); 


// create xml parser using xml-stream node.js module 
parseXml = new XmlParserStream(CreateParsableXml.concatXmlStream('ipg140107.xml')); 

parseXml.on('endElement: us-patent-grant', function(patentGrantElement) { 
    var patentGrant; 
    patentGrant = { 
     pubNo: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['doc-number'], 
     pubCountry: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['country'], 
     kind: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['kind'], 
     pubDate: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['date'] 
    }; 
    console.log(patentGrant); 
}); 

parseXml.on('end', function() { 
    console.log('all done'); 
}); 

回答

1

大宗原料藥,因爲它在你鏈接的文檔說,用於「指數」和「刪除」操作。

使用create https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#api-create

parseXml.on('endElement: us-patent-grant', function(patentGrantElement) { 
    var patentGrant; 
    patentGrant = { 
     pubNo: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['doc-number'], 
     pubCountry: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['country'], 
     kind: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['kind'], 
     pubDate: patentGrantElement['us-bibliographic-data-grant']['publication-reference']['document-id']['date'] 
    }; 
    client.create({ 
     index: 'myindex', 
     type: 'mytype', 
     body: patentGrant, 
    }, function() {} 
    ) 
    console.log(patentGrant); 
}); 

沒有ID,它應該創建一個ID爲每https://www.elastic.co/guide/en/elasticsearch/reference/1.6/docs-index_.html#_automatic_id_generation

+0

這是偉大的,謝謝。後續問題,當我到localhost時:9200/mytype/myindex /它給了我以下錯誤消息:{「error」:「ElasticsearchIllegalArgumentException [名稱沒有特徵[patentGrants]]」,「status」:400} ' –

+0

是索引和映射創建的? https://www.elastic.co/guide/en/elasticsearch/reference/1.6/indices-create-index.html#mappings – jperelli

+0

不,我沒有創建映射,是否沒有默認映射會照顧到這一點我。此外,我一直在做更多的研究,並從該視頻https://www.youtube.com/watch?v=7FLXjgB0PQI聽說您可以通過使用批量api節省大量網絡開銷。對於我來說,使用create會更好,因爲否則我必須將所有數據存儲在一個javascript對象中,然後通過批量獲取過程,這會佔用很高的內存成本? –

相關問題