2015-10-05 71 views
0

在hadoop中,我們獲得了存儲在'/ datasets/xyz/storm/information /'下的avro文件列表。Pig:Load使用AvroStorage拋出「無法從loadFunc獲取模式」例外

-rw-r----- 3 storm XYZ 5570959 2015-10-01 01:46 /datasets/xyz/storm/information/storm_1443681972122.avro 
-rw-r----- 3 storm XYZ 5571687 2015-10-01 01:46 /datasets/xyz/storm/information/storm_1443681973303.avro 
-rw-r----- 3 storm XYZ 5632194 2015-10-01 01:46 /datasets/xyz/storm/information/storm_1443681975019.avro 

什麼工作?:

a= LOAD '/datasets/xyz/storm/information/storm_1443681975019.avro' USING AvroStorage(); 

Avro的模式,在每個Avro的文件中定義爲第一個記錄如下格式:

{header: (metadata_uuid: chararray,publishDate: chararray,eventDate: chararray),raw_data: chararray} 

我想加載所有的Avro文件數據立刻變成別名'a'。所以,我想下面的代碼:

a= = LOAD '/datasets/xyz/storm/information/' using AvroStorage(); 

我得到異常如下:

ERROR 2245: Cannot get schema from loadFunc org.apache.pig.builtin.AvroStorage 

我也試過如下明確規定的模式:

a= LOAD '/datasets/xyz/storm/information/' USING AvroStorage ('schema','{"header": ("metadata_uuid": "chararray","publishDate": "chararray","eventDate": "chararray"),"raw_data": "chararray"}'); 

能否請你告訴我正確的方式來做到這一點?

謝謝!

+0

在這裏找不到有用的回覆:http://stackoverflow.com/questions/21588911/cant-load-avro-schema-in-pig –

+0

你可以試試:A = LOAD'/ datasets/xyz/storm/information/*。avro'USING org.apache.pig.piggybank.storage.avro.AvroStorage(); –

+0

@Murali Rao:謝謝你的迴應。獲取相同的錯誤 –

回答

1

提供的架構不正確,也是格式。我從AvroStorage參數中刪除了「模式」。 我改變爲下面的腳本:

a= LOAD '/datasets/xyz/storm/information/' USING AvroStorage('{"type" : "record","name" : "DataRecord","namespace" : "com.bestbuy.sim.appTalkProjects.adobe.adobeClickStreamBDPSA.util","doc" : "Schema for com.bestbuy.sim.appTalkProjects.adobe.adobeClickStreamBDPSA.util.DataRecord","fields" : [ {"name" : "header","type" : [ "null", {"type" : "record","name" : "Header","doc" : "Schema for com.bestbuy.sim.appTalkProjects.adobe.adobeClickStreamBDPSA.util.Header","fields" : [ {"name" : "metadata_uuid","type" : [ "null", "string" ]}, {"name" : "publishDate","type" : [ "null", "string" ]}, {"name" : "eventDate","type" : [ "null", "string" ]} ]} ]}, {"name" : "raw_data","type" : [ "null", "string" ]} ]}'); 

這使負載是成功的。