2017-02-23 52 views
0

我試圖在使用pig with HCatLoader的分區配置單元表上執行一些轉換。我使用蜂巢1.2和豬0.15。 hive表中分區列的數據類型是smallint。它對於非分區配置單元表格工作正常。以下是我正在執行的步驟。我做了一些調查,發現Exception stacktrace中的類與存儲中間數據有關,因此在編寫不確定數據時失敗。任何人都可以提出這個問題,以及如何解決這個問題。帶pig的HCatLoader:從UDF/LoadFunc輸出時只支持標準豬類型

**pig -useHCatalog** 

A = LOAD 'testdb.yearly_report' USING org.apache.hive.hcatalog.pig.HCatLoader() as (name:chararray,date_of_joining:int); 
B = foreach A generate name,date_of_joining; 
B = limit B 5; 
STORE B INTO '/my_hdfs_dir' USING PigStorage(',') 

我收到以下錯誤:

Error: java.lang.RuntimeException: Unexpected data type java.lang.Short found in stream. Note only standard Pig type is supported when you output from UDF/LoadFunc at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:596) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:470) at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:40) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82) at org.apache.hadoop.mapred.MapRFsOutputBuffer.collect(MapRFsOutputBuffer.java:1493) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:724) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:276) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

回答

0

我不能由於缺乏名譽的評論,但我確實看到在你的代碼中的錯誤:

A = LOAD 'testdb.yearly_report' USING org.apache.hive.hcatalog.pig.HCatLoader() as (name:chararray,date_of_joining:int); 
B = foreach A generate name,date_of_joining; 
B = limit B 5; 
STORE B INTO '/my_hdfs_dir' USING PigStorage(','); 

1 : You have used variable 'B' twice which in turn will overwrite the previous pig statement. 2 : Use DUMP to see after every statement to confirm they all are executing ; 3 : Also wondering if your table 'yearly_report' has just 2 columns ? Else you better don't provide schema while loading into 'A'.