2017-05-07 93 views
1

我在蜂房創建分區拼花表型改變柱導致誤差蜂房

array<struct<id:string>>  

的場。當我更改此列的類型以在結構部分中添加一個字段時,在某個select中爆炸數組時會出現錯誤。下面是詳細信息:

CREATE TABLE語句

CREATE EXTERNAL TABLE `test_table`(
    `my_array` array<struct<id:string>>) 
PARTITIONED BY ( 
    `ymd` int) 
ROW FORMAT SERDE 
    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
    'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 

創建數據的分區

insert overwrite table test_table 
PARTITION (ymd = 20170101) 
select 
    array(named_struct('id', 'id1')) as my_array 

然後我陣列

ALTER TABLE test_table 
CHANGE COLUMN my_array my_array array<struct<id:string, amount:int>> 

中添加sructure內的新領域這改變了表格的元數據。我的期望是過去的數據仍然是可讀的,'數量'爲空值。不幸的是,我遇到了一個我不明白的錯誤。爲了更好地說明讓我創建新的分區第一:

insert overwrite table test_table 
PARTITION (ymd = 20170102) 
select 
array(named_struct('id', 'id2', 'amount',2)) as my_array 

現在,做一個

select * from test_table 

得到我預期(從HUE UI輸出)結果:

out put from 'select * from test_table'

但是,當我想用​​這種方式的橫向視圖爆炸陣列時發生錯誤:

select 
    my_array 
from 
    test_table t 
    lateral view explode (my_array) arry as a 

此查詢會引發配置單元運行時錯誤。相關的日誌應該是本段後面的日誌。選擇'arry.a'而不是'my_array'會出現非常類似的錯誤。出人意料的是,下面的查詢運行得很好,與我的預期結果:

select 
    ymd, 
    a.id, 
    a.amount 
from 
    test_table t 
    lateral view explode (my_array) arry as a 

enter image description here

對我來說,它看起來像這樣可能是一個錯誤。運行上面導致錯誤的選擇時,這是一段日誌。蜂巢版本是1.1.0-cdh5.8.0:

Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row {"my_array":[{"id":"id1","amount":null}],"ymd":20170101} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"my_array":[{"id":"id1","amount":null}],"ymd":20170101} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 8 more 
Caused by: java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList at org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldsDataAsList(ArrayWritableObjectInspector.java:172) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:355) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:319) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:258) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:242) at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:668) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:125) at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:45) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:107) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:94) at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:108) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator.processOp(LateralViewForwardOperator.java:37) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) ... 9 more 

回答

1

我們通過編寫自定義SERDE每當一列不能在文件中找到這給空值數據解決了這個問題。