2017-01-03 185 views
0

我正在使用Hive來解析xml文件,因爲我使用的是hivexmlserde。 當我寫我的代碼並執行它時,我得到錯誤。在Hive中解析xml時出錯

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: The number of XPath expressions does not much the number of columns 

,但我的列數和XPath表達式是相同的。

下面是我的代碼:

add jar /home/cloudera/hivexmlserde-1.0.5.3.jar; 
CREATE EXTERNAL TABLE INFO(
statusCode string, 
title string, 
startTime string, 
endTime string, 
frequencyValue string, 
frequencyUnits string, 
strengthValue string, 
strengthUnits string, 
routecode string, 
routecodeSystem string, 
routedisplayName string, 
routecodesystemName string, 
ugcode string, 
uname string, 
ucodeSystem string, 
codeSystemName string, 
ageForm string, 
tr_code string, 
tr_description string, 
tr_codesystem string, 
tr_codesystemname string 
) 
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
WITH SERDEPROPERTIES (
"column.xpath.statusCode"="Document/xxx/statusCode/text()", 
"column.xpath.title"="Document/xxx/code/code/text()", 
"column.xpath.startTime"="Document/xxx/startTime/text()", 
"column.xpath.endTime"="Document/xxx/endTime/text()", 
"column.xpath.frequencyValue"="Document/xxx/frequencyValue/text()", 
"column.xpath.frequencyUnits"="Document/xxx/frequencyUnits/text()", 
"column.xpath.strengthValue"="Document/xxx/strengthValue/text()", 
"column.xpath.strengthUnits"="Document/xxx/strengthUnits/text()", 
"column.xpath.routecode"="Document/xxx/entryInfo/routeCode/code/text()", 
"column.xpath.routecodeSystem"="Document/xxx/entryInfo/routeCode/codeSystem/text()", 
"column.xpath.routedisplayName"="Document/xxx/entryInfo/routeCode/displayName/text()", 
"column.xpath.routecodesystemName"="Document/xxx/entryInfo/routeCode/codeSystemName/text()", 
"column.xpath.ugcode"="Document/xxx/entryInfo/productCode/code/text()", 
"column.xpath.ugname"="Document/xxx/entryInfo/productCode/displayName/text()", 
"column.xpath.ugcodeSystem"="Document/xxx/entryInfo/productCode/codeSystem/text()", 
"column.xpath.ugcodeSystemName"="Document/xxx/entryInfo/productCode/codeSystemName/text()", 
"column.xpath.dosageForm"="Document/xxx/entryInfo/ageForm/displayName/text()", 
"column.xpath.tr_code"="Document/xxx/entryInfo/productCode/translation/code/text()", 
"column.xpath.tr_description"="Document/xxx/entryInfo/productCode/translation/displayName/text()", 
"column.xpath.tr_codesystem"="Document/xxx/entryInfo/productCode/translation/codeSystem/text()", 
"column.xpath.tr_codesystem"="Document/xxx/entryInfo/productCode/translation/codeSystemName/text()" 
) 
STORED AS 
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
TBLPROPERTIES (
"xmlinput.start"="<Document", 
"xmlinput.end"="</Document>"); 

回答

2

我發現問題有點碼挖後。我正面臨這個問題,因爲我做了2個xpath列名。

column.xpath.tr_codesystem

是在SERDEPROPERTIES重複兩次。我將它改爲codesystemname比它開始爲我工作。