1
我有一個外部表有一列錯誤的總和 - 數據,其中數據是JSON對象蜂巢計算JSON對象
當我運行下面的蜂巢查詢
hive> select get_json_object(data, "$.ev") from data_table limit 3;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201212171824_0218, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201212171824_0218
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=master:8021 -kill job_201212171824_0218
2013-01-24 10:41:37,271 Stage-1 map = 0%, reduce = 0%
....
2013-01-24 10:41:55,549 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201212171824_0218
OK
2
2
2
Time taken: 21.449 seconds
但是,當我運行總和聚合的結果是奇怪的
hive> select sum(get_json_object(data, "$.ev")) from data_table limit 3;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201212171824_0217, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201212171824_0217
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=master:8021 -kill job_201212171824_0217
2013-01-24 10:39:24,485 Stage-1 map = 0%, reduce = 0%
.....
2013-01-24 10:41:00,760 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201212171824_0217
OK
9.4031522E7
Time taken: 100.416 seconds
任何人都可以解釋我爲什麼?我應該怎麼做才能正常工作?
大,肯定!非常感謝 – Julias