2013-04-24 231 views
2

我試圖從幾個表中追加一個變量(又名行綁定,串聯),使一個更長的表與Hive中的單個列。我認爲這是可能的使用UNION ALL基於這個問題(HiveQL UNION ALL),但我不知道一個有效的方法來實現這一點?使用UNION ALL在Hive中組合許多表?

的僞代碼將是這個樣子:

CREATE TABLE tmp_combined AS 
SELECT b.var1 FROM tmp_table1 b 
UNION ALL 
SELECT c.var1 FROM tmp_table2 c 
UNION ALL 
SELECT d.var1 FROM tmp_table3 d 
UNION ALL 
SELECT e.var1 FROM tmp_table4 e 
UNION ALL 
SELECT f.var1 FROM tmp_table5 f 
UNION ALL 
SELECT g.var1 FROM tmp_table6 g 
UNION ALL 
SELECT h.var1 FROM tmp_table7 h; 

任何幫助表示讚賞!

回答

14

試着用以下的編碼...

Select * into tmp_combined from 
(
    SELECT b.var1 FROM tmp_table1 b 
    UNION ALL 
    SELECT c.var1 FROM tmp_table2 c 
    UNION ALL 
    SELECT d.var1 FROM tmp_table3 d 
    UNION ALL 
    SELECT e.var1 FROM tmp_table4 e 
    UNION ALL 
    SELECT f.var1 FROM tmp_table5 f 
    UNION ALL 
    SELECT g.var1 FROM tmp_table6 g 
    UNION ALL 
    SELECT h.var1 FROM tmp_table7 h 
) CombinedTable 

使用帶聲明: 集hive.exec.parallel =真

這將執行不同的選擇同時,否則這將是一步一步來。

1

我會說這是直接和有效的方式來做行綁定,至少,這就是我在我的代碼中使用。 順便說一句,如果你直接把你的僞代碼則可能會導致你一些語法錯誤,您可以嘗試:

create table join_table as select * from (select ... join all select join all select...) tmp;

1

我做了同樣的概念,但對於不同的表employeelocation,可以幫助你,我相信:

DATA:Table_e-employee 
empid empname 
13 Josan 
8 Alex 
3 Ram 
17 Babu 
25 John 

Table_l-location 
empid emplocation 
13 San Jose 
8 Los Angeles 
3 Pune,IN 
17 Chennai,IN 
39 Banglore,IN 

hive> SELECT e.empid AS a ,e.empname AS b FROM employee e 
UNION ALL 
SELECT l.empid AS a,l.emplocation AS b FROM location l; 

輸出別名爲ab

13 San Jose 
8 Los Angeles 
3 Pune,IN 
17 Chennai,IN 
39 Banglore,IN 
13 Josan 
8 Alex 
3 Ram 
17 Babu 
25 John