2016-11-17 214 views
-2

我有一個數據集。請參閱下面的樣品行:將分隔列分隔爲HIVE中的唯一行

94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507; 1460777656:440515; 1460778054:440488; 1460778157:440481,440600;

每列由一個空格分隔(共3列)。列名是id(int),unid(string),time_stamp(string)。

我想分割數據集,使得每個唯一元件,例如進入下面的行: -

  • 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507
  • 94654 6802D326-9F9B -4FC8-B2DD-F878EADE31F2 1460777656:440515
  • 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778054:440488
  • 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778157:440481
  • 94654 6802D326-9 F9B-4FC8-B2DD-F878EADE31F2 1460778157:440600

每個子點是每一行。我已經使用了下面的查詢,但它給了我上面的輸出。我用下面的代碼,它不工作: -

選擇ID,UNID,TIME_DATE 從表 側視爆炸(SPLIT(TIME_DATE, '\;'))作爲TIME_DATE TIME_DATE;

輸出: - 94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507; 1460777656:440515; 1460778054:440488; 1460778157:440481,440600; (以下行重複5次)

幫助將不勝感激!在此先感謝:)

回答

1

首先,我不得不用管道替換分號。所以:

CREATE temporary TABLE tbl 
(id int, 
unid string, 
time_stamp string); 

INSERT INTO tbl 
VALUES (
94654, '6802D326-9F9B-4FC8-B2DD-F878EADE31F2' , '1460695483:440507|1460777656:440515|1460778054:440488|1460778157:440481,440600'); 

SELECT 
id, 
unid, 
time_stamp 
FROM 
(
SELECT 
id, 
unid, 
split(time_stamp,'\\|') ts 
FROM 
tbl 
) t 
lateral VIEW explode(t.ts) bar AS time_stamp; 

這給我們:

94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460695483:440507 
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460777656:440515 
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778054:440488 
94654 6802D326-9F9B-4FC8-B2DD-F878EADE31F2 1460778157:440481,440600 

你必須做分割,並在單獨的步驟爆炸。所以我們在派生表中進行拆分,並在外部查詢中進行爆炸/橫向視圖。

+0

非常感謝你安德魯!:) – zerxes