2012-07-31 80 views
0

說我有喜歡組合基於字段的元組?

{1001, {{id=1001, count=20, key=a}, {id=1001, count=30, key=b}}} 
{1002, {{id=1002, count=40, key=a}, {id=1001, count=50, key=b}}} 

的結構,並且我希望它變成

{id=1001, a=20, b=30} 
{id=1002, a=40, b=50} 

我可以使用哪些豬的命令來做到這一點?

+0

你能給出你想變換結構模式?我不認爲你可以將一個包直接放在另一個包裏,除非內包裝被封裝在一個元組中。 – cyang 2012-07-31 21:59:17

回答

1

不確定起始關係的格式是什麼,但對我來說它看起來像(int,bag:{tuple:(int,int,chararray)})?如果是的話,這應該工作:

flattened = FOREACH x GENERATE $0 AS id, flatten($1) AS (idx:int, count:int, key:chararray); 
a = FILTER flattened BY key == 'a'; 
b = FILTER flattened BY key == 'b'; 
joined = JOIN a BY id, b BY id; 
result = FOREACH joined GENERATE a::id AS id, a::count AS a, b::count AS b; 
1

它看起來像你是pivoting,類似於Pivoting in Pig。但你已經有了一袋元組。進行內部連接會花費很多,因爲它會導致額外的Map Reduce Jobs。要做到這一點,你需要在嵌套的foreach中進行過濾。修改後的代碼看起來是這樣的:

inpt = load '..../pig/bag_pivot.txt' as (id : int, b:bag{tuple:(id : int, count : int, key : chararray)}); 

result = foreach inpt { 
    col1 = filter b by key == 'a'; 
    col2 = filter b by key == 'b'; 
    generate id, flatten(col1.count) as a, flatten(col2.count) as b; 
}; 

取樣輸入數據:

1001 {(1001,20,a),(1001,30,b)} 
1002 {(1002,40,a),(1001,50,b)} 

輸出:

(1001,20,30) 
(1002,40,50)