2016-04-27 65 views
0

我希望能夠在這裏按小時分組,我知道我將有多個小時提交的條目。例如下面的11小時會多次出現。我該怎麼做呢?豬 - 如何按字段分組,其中有多個條目

hour,windSpeed 
11, 3.6 
2 , 6.8 
11, 2.5 
13, 5.0 
14, 8.9 
11, 3.2 

所以我有這個,我只想按小時

因此,例如 我們希望{11: 3.6, 2.5, 3.2 }

和remanings因爲只有一個值會組它自己的

{14: 8.9}

{2: 6.8}

answer = FOREACH weather_data GENERATE $0 AS hour, $1 as speed 

回答

1

集團按小時

A = FOREACH weather_data GENERATE $0 AS hour, $1 as speed; 
B = GROUP A by hour; 
DUMP B; 

如果你想要聚合然後用總和

C = FOREACH B generate group as hour,SUM(A.speed) as Total; 
DUMP C; 
+0

出色答卷! @inquisitive_mind for prez! – dedpo

1

試試這個。

A = LOAD 'data' AS (Hour:chararray, windSpeed:chararray); 
B = GROUP A BY (Hour); 
C = FOREACH B GENERATE 
FLATTEN(group) AS (Hour), A.windSpeed 
; 

注:這是一個未經測試的代碼

+0

嗯,它表示架構不匹配,這更多的是我正在尋找。然而,@inquisitive_mind提供瞭解決方案,繞過我的原始問題 – dedpo