2015-04-06 56 views
-2
EMP_ID PRD_NO PRD_DATE    PRD_TOTAL PRD_NORM 

IND235 00020 28/Mar/2015 02:00:50 11 60.00 

IND235 00018 27/Mar/2015 03:10:40 7 60.00 

IND235 00019 28/Mar/2015 04:00:54 3 60.00 

IND235 00020 27/Mar/2015 05:00:51 11 60.00 

PUR266 00044 28/Mar/2015 01:20:50 85 100.00 

PUR266 00024 28/Mar/2015 06:30:60 33 100.00 

PUR266 00017 27/Mar/2015 05:30:05 11 100.00 

PUR266 00038 27/Mar/2015 02:30:15 60 100.00 

I would expect to get the output: 

IND235,27/Mar/2015,60,18,42 

IND235,28/Mar/2015,60,14,46 

PUR266,27/Mar/2015,100,71,29 

PUR266,28/Mar/2015,100,118,-18 

last col is PRD_NORM-PRD_TOTAL: 

PRD_TOTAL sum by PRD_DATE,GROUP BY EMP_ID 

我剛剛開始學習豬拉丁語的來龍去脈 - 有沒有內置的方法可以在豬或某個庫中做到這一點,或者我應該看看寫UDF ?在豬身上獲得排行榜

回答

0

試試吧..

A = load 'pigdeduct' using PigStorage(' ') as (a1:chararray,b1:int,c1:chararray,d1:chararray,e1:int,f1:int); 

B = foreach A GENERATE a1,c1,e1,f1; 

C = group B by (a1,c1); 

D = foreach C generate FLATTEN(group),SUM(B.f1)/2,SUM(B.e1),SUM(B.f1)/2 - SUM(B.e1); 

dump D; 

輸入文件:

IND235 00020 28/Mar/2015 02:00:50 11 60.00 
IND235 00018 27/Mar/2015 03:10:40 7 60.00 
IND235 00019 28/Mar/2015 04:00:54 3 60.00 
IND235 00020 27/Mar/2015 05:00:51 11 60.00 
PUR266 00044 28/Mar/2015 01:20:50 85 100.00 
PUR266 00024 28/Mar/2015 06:30:60 33 100.00 
PUR266 00017 27/Mar/2015 05:30:05 11 100.00 
PUR266 00038 27/Mar/2015 02:30:15 60 100.00 

輸出:

(IND235,27/Mar/2015,60,18,42) 
(IND235,28/Mar/2015,60,14,46) 
(PUR266,27/Mar/2015,100,71,29) 
(PUR266,28/Mar/2015,100,118,-18)