2014-10-01 47 views
1

我的問題的樣本數據:前3條記錄在組內通過查詢豬

1  12  1234 

2  12  1233 

1  13  5555 

1  15  4444 

2  34  2222 

7  89  1111 




Field Description : 
col1 cust_id ,col2 zip_code , col 3 transaction_id. 

Using pig scripting i need to find the below question : 

for each cust_id i need to find the zip code mostly used for last 3 transactions . 
Approach I used so far : 



1) Group records with cust_id : 

(1,{(1,12,1234),(1,13,5555),(1,15,4444),(1,12,3333),(1,13,2323),(1,13,3434),(1,13,5755),(1,18,4424),(1,12,3383),(1,13,2823)}) 
(2,{(2,34,2222),(2,12,1233),(2,34,6666),(2,34,6666),(2,34,2422)}) 
(6,{(6,14,2312),(6,15,8888),(6,14,4634),(6,14,2712),(6,15,8288)}) 
(7,{(7,45,4244),(7,89,1111),(7,45,4544),(7,89,1121)}) 

2)對它們進行排序,並限制他們在最近3個交易。

Using nested foreach i have sorted by transaction id and limit that to 3 
nested = foreach group_by { sor = order zip by $2 desc ; limi = limit sor 3 ; generate limi; }; 

After grouping data is : 

({(1,12,1234),(1,13,2323),(1,13,2823)}) 
({(2,12,1233),(2,34,2222),(2,34,2422)}) 
({(6,14,2312),(6,14,2712),(6,14,4634)}) 
({(7,89,1111),(7,89,1121),(7,45,4244)}) 

爲什麼我的上述數據沒有按降序排序?

即使按升序排列,現在我如何找到最後3次交易中使用最多的郵政編碼。

Result should be 
1) 13 
2) 34 
3) 14 
4) 89 

回答

1

你可以試試嗎?

PigScript: 

A = LOAD 'input.txt' USING PigStorage(',') AS(CustomerId:int,ZipCode:int,TransactionId:int); 
B = GROUP A BY CustomerId; 
C = FOREACH B { 
       SortTxnId = ORDER A BY $2 DESC; 
       TxnIdLimit = LIMIT SortTxnId 3; 
       GENERATE group,TxnIdLimit; 
       } 
D = FOREACH C GENERATE FLATTEN($1); 
E = GROUP D BY ($0,$1); 
F = FOREACH E GENERATE group,COUNT(D); 
G = GROUP F BY group.$0; 
I = FOREACH G { 
       SortZipCode = ORDER F BY $1 DESC; 
       ZipCodeLimit = LIMIT SortZipCode 1; 
       GENERATE FLATTEN(ZipCodeLimit.group); 
       } 
J = FOREACH I GENERATE FLATTEN($0.TxnIdLimit::ZipCode); 
DUMP J; 

Output: 
(13) 
(34) 
(14) 
(89) 

input.txt 
1,12,1234 
1,13,5555 
1,15,4444 
1,12,3333 
1,13,5755 
1,18,4424 
2,34,2222 
2,12,1233 
2,33,6666 
2,34,6666 
2,34,2422 
6,14,2312 
6,15,8888 
6,14,4634 
6,14,2712 
7,45,4244 
7,89,1111 
7,89,3111 
7,89,1121