2013-04-03 36 views
1

我有兩套我可以在Pig中保留未匹配的項目嗎?

personCounts 
(personName:chararray, count:int) 

whitelist 
(empID:int, empName:chararray) 

我想是誰在personCounts的人,但不是在白名單中。我知道JOIN返回出現在兩者中的元素。有沒有辦法返回那些將被丟棄的東西?我在想我可以用CROSS做到這一點,但後來我會有額外的想法..?

crossed = CROSS personCounts BY personName, whitelist BY empName; 
filcrs = FILTER crossed BY NOT personCounts::personName MATCHES whitelist::empName; 

回答

2

你可以做這與一個JOIN FULL。

joined = JOIN personCounts BY personName FULL, whitetlist BY empName; 
joined = FILTER joined BY NOT $0 MATCHES ''; 
joined = FILTER joined BY $3 IS null; 

然後加入時(PERSONNAME,計數, '')

2

我想你想達到的是personCounts和白名單之間的差異是否正確?

如果是這樣,請嘗試以下操作(未測試!):

CGRP = COGROUP personCounts BY personName, whitelist BY empName; 
PC_MINUS_WL = FILTER CGRP BY IsEmpty(whitelist); 
PC_MINUS_WL = FOREACH PC_MINUS_WL GENERATE group AS name; 

我發現以下兩種資源有所幫助:

http://agiletesting.blogspot.de/2012/02/set-operations-in-apache-pig.html

http://www.cs.tufts.edu/comp/150CPA/notes/Advanced_Pig.pdf

+0

大資源,謝謝! –

相關問題