2013-08-29 56 views
2

我有一個通用的關係的這樣的:如何通過(b,a)過濾(a,b)關係?

DUMP A; 
(a, b) 
(a, c) 
(a, d) 
(b, a) 
(d, a) 
(d, b) 

看到,有一對(A,B)和(B,A);但(d,b)沒有一對。 我想過濾這些「不成對」的元組。

最終的結果應該是這樣的:

DUMP R; 
(a, b) 
(a, d) 
(b, a) 
(d, a) 

我怎麼可以這樣寫對豬?

我可以用下面的代碼來解決,但交叉操作太貴:

A_cp = FOREACH L GENERATE u1, u2; 
X = CROSS A, A_cp; 
F = FILTER X BY ($0 == $3 AND $1 == $2); 
R = FOREACH F GENERATE $0, $1; 

回答

1

這是我DESCRIBE A ; DUMP A ;的輸出:

A: {first: chararray,second: chararray} 
(a,b) 
(a,c) 
(a,d) 
(b,a) 
(d,a) 
(d,b) 

這是一種方式,你可以解決這個問題:

A = LOAD 'foo.in' AS (first:chararray, second:chararray) ; 
-- Can't do a join on its self, so we have to duplicate A 
A2 = FOREACH A GENERATE * ; 

-- Join the As so that are in (b,a,a,c) etc. pairs. 
B = JOIN A BY second, A2 BY first ; 

-- We only want pairs where the first char is equal to the last char. 
C = FOREACH (FILTER B BY A::first == A2::second) 
    -- Now we project out just one side of the pair. 
    GENERATE A::first AS first, A::second AS second ; 

輸出:

C: {first: chararray,second: chararray} 
(b,a) 
(d,a) 
(a,b) 
(a,d) 

更新:作爲WinnieNicklaus指出,這可以縮短爲:

B = FOREACH (JOIN A BY (first, second), A2 BY (second, first)) 
    GENERATE A::first AS first, A::second AS second ; 
+0

謝謝,我會嘗試你的代碼。 我能用下面的代碼完成任務,但是交叉操作太貴了: A_cp = FOREACH A GENERATE u1,u2; X = CROSS A,A_CP; F = FILTER X BY($ 0 == $ 3 AND $ 1 == $ 2); R = FOREACH F生成$ 0,$ 1; – user2730009

+0

@ user2730009內連接應該明顯更便宜。 – mr2ert

+0

它工作正常! Thx – user2730009

相關問題