如何根據同一文件中的輸入解析出現

event1   foo_id1 
event1   foo_id2 
event1   foo_id4 
event1   foo_id6 
event1   foo_id7 
event1   foo_id8 
event1   foo_id8 
event1   foo_id1 
event1   foo_id4 

event2   foo_id1 
event2   foo_id2 
event2   foo_id3 
event2   foo_id4 
event2   foo_id5 
event2   foo_id6 
event2   foo_id8 
event2   foo_id9 
event2   foo_id11

以上信息在某個存儲桶（如s3://hadoop.mycompany.com/bucket1/foo1.txt）下可用作S3中的文件。如何根據同一文件中的輸入解析出現

所有事件都有foo_ids。對於「event2」中的所有事件，我想知道這些foo_id（s）在event1中發生了多少次。

例如在上述情況下，

foo_id1=2 
foo_id2=1 
foo_id3=0 
foo_id4=2 
foo_id5=0 
foo_id6=1 
foo_id8=2 
foo_id9=0 
foo_id11=0

如何編寫配置單元腳本以預期的格式返回數據？

來源

2013-05-04 brisk

你好，這可以用下面的蜂巢腳本來完成：

首先，您需要使用此命令

創建外部表的事件（事件串，富STRING）行格式創建蜂巢外部表\ t' LOCATION's3n：//hadoop.mycompany.com/bucket1/';
運行以下查詢

SELECT e2.foo，計數（e1.foo） FROM事件E2 LEFT OUTER JOIN事件E1 ON e1.foo = e2.foo AND e1.event = '事件1' WHERE e2.event ='event2' GROUP BY e2.foo;

你應該得到你所需要的，是這樣的結果：

foo_id1 2 
foo_id11 0 
foo_id2 1 
foo_id3 0 
foo_id4 2 
foo_id5 0 
foo_id6 1 
foo_id8 2 
foo_id9 0

希望這能解決你的問題。

來源

2013-05-07 19:34:53

如何根據同一文件中的輸入解析出現

回答

相關問題