2015-12-19 31 views
0

可以說我有數據如下豬條件計數聯接

Hour status 
12  pass 
12  fail 
13  fail 
13  fail 
13  pass 

我需要計算結果如下

Hour passcount failcount TotalCount 
12 1   1   2 
13 1   2   3 

我知道我可以用2個獨立的分裂記錄達到這個過濾器,一個用於「通過」,一個用於「失敗」,分別計數並將它們加回(如下)

pass_data = FILTER data by (status matches 'pass') ; 
pass_group = group pass_data by hour; 
pass_count = foreach pass_group generate flatten(group), count($1) as pass_count ; 

original_count = foreach (group data by hour) generate flatten(group),count($1) as total_count ; 
joined = join original_count by hour , pass_count by hour ; 

但我不要像上面的解決方案。主要是因爲它有很多代碼行,實際上除了'pass'和'fail'外,還有多種狀態。 我所尋找的是類似下面的東西:

awesome_count= foreach (group data by hour) generate flatten(group),count($1) as total_count , count($1.status=='pass'?0:1) as pass_count ; 

以上不工作,主要是因爲狀態是我的包......但是我測試了一段簡單的領域和豬犯規喜歡它..拋出的所有排序的錯誤。 我希望有更好的方法或語法我可以利用?

回答

2

對於您的輸入,您可以嘗試嵌套的foreach語句,下面的邏輯將幫助您。

records = LOAD '/home/user/localinputfiles/pass_fail.txt' USING PigStorage('\t') as (hour:int,result:chararray); 

records_grp = GROUP records BY hour; 

records_each = FOREACH records_grp 
        { 
         passed_bag = FILTER records BY result == 'Pass'; 
         failed_bag = FILTER records BY result == 'Fail' ; 

        GENERATE group, COUNT(passed_bag) as pass_cnt, COUNT(failed_bag) as fail_cnt ,COUNT(records) as total_cnt; 
        }; 

dump records_each; 
+0

工作完美。謝謝 – user1581220