2014-10-16 83 views
1

我有我通過以下方式閱讀的表格。通過Pig過濾掉NULL值

A = load 'customer' using PigStorage('|'); 

在客戶以下是一些行

7|Ron|[email protected] 
8|Rina 
9|Don|[email protected] 
9|Don|[email protected] 
10|Maya|[email protected] 

11|marry|[email protected] 

的,當我使用以下....

B = DISTINCT A; 
A_CLEAN = FILTER B by ($0 is not null) AND ($1 is not null) AND ($2 is not null); 

它消除 8 |奈以及

如何通過Pig刪除空行?

有沒有辦法我可以試試 A_CLEAN =過濾器B不是IsNULL()???

我是新來的豬所以不知道我shuld把裏面ISNULL ...

感謝

A_CLEAN =通過不爲IsEmpty(B)濾波器B;

回答

0
Tarun, instead AND condition why can't you put OR condition. 
     A_CLEAN = FILTER B by ($0 is not null) OR ($1 is not null) OR ($2 is not null); 
This will remove all the null rows and retain if any columns is not empty. 
Can you try and let me know if this works for your all conditions? 

UPDATE:
我不知道爲什麼的IsEmpty()不爲你工作,它爲我工作。 IsEmpty只能與袋子一起使用,所以我將所有字段轉換爲袋子並測試空虛。請參閱下面的工作代碼。

input.txt 
7|Ron|[email protected] 
8|Rina 
9|Don|[email protected] 
9|Don|[email protected] 
10|Maya|[email protected] 

11|marry|[email protected] 

PigSCript: 
A = LOAD 'input.txt' USING PigStorage('|'); 
B = DISTINCT A; 
A_CLEAN = FILTER B BY NOT IsEmpty(TOBAG($0..)); 
DUMP A_CLEAN; 

Output: 
(8,Rina ) 
(7,Ron,[email protected]) 
(9,Don,[email protected]) 
(10,Maya,[email protected]) 
(11,marry,[email protected]) 

爲了您的另外一個問題,它是一個簡單的數學計算

In case of AND, 
8|Rina 
will be treated as 
($0 is not null) AND ($1 is not null) AND ($2 is not null) 
(true) AND (true) AND (false) 
(false) -->so this record will be skipped by Filter command 

In case of OR, 
8|Rina 
will be treated as 
($0 is not null) OR ($1 is not null) OR ($2 is not null) 
(true) OR (true) OR (false) 
(true) -->so this record will be included into the relation by Filter command 

In case of empty record, 
<empty record> 
    will be treated as 
    ($0 is not null) OR ($1 is not null) OR ($2 is not null) 
    (false) OR (false) OR (false) 
    (false) -->so this record will be skipped by Filter command 
+0

是......第一個人工作......不是IsEmpty沒有工作。不理解邏輯在這裏...作爲和應該工作,因爲我們正在尋找所有三列應該是空的...謝謝 – Tarun 2014-10-20 21:25:02

+0

更新我的答案,請檢查它。 – 2014-10-21 09:34:24

2

嘗試以下操作:

A = LOAD 'customer' USING PigStorage('|'); 
B = DISTINCT A; 
A_CLEAN = FILTER B BY NOT(($0 IS NULL) AND ($1 IS NULL) AND ($2 IS NULL)); 
DUMP A_CLEAN; 

這將產生輸出:

(8,麗娜)
(7,Ron,ron @ abc.com)
(9,唐,DMES @ xyz.com)
(10,瑪雅,瑪雅@ cnn.com)
(11,結婚,瑪麗@ abc.com)

在PIG,不能測試元組爲空。

+0

這對我有效:篩選B BY(($ 0 IS NOT NULL)和($ 1 IS NOT NULL)和($ 2 IS NOT NULL)); – PanwarS87 2017-02-15 21:28:10