2017-09-02 59 views
1
A = LOAD 'Batting.csv' USING PigStorage(','); 
B = foreach A generate $0 as id:int,$1 as year:int,$8 as run:int; 
C = FILTER B by year==1956; 

但是DUMP C返回0條記錄。但是檔案中有1956年的記錄。在Pig中返回0條記錄的過濾器命令

的樣本數據:

playerID,yearID,stint,teamID,lgID,G,G_batting,AB,R,H,2B,3B,HR,RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP,G_old 
aardsda01,2004,1,SFN,NL,11,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,11 
aardsda01,2006,1,CHN,NL,45,43,2,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,45 
aardsda01,2007,1,CHA,AL,25,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2 
aardsda01,2008,1,BOS,AL,47,5,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,5 
aardsda01,2009,1,SEA,AL,73,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 
aardsda01,2010,1,SEA,AL,53,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 
aaronha01,1954,1,ML1,NL,122,122,468,58,131,27,6,13,69,2,2,28,39,,3,6,4,13,122 
aaronha01,1955,1,ML1,NL,153,153,602,105,189,37,9,27,106,3,1,49,61,5,3,7,4,20,153 
aaronha01,1956,1,ML1,NL,153,153,609,106,200,34,14,26,92,2,4,37,54,6,2,5,7,21,153 
aaronha01,1957,1,ML1,NL,151,151,615,118,198,27,6,44,132,1,1,57,58,15,0,0,3,13,151 
aaronha01,1958,1,ML1,NL,153,153,601,109,196,34,4,30,95,4,1,59,49,16,1,0,3,21,153 
aaronha01,1959,1,ML1,NL,154,154,629,116,223,46,7,39,123,8,0,51,54,17,4,0,9,19,154 
aaronha01,1960,1,ML1,NL,153,153,590,102,172,20,11,40,126,16,7,60,63,13,2,0,12,8,153 
aaronha01,1961,1,ML1,NL,155,155,603,115,197,39,10,34,120,21,9,56,64,20,2,1,9,16,155 

自卸乙

(zuvelpa01,1984,2) 
(zuvelpa01,1985,16) 
(zuvelpa01,1986,2) 
(zuvelpa01,1987,2) 
(zuvelpa01,1988,9) 
(zuvelpa01,1989,10) 
(zuvelpa01,1991,0) 
(zuverge01,1951,0) 
(zuverge01,1952,1) 
(zuverge01,1954,1) 
(zuverge01,1954,1) 
(zuverge01,1955,0) 
(zuverge01,1955,1) 
(zuverge01,1956,0) 
(zuverge01,1957,1) 
(zuverge01,1958,0) 
(zuverge01,1959,0) 
(zwilldu01,1910,7) 
(zwilldu01,1914,91) 
(zwilldu01,1915,65) 
(zwilldu01,1916,4) 
+0

後的樣本數據,最有可能你是不是引用右側立柱。 –

+0

'DUMP B'來查看你在那個關係中得到了什麼。 – philantrovert

+0

編輯了兩個問題的細節 – Harsh

回答

1

B不是完全必要的測試篩選功能的工作......

$ cat batting.pig 
A = LOAD 'Batting.csv' USING PigStorage(','); 
C = FILTER A by (int)$1==1956; 
\d C 

你確實需要從文件中剝離標題。然後你可以實際將數據轉換爲整數。

Hadoop Pig - Removing csv header

或者,只是使用CLI工具,從您的文件

$ sed -i '' 1d Batting.csv 
$ pig -f batting.pig 
... 
(aaronha01,1956,1,ML1,NL,153,153,609,106,200,34,14,26,92,2,4,37,54,6,2,5,7,21,153)