2014-10-06 66 views
0

我有35個Csv文件我想用Pig加載數據。我有以下嘗試在PIG中加載多個文件

1) A = LOAD '/home/mrinmoy/Desktop/Sampath Project/Household/{HLPCA-00000,HLPCA-01000,HLPCA-02000,HLPCA-03000,HLPCA-04000,HLPCA-05000,HLPCA-06000,HLPCA-07000,HLPCA-08000,HLPCA-09000,HLPCA-10000,HLPCA-11000,HLPCA-12000,HLPCA-13000,HLPCA-14000,HLPCA-15000,HLPCA-16000,HLPCA-17000,HLPCA-18000,HLPCA-19000,HLPCA-20000,HLPCA-21000,HLPCA-22000,HLPCA-23000,HLPCA-24000,HLPCA-25000,HLPCA-26000,HLPCA-27000,HLPCA-28000,HLPCA-29000,HLPCA-30000,HLPCA-31000,,HLPCA-32000,,HLPCA-33000,,HLPCA-34000,,HLPCA-35000}.csv' UsingPigStorage(','); 

對於這個嘗試我已經得到了錯誤

014-10-06 00:32:07,130 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. Can not create a Path from an empty string 
Details at logfile: /home/mrinmoy/Desktop/Sampath Project/Household/pig_1412580582549.log 

在接下來的嘗試,我已經改變腳本使用SomeLoader()試了一下;

2) A = LOAD '/home/mrinmoy/Desktop/Sampath Project/Household/{HLPCA-00000,HLPCA-01000,HLPCA-02000,HLPCA-03000,HLPCA-04000,HLPCA-05000,HLPCA-06000,HLPCA-07000,HLPCA-08000,HLPCA-09000,HLPCA-10000,HLPCA-11000,HLPCA-12000,HLPCA-13000,HLPCA-14000,HLPCA-15000,HLPCA-16000,HLPCA-17000,HLPCA-18000,HLPCA-19000,HLPCA-20000,HLPCA-21000,HLPCA-22000,HLPCA-23000,HLPCA-24000,HLPCA-25000,HLPCA-26000,HLPCA-27000,HLPCA-28000,HLPCA-29000,HLPCA-30000,HLPCA-31000,,HLPCA-32000,,HLPCA-33000,,HLPCA-34000,,HLPCA-35000}.csv' using SomeLoader(); 

但我得到的錯誤說這個

2014-10-06 00:39:42,905 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve SomeLoader using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] 
Details at logfile: /home/mrinmoy/Desktop/Sampath Project/Household/pig_1412580912789.log 

回答

4

豬總是會加載一個目錄中的所有文件。所以你只需要用你的CSV文件指定目錄。

A = LOAD '/home/mrinmoy/Desktop/Sampath Project/Household/' using PigStorage(',');

另外請注意usingPigStorage()缺少一個空格。它應該是using PigStorage()。 而你有一些雙逗號:...HLPCA-31000,,HLPCA-32000,,HLPCA-33000,,HLPCA-34000,,HLPCA-35000}...

0

豬支持提供文件名作爲正則表達式。所以,你可以提供類似:

A = LOAD '/home/mrinmoy/Desktop/Sampath Project/Household/HLPCA*' Using PigStorage(','); 

,它會加載與來自家庭目錄「HLPCA」開頭的所有文件。