從Hadoop分佈式緩存中讀取文件時FileNotFoundExcepton

我在運行Hadoop作業時遇到問題，即使文件存在，但在嘗試從分佈式緩存中檢索文件時收到FileNotFoundException。當我在本地文件系統上運行它時，它可以工作。從Hadoop分佈式緩存中讀取文件時FileNotFoundExcepton

集羣託管在Amazon Web Services上，使用Hadoop版本1.0.4和Java版本1.7。我對集羣沒有任何控制權，也沒有對集羣的設置。

在主函數中，我將文件添加到分佈式緩存中。這似乎工作正常。我認爲，至少它不會拋出任何例外。

.... 
JobConf conf = new JobConf(Driver.class); 
conf.setJobName("mean"); 
conf.set("lookupfile", args[2]); 
Job job = new Job(conf); 
DistributedCache.addCacheFile(new Path(args[2]).toUri(), conf); 
...

在設置功能被稱爲地圖之前，我創建的文件的路徑，並調用該文件加載到一個哈希表的功能。

Configuration conf = context.getConfiguration(); 
String inputPath = conf.get("lookupfile");       
Path dataFile = new Path(inputPath); 
loadHashMap(dataFile, context);

加載哈希映射的函數的第一行發生異常。

brReader = new BufferedReader(new FileReader(filePath.toString()));

我開始這樣的工作。

hadoop jar Driver.jar Driver /tmp/input output /tmp/DATA.csv

我收到以下錯誤

Error: Found class org.apache.hadoop.mapreduce.Counter, but interface was expected 
attempt_201410300715_0018_m_000000_0: java.io.FileNotFoundException: /tmp/DATA.csv (No such file or directory) 
attempt_201410300715_0018_m_000000_0: at java.io.FileInputStream.open(Native Method) 
attempt_201410300715_0018_m_000000_0: at java.io.FileInputStream.<init>(FileInputStream.java:146) 
attempt_201410300715_0018_m_000000_0: at java.io.FileInputStream.<init>(FileInputStream.java:101) 
attempt_201410300715_0018_m_000000_0: at java.io.FileReader.<init>(FileReader.java:58) 
attempt_201410300715_0018_m_000000_0: at Map.loadHashMap(Map.java:49) 
attempt_201410300715_0018_m_000000_0: at Map.setup(Map.java:98) 
attempt_201410300715_0018_m_000000_0: at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
attempt_201410300715_0018_m_000000_0: at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771) 
attempt_201410300715_0018_m_000000_0: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375) 
attempt_201410300715_0018_m_000000_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:259) 
attempt_201410300715_0018_m_000000_0: at java.security.AccessController.doPrivileged(Native Method) 
attempt_201410300715_0018_m_000000_0: at javax.security.auth.Subject.doAs(Subject.java:415) 
attempt_201410300715_0018_m_000000_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1140) 
attempt_201410300715_0018_m_000000_0: at org.apache.hadoop.mapred.Child.main(Child.java:253) 
14/11/01 02:12:49 INFO mapred.JobClient: Task Id : attempt_201410300715_0018_m_000001_0, Status : FAILED

我已經驗證該文件存在，無論是在HDFS和本地文件系統上。

[email protected]:~$ hadoop fs -ls /tmp 
Found 2 items 
drwxr-xr-x - hadoop supergroup   0 2014-10-30 11:19 /tmp/input 
-rw-r--r-- 1 hadoop supergroup  428796 2014-10-30 11:19 /tmp/DATA.csv 

[email protected]:~$ ls -al /tmp/ 
-rw-r--r-- 1 hadoop hadoop 428796 Oct 30 11:30 DATA.csv

我真的不明白這裏有什麼問題。例外列出了該文件的正確路徑。我已經驗證該文件存在於HDFS和本地文件系統上。有什麼我在這裏失蹤？

來源

2014-11-01 TheSjiraffen123

BufferedReader的輸入應該來自Setup（）中的DistributedCache.getLocalCacheFiles（）返回的路徑。更多類似..

Path[] localFiles = DistributedCache.getLocalCacheFiles(); 
if (localFiles.length > 0){ 
    brReader = new BufferedReader(new FileReader(localFiles[0].toString());  
}

來源

2015-03-24 17:17:26 shiva

我面臨同樣的問題，下面的代碼爲我工作：

Configuration conf = context.getConfiguration(); 
URI[] uriList = DistributedCache.getCacheFiles(conf); 
BufferedReader br = new BufferedReader(new FileReader(uriList[0].getPath()))

正如你可以看到我使用的方法getCacheFiles這裏，然後獲取文件的路徑和讀取文件。

來源

2016-10-06 06:37:57 Pushkin

從Hadoop分佈式緩存中讀取文件時FileNotFoundExcepton

回答

相關問題