Hadoop的選擇輸入文件夾

-1

在training_set文件夾中輸入文件，文件中像這樣Hadoop的選擇輸入文件夾

mv_000000 
mv_000001 
mv_000002 
...

指數存在是可以在movie_title.txt

movie_title.tx文件中找到的影片ID是這樣的：

1,2003,Dinosaur Planet 
2,2004,Isle of Man TT 2004 Review 
3,1997,Character 
4,1994,Paula Abdul's Get Up & Dance 
5,2004,The Rise and Fall of ECW 
...

第一列是特定電影名稱的索引。

我在netplix大賽數據集上練習hadoop基礎。我假設我插入了特定的電影標題，如「生病」。然後轉到movie_titles.txt文件並搜索「sick」的moive title id。最後設置輸入路徑電影標題ID。

例如，如果我的hadoop啓動程序爲：

hadoop jar ~ [input path] [output path] [moiveA name]

比必須設置輸入路徑training_set/mv_movieAIndex。

正如我所說，電影ID的信息存在於movie_title.txt。

請給我一點提示，找出這個問題。

來源

2014-11-09 Jungseok Cho

你最終的目標是什麼？我的意思是你正在發送郵件地圖紅色作爲輸出？ – SMA 2014-11-09 06:19:44

您的要求似乎與Hadoop根本沒有關係。所有你需要的是id針對由hadoop jar命令的第三個參數指定的movieName的查找。下面的代碼片段將完成這項工作：

private static Map<String, Integer> getMovieMappings(String filePath) 
     throws IOException { 
    Map<String, Integer> movieMap = new HashMap<String, Integer>(); 
    BufferedReader br = null; 
    try { 
     br = new BufferedReader(new FileReader(filePath)); 
     String line; 
     while ((line = br.readLine()) != null) { 
      String[] temp = line.split(","); 
      movieMap.put(temp[2].trim(), Integer.parseInt(temp[0].trim())); 
     } 
    } finally { 
     if (br != null) br.close(); 
    } 
    return movieMap; 
}

現在司機，剛剛拿到地圖，並相應設置inputPath：

Map<String, Integer> movieMap = getMovieMappings("/pathTo/movie_title.txt"); 
int movieId = movieMap.get(args[2]); 
System.out.println(String.format("mv_%06d", movieId)); 
FileInputFormat.addInputPath(job, 
           new Path("training_set", 
             String.format("mv_%06d", movieId)));

可以將它幫助。

來源

2014-11-10 08:41:24 blackSmith

Hadoop的選擇輸入文件夾

回答

相關問題