爲什麼我接下來不能處理我的hadoop程序？

大家好！我有一個計劃關於日食的Hadoop，源代碼是：爲什麼我接下來不能處理我的hadoop程序？

public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { 
     private final static IntWritable one = new IntWritable(1); 
    private Text word = new Text(); 
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { 
     StringTokenizer itr = new StringTokenizer(value.toString()); 
     while(itr.hasMoreTokens()) { 
      word.set(itr.nextToken()); 
      context.write(word, one); 
     } 
    } 
} 

public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { 
    private IntWritable result = new IntWritable(); 
    @Override 
    protected void reduce(Text key, Iterable<IntWritable> values, 
      Context context) throws IOException, InterruptedException { 
     int sum = 0; 
     for(IntWritable val : values) { 
      sum += val.get(); 
     } 
     result.set(sum); 
     context.write(key, result); 
    } 
} 

public class WordCount { 
    public static void main(String[] args) throws Exception { 
     Configuration conf = new Configuration(); 
     String[] oargs = new GenericOptionsParser(conf, args).getRemainingArgs(); 
     if(oargs.length != 2) { 
      System.err.println("Usage: word count <in> <out>"); 
     } 
     System.out.println("input: "+oargs[0]); 
     System.out.println("output: "+oargs[1]); 
     Job job = new Job(conf, "word count"); 
     job.setJarByClass(WordCount.class); 
     job.setMapperClass(TokenizerMapper.class); 
     job.setCombinerClass(IntSumReducer.class); 
     job.setReducerClass(IntSumReducer.class); 
     job.setOutputKeyClass(Text.class); 
     job.setOutputValueClass(IntWritable.class); 
     FileInputFormat.addInputPath(job, new Path(oargs[0])); 
     FileOutputFormat.setOutputPath(job, new Path(oargs[1])); 
     System.out.println("=============================="); 
     System.out.println("start ..."); 
     boolean flag = job.waitForCompletion(true); 
      System.out.println(flag); 
     System.out.println("end ..."); 
     System.out.println("=============================="); 
    } 
}

和結果，請查看日誌：

[email protected] /cygdrive/f/develop/hadoop/hadoop-1.0.3 
$ ./bin/hadoop jar ./jar/wordcount.jar /tmp/input /tmp/output 
input: /tmp/input 
output: /tmp/output 
============================== 
start ... 
12/07/25 14:59:17 INFO input.FileInputFormat: Total input paths to process : 2 
12/07/25 14:59:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
12/07/25 14:59:17 WARN snappy.LoadSnappy: Snappy native library not loaded 
12/07/25 14:59:17 INFO mapred.JobClient: Running job: job_201207251447_0001 
12/07/25 14:59:18 INFO mapred.JobClient: map 0% reduce 0%

日誌不下去，永遠停在那裏。爲什麼？

我使用Windows XP系統中的cygwin軟件以本地模式運行代碼。

來源

2012-07-23 rory

「do it next」是什麼意思？你期望它應該做什麼？通常你必須等到你的集羣處理這個工作並返回給你，這就是'waitForCompletion'的含義。如果你的工作不成功，你現有的JVM。 – 2012-07-23 08:03:57

您是否可以發佈任何應該運行的兩個地圖任務的任務日誌？您可以通過作業跟蹤器web ui訪問這些URL，http：// localhost：50030 – 2012-07-25 10:27:48

@羅裏，正如托馬斯所問，你可以更具體的「下一步做」？這是你在屏幕上獲得的整個堆棧軌跡嗎？你的意思是你編譯過一次，然後得到結果，不能再運行一次？您是否已經爲eclipse IDE上的程序指定了正確的輸入參數，即輸入和輸出目錄？

如果您的意思是您無法再次運行程序，可能是您沒有指定不同的輸出目錄。但我想在看到堆棧跟蹤後情況並非如此。

來源

2012-07-23 13:37:13

感謝Arun，我的意思是當我的代碼調試到'job.waitForCompletion（true）'時，我的代碼不會繼續並永遠停留在那裏。 – rory 2012-07-24 08:12:58

我想，如果你問爲什麼你從來沒有看到end ====================的println部分，然後檢查你的代碼：

System.exit(job.waitForCompletion(true)?0:1); 
System.out.println("end ..."); 
System.out.println("==============================");

你包裹job.waitForCompletion(true)通話用System.exit，因此JVM會前終止最後兩個System.out可以執行。

編輯

日誌添加器/記錄器在這裏消息是一個線索，任何其他異常可能被吞噬。你應該修改簽名你的代碼，以利用ToolRunner效用：

public class WordCount { 
    public static void main(String[] args) throws Exception { 
    ToolRunner.run(new WordCount(), args); 
    } 

    public int run(String args[]) { 
    if(args.length != 2) { 
     System.err.println("Usage: word count <in> <out>"); 
    } 
    System.out.println("input: "+args[0]); 
    System.out.println("output: "+args[1]); 
    Job job = new Job(getConf(), "word count"); 
    Configuration conf = job.getConf(); 

    job.setJarByClass(WordCount.class); 
    job.setMapperClass(TokenizerMapper.class); 
    job.setCombinerClass(IntSumReducer.class); 
    job.setReducerClass(IntSumReducer.class); 
    job.setOutputKeyClass(Text.class); 
    job.setOutputValueClass(IntWritable.class); 

    FileInputFormat.addInputPath(job, new Path(args[0])); 
    FileOutputFormat.setOutputPath(job, new Path(args[1])); 

    System.out.println("=============================="); 
    System.out.println("start ..."); 
    int result = job.waitForCompletion(true) ? 0 : 1; 
    System.out.println("end ..."); 
    System.out.println("=============================="); 

    return results 
    } 
}

而且你應該使用$ HADOOP_HOME /斌/ Hadoop的腳本到你的作業提交到集羣（如下，你需要替換你的罐子的名字和WordCount類的全名）：

#> hadoop jar wordcount.jar WordCount input output

來源

2012-07-24 00:48:30

謝謝Chris！我想看到「結束====================」println，但我的問題不是「System.exit」，當我的代碼調試到「job.waitForCompletion（真）「，我的代碼不會繼續。 – rory 2012-07-24 08:06:40

謝謝克里斯！我想看看'end ===================='println，但是我的問題不是由於'System.exit'，當我的代碼調試到'job中。 waitForCompletion（true）'，我的代碼不會永遠停留在那裏 – rory 2012-07-24 08:15:41

你的工作甚至提交嗎？警告信息看起來很可疑 - 你不會看到關於appenders/logger的信息 – 2012-07-24 10:39:46

爲什麼我接下來不能處理我的hadoop程序？

回答

相關問題