2011-04-16 90 views
0
I have two Map/Reduce classes, named MyMappper1/MyReducer1 and MyMapper2/MyReducer2, and want to use the output of MyReducer1 as the input of MyMapper2, by setting the input path of job2 to the output path of job1. 

的類型如下:錯誤在使用一個的MapReduce的輸出作爲另一個的MapReduce的輸入

public class MyMapper1 extends Mapper<LongWritable, Text, IntWritable, IntArrayWritable> 
    public class MyReducer1 extends Reducer<IntWritable, IntArrayWritable, IntWritable, IntArrayWritable> 
    public class MyMapper2 extends Mapper<IntWritable, IntArrayWritable, IntWritable, IntArrayWritable> 
    public class MyReducer2 extends Reducer<IntWritable, IntArrayWritable, IntWritable, IntWritable> 

public class IntArrayWritable extends ArrayWritable { 
    public IntArrayWritable() { 
     super(IntWritable.class); 
    } 
} 

以及用於設置輸入/輸出路徑的代碼是這樣的:

Path temppath = new Path("temp-dir-" + temp_time); 

    FileOutputFormat.setOutputPath(job1, temppath); 

      ........... 

    FileInputFormat.addInputPath(job2, temppath); 

的設置輸入/輸出格式的代碼如下:

job1.setOutputFormatClass(TextOutputFormat.class); 
      .......... 
    job2.setInputFormatClass(KeyValueTextInputFormat.class); 

但是我運行作業2時,總是得到異常:

11/04/16 12:34:09 WARN mapred.LocalJobRunner: job_local_0002 
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable 
    at ligon.MyMapper2.map(MyMapper2.java:1) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) 

我曾試圖改變InputFormat和OUTPUTFORMAT,但沒有成功,類似的(儘管不相同)的例外發生在作業2。

我完整的代碼包是: http://dl.dropbox.com/u/7361939/HW2_Q1.zip

非常感謝您!

回答

0

問題是,在作業2中,KeyValueTextInputFormat會生成類型的鍵值對,並且您試圖使用接受的Mapper處理它們,從而導致ClassCastException。最好的選擇是將您的映射器更改爲接受並將文本轉換爲整數。

+1

謝謝。現在的問題是:ArrayWritable由第一個reducer輸出如下 - 沒有任何元素值 - 如何讓第二個映射器接受這個並從這個字符串轉換爲對象? [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] b21 – 2011-04-17 14:31:07

+0

我也有同樣的問題,並得到相同的錯誤。我想把另一個hadoop工作的輸出用作第二個hadoop工作的輸入。第一份工作的輸出具有MapWritable作爲值。第二份工作的解決方案是job.InputFormatClass()。但是我應該使用哪一個參數 – Yeameen 2012-04-20 07:21:51

0

我剛剛面臨同樣的問題,並在不久之前就想出瞭解決方案。由於您使用IntArrayWritable作爲Reducer的輸出,因此它易於編寫,並稍後以二進制形式讀取數據。

對於第一份工作:

job1.setOutputFormatClass(SequenceFileOutputFormat.class); 

    job1.setOutputKeyClass(IntWritable.class); 
    job1.setOutputValueClass(IntArrayWritable.class); 

的第二件事:

job2.setInputFormatClass(SequenceFileInputFormat.class); 

這應該工作你的情況

相關問題