錯誤在使用一個的MapReduce的輸出作爲另一個的MapReduce的輸入

I have two Map/Reduce classes, named MyMappper1/MyReducer1 and MyMapper2/MyReducer2, and want to use the output of MyReducer1 as the input of MyMapper2, by setting the input path of job2 to the output path of job1.

的類型如下：錯誤在使用一個的MapReduce的輸出作爲另一個的MapReduce的輸入

public class MyMapper1 extends Mapper<LongWritable, Text, IntWritable, IntArrayWritable> 
    public class MyReducer1 extends Reducer<IntWritable, IntArrayWritable, IntWritable, IntArrayWritable> 
    public class MyMapper2 extends Mapper<IntWritable, IntArrayWritable, IntWritable, IntArrayWritable> 
    public class MyReducer2 extends Reducer<IntWritable, IntArrayWritable, IntWritable, IntWritable> 

public class IntArrayWritable extends ArrayWritable { 
    public IntArrayWritable() { 
     super(IntWritable.class); 
    } 
}

以及用於設置輸入/輸出路徑的代碼是這樣的：

Path temppath = new Path("temp-dir-" + temp_time); 

    FileOutputFormat.setOutputPath(job1, temppath); 

      ........... 

    FileInputFormat.addInputPath(job2, temppath);

的設置輸入/輸出格式的代碼如下：

job1.setOutputFormatClass(TextOutputFormat.class); 
      .......... 
    job2.setInputFormatClass(KeyValueTextInputFormat.class);

但是我運行作業2時，總是得到異常：

11/04/16 12:34:09 WARN mapred.LocalJobRunner: job_local_0002 
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable 
    at ligon.MyMapper2.map(MyMapper2.java:1) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

我曾試圖改變InputFormat和OUTPUTFORMAT，但沒有成功，類似的（儘管不相同）的例外發生在作業2。

我完整的代碼包是： http://dl.dropbox.com/u/7361939/HW2_Q1.zip

非常感謝您！

來源

2011-04-16 Ligon Liu

問題是，在作業2中，KeyValueTextInputFormat會生成類型的鍵值對，並且您試圖使用接受的Mapper處理它們，從而導致ClassCastException。最好的選擇是將您的映射器更改爲接受並將文本轉換爲整數。

來源

2011-04-17 00:37:26 bajafresh4life

謝謝。現在的問題是：ArrayWritable由第一個reducer輸出如下 - 沒有任何元素值 - 如何讓第二個映射器接受這個並從這個字符串轉換爲對象？ [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] b21 – 2011-04-17 14:31:07

我也有同樣的問題，並得到相同的錯誤。我想把另一個hadoop工作的輸出用作第二個hadoop工作的輸入。第一份工作的輸出具有MapWritable作爲值。第二份工作的解決方案是job.InputFormatClass（）。但是我應該使用哪一個參數 – Yeameen 2012-04-20 07:21:51

我剛剛面臨同樣的問題，並在不久之前就想出瞭解決方案。由於您使用IntArrayWritable作爲Reducer的輸出，因此它易於編寫，並稍後以二進制形式讀取數據。

對於第一份工作：

job1.setOutputFormatClass(SequenceFileOutputFormat.class); 

    job1.setOutputKeyClass(IntWritable.class); 
    job1.setOutputValueClass(IntArrayWritable.class);

的第二件事：

job2.setInputFormatClass(SequenceFileInputFormat.class);

這應該工作你的情況

來源

2012-04-20 08:00:18 Yeameen

錯誤在使用一個的MapReduce的輸出作爲另一個的MapReduce的輸入

回答

相關問題