在CDH4上運行簡單的MR作業

我正在嘗試使用CDH4運行簡單的MR作業。我遇到了最奇怪的錯誤，我不知道爲什麼。基本上我的程序讀取一個文件，使用一個身份映射器，然後reducer簡單地發出一個鍵和一個字符串的值。我不明白爲什麼我的腳本不工作。我從來沒有像CDH3這樣的問題。任何意見將是巨大的在CDH4上運行簡單的MR作業

錯誤：

14/03/26 20:35:45 INFO mapred.JobClient: Task Id : attempt_201403171159_0109_m_000002_2, Status : FAILED 
java.lang.NumberFormatException: For input string: "256MB" 
     at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
     at java.lang.Integer.parseInt(Integer.java:492) 
     at java.lang.Integer.parseInt(Integer.java:527) 
     at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1060) 
     at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:809) 
     at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376) 
     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:85) 
     at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:584) 
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656) 
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) 
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:415) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) 
     at org.apache.hadoop

Maven依賴：

<dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-core</artifactId> 
      <version>2.0.0-mr1-cdh4.4.0</version> 
     </dependency> 
     <dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-common</artifactId> 
      <version>2.0.0-cdh4.4.0</version> 
     </dependency> 
     <dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-tools</artifactId> 
      <version>2.0.0-mr1-cdh4.4.0</version> 
     </dependency>

Maven的回購協議：

<repositories> 
     <repository> 
      <id>cloudera</id> 
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> 
     </repository> 
     <repository> 
      <id>maven-hadoop</id> 
      <name>Hadoop Releases</name> 
      <url>https://repository.cloudera.com/content/repositories/releases/</url> 
     </repository> 
    </repositories>

MR代碼：

package com.some.packagename; 

import java.io.IOException; 

import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.conf.Configured; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.Mapper; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; 
import org.apache.hadoop.util.Tool; 
import org.apache.hadoop.util.ToolRunner; 


public class MyMRJob extends Configured implements Tool { 

    private static String inputPath = "someHDFSInputPath"; 
    private static String outputPath = "someHDFSOutputPath"; 

    public static void main(String[] args) throws Exception { 


     Configuration conf = new Configuration(); 
     conf.set("mapred.job.tracker", "jtserver:8021"); 
     conf.set("fs.defaultFS", "hdfs://nnserver:8020"); 
     ToolRunner.run(conf, new MyMRJob(), args); 

    } 

    public final int run(final String[] args) throws Exception { 

     // Initialize 
     Job job = new Job(super.getConf(),MyMRJob.class.getSimpleName()); 

     // General Configs 
     job.setJarByClass(MyMRJob.class);  

     // Inputs  
     TextInputFormat.setInputPaths(job, inputPath); 
     job.setInputFormatClass(TextInputFormat.class); 

     // Mapper 
     job.setMapperClass(TheMapper.class); 
     job.setMapOutputKeyClass(Text.class); 
     job.setMapOutputValueClass(Text.class); 

     // Reducer 
     job.setReducerClass(TheReducer.class); 
     job.setOutputKeyClass(Text.class); 
     job.setOutputValueClass(Text.class); 

     // Output 
     TextOutputFormat.setOutputPath(job, new Path(outputPath)); 
     job.setOutputFormatClass(TextOutputFormat .class); 

     // Run the job 
     boolean b = job.waitForCompletion(true); 
     if (!b) 
      throw new IOException("Error with the job - it has failed!"); 

     return 1; 
    } 

    private static class TheMapper extends Mapper<Text, Text, Text, Text> { 
     protected void map(Text key, Text value, Context context) throws IOException, InterruptedException { 
      context.write(key, value); 
     } 
    } 

    public static class TheReducer extends Reducer<Text, Text, Text, Text> { 

     public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { 
      context.write(key, new Text("some value")); 
     } 
    } 


}

來源

2014-03-27 Tucker

看看你

mapred-site.xml

它可能有像「256MB」，特別是下列屬性的配置。

mapred.child.java.opts and io.sort.mb

來源

2014-03-27 04:23:31 malatesh

Yup，io.sort.mb被設置爲256MB。在我的配置中，我添加了conf.set（「io.sort.mb」，「256」）來覆蓋它，並且錯誤消失了。我想我應該改變它作爲一個永久修復文件？ – Tucker

是的，請在文件中更改它應該沒有任何錯誤的工作。 – malatesh

在CDH4上運行簡單的MR作業

回答

相關問題