2012-08-02 94 views
22

我的程序看起來像的Hadoop:java.lang.ClassCastException:org.apache.hadoop.io.LongWritable不能轉換到org.apache.hadoop.io.Text

public class TopKRecord extends Configured implements Tool { 

    public static class MapClass extends Mapper<Text, Text, Text, Text> { 

     public void map(Text key, Text value, Context context) throws IOException, InterruptedException { 
      // your map code goes here 
      String[] fields = value.toString().split(","); 
      String year = fields[1]; 
      String claims = fields[8]; 

      if (claims.length() > 0 && (!claims.startsWith("\""))) { 
       context.write(new Text(year.toString()), new Text(claims.toString())); 
      } 
     } 
    } 
    public int run(String args[]) throws Exception { 
     Job job = new Job(); 
     job.setJarByClass(TopKRecord.class); 

     job.setMapperClass(MapClass.class); 

     FileInputFormat.setInputPaths(job, new Path(args[0])); 
     FileOutputFormat.setOutputPath(job, new Path(args[1])); 

     job.setJobName("TopKRecord"); 
     job.setMapOutputValueClass(Text.class); 
     job.setNumReduceTasks(0); 
     boolean success = job.waitForCompletion(true); 
     return success ? 0 : 1; 
    } 

    public static void main(String args[]) throws Exception { 
     int ret = ToolRunner.run(new TopKRecord(), args); 
     System.exit(ret); 
    } 
} 

的數據看起來像

"PATENT","GYEAR","GDATE","APPYEAR","COUNTRY","POSTATE","ASSIGNEE","ASSCODE","CLAIMS","NCLASS","CAT","SUBCAT","CMADE","CRECEIVE","RATIOCIT","GENERAL","ORIGINAL","FWDAPLAG","BCKGTLAG","SELFCTUB","SELFCTLB","SECDUPBD","SECDLWBD" 
3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,, 
3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,, 
3070803,1963,1096,,"US","IL",,1,,2,6,63,,9,,0.3704,,,,,,, 
3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,, 

在運行這個節目,我看到控制檯下面

12/08/02 12:43:34 INFO mapred.JobClient: Task Id : attempt_201208021025_0007_m_000000_0, Status : FAILED 
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text 
    at com.hadoop.programs.TopKRecord$MapClass.map(TopKRecord.java:26) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:396) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) 
    at org.apache.hadoop.mapred.Child.main(Child.java:249) 

我相信類的類型正確映射, Class Mapper

請讓我知道我在做什麼錯在這裏?

回答

39

當您使用M/R程序讀取文件時,映射器的輸入鍵應爲文件中行的索引,而輸入值將爲全行。

所以這裏發生的事情是,你試圖讓行索引作爲Text對象這是錯誤的,你需要一個LongWritable來代替,所以Hadoop不會抱怨類型。

試試這個:

public class TopKRecord extends Configured implements Tool { 

    public static class MapClass extends Mapper<LongWritable, Text, Text, Text> { 

     public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
      // your map code goes here 
      String[] fields = value.toString().split(","); 
      String year = fields[1]; 
      String claims = fields[8]; 

      if (claims.length() > 0 && (!claims.startsWith("\""))) { 
       context.write(new Text(year.toString()), new Text(claims.toString())); 
      } 
     } 
    } 

    ... 
} 

同樣一件事在你的代碼,你可能要重新考慮,您要爲您正在處理的每個記錄2個Text對象。您應該只在開始時創建這兩個對象,然後在映射器中使用set方法設置它們的值。如果您正在處理大量數據,這將爲您節省大量時間。

8

你需要設置輸入格式類

job.setInputFormatClass(KeyValueTextInputFormat.class); 
job.setOutputFormatClass(TextOutputFormat.class); 
+1

好點!這是我的錯誤,通過將InputFormatClass設置爲'SequenceFileInputFormat.class'解決。當這項工作的輸入是以前工作的輸出時,這是有效的 – vefthym 2014-09-17 08:55:24

相關問題