hadoop mapreduce映射器從文本文件中讀取不正確的值

我正在編寫一個mapreduce程序來處理文本文件，爲每行添加一個字符串。我面臨的問題是映射器映射方法中的文本值不正確。hadoop mapreduce映射器從文本文件中讀取不正確的值

只要文件中的某一行比上一行更少，就會自動將少量字符附加到行上，以使行長等於上一行讀取的行。

地圖方法PARAMS如下

*@Override 
protected void map(LongWritable key, Text value, Context context) 
     throws IOException, InterruptedException {*

我登錄地圖方法內的值，並觀察這種行爲。任何指針？

代碼段

Driver 

Configuration configuration = new Configuration(); 
     configuration.set("CLIENT_ID", "Test"); 
     Job job = Job.getInstance(configuration, JOB_NAME); 
     job.setJarByClass(JobDriver.class); 
     job.setMapperClass(AdwordsMapper.class); 
     job.setInputFormatClass(TextInputFormat.class); 
     job.setOutputFormatClass(TextOutputFormat.class); 
     job.setOutputKeyClass(Text.class); 
     job.setOutputValueClass(Text.class); 
     FileInputFormat.setInputPaths(job, new Path(args[0])); 
     FileOutputFormat.setOutputPath(job, new Path(args[1])); 
     FileOutputFormat.setCompressOutput(job, true); 
     FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class); 

     System.exit(job.waitForCompletion(true) ? 0 : 1); 


Mapper 

public class AdwordsMapper extends Mapper<LongWritable, Text, Text, Text> { 

    @Override 
    protected void map(LongWritable key, Text value, Context context) 
      throws IOException, InterruptedException { 
     String textLine = new String(value.getBytes()); 

     textLine = new StringBuffer(textLine).append(",") 
       .append(context.getConfiguration().get("CLIENT_ID")).toString(); 
     context.write(new Text(""), new Text(textLine)); 

    } 

}

來源

2015-03-30 Pradeep S

你可以發佈你的代碼。 – 2015-03-31 04:43:41

添加驅動程序和映射類的代碼片段 – 2015-03-31 16:57:35

我所知，在你的映射器的getBytes問題（）的;

，而不是這個

String textLine = new String(value.getBytes());

嘗試。

String textLine = value.toString();

來源

2015-03-31 17:18:49

謝謝Sravan。這解決了這個問題。還有一個關於輸出鍵的查詢。當我把空文本，輸出文件中有一個製表符。有沒有一種方法來指定一個密鑰而不生成任何額外的字符？ – 2015-03-31 17:50:00

傳遞null.thanks。 – 2015-03-31 17:54:39

使用NullWritable作爲密鑰。另外我在輸入文件夾中有多個文件。目前處理所有輸入文件後只生成一個輸出文件。我們可以爲每個輸入文件生成1個輸出文件 – 2015-03-31 21:45:50

hadoop mapreduce映射器從文本文件中讀取不正確的值

回答

相關問題