我正在編寫一個mapreduce程序來處理文本文件,爲每行添加一個字符串。我面臨的問題是映射器映射方法中的文本值不正確。hadoop mapreduce映射器從文本文件中讀取不正確的值
只要文件中的某一行比上一行更少,就會自動將少量字符附加到行上,以使行長等於上一行讀取的行。
地圖方法PARAMS如下
*@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {*
我登錄地圖方法內的值,並觀察這種行爲。 任何指針?
代碼段
Driver
Configuration configuration = new Configuration();
configuration.set("CLIENT_ID", "Test");
Job job = Job.getInstance(configuration, JOB_NAME);
job.setJarByClass(JobDriver.class);
job.setMapperClass(AdwordsMapper.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
Mapper
public class AdwordsMapper extends Mapper<LongWritable, Text, Text, Text> {
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String textLine = new String(value.getBytes());
textLine = new StringBuffer(textLine).append(",")
.append(context.getConfiguration().get("CLIENT_ID")).toString();
context.write(new Text(""), new Text(textLine));
}
}
你可以發佈你的代碼。 – 2015-03-31 04:43:41
添加驅動程序和映射類的代碼片段 – 2015-03-31 16:57:35