在Hadoop教程中運行實現自定義輸出格式

我想要運行此tutorial中描述的代碼，以便自定義Hadoop中的輸出格式。更確切地說，本教程顯示了兩個java文件：在Hadoop教程中運行實現自定義輸出格式

字計數：是字數java應用程序（類似MapReduce的教程在此link的字計數V1.0）
XMLOutputFormat：擴展FileOutputFormat java類並實現定製輸出的方法。

嗯，我所做的就是把MapReduce的教程（而不是使用字計數顯示，在教程中）的字計數v1.0和添加的驅動程序job.setOutputFormatClass(XMLOutputFormat.class);和執行Hadoop的應用程序以這種方式：

/usr/local/hadoop/bin/hadoop com.sun.tools.javac.Main WordCount.java && jar cf wc.jar WordCount*.class && /usr/local/hadoop/bin/hadoop jar wc.jar WordCount /home/luis/Desktop/mytest/input/ ./output_folder

注：/home/luis/Desktop/mytest/input/和./output_folder是輸入和輸出文件夾，分別。

不幸的是，終端顯示我以下錯誤：

WordCount.java:57: error: cannot find symbol job.setOutputFormatClass(XMLOutputFormat.class); ^ symbol: class XMLOutputFormat location: class WordCount 1 error

爲什麼？ WordCount.java和XMLOutputFormat.java存儲在同一個文件夾中。

以下是我的代碼。

WordCount代碼：

import java.io.IOException; 
import java.util.StringTokenizer; 

import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.Mapper; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 

public class WordCount { 

    public static class TokenizerMapper 
     extends Mapper<Object, Text, Text, IntWritable>{ 

    private final static IntWritable one = new IntWritable(1); 
    private Text word = new Text(); 

    public void map(Object key, Text value, Context context 
        ) throws IOException, InterruptedException { 
     StringTokenizer itr = new StringTokenizer(value.toString()); 
     while (itr.hasMoreTokens()) { 
     word.set(itr.nextToken()); 
     context.write(word, one); 
     } 
    } 
    } 

    public static class IntSumReducer 
     extends Reducer<Text,IntWritable,Text,IntWritable> { 
    private IntWritable result = new IntWritable(); 

    public void reduce(Text key, Iterable<IntWritable> values, 
         Context context 
         ) throws IOException, InterruptedException { 
     int sum = 0; 
     for (IntWritable val : values) { 
     sum += val.get(); 
     } 
     result.set(sum); 
     context.write(key, result); 
    } 
    } 

    public static void main(String[] args) throws Exception { 
    Configuration conf = new Configuration(); 
    Job job = Job.getInstance(conf, "word count"); 
    job.setJarByClass(WordCount.class); 
    job.setMapperClass(TokenizerMapper.class); 
    job.setCombinerClass(IntSumReducer.class); 
    job.setReducerClass(IntSumReducer.class); 
    job.setOutputKeyClass(Text.class); 
    job.setOutputValueClass(IntWritable.class); 
    job.setOutputFormatClass(XMLOutputFormat.class); 
    FileInputFormat.addInputPath(job, new Path(args[0])); 
    FileOutputFormat.setOutputPath(job, new Path(args[1])); 
    System.exit(job.waitForCompletion(true) ? 0 : 1); 



    } 
}

XMLOutputFormat代碼：

import java.io.DataOutputStream; 
import java.io.IOException; 
import org.apache.hadoop.fs.FSDataOutputStream; 
import org.apache.hadoop.fs.FileSystem; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.*; 
import org.apache.hadoop.mapreduce.RecordWriter; 
import org.apache.hadoop.mapreduce.TaskAttemptContext; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 

public class XMLOutputFormat extends FileOutputFormat<Text, IntWritable> { 

    protected static class XMLRecordWriter extends RecordWriter<Text, IntWritable> { 

     private DataOutputStream out; 

     public XMLRecordWriter(DataOutputStream out) throws IOException{ 

      this.out = out; 
      out.writeBytes("<Output>\n"); 

     } 


     private void writeStyle(String xml_tag,String tag_value) throws IOException { 

      out.writeBytes("<"+xml_tag+">"+tag_value+"</"+xml_tag+">\n"); 

     } 

     public synchronized void write(Text key, IntWritable value) throws IOException { 

      out.writeBytes("<record>\n"); 
      this.writeStyle("key", key.toString()); 
      this.writeStyle("value", value.toString()); 
      out.writeBytes("</record>\n"); 

     } 

     public synchronized void close(TaskAttemptContext job) throws IOException { 

      try { 

       out.writeBytes("</Output>\n"); 

      } finally { 

       out.close(); 

      } 

     } 

    } 

    public RecordWriter<Text, IntWritable> getRecordWriter(TaskAttemptContext job) throws IOException { 

     String file_extension = ".xml"; 
     Path file = getDefaultWorkFile(job, file_extension); 
     FileSystem fs = file.getFileSystem(job.getConfiguration()); 
     FSDataOutputStream fileOut = fs.create(file, false); 
     return new XMLRecordWriter(fileOut); 

    } 

}

來源

2016-11-30 Junior hpc

你需要或者你WordCount類的開頭添加package testpackage;

或

import testpackage.XMLOutputFormat;在你的WordCount班。

因爲它們在同一個目錄中，所以並不意味着它們在同一個包中。

來源

2016-11-30 15:52:41 Kyriakos

我試過，但它沒有work.Well，我通過運行'在/ usr /本地/ Hadoop的/ bin中/ Hadoop的com.sun.tools.javac.Main -d創建testpackage。 XMLOutputFormat.java'.After，我添加了'package testpackage;'在'WordCount'的開頭用'/ usr/local/hadoop/bin/hadoop com.sun.tools.javac.Main WordCount.java'編譯它，但是我得到了同樣的錯誤。然後，我從'WordCount'開頭刪除了'package testpackage'，我在'WordCount'中添加了'import testpackage.XMLOutputFormat;'，但是我得到了'error：can not find symbol testpackage.XMLOutputFormat ;'並且找不到符號'job.setOutputFormatClass' –

我們需要先將XMLOutputFormat.jar文件添加到HADOOP_CLASSPATH中，以便驅動程序代碼找到它。並通過-libjars選項將其添加到地圖的classpath中並減少jvms。

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/abc/xyz/XMLOutputFormat.jar 

yarn jar wordcount.jar com.sample.test.Wordcount 
-libjars /path/to/XMLOutputFormat.jar 
/lab/mr/input /lab/output/output

來源

2016-11-30 18:09:29 AkashNegi

在Hadoop教程中運行實現自定義輸出格式

回答

相關問題