重複mapreduce程序輸出？

我在輸出中得到了很多重複值，所以我實現了一個reduce函數，如下所示，但這個reduce函數仍然可以作爲一個身份函數，即使我有一個reduce也沒有差別。我的縮小功能有什麼問題？重複mapreduce程序輸出？

 public class search 
{  
    public static String str="And"; 
    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> 
    { 
     String mname=""; 
     public void configure(JobConf job) 
     { 
      mname=job.get(str); 
      job.set(mname,str); 
     } 

     private Text word = new Text(); 
     public Text Uinput =new Text(""); 
     public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException 
     { 

      String mapstr=mname; 
      Uinput.set(mapstr); 
      String line = value.toString(); 
      Text fdata = new Text(); 

      StringTokenizer tokenizer = new StringTokenizer(line); 
      while (tokenizer.hasMoreTokens()) 
      { 
       word.set(tokenizer.nextToken()); 
       fdata.set(line); 

       if(word.equals(Uinput)) 
       output.collect(fdata,new Text("")); 
      } 

     } 
    } 

    public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> 
    { 
     public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException 
     { 

      boolean start = true; 
      //System.out.println("inside reduce :"+input); 
      StringBuilder sb = new StringBuilder(); 
      while (values.hasNext()) 
      { 
       if(!start) 

       start=false; 
       sb.append(values.next().toString()); 

      } 
      //output.collect(key, new IntWritable(sum)); 
      output.collect(key, new Text(sb.toString())); 
     } 
    }

公共靜態無效的主要（字串[] args）拋出異常 {

JobConf conf = new JobConf(search.class); 
    conf.setJobName("QueryIndex"); 
    //JobConf conf = new JobConf(getConf(), WordCount.class); 
    conf.set(str,args[0]); 

    conf.setOutputKeyClass(Text.class); 
    conf.setOutputValueClass(Text.class); 

    conf.setMapperClass(Map.class); 
    //conf.setCombinerClass(SReducer.class); 
    conf.setReducerClass(SReducer.class); 

    conf.setInputFormat(TextInputFormat.class); 
    conf.setOutputFormat(TextOutputFormat.class); 



    FileInputFormat.setInputPaths(conf, new Path("IIndexOut")); 
    FileOutputFormat.setOutputPath(conf, new Path("searchOut")); 

    JobClient.runJob(conf); 
}

}

來源

2012-04-26 Karan Rekhi

可能的重複：http：//stackoverflow.com/questions/10305435/hadoop-inverted-index-without-recurrence-of-file-names – 2012-04-26 20:15:11

嗨馬特，我已經通過該帖子，但它沒有解決我的問題。這就是我發佈自己的原因。 – 2012-04-26 20:18:07

也許你沒有設置該減速器的實際減少功能來使用嗎？這是通過使用

job.setReducerClass().

如果您沒有將類設置爲您的類，那麼使用默認的減速器。你應該做以下幾點：

job.setReducerClass(SReducer.class)

請張貼您的主要功能，以便我們可以驗證。

來源

2012-04-26 20:02:15 Chaos

我做到了，我也在上面貼出來，請檢查一下。 – 2012-04-26 20:07:53

你確定你正在閱讀最新的輸出嗎？我建議您刪除所有以前的輸出文件並重新運行作業。什麼是順便運行你的工作？ – Chaos 2012-04-26 20:10:55

它是一個搜索引擎程序，所以indexout是倒排索引實現的輸出，在這個搜索步驟中，我只需要搜索一個關鍵字並顯示結果（即時獲取重複的地方） – 2012-04-26 20:13:15

我沒有看過徹底的代碼，但有一點我可以肯定的是布爾變量開始沒用這裏，下面如果代碼（！開始）應在括號去dup數據，否則你最終只能寫入你從mapper接收的reducer中的所有數據。

public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> 
{ 
    public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException 
    { 

     boolean start = true; 
     //System.out.println("inside reduce :"+input); 
     StringBuilder sb = new StringBuilder(); 
     while (values.hasNext()) 
     { 
      if(!start) 
      { 
       start=false; 
       sb.append(values.next().toString()); 
      } 

     } 
     //output.collect(key, new IntWritable(sum)); 
     output.collect(key, new Text(sb.toString())); 
    } 
}

或最佳減少方法是隻： -

public static class SReducer extends MapReduceBase implements Reducer<Text, Text, Text, Text> 
    { 
    public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException 
{ 

    //System.out.println("inside reduce :"+input); 
    StringBuilder sb = new StringBuilder(); 
    sb.append(values.next().toString()); 

    //output.collect(key, new IntWritable(sum)); 
    output.collect(key, new Text(sb.toString())); 
}

}

當你只關心迭代器的第一個值。

來源

2012-04-27 00:14:29 sulabhc

在地圖和縮小功能之前使用@override註釋。所以你可以肯定，你重寫了基類的方法。

來源

2013-08-20 21:46:17

重複mapreduce程序輸出？

回答

相關問題