複合鍵變更，Hadoop Map-Reduce？

我剛開始學習hadoop，並運行hadoop map-reduce程序與自定義分區和比較器。我面臨的問題是，主要和次要的排序沒有完成複合鍵，超過一個複合材料的一部分鍵正在與其他compsite-key部分進行更改。複合鍵變更，Hadoop Map-Reduce？

例如我創建內部映射

key1 -> tagA,1 
key2 -> tagA,1 
key3 -> tagA,1 
key4 -> tagA,1 
key5 -> tagA,2 
key6 -> tagA,2 
key7 -> tagB,1 
key8 -> tagB,1 
key9 -> tagB,1 
key10 -> tagB,1 
key11 -> tagB,2 
key12 -> tagB,2

和分割和組合下鍵如下

//Partitioner 
public static class TaggedJoiningPartitioner implements Partitioner<Text, Text> { 
    @Override 
    public int getPartition(Text key, Text value, int numPartitions) { 
     String line = key.toString(); 
     String tokens[] = line.split(","); 
     return (tokens[0].hashCode() & Integer.MAX_VALUE)% numPartitions; 
    } 
    @Override 
    public void configure(JobConf arg0) { 
     // TODO Auto-generated method stub //NOT OVERRIDING THIS METHOD 
    } 
} 
//Comparator 
public static class TaggedJoiningGroupingComparator extends WritableComparator { 

    public TaggedJoiningGroupingComparator() { 
     super(Text.class, true); 
    } 

    @Override 
    public int compare(WritableComparable a, WritableComparable b) { 
     String taggedKey1[] = ((Text)a).toString().split(","); 
     String taggedKey2[] = ((Text)b).toString().split(","); 
     return taggedKey1[0].compareTo(taggedKey2[0]); 
    } 
}

在減速的主要目標是正確分組根據標籤，但沒有適當的排序。還原劑中鍵的順序和內容如下：

//REDUCER 1 
key1 -> tagA,1 
key2 -> tagA,1 
key3 -> tagA,1 
key5 -> tagA,1 //2 changed by 1 here 
key6 -> tagA,1 //2 changed by 1 here 
key4 -> tagA,1 

//REDUCER 2 
key7 -> tagB,1 
key11 -> tagB,1 //2 changed by 1 here 
key12 -> tagB,1 //2 changed by 1 here 
key8 -> tagB,1 
key9 -> tagB,1 
key10 -> tagB,1

嘗試長時間解決但尚未成功，任何幫助讚賞？

來源

2014-09-27 Bruce_Wayne

我在這裏沒有看到第二種排序。二次排序在哪裏發生？ – 2014-09-27 19:14:19

我正在使用Hadoop的舊API。因此沒有任何像job.setSortComparatorClass（CompositeKeyComparator.class）;可用。你能否提供相當於舊的API。？ – 2014-09-27 21:28:42

另外我在JobConf對象中設置分區器和比較器，如下所示： - \t \t conf.setPartitionerClass（TaggedJoiningPartitioner.class）; \t conf.setOutputKeyComparatorClass（TaggedJoiningGroupingComparator.class）; – 2014-09-27 21:36:59

終於得到它的工作，其實我改變

conf.setOutputKeyComparatorClass(TaggedJoiningGroupingComparator.class);

到

conf.setOutputValueGroupingComparator(TaggedJoiningGroupingComparator.class);

也Hadoop的API文檔。 -

setOutputValueGroupingComparator(Class<? extends RawComparator> theClass) 
Set the user defined RawComparator comparator for grouping keys in the input to the reduce.

來源

2014-09-28 17:40:43

複合鍵變更，Hadoop Map-Reduce？

回答

相關問題