2013-03-23 104 views
3

當我做一個mapreduce程序,我遇到的關鍵是一個元組(A,B)(A和B都是整數集)。我怎樣才能定製這種數據類型?mapreduce ---自定義數據類型

public static class MapClass extends Mapper<Object,Text,Tuple,Tuple>.... 

public class Tuple implements WritableComparable<Tuple>{ 


     @Override 
     public void readFields(DataInput arg0) throws IOException { 
      // TODO Auto-generated method stub 

     } 

     @Override 
     public void write(DataOutput arg0) throws IOException { 
      // TODO Auto-generated method stub 

     } 

     @Override 
     public int compareTo(Tuple o) { 
      // TODO Auto-generated method stub 
      return 0; 
     } 
    } 
+0

看看這個:http://hadoop-blog.blogspot.in/2012/04/hadoop-example-using-custom-java-class.html – Amar 2013-03-23 08:37:16

+0

感謝您分享鏈接給我,但數據類型我要實現的不是String對.A是Integer集合。例如,A = {0,1,2,3}。 – user2178911 2013-03-23 13:01:18

回答

3

就快,只需添加變量A和B,然後完成序列化方法和的compareTo:

public class Tuple implements WritableComparable<Tuple>{ 
    public Set<Integer> a = new TreeSet<Integer>; 
    public Set<Integer> b = new TreeSet<Integer>; 

    @Override 
    public void readFields(DataInput arg0) throws IOException { 
     a.clear(); 
     b.clear(); 

     int count = arg0.readInt(); 
     while (count-- > 0) { 
      a.add(arg0.readInt()); 
     } 

     count = arg0.readInt(); 
     while (count-- > 0) { 
      b.add(arg0.readInt()); 
     } 
    } 

    @Override 
    public void write(DataOutput arg0) throws IOException { 
     arg0.writeInt(a.size()); 
     for (int v : a) { 
      arg0.writeInt(v); 
     } 
     arg0.writeInt(b.size()); 
     for (int v : b) { 
      arg0.writeInt(v); 
     } 
    } 

    @Override 
    public int compareTo(Tuple o) { 
     // you'll need to implement how you want to compare the two sets between objects 
    } 
} 
+0

但A和B不只是整數,它們是整數集。例如,A = {0,1,2,3}。 – user2178911 2013-03-23 12:57:26

+0

那麼這是一個簡單的修正 - 請參閱編輯 – 2013-03-23 13:57:04

+0

只要記住要清除每個映射迭代之間的集合 - 否則您將累積所有看到的值 – 2013-03-23 14:04:04

1

實現Hadoop中的自定義數據類型,則必須實現WritableComparable接口併爲readFields()write()方法提供自定義實現。 除了執行readFiled之外,寫入方法必須覆蓋java對象的equals和hashcode方法。

如果自定義數據類型實現的密鑰必須實現可比較的接口。