2013-03-20 102 views
2

我有這樣的代碼:爲什麼這段代碼不會迭代reducer值兩次?

public void reduce(Text key, Iterable<Text> values, Context context) 
     throws IOException, InterruptedException 
     { 
      String name = null; 
      String sid = null; 
      String predicate = null; 
      String oid = null; 
      String id = null; 
      String outKey = null; 
      String outVal = null; 

      LinkedList<Text> valuesList = new LinkedList<Text>(); 
      Iterator<Text> ite = values.iterator(); 
      while(ite.hasNext()) { 
       Text t = ite.next(); 
       String[] entities = t.toString().split("#-#-#-#"); 
         if(entities[entities.length-1].equalsIgnoreCase("topic_name")) 
       { 
        name = entities[0]; 
       } 
       valuesList.add(t); 
      } 
      Iterator<Text> ite2 = valuesList.iterator(); 
      while(ite2.hasNext()) { 
       Text t2 = ite2.next(); 
       String[] entities = t2.toString().split("#-#-#-#"); 
       if(!entities[entities.length-1].contains("topic_name")) 
       { 
        if(name!=null) { 
        outKey = entities[0]+"\t"+entities[1]+"\t"+name; 
       } 
       else 
       { 
        outKey = entities[0]+"\t"+entities[1]+"\t"+key.toString(); 
       } 
       context.write(new Text(outKey), null); 
       } 
      } 
     } 

我看到,當我通過再值迭代,它總是以最後一個值的緩存副本。

回答

5

第一個迭代器實際上總是會給你返回相同的Text對象,它只是在每次調用之前用不同的字符串填充它。它這樣做可以節省實例化對象的時間。所以你實際上構建了一個包含同一對象的許多副本的List<Text>。要解決這個問題,您應該將值保存到包含實際「未裝箱」值的List<String>中。像這樣:

 LinkedList<String> valuesList = new LinkedList<String>(); 
     Iterator<Text> ite = values.iterator(); 
     while(ite.hasNext()) { 
      Text t = ite.next(); 
      String[] entities = t.toString().split("#-#-#-#"); 
        if(entities[entities.length-1].equalsIgnoreCase("topic_name")) 
      { 
       name = entities[0]; 
      } 
      valuesList.add(t.toString()); 
     } 
     Iterator<String> ite2 = valuesList.iterator(); 
     while(ite2.hasNext()) { 
      String t2 = ite2.next(); 
      String[] entities = t2.split("#-#-#-#"); 
      if(!entities[entities.length-1].contains("topic_name")) 
      { 
       if(name!=null) { 
       outKey = entities[0]+"\t"+entities[1]+"\t"+name; 
      } 
      else 
      { 
       outKey = entities[0]+"\t"+entities[1]+"\t"+key.toString(); 
      } 
      context.write(new Text(outKey), null); 
      } 
     } 
+0

這是一個完美的答案。謝謝! – 2013-03-20 20:30:18