使用MapReduce在HBase中插入多行

我想從每個映射器批量插入N行到HBase表。我currenly知道這樣做的方法有兩種：使用MapReduce在HBase中插入多行

創建Put對象的列表，並使用HTable實例put(List<Put> puts)方法，並且確保禁用autoFlush參數。
使用TableOutputFormat類和使用context.write(rowKey, put)方法。

哪一個更好？

在第一種方式中，context.write()不需要，因爲hTable.put(putsList)方法用於直接將數據放入表中。我的mapper類正在擴展Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>，那麼我應該使用哪些類KEYOUT和VALUEOUT？

第二種方式，我必須撥打context.write(rowKey, put) N次。有沒有什麼辦法可以使用context.write()獲得Put作業列表？

是否有任何其他方式與MapReduce做到這一點？

在此先感謝。

來源

2016-06-21 hp36

爲什麼單映射器爲什麼不能多映射器？你如何指定映射器的數量？即使你指定的建議編碼不能保證mappers的數量是一個。 –

您可以使用setNumMapTasks或conf.set（'mapred.map.tasks'，'numberofmappersyouwanttoset'）（但它對配置的建議）更改映射器的數量，但不能保證映射器實例將被設置。此外，它取決於輸入分裂。 –

lookat http://stackoverflow.com/questions/37239944/is-it-possible-to-run-multiple-mappers-on-one-node請看我詳細的答案..隨時提問。 –

我更喜歡第二個選項，其中配料是天然的（不需要的放列表）的映射縮減....有深入瞭解，請看看我的第二點

1）你的第一選擇List<Put>通常用於獨立Hbase Java客戶端。內部，它是由hbase.client.write.buffer下面一樣在你的配置之一控制XMLS

<property> 
     <name>hbase.client.write.buffer</name> 
     <value>20971520</value> // around 2 mb i guess 
</property>

其中有默認值2MB說大小。一旦你的緩衝區被填滿，那麼它將沖洗所有的投入來實際插入到你的表中。這是同樣的方式BufferedMutator在關於第二個選項＃2

2）解釋，如果你看到TableOutputFormat文檔

org.apache.hadoop.hbase.mapreduce 
Class TableOutputFormat<KEY> 

java.lang.Object 
org.apache.hadoop.mapreduce.OutputFormat<KEY,Mutation> 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat<KEY> 
All Implemented Interfaces: 
org.apache.hadoop.conf.Configurable 

@InterfaceAudience.Public 
@InterfaceStability.Stable 
public class TableOutputFormat<KEY> 
extends org.apache.hadoop.mapreduce.OutputFormat<KEY,Mutation> 
implements org.apache.hadoop.conf.Configurable 
Convert Map/Reduce output and write it to an HBase table. The KEY is ignored

while the output value must be either a Put or a Delete instance。

- 通過code看到這個的另一種方式是像下面。

/** 
    * Writes a key/value pair into the table. 
    * 
    * @param key The key. 
    * @param value The value. 
    * @throws IOException When writing fails. 
    * @see RecordWriter#write(Object, Object) 
    */ 
    @Override 
    public void write(KEY key, Mutation value) 
    throws IOException { 
     if (!(value instanceof Put) && !(value instanceof Delete)) { 
     throw new IOException("Pass a Delete or a Put"); 
     } 
     mutator.mutate(value); 
    } 
    }

結論：context.write（rowkey，putlist）這是不可能的API。

然而，BufferedMutator（來自增變器。在上面的代碼中發生變異）說

Map/reduce jobs benefit from batching, but have no natural flush point. {@code BufferedMutator} receives the puts from the M/R job and will batch puts based on some heuristic, such as the accumulated size of the puts, and submit batches of puts asynchronously so that the M/R logic can continue without interruption.

所以你的配料是自然的（與BufferedMutator）作爲上述

來源

2016-06-21 14:08:36

使用MapReduce在HBase中插入多行

回答

相關問題