Hadoop的減少上下文和其他輸入文件的方聯接

我有以下簡單的減速機：Hadoop的減少上下文和其他輸入文件的方聯接

int i = 0; 
int numPurchases = 0; 
IntWritable count = new IntWritable(); 

@Override 
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { 

    i = 0; 
    for (IntWritable val : values) { 
     i = i + Integer.parseInt(val.toString()); 
     numPurchases ++; 
    } 
    count.set(i/numPurchases); 
    numPurchases =0; 
    context.write (key, count); 
}

上面簡單地返回以下的輸出：

customerId | avgPurchasePrice

上面得到了減速其數據來自文件File1。有兩個問題：

1）我可以將購買數量numPurchases添加到輸出文件嗎？任何關於如何實現的指針將不勝感激

2）現在我有另一個文件File2。 File2基本上有以下幾種：

customerId | customerName | customerPhone | customerAddress。

我可以做一個減速裝置側加盟，使輸出文件將具有以下格式：

customerId | name | phone | avgPurchasePrice | totalPurchases？

如果有，我可以看看那裏的任何例子嗎？

來源

2013-11-23 rkh

我會建議這個，

創建兩個自定義類型。 CustomerKey和PurchaseSummary

1）CustomerKey：擁有的customerID，姓名和電話號碼。這應該實現WritableComparable

落實public int compareTo使得它使用的customerID進行比較。
重寫toString方法。

2）PurchaseSummary：具有avgPurchasePrice和totalPurchases。您可以實現Writable

重寫ToString方法

我假設數量totalPurchases是針對每個客戶的條目數的總和。

在映射程序中讀取文本並創建您的CustomerKey實例。值應該與您現在正在做的相同
在reducer中創建PurchaseSummary的實例並相應地填充其值。

來源

2013-11-23 04:24:05

這似乎是有道理的，但數據是在兩個文件...應該你的解決方案使用multipleInputs或單獨讀取文件，不知何故？我試圖將所有內容放在一起作爲回答... – rkh

您可以將所有文件放入hdfs目錄並將該目錄作爲參數傳遞給程序。 FileInputFormat處理輸入目錄中的所有文件。 –

這部分我知道。但是，接下來mapper如何知道要訪問哪個文件以及reducer如何知道要加入的內容...... – rkh

Hadoop的減少上下文和其他輸入文件的方聯接

回答

相關問題