2015-11-05 70 views
0

我在Amazon DynamoDB中有兩個表:元素和容器。層次結構是一個容器可以容納很少的元素。
所以元素看起來像:uuid,timestamp,container_id,data。
我想要聚合的所有元素的數據到對應的容器,例如:
元素:如何將數據複製到另一個表而不覆蓋現有列

| uuid | container_id | data | 
| 1 | 1   | 100 | 
| 2 | 1   | 150 | 
| 3 | 2   | 100 | 

所以我想在集裝箱表獲得:

| uuid | data | 
| 1 | 250 | 
| 2 | 100 | 

因此,使用蜂巢,我寫腳本(在EMR集羣上啓動):

CREATE EXTERNAL TABLE element (`uuid` string, `container_id ` bigint, `data` double) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES("dynamodb.table.name"="Elements", "dynamodb.column.mapping"="uuid:UUID,container_id:container_id,data:data"); 
CREATE EXTERNAL TABLE container (`uuid` string, `data` double) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES("dynamodb.table.name"="Containers", "dynamodb.column.mapping"="uuid:UUID,data:data"); 
INSERT INTO TABLE container SELECT container_id as `uuid` sum(`data`) as `data` FROM element WHERE container_id IS NOT NULL GROUP BY container_id; 

它運作良好,但現在我需要寫e Containers表的一些額外數據,所以它應該像uuid, data, another_data。但是,當我在上面執行腳本時,它將覆蓋所有another_data(未在外部表中列出)。我嘗試了很多變體,但找不到解決方案。

+0

當你到容器的表什麼樣的價值增加額外的列不出此列採取現有的數據? – madhu

+0

添加新數據不是高性能操作,所以我通過java和'amazonDynamoDBClient.updateItem(tableName,key,attributeUpdates)'來實現,它實際上會放入一些數據,而不會影響其他數據。 –

回答

0

好吧,我找到了答案:

CREATE EXTERNAL TABLE element (`uuid` string, `container_id ` bigint, `data` double) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES("dynamodb.table.name"="Elements", "dynamodb.column.mapping"="uuid:UUID,container_id:container_id,data:data"); 
CREATE EXTERNAL TABLE container (`uuid` string, `data` double, `another_data` double) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES("dynamodb.table.name"="Containers", "dynamodb.column.mapping"="uuid:UUID,data:data,another_data:another_data"); 
INSERT INTO TABLE container SELECT element.`container_id` as `uuid` sum(element.`data`) as `data`, collect_set(container.`another_data`)[0] as `another_data` FROM element LEFT JOIN container ON (element.`container_id` = container.`uuid`) WHERE element.container_id IS NOT NULL GROUP BY element.container_id; 
相關問題