2016-07-25 66 views

回答

3

是的。
分區是你把數據分成HDFS上的目錄數量。每個目錄都是一個分區。例如,如果你的表定義是像

CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING) 
COMMENT 'A bucketed copy of user_info' 
PARTITIONED BY(ds STRING) 
CLUSTERED BY(user_id) INTO 256 BUCKETS; 

那麼你就必須對HDFS目錄,如

/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/ 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/ 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-13/ 

桶裝是關於你的數據是如何分區裏面分佈,因此,您所擁有的文件像

/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/000000_0 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/000000_1 
... 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-11/000000_255 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/000000_0 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/000000_1 
... 
/user/hive/warehouse/user_info_bucketed/ds=2011-01-12/000000_255 

參考HDFS: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables http://www.hadooptpoint.com/hive-buckets-optimization-techniques/

0

你可以!在這種情況下,您將在分區數據中使用桶!

1

是的。這是直截了當的。
嘗試下面的東西:

CREATE TABLE IF NOT EXISTS employee_partition_bucket 
( 
employeeID Int, 
firstName String, 
designation String, 
salary Int 
) 
PARTITIONED BY (department string) 
CLUSTERED BY (designation) INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n'; 

在這個例子中,我通過指定
Hopw創建的分區由部門和剷鬥這將幫助你

+0

如何數據將在文件系統目錄分配?你能詳細說明嗎? – Farooque