HIVE：Map分區表中的連接

考慮到配置了事實和維度表的配置單元中的典型數據倉庫場景，比如事實表被分割爲多個數據節點和分區。在加入尺寸（未分區）的事實表（分區）時，使用Map連接似乎是合乎邏輯的，因爲尺寸表的尺寸很小，因此它們將存儲在內存中以便跨所有節點有效地連接事實數據。HIVE：Map分區表中的連接

但是，很少有在線資源表明Map連接要在分區表上執行，兩個表上的分區鍵應該與連接鍵相同。

所以，這就是我要尋找的答案的問題：
分區表（事實）可以是MAP與非分區表（尺寸）加入？

來源

2017-06-12 Kirthika Ramachandran

答案是 - 是

圖加入運營商

演示

create table fact (rec_id int,dim_id int) partitioned by (dt date); 
create table dim (dim_id int,descr string);

explain 
select * 
from fact f join dim d 
     on d.dim_id = f.dim_id

STAGE DEPENDENCIES: 
    Stage-4 is a root stage 
    Stage-3 depends on stages: Stage-4 
    Stage-0 depends on stages: Stage-3 

STAGE PLANS: 
    Stage: Stage-4 
    Map Reduce Local Work 
     Alias -> Map Local Tables: 
     d 
      Fetch Operator 
      limit: -1 
     Alias -> Map Local Operator Tree: 
     d 
      TableScan 
      alias: d 
      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE 
      Filter Operator 
       predicate: dim_id is not null (type: boolean) 
       Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE 
       HashTable Sink Operator 
       keys: 
        0 dim_id (type: int) 
        1 dim_id (type: int) 

    Stage: Stage-3 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      alias: f 
      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE 
      Filter Operator 
       predicate: dim_id is not null (type: boolean) 
       Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE 
       Map Join Operator 
       condition map: 
        Inner Join 0 to 1 
       keys: 
        0 dim_id (type: int) 
        1 dim_id (type: int) 
       outputColumnNames: _col0, _col1, _col2, _col6, _col7 
       Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE 
       Select Operator 
        expressions: _col0 (type: int), _col1 (type: int), _col2 (type: date), _col6 (type: int), _col7 (type: string) 
        outputColumnNames: _col0, _col1, _col2, _col3, _col4 
        Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE 
        File Output Operator 
        compressed: false 
        Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE 
        table: 
         input format: org.apache.hadoop.mapred.SequenceFileInputFormat 
         output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat 
         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
     Local Work: 
     Map Reduce Local Work 

    Stage: Stage-0 
    Fetch Operator 
     limit: -1 
     Processor Tree: 
     ListSink

來源

2017-06-12 12:11:37

HIVE：Map分區表中的連接

回答

相關問題