2017-04-24 93 views
0

我有一個查詢,我需要調整。在許多優秀的人員/計算器的幫助下,我沒有提出任何建議的改變,它確實有效;但真的想要了解蜂巢中的解釋計劃。並嘗試自己調整查詢。任何人都可以請幫我。需要幫助理解配置單元中的解釋計劃

查詢 -

CREATE TABLE admin.FctPrfitAmt_rpt AS 
SELECT * FROM admin.FctPrfitAmt t2 
WHERE t2.scenario_id NOT exists (SELECT 1 from admin.FctPrfitAmt_incr t3 where t2.scenario_id = t3.scenario_id) 
UNION ALL 
SELECT * FROM admin.FctPrfitAmt_incr 

EXPLAIN PLAN

STAGE DEPENDENCIES: 
    Stage-10 is a root stage 
    Stage-15 depends on stages: Stage-1, Stage-10, Stage-16 , consists of Stage-18, Stage-2 
    Stage-18 has a backup stage: Stage-2 
    Stage-14 depends on stages: Stage-18 
    Stage-3 depends on stages: Stage-2, Stage-14 
    Stage-9 depends on stages: Stage-3 , consists of Stage-6, Stage-5, Stage-7 
    Stage-6 
    Stage-0 depends on stages: Stage-6, Stage-5, Stage-8 
    Stage-20 depends on stages: Stage-0 
    Stage-4 depends on stages: Stage-20 
    Stage-5 
    Stage-7 
    Stage-8 depends on stages: Stage-7 
    Stage-2 
    Stage-11 is a root stage 
    Stage-12 depends on stages: Stage-11 
    Stage-17 depends on stages: Stage-12 , consists of Stage-19, Stage-1 
    Stage-19 has a backup stage: Stage-1 
    Stage-16 depends on stages: Stage-19 
    Stage-1 

STAGE PLANS: 
    Stage: Stage-10 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      alias: t3 
      Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: scenario_id (type: bigint) 
       outputColumnNames: scenario_id 
       Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE 
       Group By Operator 
       keys: scenario_id (type: bigint) 
       mode: hash 
       outputColumnNames: _col0 
       Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE 
       Reduce Output Operator 
        key expressions: _col0 (type: bigint) 
        sort order: + 
        Map-reduce partition columns: _col0 (type: bigint) 
        Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE 
     Reduce Operator Tree: 
     Group By Operator 
      keys: KEY._col0 (type: bigint) 
      mode: mergepartial 
      outputColumnNames: _col0 
      Statistics: Num rows: 19538143 Data size: 1141692827 Basic stats: COMPLETE Column stats: NONE 
      File Output Operator 
      compressed: false 
      table: 
       input format: org.apache.hadoop.mapred.SequenceFileInputFormat 
       output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat 
       serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe 

    Stage: Stage-15 
    Conditional Operator 

    Stage: Stage-18 
    Map Reduce Local Work 
     Alias -> Map Local Tables: 
     reconcile-subquery1:t1-subquery1:$INTNAME1 
      Fetch Operator 
      limit: -1 
     Alias -> Map Local Operator Tree: 
     reconcile-subquery1:t1-subquery1:$INTNAME1 
      TableScan 
      HashTable Sink Operator 
       keys: 
       0 _col0 (type: bigint) 
       1 _col0 (type: bigint) 

    Stage: Stage-14 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      Map Join Operator 
       condition map: 
        Left Outer Join0 to 1 
       keys: 
       0 _col0 (type: bigint) 
       1 _col0 (type: bigint) 
       outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col11 
       Statistics: Num rows: 715121683 Data size: 39113453068 Basic stats: COMPLETE Column stats: NONE 
       Filter Operator 
       predicate: _col11 is null (type: boolean) 
       Statistics: Num rows: 357560841 Data size: 19556726506 Basic stats: COMPLETE Column stats: NONE 
       Select Operator 
        expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint), _col3 (type: int), _col4 (type: double), _col5 (type: decimal(25,13)), _col6 (type: decimal(25,13)), _col7 (type: decimal(25,13)) 
        outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7 
        Statistics: Num rows: 357560841 Data size: 19556726506 Basic stats: COMPLETE Column stats: NONE 
        File Output Operator 
        compressed: false 
        table: 
         input format: org.apache.hadoop.mapred.SequenceFileInputFormat 
         output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat 
         serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe 
     Local Work: 
     Map Reduce Local Work 

    Stage: Stage-3 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      Union 
       Statistics: Num rows: 396637128 Data size: 21840112219 Basic stats: COMPLETE Column stats: NONE 
       File Output Operator 
       compressed: false 
       Statistics: Num rows: 396637128 Data size: 21840112219 Basic stats: COMPLETE Column stats: NONE 
       table: 
        input format: org.apache.hadoop.mapred.TextInputFormat 
        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
        name: admin.FctPrfitAmt_reporting_k_benchmark 
      TableScan 
      alias: FctPrfitAmt_incr 
      Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: scenario_id (type: bigint), facility_id (type: bigint), process_id (type: bigint), mp_surrogate_id (type: int), units (type: double), raw_amount (type: decimal(25,13)), allocation_percent (type: decimal(25,13)), capacity_allocation_percent (type: decimal(25,13)) 
       outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7 
       Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE 
       Union 
       Statistics: Num rows: 396637128 Data size: 21840112219 Basic stats: COMPLETE Column stats: NONE 
       File Output Operator 
        compressed: false 
        Statistics: Num rows: 396637128 Data size: 21840112219 Basic stats: COMPLETE Column stats: NONE 
        table: 
         input format: org.apache.hadoop.mapred.TextInputFormat 
         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
         name: admin.FctPrfitAmt_reporting_k_benchmark 

    Stage: Stage-9 
    Conditional Operator 

    Stage: Stage-6 
    Move Operator 
     files: 
      hdfs directory: true 
      destination: hdfs://nameservice1/admin/.hive-staging_hive_2017-04-24_04-17-27_639_6500987676644679103-777/-ext-10001 

    Stage: Stage-0 
    Move Operator 
     files: 
      hdfs directory: true 
      destination: hdfs://nameservice1/admin/FctPrfitAmt_reporting_k_benchmark 

    Stage: Stage-20 
     Create Table Operator: 
     Create Table 
      columns: scenario_id bigint, facility_id bigint, process_id bigint, mp_surrogate_id int, units double, raw_amount decimal(25,13), allocation_percent decimal(25,13), capacity_allocation_percent decimal(25,13) 
      input format: org.apache.hadoop.mapred.TextInputFormat 
      output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat 
      serde name: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
      name: admin.FctPrfitAmt_reporting_k_benchmark 

    Stage: Stage-4 
    Stats-Aggr Operator 

    Stage: Stage-5 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      File Output Operator 
       compressed: false 
       table: 
        input format: org.apache.hadoop.mapred.TextInputFormat 
        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
        name: admin.FctPrfitAmt_reporting_k_benchmark 

    Stage: Stage-7 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      File Output Operator 
       compressed: false 
       table: 
        input format: org.apache.hadoop.mapred.TextInputFormat 
        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 
        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
        name: admin.FctPrfitAmt_reporting_k_benchmark 

    Stage: Stage-8 
    Move Operator 
     files: 
      hdfs directory: true 
      destination: hdfs://nameservice1/admin/.hive-staging_hive_2017-04-24_04-17-27_639_6500987676644679103-777/-ext-10001 

    Stage: Stage-2 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      Reduce Output Operator 
       key expressions: _col0 (type: bigint) 
       sort order: + 
       Map-reduce partition columns: _col0 (type: bigint) 
       Statistics: Num rows: 650110607 Data size: 35557683837 Basic stats: COMPLETE Column stats: NONE 
       value expressions: _col1 (type: bigint), _col2 (type: bigint), _col3 (type: int), _col4 (type: double), _col5 (type: decimal(25,13)), _col6 (type: decimal(25,13)), _col7 (type: decimal(25,13)) 
      TableScan 
      Reduce Output Operator 
       key expressions: _col0 (type: bigint) 
       sort order: + 
       Map-reduce partition columns: _col0 (type: bigint) 
       Statistics: Num rows: 19538143 Data size: 1141692827 Basic stats: COMPLETE Column stats: NONE 
     Reduce Operator Tree: 
     Join Operator 
      condition map: 
       Left Outer Join0 to 1 
      keys: 
      0 _col0 (type: bigint) 
      1 _col0 (type: bigint) 
      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col11 
      Statistics: Num rows: 715121683 Data size: 39113453068 Basic stats: COMPLETE Column stats: NONE 
      Filter Operator 
      predicate: _col11 is null (type: boolean) 
      Statistics: Num rows: 357560841 Data size: 19556726506 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint), _col3 (type: int), _col4 (type: double), _col5 (type: decimal(25,13)), _col6 (type: decimal(25,13)), _col7 (type: decimal(25,13)) 
       outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7 
       Statistics: Num rows: 357560841 Data size: 19556726506 Basic stats: COMPLETE Column stats: NONE 
       File Output Operator 
       compressed: false 
       table: 
        input format: org.apache.hadoop.mapred.SequenceFileInputFormat 
        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat 
        serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe 

    Stage: Stage-11 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      alias: t3 
      filterExpr: scenario_id is null (type: boolean) 
      Statistics: Num rows: 39076287 Data size: 2283385713 Basic stats: COMPLETE Column stats: NONE 
      Filter Operator 
       predicate: scenario_id is null (type: boolean) 
       Statistics: Num rows: 19538143 Data size: 1141692827 Basic stats: COMPLETE Column stats: NONE 
       Select Operator 
       expressions: null (type: bigint) 
       outputColumnNames: scenario_id 
       Statistics: Num rows: 19538143 Data size: 1141692827 Basic stats: COMPLETE Column stats: NONE 
       Group By Operator 
        keys: scenario_id (type: bigint) 
        mode: hash 
        outputColumnNames: _col0 
        Statistics: Num rows: 19538143 Data size: 1141692827 Basic stats: COMPLETE Column stats: NONE 
        Reduce Output Operator 
        key expressions: _col0 (type: bigint) 
        sort order: + 
        Map-reduce partition columns: _col0 (type: bigint) 
        Statistics: Num rows: 19538143 Data size: 1141692827 Basic stats: COMPLETE Column stats: NONE 
     Reduce Operator Tree: 
     Group By Operator 
      keys: KEY._col0 (type: bigint) 
      mode: mergepartial 
      outputColumnNames: _col0 
      Statistics: Num rows: 9769071 Data size: 570846384 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
      Statistics: Num rows: 9769071 Data size: 570846384 Basic stats: COMPLETE Column stats: NONE 
      Group By Operator 
       aggregations: count() 
       mode: hash 
       outputColumnNames: _col0 
       Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 
       File Output Operator 
       compressed: false 
       table: 
        input format: org.apache.hadoop.mapred.SequenceFileInputFormat 
        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat 
        serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe 

    Stage: Stage-12 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      Reduce Output Operator 
       sort order: 
       Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 
       value expressions: _col0 (type: bigint) 
     Reduce Operator Tree: 
     Group By Operator 
      aggregations: count(VALUE._col0) 
      mode: mergepartial 
      outputColumnNames: _col0 
      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 
      Filter Operator 
      predicate: (_col0 = 0) (type: boolean) 
      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 
      Select Operator 
       expressions: 0 (type: bigint) 
       outputColumnNames: _col0 
       Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 
       Group By Operator 
       keys: _col0 (type: bigint) 
       mode: hash 
       outputColumnNames: _col0 
       Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 
       File Output Operator 
        compressed: false 
        table: 
         input format: org.apache.hadoop.mapred.SequenceFileInputFormat 
         output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat 
         serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe 

    Stage: Stage-17 
    Conditional Operator 

    Stage: Stage-19 
    Map Reduce Local Work 
     Alias -> Map Local Tables: 
     reconcile-subquery1:t1-subquery1:$INTNAME 
      Fetch Operator 
      limit: -1 
     Alias -> Map Local Operator Tree: 
     reconcile-subquery1:t1-subquery1:$INTNAME 
      TableScan 
      HashTable Sink Operator 
       keys: 
0 
1 

    Stage: Stage-16 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      alias: t2 
      Statistics: Num rows: 591009630 Data size: 32325166424 Basic stats: COMPLETE Column stats: NONE 
      Map Join Operator 
       condition map: 
        Left Semi Join 0 to 1 
       keys: 
0 
1 
       outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7 
       Statistics: Num rows: 650110607 Data size: 35557683837 Basic stats: COMPLETE Column stats: NONE 
       File Output Operator 
       compressed: false 
       table: 
        input format: org.apache.hadoop.mapred.SequenceFileInputFormat 
        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat 
        serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe 
     Local Work: 
     Map Reduce Local Work 

    Stage: Stage-1 
    Map Reduce 
     Map Operator Tree: 
      TableScan 
      alias: t2 
      Statistics: Num rows: 591009630 Data size: 32325166424 Basic stats: COMPLETE Column stats: NONE 
      Reduce Output Operator 
       sort order: 
       Statistics: Num rows: 591009630 Data size: 32325166424 Basic stats: COMPLETE Column stats: NONE 
       value expressions: scenario_id (type: bigint), facility_id (type: bigint), process_id (type: bigint), mp_surrogate_id (type: int), units (type: double), raw_amount (type: decimal(25,13)), allocation_percent (type: decimal(25,13)), capacity_allocation_percent (type: decimal(25,13)) 
      TableScan 
      Reduce Output Operator 
       sort order: 
       Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE 
     Reduce Operator Tree: 
     Join Operator 
      condition map: 
       Left Semi Join 0 to 1 
      keys: 
0 
1 
      outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7 
      Statistics: Num rows: 650110607 Data size: 35557683837 Basic stats: COMPLETE Column stats: NONE 
      File Output Operator 
      compressed: false 
      table: 
       input format: org.apache.hadoop.mapred.SequenceFileInputFormat 
       output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat 
       serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe 

回答

0

它看起來像一個增量操作。

我們使用完全連接,只生成一個mapreduce作業。

CREATE TABLE admin.FctPrfitAmt_rpt AS 
SELECT 
    case when t3.scenario_id is null then t2.scenario_id else t3.scenario_id as scenario_id , 
    case when t3.scenario_id is null then t2.COL1 else t3.COL1 as COL1 , 
    case when t3.scenario_id is null then t2.COL2 else t3.COL2 as COL2 , 
    case when t3.scenario_id is null then t2.COL3 else t3.COL3 as COL3 , 
    ........ 
FROM admin.FctPrfitAmt t2 
full join admin.FctPrfitAmt_incr t3 on t2.scenario_id = t3.scenario_id 
+0

此解決方案不相關,因爲'scenario_id'在兩個表上都不唯一 –