配置單元：GC開銷或堆空間錯誤 - 動態分區表

請您指導我解決此GC開銷和堆空間錯誤。配置單元：GC開銷或堆空間錯誤 - 動態分區表

我使用下面的查詢試圖從另一個表（動態分區）插入分區表：

INSERT OVERWRITE table tbl_part PARTITION(county) 
SELECT col1, col2.... col47, county FROM tbl;

我已經跑了下列參數：

export HADOOP_CLIENT_OPTS=" -Xmx2048m" 
set hive.exec.dynamic.partition=true; 
set hive.exec.dynamic.partition.mode=nonstrict; 
SET hive.exec.max.dynamic.partitions=2048; 
SET hive.exec.max.dynamic.partitions.pernode=256; 
set mapreduce.map.memory.mb=2048; 
set yarn.scheduler.minimum-allocation-mb=2048; 
set hive.exec.max.created.files=250000; 
set hive.vectorized.execution.enabled=true; 
set hive.merge.smallfiles.avgsize=283115520; 
set hive.merge.size.per.task=209715200;

紗現場還增加.XML：

<property> 
<name>yarn.nodemanager.vmem-check-enabled</name> 
<value>false</value> 
<description>Whether virtual memory limits will be enforced for containers</description> 
</property> 

<property> 
<name>yarn.nodemanager.vmem-pmem-ratio</name> 
<value>4</value> 
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description> 
</property>

運行免費-m：

  total  used  free  shared buffers  cached 
Mem:   15347  11090  4256   0  174  6051 
-/+ buffers/cache:  4864  10483 
Swap:  15670   18  15652

其具有1個內核的獨立羣集。準備測試數據以在Spark中運行我的單元測試用例。

你能指導我能做些什麼嗎？

源表具有以下特性：

Table Parameters:  
    COLUMN_STATS_ACCURATE true     
    numFiles    13     
    numRows     10509065    
    rawDataSize    3718599422   
    totalSize    3729108487   
    transient_lastDdlTime 1470909228

謝謝。

來源

2016-08-14 Aavik

添加DISTRIBUTE BY county 到您的查詢：

INSERT OVERWRITE table tbl_part PARTITION(county) SELECT col1, col2.... col47, county FROM tbl DISTRIBUTE BY county;

來源

2016-08-14 07:47:31 leftjoin

我跑了DISTRIBUTE BY，並得到了堆空間錯誤： – Aavik

你沒有提供日誌，因此它是我用的猜測。通常這有助於。你有沒有試圖增加分配的內存？也許它真的用完了內存。看到這個：https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-hive-out-of-memory-error-oom/而這個：https：//blogs.msdn.microsoft。 com/shanyu/2014/07/31/hadoop-yarn-memory-settings-in-hdinsight/ – leftjoin

set hive.vectorized.execution.enabled = true; set hive.vectorized.execution.reduce.enabled = true; set hive.vectorized.execution.reduce.groupby.enabled = true; set yarn.nodemanager.resource.memory-mb = 8192; set yarn.scheduler.minimum-allocation-mb = 2048; set yarn.scheduler.maximum-allocation-mb = 8192; SET hive.tez.container.size = 7168; SET hive.tez.java.opts = -Xmx4096m; – Aavik

配置單元：GC開銷或堆空間錯誤 - 動態分區表

回答

相關問題