Spark Spark上的配置單元失敗，直接SELECT，但SELECT GROUP BY成功

我已經成功使用Spark上的Hive幾個星期，執行包含聚合數據的CTAS語句。最近，我需要用一個直接的select語句（沒有GROUP BY）來執行一個CTAS，並且很驚訝它失敗了。我發現我可以在MR上用Hive執行聲明就好了。我創建了一個測試來隔離我的問題。Spark Spark上的配置單元失敗，直接SELECT，但SELECT GROUP BY成功

創建源表：

hive -e " 

    CREATE DATABASE IF NOT EXISTS testhiveonspark; 
    DROP TABLE IF EXISTS testhiveonspark.temptable1; 
    CREATE TABLE testhiveonspark.temptable1 (num1 INT, num2 INT) 
    STORED AS TEXTFILE; 
    INSERT into table testhiveonspark.temptable1 values (1, 1); 
    select * from testhiveonspark.temptable1; 

"

執行工作CTAS文件（使用SELECT GROUP BY）......這樣就完成就好了：

cat > testhiveonspark_working.hql << EOF 

    set mapred.job.queue.name=root.apps10; 
    set spark.master=yarn-client; 
    set hive.server2.enable.doAs=false; 
    set hive.execution.engine=spark; 
    set spark.eventLog.enabled=true; 
    set spark.shuffle.blockTransferService=nio; 
    set spark.eventLog.dir=hdfs://hdfsnode:8020/user/spark/applicationHistory; 
    set spark.shuffle.service.enabled=true; 
    set spark.dynamicAllocation.enabled=true; 

    DROP TABLE IF EXISTS testhiveonspark.temptable2; 
    CREATE TABLE testhiveonspark.temptable2 
    STORED AS TEXTFILE 
    AS SELECT num1, num2 FROM testhiveonspark.temptable1 GROUP BY num1, num2; 
    SELECT * FROM testhiveonspark.temptable2; 

EOF 
hive -f testhiveonspark_working.hql 

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties 
OK 
Time taken: 0.931 seconds 
Query ID = jolsen_20161112104040_4e398c2c-4691-47c8-ab79-dfd56195bf2a 
Total jobs = 1 
Launching Job 1 out of 1 
In order to change the average load for a reducer (in bytes): 
    set hive.exec.reducers.bytes.per.reducer=<number> 
In order to limit the maximum number of reducers: 
    set hive.exec.reducers.max=<number> 
In order to set a constant number of reducers: 
    set mapreduce.job.reduces=<number> 
Starting Spark Job = 3f263511-afa9-48ba-b224-1496194d5d3e 
Running with YARN Application = application_1478903884683_0019 
Kill Command = /usr/lib/hadoop/bin/yarn application -kill application_1478903884683_0019 

Query Hive on Spark job[0] stages: 
0 
1 

Status: Running (Hive on Spark job[0]) 
Job Progress Format 
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost] 
2016-11-12 10:41:15,453 Stage-0_0: 0/1 Stage-1_0: 0/1 
2016-11-12 10:41:18,482 Stage-0_0: 0/1 Stage-1_0: 0/1 
2016-11-12 10:41:21,504 Stage-0_0: 0(+1)/1  Stage-1_0: 0/1 
2016-11-12 10:41:24,533 Stage-0_0: 1/1 Finished Stage-1_0: 0(+1)/1 
2016-11-12 10:41:25,542 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished 
Status: Finished successfully in 18.15 seconds 
Moving data to: hdfs://hdfsnode:8020/user/hive/warehouse/testhiveonspark.db/temptable2 
Table testhiveonspark.temptable2 stats: [numFiles=1, numRows=1, totalSize=4, rawDataSize=3] 
OK 
Time taken: 83.03 seconds 
OK 
1  1 
Time taken: 0.203 seconds, Fetched: 1 row(s)

執行一個沒有工作CTAS文件（僅限於使用SELECT，no GROUP BY）...這不完整：

cat > testhiveonspark_not_working.hql << EOF 

    set mapred.job.queue.name=root.apps10; 
    set spark.master=yarn-client; 
    set hive.server2.enable.doAs=false; 
    set hive.execution.engine=spark; 
    set spark.eventLog.enabled=true; 
    set spark.shuffle.blockTransferService=nio; 
    set spark.eventLog.dir=hdfs://HDFSNode:8020/user/spark/applicationHistory; 
    set spark.shuffle.service.enabled=true; 
    set spark.dynamicAllocation.enabled=true; 

    DROP TABLE IF EXISTS testhiveonspark.temptable2; 
    CREATE TABLE testhiveonspark.temptable2 
    STORED AS TEXTFILE 
    AS SELECT num1, num2 FROM testhiveonspark.temptable1; 

EOF 
hive -f testhiveonspark_not_working.hql 

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties 
OK 
Time taken: 1.041 seconds 
Query ID = jolsen_20161112111313_018a2914-6a3b-428c-af7e-81e0a99a302d 
Total jobs = 1 
Launching Job 1 out of 1 
In order to change the average load for a reducer (in bytes): 
    set hive.exec.reducers.bytes.per.reducer=<number> 
In order to limit the maximum number of reducers: 
    set hive.exec.reducers.max=<number> 
In order to set a constant number of reducers: 
    set mapreduce.job.reduces=<number> 
Starting Spark Job = c2affd61-1a39-4765-b3ac-e27bc79562ab 
Job hasn't been submitted after 61s. Aborting it. 
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc. 
Please check YARN or Spark driver's logs for further information. 
Status: SENT 
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

我當時運行的是「Hive 1.1.0-cdh5.8.0」版本， Spark「版本1.6.0」，並認爲這可能與我沒有意識到的錯誤或默認設置有關。 cdh 5.8.2的發行說明指出了一張似乎與我的問題（https://issues.apache.org/jira/browse/HIVE-12650）相關的故障單，因此決定將我的集羣升級到最新版本cdh5.9.0。不幸的是，我仍然有同樣的問題。

我「認爲」日誌顯示Hive在從Application Master獲得響應之前超時，但我確實不確定。我仍在調查。我也使用調試運行配置單元，但沒有看到任何有用的信息......日誌似乎證實了我已經從錯誤中觀察到了什麼：「hive --hiveconf hive.root.logger = DEBUG，console -f testhiveonspark_working.hql「

失敗的原因是什麼？兩個CTAS聲明沒有太大的不同。爲什麼一個人會工作而另一個人失敗？

謝謝。

我跑了我的工作，而不是工作測試調試像「配置單元--hiveconf hive.root.logger = DEBUG，控制檯-f testhiveonspark_not_working.hql」並比較輸出。從三角洲，我嘗試選擇我認爲是日誌中最相關的部分，我在下面粘貼了這些部分。我仍然在努力去理解爲什麼「61s之後還沒有提交Job」以及這個問題的根源。任何幫助？謝謝。

... 

6/11/12 11:42:56 [main]: INFO client.SparkClientImpl: Running client driver with argv: /usr/lib/spark/bin/spark-submit --properties-file /tmp/spark-submit.5047982490856668419.properties --class org.apache.hive.spark.client.RemoteDriver /usr/lib/hive/lib/hive-exec-1.1.0-cdh5.9.0.jar --remote-host HDFSNode --remote-port 42944 --conf hive.spark.client.connect.timeout=1000 --conf hive.spark.client.server.connect.timeout=90000 --conf hive.spark.client.channel.log.level=null --conf hive.spark.client.rpc.max.size=52428800 --conf hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256 --conf hive.spark.client.rpc.server.address=null 
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000 
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8 
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000 
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256 
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800 
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Class path contains multiple SLF4J bindings. 
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:57 INFO RemoteDriver: Connecting to: HDFSNode:42944 
16/11/12 11:42:58 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:58 WARN Rpc: Invalid log level null, reverting to default. 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG util.ResourceLeakDetector: -Dio.netty.leakDetectionLevel: simple 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG util.Recycler: -Dio.netty.recycler.maxCapacity.default: 262144 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (41 bytes) 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG internal.Cleaner0: java.nio.ByteBuffer.cleaner(): available 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Handling SASL challenge message... 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Sending SASL challenge response... 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (98 bytes) 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (275 bytes) 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Handling SASL challenge message... 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Sending SASL challenge response... 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (45 bytes) 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: SASL negotiation finished with QOP auth. 
16/11/12 11:42:58 [main]: DEBUG session.SparkSessionManagerImpl: New session (23ef000b-8281-4c2d-a1db-077a3d40366f) is created. 
16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkSubmitJob from=org.apache.hadoop.hive.ql.exec.spark.SparkTask> 
16/11/12 11:42:58 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:58 INFO SparkContext: Running Spark version 1.6.0 
16/11/12 11:42:58 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 0 (org.apache.hive.spark.client.BaseProtocol$SyncJobRequest). 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes) 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$SyncJobRequest (168 bytes) 
16/11/12 11:42:58 [main]: DEBUG hdfs.DFSClient: /tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1: masked=rwx------ 
16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #22 
16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #22 
16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: mkdirs took 9ms 
16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #23 
16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #23 
16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms 
16/11/12 11:42:58 [main]: INFO ql.Context: New scratch dir is hdfs://HDFSNode:8020/tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1 
16/11/12 11:42:58 [main]: DEBUG hdfs.DFSClient: /tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1/-mr-10003: masked=rwxr-xr-x 
16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #24 
16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #24 
16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: mkdirs took 5ms 
16/11/12 11:42:58 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 1 (org.apache.hive.spark.client.BaseProtocol$JobRequest). 
16/11/12 11:42:58 [main]: DEBUG client.SparkClientImpl: Send JobRequest[3e50626e-3a1e-47d0-a18c-76869e1277ae]. 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes) 
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$JobRequest (174708 bytes) 
16/11/12 11:42:58 [main]: INFO log.PerfLogger: </PERFLOG method=SparkSubmitJob start=1478968978795 end=1478968978987 duration=192 from=org.apache.hadoop.hive.ql.exec.spark.SparkTask> 
Starting Spark Job = 3e50626e-3a1e-47d0-a18c-76869e1277ae 
16/11/12 11:42:58 [main]: INFO exec.Task: Starting Spark Job = 3e50626e-3a1e-47d0-a18c-76869e1277ae 
16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkRunJob from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor> 
16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkSubmitToRunning from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor> 
state = SENT 
16/11/12 11:42:58 [main]: INFO status.SparkJobMonitor: state = SENT 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: Changing view acls to: jolsen 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: Changing modify acls to: jolsen 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jolsen); users with modify permissions: Set(jolsen) 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Utils: Successfully started service 'sparkDriver' on port 43254. 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Slf4jLogger: Slf4jLogger started 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Starting remoting 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:53727] 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:53727] 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 53727. 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SparkEnv: Registering MapOutputTracker 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SparkEnv: Registering BlockManagerMaster 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-1353a7a8-14f0-4b9d-b369-1224fbc9b331 
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO MemoryStore: MemoryStore started with capacity 530.3 MB 
state = SENT 
16/11/12 11:42:59 [main]: INFO status.SparkJobMonitor: state = SENT 
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkEnv: Registering OutputCommitCoordinator 
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkUI: Started SparkUI at http://192.168.1.99:4040 
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkContext: Added JAR file:/usr/lib/hive/lib/hive-exec-1.1.0-cdh5.9.0.jar at spark://192.168.1.99:43254/jars/hive-exec-1.1.0-cdh5.9.0.jar with timestamp 1478968980504 
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO RMProxy: Connecting to ResourceManager at HDFSNode/192.168.1.99:8032 
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Requesting a new application from cluster with 10 NodeManagers 
state = SENT 
16/11/12 11:43:00 [main]: INFO status.SparkJobMonitor: state = SENT 
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (20480 MB per container) 
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Setting up container launch context for our AM 
16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Setting up the launch environment for our AM container 
16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Preparing resources for our AM container 
16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar -> hdfs://HDFSNode:8020/user/jolsen/.sparkStaging/application_1478903884683_0032/spark-assembly-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar 
state = SENT 
16/11/12 11:43:01 [main]: INFO status.SparkJobMonitor: state = SENT 
16/11/12 11:43:02 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:02 INFO Client: Uploading resource file:/tmp/spark-695bd22c-574c-41ff-9e38-fa491f09c3f6/__spark_conf__2574250824979235523.zip -> hdfs://HDFSNode:8020/user/jolsen/.sparkStaging/application_1478903884683_0032/__spark_conf__2574250824979235523.zip 
state = SENT 
16/11/12 11:43:02 [main]: INFO status.SparkJobMonitor: state = SENT 
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: Changing view acls to: jolsen 
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: Changing modify acls to: jolsen 
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jolsen); users with modify permissions: Set(jolsen) 
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO Client: Submitting application 32 to ResourceManager 
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO YarnClientImpl: Submitted application application_1478903884683_0032 
state = SENT 
16/11/12 11:43:04 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:05 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:06 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:07 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:08 [main]: INFO status.SparkJobMonitor: state = SENT 
16/11/12 11:43:08 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms 
16/11/12 11:43:08 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: closed 
16/11/12 11:43:08 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: stopped, remaining connections 0 
state = SENT 
16/11/12 11:43:09 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:10 [main]: INFO status.SparkJobMonitor: state = SENT 
16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null) 
16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> HDFSNode, PROXY_URI_BASES -> http://HDFSNode:8151/proxy/application_1478903884683_0032), /proxy/application_1478903884683_0032 
16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 
state = SENT 
16/11/12 11:43:11 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:12 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:13 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:14 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:15 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:16 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:17 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:18 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:19 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:20 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:21 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:22 [main]: INFO status.SparkJobMonitor: state = SENT 
16/11/12 11:43:22 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms 
state = SENT 
16/11/12 11:43:23 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:24 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:25 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:26 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:27 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:28 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:29 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:30 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:31 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:32 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:33 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:34 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:35 [main]: INFO status.SparkJobMonitor: state = SENT 
16/11/12 11:43:35 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms 
state = SENT 
16/11/12 11:43:36 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:37 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:38 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:39 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:40 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:41 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:42 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:43 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:44 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:45 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:46 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:47 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:48 [main]: INFO status.SparkJobMonitor: state = SENT 
16/11/12 11:43:48 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms 
state = SENT 
16/11/12 11:43:49 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:50 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:51 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:52 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:53 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:54 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:55 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:56 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:57 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:58 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:43:59 [main]: INFO status.SparkJobMonitor: state = SENT 
state = SENT 
16/11/12 11:44:00 [main]: INFO status.SparkJobMonitor: state = SENT 
Job hasn't been submitted after 61s. Aborting it. 
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc. 
Please check YARN or Spark driver's logs for further information. 
16/11/12 11:44:00 [main]: ERROR status.SparkJobMonitor: Job hasn't been submitted after 61s. Aborting it. 
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc. 
Please check YARN or Spark driver's logs for further information. 
Status: SENT 
16/11/12 11:44:00 [main]: ERROR status.SparkJobMonitor: Status: SENT 
16/11/12 11:44:00 [main]: INFO log.PerfLogger: </PERFLOG method=SparkRunJob start=1478968978989 end=1478969040067 duration=61078 from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor> 
16/11/12 11:44:00 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 2 (org.apache.hive.spark.client.BaseProtocol$CancelJob).

來源

2016-11-12 jolsen

我只是試圖更新cdh5.8.0在Cloudera的CDH快速啓動虛擬機相同的步驟，並得到了相同的結果。 – jolsen

我通過蜂巢源代碼挖定位日誌消息的源（例如，「作業尚未61S之後提交」）我看到輸出到控制檯。從代碼中，我可以找到一個屬性「hive.spark.job.monitor.timeout」，默認爲60秒，就是我的工作超時的確切時間，所以計算必須是正確的屬性。我再次嘗試我的工作，每次增加「hive.spark.job.monitor.timeout」，並在增加到「180s」後，我的工作在超時之前最終執行。問題解決了。

我不知道爲什麼我的工作應該長達3分鐘才能真正執行，這看起來像是一個非常長的延遲，但我會再次離開研究。

這是我最後的代碼，它的工作：

set mapred.job.queue.name=root.apps10; 
set spark.master=yarn-client; 
set hive.server2.enable.doAs=false; 
set hive.execution.engine=spark; 
set spark.eventLog.enabled=true; 
set spark.shuffle.blockTransferService=nio; 
set spark.eventLog.dir=hdfs://HDFSNode:8020/user/spark/applicationHistory; 
set spark.shuffle.service.enabled=true; 
set spark.dynamicAllocation.enabled=true; 
set hive.spark.job.monitor.timeout=180s; 

DROP TABLE IF EXISTS testhiveonspark.temptable2; 
CREATE TABLE testhiveonspark.temptable2 
STORED AS TEXTFILE 
AS SELECT num1, num2 FROM testhiveonspark.temptable1;

來源

2016-11-18 20:31:55 jolsen

Spark Spark上的配置單元失敗，直接SELECT，但SELECT GROUP BY成功

回答

相關問題