阿帕奇星火的部署問題（集羣模式）與蜂巢

編輯：阿帕奇星火的部署問題（集羣模式）與蜂巢

我正在開發一個應用程序的Spark從多個結構化模式讀取數據，我想聚集來自這些模式的信息。當我在本地運行時，我的應用程序運行良好。但是當我在羣集上運行它時，我遇到了配置問題（很可能是使用hive-site.xml）或使用submit-command參數。我查找了其他相關帖子，但找不到適用於我的場景的解決方案。我已經提到了我嘗試過的命令以及我在下面詳細介紹的錯誤。我是Spark新手，可能會錯過一些微不足道的東西，但可以提供更多信息來支持我的問題。

原題：

我一直運行在HDP2.3組件捆綁6個節點的Hadoop集羣我火花的應用程序。

這裏有可能在暗示解決方案是爲你們有用成分的信息：

集羣信息：6個節點的集羣：

128GB RAM 24芯 8TB硬盤

應用中使用的部件

HDP - 2.3

星火 - 1.3.1

$的Hadoop版本：

Hadoop 2.7.1.2.3.0.0-2557 
Subversion [email protected]:hortonworks/hadoop.git -r 9f17d40a0f2046d217b2bff90ad6e2fc7e41f5e1 
Compiled by jenkins on 2015-07-14T13:08Z 
Compiled with protoc 2.5.0 
From source with checksum 54f9bbb4492f92975e84e390599b881d

場景：

我試圖使用SparkContext和HiveContext以充分利用spark數據結構li的實時查詢的方式ke數據幀。在我的應用程序中使用的依賴關係：

<dependency> <!-- Spark dependency --> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_2.10</artifactId> 
     <version>1.3.1</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-sql_2.10</artifactId> 
     <version>1.3.1</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-hive_2.10</artifactId> 
     <version>1.3.1</version> 
    </dependency> 
    <dependency> 
     <groupId>com.databricks</groupId> 
     <artifactId>spark-csv_2.10</artifactId> 
     <version>1.4.0</version> 
    </dependency>

下面是提交命令和coresponding錯誤日誌是我得到：

提交COMMAND1：

spark-submit --class working.path.to.Main \ 
    --master yarn \ 
    --deploy-mode cluster \ 
    --num-executors 17 \ 
    --executor-cores 8 \ 
    --executor-memory 25g \ 
    --driver-memory 25g \ 
    --num-executors 5 \ 
    application-with-all-dependencies.jar

錯誤LOG1：

User class threw exception: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

提交命令2：

spark-submit --class working.path.to.Main \ 
    --master yarn \ 
    --deploy-mode cluster \ 
    --num-executors 17 \ 
    --executor-cores 8 \ 
    --executor-memory 25g \ 
    --driver-memory 25g \ 
    --num-executors 5 \ 
    --files /etc/hive/conf/hive-site.xml \ 
    application-with-all-dependencies.jar

錯誤的log 2：

User class threw exception: java.lang.NumberFormatException: For input string: "5s"

由於我沒有管理權限，我不能修改配置。那麼，我可以聯繫IT工程師並進行更改，但我正在尋找解決方案，儘可能減少配置文件中的更改！

配置更改建議here。

然後我嘗試傳遞各種jar文件作爲參數在其他論壇建議。

提交指令代碼：

spark-submit --class working.path.to.Main \ 
    --master yarn \ 
    --deploy-mode cluster \ 
    --num-executors 17 \ 
    --executor-cores 8 \ 
    --executor-memory 25g \ 
    --driver-memory 25g \ 
    --num-executors 5 \ 
    --jars /usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-core-3.2.10.jar,/usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-rdbms-3.2.9.jar \ 
    --files /etc/hive/conf/hive-site.xml \ 
    application-with-all-dependencies.jar

錯誤LOG3：

User class threw exception: java.lang.NumberFormatException: For input string: "5s"

我不明白用下面的命令發生了什麼事，不能分析錯誤日誌。

提交Command4：

spark-submit --class working.path.to.Main \ 
    --master yarn \ 
    --deploy-mode cluster \ 
    --num-executors 17 \ 
    --executor-cores 8 \ 
    --executor-memory 25g \ 
    --driver-memory 25g \ 
    --num-executors 5 \ 
    --jars /usr/hdp/2.3.0.0-2557/spark/lib/*.jar \ 
    --files /etc/hive/conf/hive-site.xml \ 
    application-with-all-dependencies.jar

提交LOG4：

Application application_1461686223085_0014 failed 2 times due to AM Container for appattempt_1461686223085_0014_000002 exited with exitCode: 10 
For more detailed output, check application tracking page:http://cluster-host:XXXX/cluster/app/application_1461686223085_0014Then, click on links to logs of each attempt. 
Diagnostics: Exception from container-launch. 
Container id: container_e10_1461686223085_0014_02_000001 
Exit code: 10 
Stack trace: ExitCodeException exitCode=10: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) 
at org.apache.hadoop.util.Shell.run(Shell.java:456) 
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) 
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745) 
Container exited with a non-zero exit code 10 
Failing this attempt. Failing the application.

任何其他可能的選擇嗎？任何形式的幫助將不勝感激。如果您需要其他信息，請告訴我。

謝謝。

來源

2016-04-29 accssharma

什麼是雞尾酒問題？請每個問題發佈一個問題！這是不可接受的 – eliasah

親愛的@eliasah，我明白我的問題很長。但是如果你看看這個問題的組織結構，我試圖問我的提交命令有什麼問題，它會拋出與配置單元相關的錯誤。我認爲提供更多的信息，同時提出問題對於幫助他們理解情景的讀者是有益的。對不起，如果你不喜歡它，但我的意圖不是把它作爲一個雞尾酒問題。那麼，我的問題仍然是相同的，因爲我已經試過，並沒有找到我的答案。 – accssharma

你可能已經建議我在投票前提出我的問題的適當方式，因爲我不害怕改善我的垮臺，並且打算爲我不知道的問題尋求解決方案。 – accssharma

here解釋的解決方案適用於我的案例。有兩個位置hive-site.xml駐留，可能會令人困惑。使用--files /usr/hdp/current/spark-client/conf/hive-site.xml而不是--files /etc/hive/conf/hive-site.xml。我不必爲我的配置添加罐子。希望這能幫助有人解決類似的問題。謝謝。

來源

2016-04-29 21:01:31 accssharma

阿帕奇星火的部署問題（集羣模式）與蜂巢

回答

相關問題