2016-02-05 122 views
0

我有一個帶有3名工人的Spark獨立安裝程序(v 1.4.1)。Apache Spark工作者執行程序退出退出狀態1

我有一個應用程序,從卡夫卡主題讀取流詳細說明數據並將其存儲在另一個卡夫卡主題。

昨晚,申請下降,所有工人都摔倒了。

工人的日誌報告如下所示:

16/02/04 21:02:10 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=52180" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54330" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://[email protected]:52180/user/CoarseGrainedScheduler" "--executor-id" "24279" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160201182749-0007" "--worker-url" "akka.tcp://[email protected]:57853/user/Worker" 
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24279/stdout with daily rolling 
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24279/stderr with daily rolling 
16/02/04 21:02:10 INFO Worker: Executor app-20160129184621-0001/1430 finished with state EXITED message Command exited with code 1 exitStatus 1 
16/02/04 21:02:10 INFO Worker: Asked to launch executor app-20160129184621-0001/1431 for stream-elaboration 
16/02/04 21:02:10 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=57297" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54326" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://[email protected]:57297/user/CoarseGrainedScheduler" "--executor-id" "1431" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160129184621-0001" "--worker-url" "akka.tcp://[email protected]:57853/user/Worker" 
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1431/stdout with daily rolling 
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1431/stderr with daily rolling 
16/02/04 21:02:11 INFO Worker: Executor app-20160201182749-0007/24279 finished with state EXITED message Command exited with code 1 exitStatus 1 
16/02/04 21:02:11 INFO Worker: Asked to launch executor app-20160201182749-0007/24280 for stream-elaboration 
16/02/04 21:02:11 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=52180" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54330" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://[email protected]:52180/user/CoarseGrainedScheduler" "--executor-id" "24280" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160201182749-0007" "--worker-url" "akka.tcp://[email protected]:57853/user/Worker" 
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24280/stdout with daily rolling 
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24280/stderr with daily rolling 
16/02/04 21:02:11 INFO Worker: Executor app-20160129184621-0001/1431 finished with state EXITED message Command exited with code 1 exitStatus 1 
16/02/04 21:02:11 INFO Worker: Asked to launch executor app-20160129184621-0001/1432 for stream-elaboration 
16/02/04 21:02:11 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=57297" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54326" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://[email protected]:57297/user/CoarseGrainedScheduler" "--executor-id" "1432" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160129184621-0001" "--worker-url" "akka.tcp://[email protected]:57853/user/Worker" 
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1432/stdout with daily rolling 
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1432/stderr with daily rolling 
16/02/04 21:02:11 INFO Worker: Executor app-20160201182749-0007/24280 finished with state EXITED message Command exited with code 1 exitStatus 1 
16/02/04 21:02:11 INFO Worker: Asked to launch executor app-20160201182749-0007/24281 for stream-elaboration 

在日誌的末尾:

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp291507283-42" 
Exception in thread "qtp291507283-37" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "ExecutorRunner for app-20160201182749-0007/29488" java.lang.OutOfMemoryError: GC overhead limit exceeded 

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkWorker-scheduler-1" 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "qtp291507283-38" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "JMX server connection timeout 81" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "JMX server connection timeout 81" 

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkWorker-10" 

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp291507283-40" 
Exception in thread "qtp291507283-35" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 

Exception in thread "qtp291507283-39" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "qtp291507283-41" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "qtp291507283-36" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)" 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded 
Exception in thread "RMI TCP Connection(idle)" 

.... 

運行:

ps aux | grep "worker" 

過程仍處於活動狀態,但我在sparkUI上看不到它。

爲什麼工作人員執行程序會頻繁重啓?

回答

1

日誌顯示多個java.lang.OutOfMemoryError: GC overhead limit exceeded消息,這意味着您的執行程序會拋出導致它們退出的錯誤。

此錯誤表示您的程序花費太多時間運行GC(請參閱更多詳細信息here)。要解決這一點 - 你可以試試這些路徑之一:

  • 蠻力的辦法就是通過增加-XX:-UseGCOverheadLimit你的執行者JVM選項來禁用該安全性,但它可能會離開你的應用程序做的大多是GC,因此運行非常緩慢
  • 分析你的工作的內存使用和優化 - 你的代碼可能會被消耗比需要更多的內存,迫使GC悠着點
  • 優化內存設置 - 例如,如果你可以增加堆空間的執行者,氣壓可能會降低
0

您實質上內存不足以順利運行此過程。想到的選項:

  1. 指定更多的記憶像你提到的,嘗試之間的東西像-Xmx512m之間。
  2. 調試您的代碼以查找內存泄漏的可能性。

爲什麼工作者執行器會頻繁重啓?

您正在使用的是建立在SPARK樂馳流,所以它享有相同的容錯的工作節點,這意味着如果工人下降是由於在你的情況下,星火引擎將嘗試重新啓動一些意外的錯誤它。