2016-03-04 59 views
0

在我的應用程序中,Java spark的上下文是使用不可用的主URL創建的(您可以假定主服務器已關閉以進行維護)。當創建Java火花上下文時,它導致停止運行具有JVM退出代碼50的火花驅動程序的JVM。Apache Spark在主服務器不可用時停止JVM

當我檢查日誌時,發現SparkUncaughtExceptionHandler調用System.exit。我的程序應該永遠運行。我應該如何解決這個問題?

我試着在火花版本這種情況下1.4.1和1.6.0

我的代碼如下

package test.mains; 

import org.apache.spark.SparkConf; 
import org.apache.spark.api.java.JavaSparkContext; 

public class CheckJavaSparkContext { 

    /** 
    * @param args the command line arguments 
    */ 
    public static void main(String[] args) { 

     SparkConf conf = new SparkConf(); 
     conf.setAppName("test"); 
     conf.setMaster("spark://sunshine:7077"); 

     try { 
      new JavaSparkContext(conf); 
     } catch (Throwable e) { 
      System.out.println("Caught an exception : " + e.getMessage()); 
      //e.printStackTrace(); 
     } 

     System.out.println("Waiting to complete..."); 
     while (true) { 
     } 
    } 

} 

部分輸出日誌

16/03/04 18:02:24 INFO SparkDeploySchedulerBackend: Shutting down all executors 
16/03/04 18:02:24 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 
16/03/04 18:02:24 WARN AppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master 
16/03/04 18:02:24 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main] 
java.lang.InterruptedException 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039) 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) 
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) 
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) 
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) 
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) 
    at scala.concurrent.Await$.result(package.scala:107) 
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) 
    at org.apache.spark.deploy.client.AppClient.stop(AppClient.scala:290) 
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.org$apache$spark$scheduler$cluster$SparkDeploySchedulerBackend$$stop(SparkDeploySchedulerBackend.scala:198) 
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.stop(SparkDeploySchedulerBackend.scala:101) 
    at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:446) 
    at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1582) 
    at org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1731) 
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1229) 
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1730) 
    at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:127) 
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264) 
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:134) 
    at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1163) 
    at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:129) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
16/03/04 18:02:24 INFO DiskBlockManager: Shutdown hook called 
16/03/04 18:02:24 INFO ShutdownHookManager: Shutdown hook called 
16/03/04 18:02:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea68a0fa-4f0d-4dbb-8407-cce90ef78a52 
16/03/04 18:02:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea68a0fa-4f0d-4dbb-8407-cce90ef78a52/userFiles-db548748-a55c-4406-adcb-c09e63b118bd 
Java Result: 50 

回答

0

的給出。如果應用程序是高手自行下載應用程序將嘗試連接到主three times20 second timeout。它看起來像這些參數是硬編碼的,不可配置。如果應用程序無法連接,那麼您再也無法嘗試重新提交應用程序。

這就是爲什麼您應該在高可用性模式下配置羣集的原因。火花Standalone支持兩種不同的模式:

其中第二選項應該是適用於生產和有用在所描述的場景。

+0

退出代碼爲50時調用System.exit的原因是什麼?在第三方API代碼中做這樣的事情是不可接受的。當我檢查SparkUncaughtExceptionHandler.scala代碼時,它有這個糟糕的JVM終止代碼。 – era