2016-11-18 69 views
0

我正在嘗試使用Spark Standalone羣集的SimpleApp.java和一個worker。但我得到,每一個變化之後,下面的錯誤線程「main」中的異常org.apache.spark.SparkException:作業中止:Spark羣集往下看

Exception in thread "main" org.apache.spark.SparkException: Job aborted: Spark cluster looks down 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) 
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) 
    at scala.Option.foreach(Option.scala:236) 
    at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190) 
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) 
    at akka.actor.ActorCell.invoke(ActorCell.scala:456) 
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) 
    at akka.dispatch.Mailbox.run(Mailbox.scala:219) 
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) 
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 

我有以下設置

  • 獨立法師在本地主機上運行
  • 並補充工人 enter image description here

以下行來自主日誌

Spark Command: /usr/lib/jvm/java-7-oracle/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/* -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --host 192.168.97.128 --port 7077 --webui-port 8080 
======================================== 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
16/11/18 12:36:57 INFO Master: Started daemon with process name: [email protected] 
16/11/18 12:36:57 INFO SignalUtils: Registered signal handler for TERM 
16/11/18 12:36:57 INFO SignalUtils: Registered signal handler for HUP 
16/11/18 12:36:57 INFO SignalUtils: Registered signal handler for INT 
16/11/18 12:36:57 WARN MasterArguments: SPARK_MASTER_IP is deprecated, please use SPARK_MASTER_HOST 
16/11/18 12:36:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/11/18 12:36:58 INFO SecurityManager: Changing view acls to: vinay 
16/11/18 12:36:58 INFO SecurityManager: Changing modify acls to: vinay 
16/11/18 12:36:58 INFO SecurityManager: Changing view acls groups to: 
16/11/18 12:36:58 INFO SecurityManager: Changing modify acls groups to: 
16/11/18 12:36:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vinay); groups with view permissions: Set(); users with modify permissions: Set(vinay); groups with modify permissions: Set() 
16/11/18 12:36:59 INFO Utils: Successfully started service 'sparkMaster' on port 7077. 
16/11/18 12:36:59 INFO Master: Starting Spark master at spark://192.168.97.128:7077 
16/11/18 12:36:59 INFO Master: Running Spark version 2.0.1 
16/11/18 12:36:59 INFO Utils: Successfully started service 'MasterUI' on port 8080. 
16/11/18 12:36:59 INFO MasterWebUI: Bound MasterWebUI to 192.168.97.128, and started at http://192.168.97.128:8080 
16/11/18 12:36:59 INFO Utils: Successfully started service on port 6066. 
16/11/18 12:36:59 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066 
16/11/18 12:36:59 INFO Master: I have been elected leader! New state: ALIVE 
16/11/18 12:38:58 INFO Master: 192.168.97.128:34770 got disassociated, removing it. 

SimpleApp.java

public static void main(String[] args) { 
     System.out.println("hellow world!!"); 
    String logFile = "/usr/local/spark/README.md"; // Should be some file on your system 
    SparkConf conf = new SparkConf().setAppName("Simple Application"); 
    conf.setMaster("spark://192.168.97.128:7077"); 
    // conf.set(key, value) 
    //conf.setMaster("local[4]"); 
    JavaSparkContext sc = new JavaSparkContext(conf); 
    JavaRDD<String> logData = sc.textFile(logFile).cache(); 

    long numAs = logData.filter(new Function<String, Boolean>() { 
     public Boolean call(String s) { return s.contains("a"); } 
    }).count(); 

    long numBs = logData.filter(new Function<String, Boolean>() { 
     public Boolean call(String s) { return s.contains("b"); } 
    }).count(); 

    System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs); 

    sc.stop(); 
    } 

隨着修改配置條目spark-env.sh

SPARK_MASTER_HOST=192.168.97.128 
SPARK_MASTER_IP=192.168.97.128 
SPARK_LOCAL_IP=192.168.97.128 
SPARK_PUBLIC_DNS=192.168.97.128 
SPARK_WORKER_CORES=2 
SPARK_WORKER_MEMORY=2g 

而且environement變量以及

SPARK_LOCAL_IP=192.168.97.128 
SPARK_MASTER_IP=192.168.97.128 

更新1: 免費 - m輸出

[email protected]:/usr/local/spark/sbin$ free -m 
       total  used  free  shared buff/cache available 
Mem:   7875  4500   970   531  2404  2756 
Swap:   8082   6  8076 

更新2:程序

16/11/18 15:33:05 INFO slf4j.Slf4jLogger: Slf4jLogger started 
16/11/18 15:33:05 INFO Remoting: Starting remoting 
16/11/18 15:33:05 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:43526] 
16/11/18 15:33:05 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:43526] 
16/11/18 15:33:05 INFO spark.SparkEnv: Registering BlockManagerMaster 
16/11/18 15:33:06 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20161118153305-9cf5 
16/11/18 15:33:06 INFO storage.MemoryStore: MemoryStore started with capacity 1050.6 MB. 
16/11/18 15:33:06 INFO network.ConnectionManager: Bound socket to port 46557 with id = ConnectionManagerId(192.168.97.128,46557) 
16/11/18 15:33:06 INFO storage.BlockManagerMaster: Trying to register BlockManager 
16/11/18 15:33:06 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager 192.168.97.128:46557 with 1050.6 MB RAM 
16/11/18 15:33:06 INFO storage.BlockManagerMaster: Registered BlockManager 
16/11/18 15:33:06 INFO spark.HttpServer: Starting HTTP Server 
16/11/18 15:33:06 INFO server.Server: jetty-7.6.8.v20121106 
16/11/18 15:33:06 INFO server.AbstractConnector: Started [email protected]:33688 
16/11/18 15:33:06 INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.97.128:33688 
16/11/18 15:33:06 INFO spark.SparkEnv: Registering MapOutputTracker 
16/11/18 15:33:06 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-633ba798-963f-4b02-ab23-1edb4e677fde 
16/11/18 15:33:06 INFO spark.HttpServer: Starting HTTP Server 
16/11/18 15:33:06 INFO server.Server: jetty-7.6.8.v20121106 
16/11/18 15:33:06 INFO server.AbstractConnector: Started [email protected]:46433 
16/11/18 15:33:06 INFO server.Server: jetty-7.6.8.v20121106 
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null} 
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage,null} 
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/stage,null} 
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/pool,null} 
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages,null} 
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/environment,null} 
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/executors,null} 
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/metrics/json,null} 
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/static,null} 
16/11/18 15:33:06 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/,null} 
16/11/18 15:33:06 INFO server.AbstractConnector: Started [email protected]:4040 
16/11/18 15:33:06 INFO ui.SparkUI: Started Spark Web UI at http://192.168.97.128:4040 
16/11/18 15:33:06 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.97.128:7077... 
16/11/18 15:33:07 INFO storage.MemoryStore: ensureFreeSpace(32856) called with curMem=0, maxMem=1101633945 
16/11/18 15:33:07 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 32.1 KB, free 1050.6 MB) 
16/11/18 15:33:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
16/11/18 15:33:08 WARN snappy.LoadSnappy: Snappy native library not loaded 
16/11/18 15:33:08 INFO mapred.FileInputFormat: Total input paths to process : 1 
16/11/18 15:33:08 INFO spark.SparkContext: Starting job: count at SimpleApp.java:20 
16/11/18 15:33:08 INFO scheduler.DAGScheduler: Got job 0 (count at SimpleApp.java:20) with 2 output partitions (allowLocal=false) 
16/11/18 15:33:08 INFO scheduler.DAGScheduler: Final stage: Stage 0 (count at SimpleApp.java:20) 
16/11/18 15:33:08 INFO scheduler.DAGScheduler: Parents of final stage: List() 
16/11/18 15:33:08 INFO scheduler.DAGScheduler: Missing parents: List() 
16/11/18 15:33:08 INFO scheduler.DAGScheduler: Submitting Stage 0 (FilteredRDD[2] at filter at SimpleApp.java:18), which has no missing parents 
16/11/18 15:33:08 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (FilteredRDD[2] at filter at SimpleApp.java:18) 
16/11/18 15:33:08 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 
16/11/18 15:33:23 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 
16/11/18 15:33:26 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.97.128:7077... 
16/11/18 15:33:38 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 
16/11/18 15:33:46 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.97.128:7077... 
16/11/18 15:33:53 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 
16/11/18 15:34:06 ERROR client.AppClient$ClientActor: All masters are unresponsive! Giving up. 
16/11/18 15:34:06 ERROR cluster.SparkDeploySchedulerBackend: Spark cluster looks dead, giving up. 
16/11/18 15:34:06 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/11/18 15:34:06 INFO scheduler.DAGScheduler: Failed to run count at SimpleApp.java:20 
Exception in thread "main" org.apache.spark.SparkException: Job aborted: Spark cluster looks down 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) 
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) 
    at scala.Option.foreach(Option.scala:236) 
    at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604) 
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190) 
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) 
    at akka.actor.ActorCell.invoke(ActorCell.scala:456) 
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) 
    at akka.dispatch.Mailbox.run(Mailbox.scala:219) 
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) 
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 
+0

可以打印「免費-m」的輸出 –

+0

添加的程序也輸出。 – vinay

回答

2

空閒內存的 輸出爲970mb,但你擁有2GB配置。試着給人SPARK_WORKER_MEMORY價值爲500MB,然後再試一次

希望這有助於

+0

仍然沒有工作,也嘗試了100MB。 – vinay

+0

看到這行WARN scheduler.TaskSchedulerImpl:初始作業沒有接受任何資源;檢查您的集羣UI以確保工作人員已註冊並具有足夠的內存。它似乎是內存問題 –

+0

你可以發佈更新的spark-env.sh –

相關問題