2016-11-29 63 views
1

使用kbastani/spark-neo4jdocker-compose上的MacBook Pro(16GB MEM),我試圖分析strongly_connected_components我圖的的Neo4j-mazerunner,如何增加內存大小泊塢窗,compose.yml

我有一個約60,000個節點的圖形(n1:Node {id:1})-[r:NEXT {count:100}]->(n2:Node {id:2})

使用neo4j瀏覽器我設法得到pagerank處理回我的節點。

然而,當我嘗試運行更復雜的算法像strongly_connected_components,我得到以下錯誤:

mazerunner_1 | 16/11/29 14:58:01 ERROR Utils: Uncaught exception in thread SparkListenerBus 
mazerunner_1 | java.lang.OutOfMemoryError: Java heap space 
mazerunner_1 |  at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5$$anonfun$apply$9.apply(JobProgressListener.scala:200) 
mazerunner_1 |  at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5$$anonfun$apply$9.apply(JobProgressListener.scala:200) 
mazerunner_1 |  at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189) 
mazerunner_1 |  at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91) 
mazerunner_1 |  at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5.apply(JobProgressListener.scala:200) 
mazerunner_1 |  at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5.apply(JobProgressListener.scala:198) 
mazerunner_1 |  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) 
mazerunner_1 |  at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) 
mazerunner_1 |  at org.apache.spark.ui.jobs.JobProgressListener.onJobStart(JobProgressListener.scala:198) 
mazerunner_1 |  at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:34) 
mazerunner_1 |  at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) 
mazerunner_1 |  at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) 
mazerunner_1 |  at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53) 
mazerunner_1 |  at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36) 
mazerunner_1 |  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76) 
mazerunner_1 |  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) 
mazerunner_1 |  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) 
mazerunner_1 |  at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) 
mazerunner_1 |  at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60) 
mazerunner_1 | Exception in thread "SparkListenerBus" java.lang.OutOfMemoryError: Java heap space 
mazerunner_1 |  at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5$$anonfun$apply$9.apply(JobProgressListener.scala:200) 
mazerunner_1 |  at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5$$anonfun$apply$9.apply(JobProgressListener.scala:200) 
mazerunner_1 |  at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189) 
mazerunner_1 |  at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91) 
mazerunner_1 |  at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5.apply(JobProgressListener.scala:200) 
mazerunner_1 |  at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5.apply(JobProgressListener.scala:198) 
mazerunner_1 |  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) 
mazerunner_1 |  at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) 
mazerunner_1 |  at org.apache.spark.ui.jobs.JobProgressListener.onJobStart(JobProgressListener.scala:198) 
mazerunner_1 |  at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:34) 
mazerunner_1 |  at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) 
mazerunner_1 |  at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) 
mazerunner_1 |  at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53) 
mazerunner_1 |  at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36) 
mazerunner_1 |  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76) 
mazerunner_1 |  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) 
mazerunner_1 |  at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) 
mazerunner_1 |  at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) 
mazerunner_1 |  at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60) 

我試圖修改我的搬運工,compose.yml文件像這樣:

hdfs: 
    environment: 
    - "JAVA_OPTS=-Xmx5g" 
    image: sequenceiq/hadoop-docker:2.4.1 
    command: /etc/bootstrap.sh -d -bash 
mazerunner: 
    environment: 
    - "JAVA_OPTS=-Xmx5g" 
    image: kbastani/neo4j-graph-analytics:latest 
    links: 
    - hdfs 
graphdb: 
    environment: 
    - "JAVA_OPTS=-Xmx2g" 
    image: kbastani/docker-neo4j:latest 
    ports: 
    - "7474:7474" 
    - "1337:1337" 
    volumes: 
    - /opt/data 
    links: 
    - mazerunner 
    - hdfs 

沒有成功。我如何配置火花& hdfs使用最大可用內存?

回答

0

我的解決方案是增加虛擬機的內存大小。在我的Virtual Box UI上,我已經調整了「基本內存」大小滑塊。

enter image description here