有關Spark 2.0支持多個SparkContext
s的大量討論。支持它的配置變量已經存在了很長時間,但實際上並不有效。在Spark 2.0中實際取消了單個SparkContext的限制嗎?
在$SPARK_HOME/conf/spark-defaults.conf
:
spark.driver.allowMultipleContexts true
我們來驗證一下屬性被確認:
scala> println(s"allowMultiCtx = ${sc.getConf.get("spark.driver.allowMultipleContexts")}")
allowMultiCtx = true
這裏是它的一個小的POC程序:
import org.apache.spark._
import org.apache.spark.streaming._
println(s"allowMultiCtx = ${sc.getConf.get("spark.driver.allowMultipleContexts")}")
def createAndStartFileStream(dir: String) = {
val sc = new SparkContext("local[1]",s"Spark-$dir" /*,conf*/)
val ssc = new StreamingContext(sc, Seconds(4))
val dstream = ssc.textFileStream(dir)
val valuesCounts = dstream.countByValue()
ssc.start
ssc.awaitTermination
}
val dirs = Seq("data10m", "data50m", "dataSmall").map { d =>
s"/shared/demo/data/$d"
}
dirs.foreach{ d =>
createAndStartFileStream(d)
}
但是嘗試使用當該能力不成功:
16/08/14 11:38:55 WARN SparkContext: Multiple running SparkContexts detected
in the same JVM!
org.apache.spark.SparkException: Only one SparkContext may be running in
this JVM (see SPARK-2243). To ignore this error,
set spark.driver.allowMultipleContexts = true.
The currently running SparkContext was created at:
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:814)
org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
任何人都有關於如何使用多個上下文的見解?
SPARK-2243的分辨率是不會修復的,所以它看起來像答案是「不,它不會」。 – 2016-08-14 19:28:55
@LostInOverflow請創建一個答案 - 你應該得到認可。我已經添加了Sean Owen的具體細節,他是這個東西的大佬。 – javadba