2017-06-13 210 views
0

我一直在使用spark 2.0.1,但試圖升級到更新的版本,即2.1.1通過下載tar文件到我的本地和更改路徑。奇怪的錯誤初始化sparkContext python

但是,現在當我嘗試運行任何程序時,它在初始化sparkContext時失敗。即

sc = SparkContext() 

,我試圖運行整個示例代碼:

 import os 
    os.environ['SPARK_HOME']="/opt/apps/spark-2.1.1-bin-hadoop2.7/" 

    from pyspark import SparkContext 
    from pyspark.sql import * 
    sc = SparkContext() 

    sqlContext = SQLContext(sc) 

    df_tract_alpha= sqlContext.read.parquet("tract_alpha.parquet") 
    print (df_tract_alpha.count()) 

我得到的例外是在開始時本身即:我不是合格的Ubuntu

 

    Traceback (most recent call last): 
     File "/home/vna/scripts/global_score_pipeline/test_code_here.py", line 47, in 
     sc = SparkContext() 
     File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/pyspark/context.py", line 118, in __init__ 
     conf, jsc, profiler_cls) 
     File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/pyspark/context.py", line 182, in _do_init 
     self._jsc = jsc or self._initialize_context(self._conf._jconf) 
     File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/pyspark/context.py", line 249, in _initialize_context 
     return self._jvm.JavaSparkContext(jconf) 
     File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1401, in __call__ 
     File "/opt/apps/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value 
    py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. 
    : java.lang.NumberFormatException: For input string: "Ubuntu" 
     at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 

任何地方在我的變量或我的ENV變量以及..

我也試過改變sc = SparkConte xt(master ='local'),但問題是一樣的。

請確認這個問題

編輯幫助:火花defaults.conf的內容

 

    spark.master      spark://master:7077 
    # spark.eventLog.enabled   true 
    # spark.eventLog.dir    hdfs://namenode:8021/directory 
    spark.serializer     org.apache.spark.serializer.KryoSerializer 
    spark.driver.memory    8g 
    spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" 
    spark.driver.extraClassPath /opt/apps/spark-2.1.1-bin-hadoop2.7/jars/mysql-connector-java-5.1.35-bin.jar 
    spark.executor.extraClassPath /opt/apps/spark-2.1.1-bin-hadoop2.7/jars/mysql-connector-java-5.1.35-bin.jar 

回答

0

你檢查你的配置文件(例如spark-defaults.conf)?這可能是期望整數的字段的解析錯誤。例如,如果您嘗試設置spark.executor.cores Ubuntu,則可能會發生該異常。

+0

我檢查了我的配置。他們似乎很好,現在在問題中增加了內容。我甚至沒有使用火花執行器內核。 – Viv

+0

即使是grep -R「Ubuntu」。在火花文件夾中沒有產生任何結果 – Viv

+0

奇怪。我可能會嘗試使用命令行shell工具來查看是否可以打開上下文。有時候scala('spark-shell')會給出更好的錯誤信息; pyspark錯誤信息往往會被py4j接口遮蔽。 – santon