4
以前的問題詢問這個錯誤有答案,說你需要做的就是更新你的Spark版本。我剛剛刪除了我早期版本的Spark,並安裝了爲Hadoop 2.6.0構建的Spark 1.6.3。AttributeError:'SparkContext'對象沒有使用Spark 1.6的屬性'createDataFrame'
我嘗試這樣做:
s_df = sc.createDataFrame(pandas_df)
而得到這個錯誤:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-4e8b3fc80a02> in <module>()
1 #creating a spark dataframe from the pandas dataframe
----> 2 s_df = sc.createDataFrame(pandas_df)
AttributeError: 'SparkContext' object has no attribute 'createDataFrame'
有誰知道爲什麼嗎?我嘗試刪除並重新安裝相同的1.6版本,但它不適合我。
下面是我用得到我pyspark正常工作搞亂我的環境變量:
PATH="/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin"
export PATH
# Setting PATH for Python 2.7
# The orginal version is saved in .bash_profile.pysave
PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"
export PATH
# added by Anaconda installer
export PATH="/Users/pr/anaconda:$PATH"
# path to JAVA_HOME
export JAVA_HOME=$(/usr/libexec/java_home)
#Spark
export SPARK_HOME="/Users/pr/spark" #version 1.6
export PATH=$PATH:$SPARK_HOME/bin
export PYSPARK_SUBMIT_ARGS="--master local[2]"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
難道我也許需要單獨安裝Hadoop的?我跳過這一步,因爲我不需要它來運行代碼。