2016-09-15 86 views
2

我在嘗試從RDD創建DataFrame時遇到了錯誤。
我的代碼:unbound方法createDataFrame()

from pyspark import SparkConf, SparkContext 
from pyspark import sql 


conf = SparkConf() 
conf.setMaster('local') 
conf.setAppName('Test') 
sc = SparkContext(conf = conf) 
print sc.version 

rdd = sc.parallelize([(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)]) 

df = sql.SQLContext.createDataFrame(rdd, ["id", "score"]).collect() 

print df 

錯誤:

df = sql.SQLContext.createDataFrame(rdd, ["id", "score"]).collect() 
TypeError: unbound method createDataFrame() must be called with SQLContext 
      instance as first argument (got RDD instance instead) 

我完成火花外殼相同的任務,其中一個直接的最後三行代碼將打印值。我主要懷疑導入語句,因爲這是IDE和Shell之間的區別。

回答

4

您需要使用SQLContext的實例。所以,你可以嘗試像以下:

sqlContext = sql.SQLContext(sc) 
df = sqlContext.createDataFrame(rdd, ["id", "score"]).collect() 

的更多細節pyspark documentation

相關問題