1
有是受以下代碼生成可能的錯誤:火花上數據幀初始化2.0可能的錯誤
_struct = [
types.StructField('string_field', types.StringType(), True),
types.StructField('long_field', types.LongType(), True),
types.StructField('double_field', types.DoubleType(), True)
]
_rdd = sc.parallelize([Row(string_field='1', long_field=1, double_field=1.1)])
_schema = types.StructType(_struct)
_df = sqlContext.createDataFrame(_rdd, schema=_schema)
_df.take(1)
預期的輸出是一個與RDD 1行應該被創建。
但與當前的行爲我收到以下錯誤:
DoubleType can not accept object '1' in type <type 'str'>
PS:我使用的火花2.0彙編斯卡拉2.10
編輯
得益於回答者的建議,我現在可以正確理解這一點。爲了簡化,請確保結構已排序。以下代碼解釋了這一點:
# This doesn't work:
_struct = [
SparkTypes.StructField('string_field', SparkTypes.StringType(), True),
SparkTypes.StructField('long_field', SparkTypes.LongType(), True),
SparkTypes.StructField('double_field', SparkTypes.DoubleType(), True)
]
_rdd = sc.parallelize([Row(string_field='1', long_field=1, double_field=1.1)])
# But this will work, since schema is sorted:
_struct = sorted([
SparkTypes.StructField('string_field', SparkTypes.StringType(), True),
SparkTypes.StructField('long_field', SparkTypes.LongType(), True),
SparkTypes.StructField('double_field', SparkTypes.DoubleType(), True)
], key=lambda x: x.name)
params = {'string_field':'1', 'long_field':1, 'double_field':1.1}
_rdd = sc.parallelize([Row(**params)])
_schema = SparkTypes.StructType(_struct)
_df = sqlContext.createDataFrame(_rdd, schema=_schema)
_df.take(1)
_schema = SparkTypes.StructType(_struct)
_df = sqlContext.createDataFrame(_rdd, schema=_schema)
_df.take(1)
你是指scala 2.10嗎? – eliasah