1
星火版本:2.1如何在Spark SQL中從列表創建數據框?
例如,在pyspark,我創建一個列表
test_list = [['Hello', 'world'], ['I', 'am', 'fine']]
那麼如何創建一個數據框形成test_list,在數據幀的類型是象下面這樣:
DataFrame[words: array<string>]
星火版本:2.1如何在Spark SQL中從列表創建數據框?
例如,在pyspark,我創建一個列表
test_list = [['Hello', 'world'], ['I', 'am', 'fine']]
那麼如何創建一個數據框形成test_list,在數據幀的類型是象下面這樣:
DataFrame[words: array<string>]
這裏是如何 -
from pyspark.sql.types import *
cSchema = StructType([StructField("WordList", ArrayType(StringType()))])
# notice extra square brackets around each element of list
test_list = [['Hello', 'world']], [['I', 'am', 'fine']]
df = spark.createDataFrame(test_list,schema=cSchema)
You can create a RDD first from the input and then convert to dataframe from the constructed RDD
<code>
import sqlContext.implicits._
val testList = Array(Array("Hello", "world"), Array("I", "am", "fine"))
// CREATE RDD
val testListRDD = sc.parallelize(testList)
val flatTestListRDD = testListRDD.flatMap(entry => entry)
// COnvert RDD to DF
val testListDF = flatTestListRDD.toDF
testListDF.show
</code>