2017-07-31 97 views
0

我想創建並保存一張填充了隨機int s的表格。到目前爲止,一切都很順利,但我不明白我能夠如何將多維數組tmp轉換爲Dataframe,並在頂部定義架構。Scala - 將數組數據轉換爲表或數據框?

import org.apache.spark.sql.types.{ 
StructType, StructField, StringType, IntegerType, DoubleType} 
import org.apache.spark.sql.Row 

val schema = StructType(
StructField("rowId", IntegerType, true) :: 
StructField("t0_1", DoubleType, true) :: 
StructField("t0_2", DoubleType, true) ::  
StructField("t0_3", DoubleType, true) :: 
StructField("t0_4", DoubleType, true) :: 
StructField("t0_5", DoubleType, true) :: 
StructField("t0_6", DoubleType, true) :: 
StructField("t0_7", DoubleType, true) :: 
StructField("t0_8", DoubleType, true) :: 
StructField("t0_9", DoubleType, true) :: 
StructField("t0_10", DoubleType, true) :: Nil) 

val columnNo = 10; 
val rowNo = 50; 

var c = 0; 
var r = 0; 

val tmp = Array.ofDim[Double](10,rowNo) 

for (r <- 1 to rowNo){ 
for (c <- 1 to columnNo){ 
    val temp = new scala.util.Random 
    tmp(c-1)(r-1) = temp.nextDouble 
    println("Value of " + c + "/"+ r + ":" + tmp(c-1)(r-1)); 
} 
} 

val df = sc.parallelize(tmp).toDF 
df.show 
dataframe.show 

回答

1

您不能將一個數組Array轉換爲一個DataFrame,而是需要一個數組元組來處理數據類。這裏的變體基於對應於你想要的模式的案例類:

case class Record(
    rowID:Option[Int], 
    t0_1:Option[Double], 
    t0_2:Option[Double], 
    t0_3:Option[Double], 
    t0_4:Option[Double], 
    t0_5:Option[Double], 
    t0_6:Option[Double], 
    t0_7:Option[Double], 
    t0_8:Option[Double], 
    t0_9:Option[Double], 
    t0_10:Option[Double] 
) 

val rowNo = 50; 
val temp = new scala.util.Random 

val data = (1 to rowNo).map(r => 
Record(
    Some(r), 
    Some(temp.nextDouble), 
    Some(temp.nextDouble), 
    Some(temp.nextDouble), 
    Some(temp.nextDouble), 
    Some(temp.nextDouble), 
    Some(temp.nextDouble), 
    Some(temp.nextDouble), 
    Some(temp.nextDouble), 
    Some(temp.nextDouble), 
    Some(temp.nextDouble) 
) 
) 

val df = sc.parallelize(data).toDF 
+0

非常感謝!解決了我的問題,並大幅縮短我的代碼! –