火花數據框中：如何通過每一個ID爆炸一IntegerType列

val schema = StructType(Array(StructField("id", IntegerType, false),StructField("num", IntegerType, false)))

我想從0生成連續數NUM 我不知道該怎麼辦.. 感謝火花數據框中：如何通過每一個ID爆炸一IntegerType列

data and result here !!!

來源

2017-04-13 Liangpi

添加的代碼格式 – anoop4real

您可以使用UDF和explode功能：

import org.apache.spark.sql.functions.{udf, explode} 

val range = udf((i: Int) => (0 to i).toArray) 
df.withColumn("num", explode(range($"num")))

來源

2017-04-13 08:49:55 user7860973

非常感謝 – Liangpi

嘗試DataFrame.explode：

df.explode(col("id"), col("num")) {case row: Row => 
    val id = row(0).asInstanceOf[Int] 
    val num = row(1).asInstanceOf[Int] 
    (0 to num).map((id, _)) 
}

還是在RDD土地，你可以使用flatmap這個：

df.rdd.flatMap(x => (0 to x._2).map((x._1, _)))

來源

2017-04-13 08:26:11 ImDarrenG

非常感謝 – Liangpi

火花數據框中：如何通過每一個ID爆炸一IntegerType列

回答

相關問題