這是使用UDF的簡單解決方案,但您需要手動創建列表。
//dataframe with column age
val df = spark.sparkContext.parallelize(Seq("-1", "12", "18", "28", "38", "38", "388", "3", "41")).toDF("Age")
val updateUDF = udf((age : String) => {
val range = Seq(
(-1, 12, "(-1 - 12)"),
(12, 17, "(12 - 17)"),
(17, 24, "(17 - 24)"),
(24, 34, "(24 - 34)"),
(34, 44, "(34 - 44)"),
(44, 54, "(44 - 54)"),
(54, 64, "(54 - 64)"),
(64, 10, "(64 - 100)"),
(100, 1000, "(100- 1000)")
)
range.map(value => {
if (age.toInt >= value._1 && age.toInt < value._2) value._3
else ""
}).filter(!_.equals(""))(0)
})
df.withColumn("Age-Range", updateUDF($"Age")).show(false)
Here is the output:
+---+-----------+
|Age|Age-Range |
+---+-----------+
|-1 |(-1 - 12) |
|12 |(12 - 17) |
|18 |(17 - 24) |
|28 |(24 - 34) |
|38 |(34 - 44) |
|38 |(34 - 44) |
|388|(100- 1000)|
|3 |(-1 - 12) |
|41 |(34 - 44) |
+---+-----------+
我希望這有助於!
非常感謝丹尼爾!!! ....它爲我工作!!! ... – Bhavesh