我在星火以下數據框和模式火花SQL轉換函數創建了NULLS列
val df = spark.read.options(Map("header"-> "true")).csv("path")
scala> df show()
+-------+-------+-----+
| user| topic| hits|
+-------+-------+-----+
| om| scala| 120|
| daniel| spark| 80|
|3754978| spark| 1|
+-------+-------+-----+
scala> df printSchema
root
|-- user: string (nullable = true)
|-- topic: string (nullable = true)
|-- hits: string (nullable = true)
我想改變列命中整數
我嘗試這樣做:
scala> df.createOrReplaceTempView("test")
val dfNew = spark.sql("select *, cast('hist' as integer) as hist2 from test")
scala> dfNew.printSchema
root
|-- user: string (nullable = true)
|-- topic: string (nullable = true)
|-- hits: string (nullable = true)
|-- hist2: integer (nullable = true)
但是當我打印數據框列HIST 2被用空值填充
scala> dfNew show()
+-------+-------+-----+-----+
| user| topic| hits|hist2|
+-------+-------+-----+-----+
| om| scala| 120| null|
| daniel| spark| 80| null|
|3754978| spark| 1| null|
+-------+-------+-----+-----+
我也試過這樣:
scala> val df2 = df.withColumn("hitsTmp",
df.hits.cast(IntegerType)).drop("hits"
).withColumnRenamed("hitsTmp", "hits")
,並得到這個:
<console>:26: error: value hits is not a member of org.apache.spark.sql.DataFram
e
也試過這樣:
scala> val df2 = df.selectExpr ("user","topic","cast(hits as int) hits")
and got this:
org.apache.spark.sql.AnalysisException: cannot resolve '`topic`' given input col
umns: [user, topic, hits]; line 1 pos 0;
'Project [user#0, 'topic, cast('hits as int) AS hits#22]
+- Relation[user#0, topic#1, hits#2] csv
與
scala> val df2 = df.selectExpr ("cast(hits as int) hits")
我得到SI類似錯誤。
任何幫助將不勝感激。我知道這個問題已經解決過,但我嘗試了3種不同的方法(這裏發表),沒有一個工作。
謝謝。
我使用的是2.1.0版本 –