火花SQL轉換函數創建了NULLS列

我在星火以下數據框和模式火花SQL轉換函數創建了NULLS列

val df = spark.read.options(Map("header"-> "true")).csv("path") 

scala> df show() 

+-------+-------+-----+ 
| user| topic| hits| 
+-------+-------+-----+ 
|  om| scala| 120| 
| daniel| spark| 80| 
|3754978| spark| 1| 
+-------+-------+-----+ 

scala> df printSchema 

root 
|-- user: string (nullable = true) 
|-- topic: string (nullable = true) 
|-- hits: string (nullable = true)

我想改變列命中整數

我嘗試這樣做：

scala> df.createOrReplaceTempView("test") 
    val dfNew = spark.sql("select *, cast('hist' as integer) as hist2 from test") 

scala> dfNew.printSchema 

root 
|-- user: string (nullable = true) 
|-- topic: string (nullable = true) 
|-- hits: string (nullable = true) 
|-- hist2: integer (nullable = true)

但是當我打印數據框列HIST 2被用空值填充

scala> dfNew show() 

+-------+-------+-----+-----+ 
| user| topic| hits|hist2| 
+-------+-------+-----+-----+ 
|  om| scala| 120| null| 
| daniel| spark| 80| null| 
|3754978| spark| 1| null| 
+-------+-------+-----+-----+

我也試過這樣：

scala> val df2 = df.withColumn("hitsTmp", 
df.hits.cast(IntegerType)).drop("hits" 
).withColumnRenamed("hitsTmp", "hits")

，並得到這個：

<console>:26: error: value hits is not a member of org.apache.spark.sql.DataFram 
e

也試過這樣：

scala> val df2 = df.selectExpr ("user","topic","cast(hits as int) hits") 

and got this: 
org.apache.spark.sql.AnalysisException: cannot resolve '`topic`' given input col 
umns: [user, topic, hits]; line 1 pos 0; 
'Project [user#0, 'topic, cast('hits as int) AS hits#22] 
+- Relation[user#0, topic#1, hits#2] csv

與

scala> val df2 = df.selectExpr ("cast(hits as int) hits")

我得到SI類似錯誤。

任何幫助將不勝感激。我知道這個問題已經解決過，但我嘗試了3種不同的方法（這裏發表），沒有一個工作。

謝謝。

來源

2017-06-20 ceteris_paribus

我使用的是2.1.0版本 –

您可以在以下幾個方面

df.withColumn("hits", df("hits").cast("integer"))

或者

轉換爲整數類型的列

data.withColumn("hitsTmp", 
     data("hits").cast(IntegerType)).drop("hits"). 
     withColumnRenamed("hitsTmp", "hits")

或者

data.selectExpr ("user","topic","cast(hits as int) hits")

來源

2017-06-21 01:59:53

我試着他們都沒有成功。 –

scala> val df2 = df.withColumn（「hits」，df（「hits」）。cast（「integer」）） org.apache.spark.sql.AnalysisException：無法解析（用戶，主題，點擊）; 在org.apache.spark.sql.Dataset $$ anonfun $決心$ 1.適用（Dataset.scala：219）在org.apache.spark.sql.Dataset $$ anonfun $ $解析1.適用（Dataset.scala： 219） at org.apache.spark.sql.Dataset.resolve（Dataset.scala：218） at org.apache.spark.sql.Dataset.col（Option 219）（scala.Option.getOrElse（Option.scala：121） Dataset.scala：1073） at org.apache.spark.sql.Dataset.apply（Dataset.scala：1059） ... 48 elided –

scala> val df2 = df.selectExpr（「user」，「topic」，「鑄態（如命中INT）命中」） org.apache.spark.sql.AnalysisException：無法解析 ''topic'' 給定的輸入欄 UMNS：用戶，主題，命中];第1行pos 0; 'Project [user＃0，'topic，cast（'hit as int）AS hit＃40] + - Relation [user＃0，topic＃1，hits＃2] csv –

火花SQL轉換函數創建了NULLS列

回答

相關問題