2017-06-14 56 views
0

我是非常新的Spark機器學習(2天大)我在Spark中執行下面的代碼殼牌我試圖預測某個值,我看到在#1提供這個錯誤後,但我不能夠再修復我的代碼以適當的解決方案,以便張貼問題爲同一java.lang.IllegalArgumentException:需求失敗:列功能必須是類型org.apache.spark.ml.linalg.VectorUDT

輸入數據的道歉:

1.00,1.00,9.00 
1.00,2.00,10.00 
1.00,3.00,9.00 
1.00,4.00,9.00 
1.00,5.00,9.00 
1.00,6.00,9.45 
1.00,7.00,9.45 
1.00,8.00,9.45 
1.00,9.00,9.45 

代碼:

val df = spark.read.csv("/root/Predictiondata.csv").toDF("Userid", "Date", "Intime") 
import org.apache.spark.sql.types.DoubleType 
val featureDf = df.select(df("Userid").cast(DoubleType).as("Userid"),df("Date").cast(DoubleType).as("Date"),df("Intime").cast(DoubleType).as("Intime")).toDF() 
import org.apache.spark.mllib.linalg.Vectors 
import org.apache.spark.mllib.regression.LabeledPoint 
val data = featureDf.select("Userid","Date","Intime").map(r => LabeledPoint(r(0).toString.toDouble,Vectors.dense(r(1).toString.toDouble,r(2).toString.toDouble))).toDF() 
import org.apache.spark.ml.regression.LinearRegression 
val lr = new LinearRegression() 
val lrModel = lr.fit(data) 

錯誤:

scala> val lrModel = lr.fit(data) 
java.lang.IllegalArgumentException: requirement failed: Column features must be of type [email protected] but was actually [email protected] 
at scala.Predef$.require(Predef.scala:224) 
at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42) 
at org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:51) 
at org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:72) 
at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:122) 
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) 
at org.apache.spark.ml.Predictor.fit(Predictor.scala:90) 
... 48 elided 

任何幫助或建議是高度讚賞。

由於提前

回答

1

請使用星火2+與數據幀API連同VectorAssembler

像這樣(沒有測試過):

import spark.implicits._ 

val data = spark.read 
    .option("inferSchema", true) 
    .csv("/root/Predictiondata.csv") 
    .toDF("Userid", "Date", "Intime") 

val dataWithFeatures = new VectorAssembler() 
    .setInputCols(Array("Date", "Intime")) 
    .transform(data) 

val dataWithLabelFeatures = dataWithFeatures   
    .withColumn("label", $"Userid") 

val lrModel = new LinearRegression().fit(dataWithLabelFeatures) 

而且,看看Pipeline

+1

非常感謝您的幫助!...這使得一些修改之後的工作。 ...再次感謝您的幫助!!! – Bhavesh

0

如果您Spark是> 2.X進口

org.apache.spark.ml.linalg.VectorUDT 

,而不是

org.apache.spark.mllib.linalg.VectorUDT 
相關問題