2016-11-11 101 views
3

當我嘗試創建矢量變壓器輸出的標記點,我面臨着以下問題:如何將ML稀疏矢量類型的變量轉換爲MLlib稀疏矢量類型?

val realout = output.select("label","features").rdd.map(row => LabeledPoint 
    row.getAs[Double]("label"), 
row.getAs[org.apache.spark.mllib.linalg.SparseVector]("features") 
)) 

我得到的錯誤是:

enter [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 1 times, most recent failure: Lost task 0.0 in stage 13.0 (TID 13, localhost): java.lang.ClassCastException: org.apache.spark.ml.linalg.SparseVector cannot be cast to org.apache.spark.mllib.linalg.Vector 
[error]  at DataCleaning$$anonfun$1.apply(DataCleaning.scala:107 
[error]  at DataCleaning$$anonfun$1.apply(DataCleaning.scala:105) 
[error] 
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) 
[error] 
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462 
[error] 
atorg.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:213) 

我檢查提供的解決方案鏈路1如下面提及的解釋在火花2.0.0載體的轉化,但面對編譯錯誤,

object linalg is not a member of package org.apache.spark.ml 

請幫助。謝謝 !

回答

2

org.apache.spark.mllib.linalg.SparseVector中有一種靜態方法將新的內部類型轉換爲spark.mllib類型,稱爲fromML。它可以用於將ML稀疏矢量轉換爲MLlib稀疏矢量。請記住它只複製引用。

可以按如下方式使用它:

val realout : RDD[LabeledPoint] = features1.rdd.map(row => LabeledPoint(row.getAs[Double]("label"), 
    SparseVector.fromML(row.getAs[org.apache.spark.ml.linalg.SparseVector]("features")))) 

參考星火文檔:https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/mllib/linalg/SparseVector.html

附: - :本文檔直接到Java,但我的示例代碼在Scala中。但是,它沒有問題,因爲Scala與Java兼容。這意味着你可以在另一個方法中調用任何一種語言的方法。

+0

鏈接指向Java ..但你的回覆是在斯卡拉 – hshihab

+0

@hshihab自從Scala與Java兼容以來就沒問題了。所以你可以在這兩種語言中使用上面提到的方法。謝謝你的關心。 –

+0

斯卡拉文檔在這裏:https://spark.apache.org/docs/2.0.2/api/scala/index.html#org.apache.spark.mllib.linalg.SparseVector$ –