我試圖從DataFrame
取得列並將其轉換爲RDD[Vector]
。帶點火花的列名稱
的問題是,我已經列在他們的名字一個「點」爲以下數據集:
"col0.1","col1.2","col2.3","col3.4"
1,2,3,4
10,12,15,3
1,12,10,5
這是我在做什麼:
val df = spark.read.format("csv").options(Map("header" -> "true", "inferSchema" -> "true")).load("C:/Users/mhattabi/Desktop/donnee/test.txt")
val column=df.columns.map(c=>s"`${c}`")
val rows = new VectorAssembler().setInputCols(column).setOutputCol("vs")
.transform(df)
.select("vs")
.rdd
val data =rows.map(_.getAs[org.apache.spark.ml.linalg.Vector](0))
.map(org.apache.spark.mllib.linalg.Vectors.fromML)
val mat: RowMatrix = new RowMatrix(data)
//// Compute the top 5 singular values and corresponding singular vectors.
val svd: SingularValueDecomposition[RowMatrix, Matrix] = mat.computeSVD(mat.numCols().toInt, computeU = true)
val U: RowMatrix = svd.U // The U factor is a RowMatrix.
val s: Vector = svd.s // The singular values are stored in a local dense vector.
val V: Matrix = svd.V // The V factor is a local dense matrix.
println(V)
請任何幫助讓我考慮他們的名字中有圓點的列。謝謝
你嘗試改變列名? –
@RameshMaharjan它與列的無縫工作,但我需要解決它與點,任何幫助thnx –
我會建議是保存模式的點,改變列名稱,並在完成後,新列名改回來帶點。這不正常嗎? –