我的數據幀是這樣的:如何獲取每個列表的第一行數據?
+------------------------+----------------------------------------+
|ID |probability |
+------------------------+----------------------------------------+
|583190715ccb64f503a|[0.49128147201958017,0.5087185279804199]|
|58326da75fc764ad200|[0.42143416087939345,0.5785658391206066]|
|583270ff17c76455610|[0.3949217100212508,0.6050782899787492] |
|583287c97ec7641b2d4|[0.4965059792664432,0.5034940207335569] |
|5832d7e279c764f52e4|[0.49128147201958017,0.5087185279804199]|
|5832e5023ec76406760|[0.4775830044196701,0.52241699558033] |
|5832f88859cb64960ea|[0.4360509428173421,0.563949057182658] |
|58332e6238c7643e6a7|[0.48730029128352853,0.5126997087164714]|
,我得到概率的使用
val proVal = Data.select("probability").rdd.map(r => r(0)).collect()
proVal.foreach(println)
結果列是:
[0.49128147201958017,0.5087185279804199]
[0.42143416087939345,0.5785658391206066]
[0.3949217100212508,0.6050782899787492]
[0.4965059792664432,0.5034940207335569]
[0.49128147201958017,0.5087185279804199]
[0.4775830044196701,0.52241699558033]
[0.4360509428173421,0.563949057182658]
[0.48730029128352853,0.5126997087164714]
,但我想要得到的數據的第一列對於每一行,如下所示:
0.49128147201958017
0.42143416087939345
0.3949217100212508
0.4965059792664432
0.49128147201958017
0.4775830044196701
0.4360509428173421
0.48730029128352853
這怎麼辦?
輸入是標準的隨機森林輸入,上述輸入val Data = predictions.select("docID", "probability")
predictions.printSchema()
root |-- docID: string (nullable = true) |-- label: double (nullable = false) |-- features: vector (nullable = true) |-- indexedLabel: double (nullable = true) |-- rawPrediction: vector (nullable = true) |-- probability: vector (nullable = true) |-- prediction: double (nullable = true) |-- predictedLabel: string (nullable = true)
,我想要得到的「概率」的第一個值列
謝謝,我使用的方法,但是這兩種方法拋出同樣的錯誤:線程「main」中的異常org.apache.spark.sql.AnalysisException:無法從概率#177提取值;但是第177行的結構與其他行相同 – John
如果您可以提供一個輸入失敗的示例輸入 - 我可以嘗試提供幫助,否則我無法看到任何明顯的原因。另外 - 你可以編輯問題並添加Data.printSchema()'的結果嗎? –
輸入是標準的隨機森林輸入,最終結果是「概率」列的第一個值,'Data.printSchema()'的結果是:根 | - docID:string(nullable = true) | - label:double(nullable = false) | - 特徵:vector(nullable = true) | - indexedLabel:double(nullable = true) | - rawPrediction:vector(nullable = true) | - probability: vector(nullable = true) | - prediction:double(nullable = true) | - predictedLabel:string(nullable = true) – John