2017-09-27 36 views
-1

我的數據框如下所示。我需要從輸入數組類型列中提取值。你能不能讓我知道我該如何在體育館做到這一點?pyspark - 從數據框中獲取數組類型的值

None 
root 
|-- input: array (nullable = true) 
| |-- element: map (containsNull = true) 
| | |-- key: string 
| | |-- value: map (valueContainsNull = true) 
| | | |-- key: string 
| | | |-- value: double (valueContainsNull = true) 
|-- A: array (nullable = true) 
| |-- element: map (containsNull = true) 
| | |-- key: string 
| | |-- value: map (valueContainsNull = true) 
| | | |-- key: string 
| | | |-- value: double (valueContainsNull = true) 
|-- B: array (nullable = true) 
| |-- element: map (containsNull = true) 
| | |-- key: string 
| | |-- value: map (valueContainsNull = true) 
| | | |-- key: string 
| | | |-- value: double (valueContainsNull = true) 
|-- C: array (nullable = true) 
| |-- element: map (containsNull = true) 
| | |-- key: string 
| | |-- value: map (valueContainsNull = true) 
| | | |-- key: string 
| | | |-- value: double (valueContainsNull = true) 
|-- D: array (nullable = true) 
| |-- element: map (containsNull = true) 
| | |-- key: string 
| | |-- value: map (valueContainsNull = true) 
| | | |-- key: string 
| | | |-- value: double (valueContainsNull = true) 
|-- E: array (nullable = true) 
| |-- element: map (containsNull = true) 
| | |-- key: string 
| | |-- value: map (valueContainsNull = true) 
| | | |-- key: string 
| | | |-- value: double (valueContainsNull = true) 
|-- timestamp: array (nullable = true) 
| |-- element: map (containsNull = true) 
| | |-- key: string 
| | |-- value: map (valueContainsNull = true) 
| | | |-- key: string 
| | | |-- value: double (valueContainsNull = true) 

回答

-1

希望這有助於!

from itertools import chain 
df.select('input').rdd.flatMap(lambda x: chain(*(x))).map(lambda x: x.values()).collect() 
+0

小心解釋downvote? – Prem