2017-08-25 112 views
4

我想我的載體轉移到陣列,所以我用對於Spark矢量使用.toArray()後應該是什麼類型?

get_array = udf(lambda x: x.toArray(),ArrayType(DoubleType())) 
result3 = result2.withColumn('list',get_array('features')) 
result3.show() 

其中列features是矢量D型。但是星火告訴我,

net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct) 

我知道原因一定是我在使用UDF的類型,所以我嘗試get_array = udf(lambda x: x.toArray(),ArrayType(FloatType())),這也不能幹活知道它是轉讓後numpy.narray,但我怎麼能顯示它正確嗎?

這裏是我的代碼是如何得到我的數據框RESULT2:

df4 = indexed.groupBy('uuid').pivot('name').sum('fre') 
df4 = df4.fillna(0) 
from pyspark.ml.feature import VectorAssembler 
assembler = VectorAssembler(
    inputCols=df4.columns[1:], 
    outputCol="features") 
dataset = assembler.transform(df4) 
bk = BisectingKMeans(k=8, seed=2, featuresCol="features") 
result2 = bk.fit(dataset).transform(dataset) 

這裏是收錄的樣子:

+------------------+------------+---------+-------------+------------+----------+--------+----+ 
|    uuid| category|  code| servertime|   cat|  fre|catIndex|name| 
+------------------+------------+---------+-------------+------------+----------+--------+----+ 
| 351667085527886|   398|  null|1503084585000|   398|0.37951264|  2.0| a2| 
| 352279079643619|   403|  null|1503105476000|   403| 0.3938634|  3.0| a3| 
| 352279071621894|   398|  null|1503085396000|   398|0.38005984|  2.0| a2| 
| 357653074851887|   398|  null|1503085552000|   398| 0.3801652|  2.0| a2| 
| 354287077780760|   407|  null|1503085603000|   407|0.38019964|  5.0| a5| 
|0_8f394ebf3f67597c|   403|  null|1503084183000|   403|0.37924168|  3.0| a3| 
| 353528084062994|   403|  null|1503084234000|   403|0.37927604|  3.0| a3| 
| 356626072993852| 100000504|100000504|1503104781000| 100000504| 0.3933774|  0.0| a0| 
| 351667081062615| 100000448|  398|1503083901000|   398|0.37905172|  2.0| a2| 
| 354330089551058|1.00000444E8|  null|1503084004000|1.00000444E8|0.37912107| 34.0| a34| 
+------------------+------------+---------+-------------+------------+----------+--------+----+ 

result2,我有double類型的某些列,然後我使用VectorAssembler將這些雙列組裝成一個向量features,這是我想要傳輸到數組的列。

+0

我有文章,請檢查它。 –

回答

相關問題