2017-04-03 47 views
0

我希望有人可以幫助我調試我們在火花中看到的子類型爲ndarray的問題。具體來說,當broadcast一個子類數組似乎失去了額外的信息。一個簡單的例子是如下:subndaring ndarray在pyspark中播放時丟棄信息

>>> import numpy as np 
>>> 
>>> class Test(np.ndarray): 
...  def __new__(cls, input_array, info=None): 
...   obj = np.asarray(input_array).view(cls) 
...   obj.info = info 
...   return obj 
...  
...  def __array_finalize__(self, obj): 
...   if not hasattr(self, "info"): 
...    self.info = getattr(obj, 'info', None) 
...   else: 
...    print("has info attribute: %s" % getattr(self, 'info')) 
... 
>>> test = Test(np.array([[1,2,3],[4,5,6]]), info="info") 
>>> print(test.info) 
info 
>>> print(sc.broadcast(test).value) 
[[1 2 3] 
[4 5 6]] 
>>> print(sc.broadcast(test).value.info) 
None 
+0

這個線程解決它:http://stackoverflow.com/questions/26598109/preserve-custom-attributes-when-pickling-subclass-of-numpy-array – David

回答

0

至少,你有一個小錯字 - 你檢查hasattr(obj, "info")時,而不是你應該檢查if hasattr(self, "info")。由於if語句翻轉,信息不會被結轉。

test = Test(np.array([[1,2,3],[4,5,6]]), info="info") 
print test.info # info 
test2 = test[1:] 
print test2.info # info 
+0

福氣!不解決首要問題,雖然:( – David