0
我希望有人可以幫助我調試我們在火花中看到的子類型爲ndarray
的問題。具體來說,當broadcast一個子類數組似乎失去了額外的信息。一個簡單的例子是如下:subndaring ndarray在pyspark中播放時丟棄信息
>>> import numpy as np
>>>
>>> class Test(np.ndarray):
... def __new__(cls, input_array, info=None):
... obj = np.asarray(input_array).view(cls)
... obj.info = info
... return obj
...
... def __array_finalize__(self, obj):
... if not hasattr(self, "info"):
... self.info = getattr(obj, 'info', None)
... else:
... print("has info attribute: %s" % getattr(self, 'info'))
...
>>> test = Test(np.array([[1,2,3],[4,5,6]]), info="info")
>>> print(test.info)
info
>>> print(sc.broadcast(test).value)
[[1 2 3]
[4 5 6]]
>>> print(sc.broadcast(test).value.info)
None
這個線程解決它:http://stackoverflow.com/questions/26598109/preserve-custom-attributes-when-pickling-subclass-of-numpy-array – David