2010-08-15 74 views
37

使用simplejson序列化numpy數組的最有效方法是什麼?SimpleJSON和NumPy數組

+0

[相關](http://stackoverflow.com/questions/11561932/why-do-json-dumpslistnp-arange5-fail-while-json-dumpsnp-arange5-tolis)a nd [簡單解決方案](http://stackoverflow.com/questions/8230315/python-sets-are-not-json-serializable)通過顯式傳遞一個[默認處理程序](http://docs.python.org/2 /library/json.html#json.dumps)用於不可序列化的對象。 – 2013-08-22 05:39:42

+0

另一個答案在這裏:http://stackoverflow.com/questions/26646362/numpy-array-is-not-json-serializable/32850511#32850511 – travelingbones 2015-09-29 18:36:56

回答

26

我會用simplejson.dumps(somearray.tolist())作爲最方便方法(如果我還是用simplejson可言,這意味着被卡住的Python 2.5或更早版本; 2.6和更高版本有一個標準庫模塊json它的工作方式相同,所以當然如果使用的Python版本支持它的話,我會使用它;-)。

在提高效率的追求,你json.JSONEncoder;通過(在json我不知道是不是上了年紀simplejson已經提供這種定製的可能性),並在default法,numpy.array特殊情況實例把它們變成列表或元組「隨時」。儘管如此,我懷疑你會通過這種方法獲得足夠的收益,但在性能方面,這是值得的。

+0

JSONEncoder的默認方法必須返回一個可序列化的對象,所以它會一樣返回'somearray.tolist()'。如果你想要更快的東西,你必須自己編碼一個元素。 – 2016-03-04 22:36:47

10

這表明如何從一維與NumPy陣列JSON轉換並返回到一個數組:

try: 
    import json 
except ImportError: 
    import simplejson as json 
import numpy as np 

def arr2json(arr): 
    return json.dumps(arr.tolist()) 
def json2arr(astr,dtype): 
    return np.fromiter(json.loads(astr),dtype) 

arr=np.arange(10) 
astr=arr2json(arr) 
print(repr(astr)) 
# '[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]' 
dt=np.int32 
arr=json2arr(astr,dt) 
print(repr(arr)) 
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 

大廈tlausch's answer,這裏 是一種將JSON編碼一個NumPy的陣列,同時保留形狀和dtype的任何 NumPy數組 - 包括那些複雜的dtype。

class NDArrayEncoder(json.JSONEncoder): 
    def default(self, obj): 
     if isinstance(obj, np.ndarray): 
      output = io.BytesIO() 
      np.savez_compressed(output, obj=obj) 
      return {'b64npz' : base64.b64encode(output.getvalue())} 
     return json.JSONEncoder.default(self, obj) 


def ndarray_decoder(dct): 
    if isinstance(dct, dict) and 'b64npz' in dct: 
     output = io.BytesIO(base64.b64decode(dct['b64npz'])) 
     output.seek(0) 
     return np.load(output)['obj'] 
    return dct 

# Make expected non-contiguous structured array: 
expected = np.arange(10)[::2] 
expected = expected.view('<i4,<f4') 

dumped = json.dumps(expected, cls=NDArrayEncoder) 
result = json.loads(dumped, object_hook=ndarray_decoder) 

assert result.dtype == expected.dtype, "Wrong Type" 
assert result.shape == expected.shape, "Wrong Shape" 
assert np.array_equal(expected, result), "Wrong Values" 
17

我發現這個JSON子類代碼爲一個字典內串行化一維陣列numpy的。我試過了,它適用於我。

class NumpyAwareJSONEncoder(json.JSONEncoder): 
    def default(self, obj): 
     if isinstance(obj, numpy.ndarray) and obj.ndim == 1: 
      return obj.tolist() 
     return json.JSONEncoder.default(self, obj) 

我的字典是'results'。下面是如何寫入文件「data.json」:

j=json.dumps(results,cls=NumpyAwareJSONEncoder) 
f=open("data.json","w") 
f.write(j) 
f.close() 
+2

這種方法也適用於嵌套在字典中的numpy數組。這個答案(我認爲)暗示了我剛纔所說的,但這是一個重要的觀點。 – 2013-01-30 19:59:53

+1

這不適合我。我必須使用'return obj.tolist()'而不是'return [x for obj]''。 – nwhsvc 2013-07-31 00:04:01

+0

我更喜歡使用numpy的對象列表 - 它應該更快地讓numpy迭代遍歷列表,而不是讓python遍歷。 – 2014-06-24 21:56:29

2

提高對拉斯的回答,我也將包括np.generic scalars

class NumpyAwareJSONEncoder(json.JSONEncoder): 
    def default(self, obj): 
     if isinstance(obj, np.ndarray) and obj.ndim == 1: 
       return obj.tolist() 
     elif isinstance(obj, np.generic): 
      return obj.item() 
     return json.JSONEncoder.default(self, obj) 
73

爲了保持D型和尺寸試試這個:

import base64 
import json 
import numpy as np 

class NumpyEncoder(json.JSONEncoder): 

    def default(self, obj): 
     """If input object is an ndarray it will be converted into a dict 
     holding dtype, shape and the data, base64 encoded. 
     """ 
     if isinstance(obj, np.ndarray): 
      if obj.flags['C_CONTIGUOUS']: 
       obj_data = obj.data 
      else: 
       cont_obj = np.ascontiguousarray(obj) 
       assert(cont_obj.flags['C_CONTIGUOUS']) 
       obj_data = cont_obj.data 
      data_b64 = base64.b64encode(obj_data) 
      return dict(__ndarray__=data_b64, 
         dtype=str(obj.dtype), 
         shape=obj.shape) 
     # Let the base class default method raise the TypeError 
     return json.JSONEncoder(self, obj) 


def json_numpy_obj_hook(dct): 
    """Decodes a previously encoded numpy ndarray with proper shape and dtype. 

    :param dct: (dict) json encoded ndarray 
    :return: (ndarray) if input was an encoded ndarray 
    """ 
    if isinstance(dct, dict) and '__ndarray__' in dct: 
     data = base64.b64decode(dct['__ndarray__']) 
     return np.frombuffer(data, dct['dtype']).reshape(dct['shape']) 
    return dct 

expected = np.arange(100, dtype=np.float) 
dumped = json.dumps(expected, cls=NumpyEncoder) 
result = json.loads(dumped, object_hook=json_numpy_obj_hook) 


# None of the following assertions will be broken. 
assert result.dtype == expected.dtype, "Wrong Type" 
assert result.shape == expected.shape, "Wrong Shape" 
assert np.allclose(expected, result), "Wrong Values" 
+6

不清楚爲什麼這不是更upvoted! – tacaswell 2014-11-12 02:30:40

+0

同意,這個解決方案通常用於嵌套數組,IE是一個數組字典。 http://stackoverflow.com/questions/27909658/json-encoder-and-decoder-for-complex-numpy-arrays/27913569#27913569 – 2015-01-13 17:49:24

+1

這是隱藏但寶貴的SO寶石之一,可以節省您的時間和小時工作的。 – 2015-09-10 22:46:11

3

如果你想申請拉斯的方法n維數組numpy的你可以試試這個

class NumpyAwareJSONEncoder(json.JSONEncoder): 
    def default(self, obj): 
     if isinstance(obj, numpy.ndarray): 
      if obj.ndim == 1: 
       return obj.tolist() 
      else: 
       return [self.default(obj[i]) for i in range(obj.shape[0])] 
     return json.JSONEncoder.default(self, obj) 

這將簡單地將n維數組轉換爲深度爲「n」的列表列表。爲了將這樣的列表重新映射到一個numpy數組,my_nparray = numpy.array(my_list)將工作,不管列表「深度」如何。

1

你也可以只以這種方式傳遞到json.dumps函數回答:

json.dumps(np.array([1, 2, 3]), default=json_numpy_serializer) 

隨着

import numpy as np 

def json_numpy_serialzer(o): 
    """ Serialize numpy types for json 

    Parameters: 
     o (object): any python object which fails to be serialized by json 

    Example: 

     >>> import json 
     >>> a = np.array([1, 2, 3]) 
     >>> json.dumps(a, default=json_numpy_serializer) 

    """ 
    numpy_types = (
     np.bool_, 
     # np.bytes_, -- python `bytes` class is not json serializable  
     # np.complex64, -- python `complex` class is not json serializable 
     # np.complex128, -- python `complex` class is not json serializable 
     # np.complex256, -- special handling below 
     # np.datetime64, -- python `datetime.datetime` class is not json serializable 
     np.float16, 
     np.float32, 
     np.float64, 
     # np.float128, -- special handling below 
     np.int8, 
     np.int16, 
     np.int32, 
     np.int64, 
     # np.object_ -- should already be evaluated as python native 
     np.str_, 
     np.timedelta64, 
     np.uint8, 
     np.uint16, 
     np.uint32, 
     np.uint64, 
     np.void, 
    ) 

    if isinstance(o, np.ndarray): 
     return o.tolist() 
    elif isinstance(o, numpy_types):   
     return o.item() 
    elif isinstance(o, np.float128): 
     return o.astype(np.float64).item() 
    # elif isinstance(o, np.complex256): -- no python native for np.complex256 
    #  return o.astype(np.complex128).item() -- python `complex` class is not json serializable 
    else: 
     raise TypeError("{} of type {} is not JSON serializable".format(repr(o), type(o))) 

驗證:

need_addition_json_handeling = (
    np.bytes_, 
    np.complex64, 
    np.complex128, 
    np.complex256, 
    np.datetime64, 
    np.float128, 
) 


numpy_types = tuple(set(np.typeDict.values())) 

for numpy_type in numpy_types: 
    print(numpy_type) 

    if numpy_type == np.void: 
     # complex dtypes evaluate as np.void, e.g. 
     numpy_type = np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))]) 
    elif numpy_type in need_addition_json_handeling: 
     print('python native can not be json serialized') 
     continue 

    a = np.ones(1, dtype=nptype) 
    json.dumps(a, default=json_numpy_serialzer) 
0

一個快,雖然不是真正的最佳方式是使用Pandas

import pandas as pd 
pd.Series(your_array).to_json(orient='values') 
0

我剛剛發現tlausch的回答這個問題,並意識到它給我的問題幾乎是正確的答案,但至少對我來說並沒有在Python 3.5工作,因爲幾個錯誤的,: 1 - 無窮遞歸 2 - 將數據保存爲無

既然不能直接在原來的答案尚未置評,這裏是我的版本:

import base64 
import json 
import numpy as np 

    class NumpyEncoder(json.JSONEncoder): 
     def default(self, obj): 
      """If input object is an ndarray it will be converted into a dict 
      holding dtype, shape and the data, base64 encoded. 
      """ 
      if isinstance(obj, np.ndarray): 
       if obj.flags['C_CONTIGUOUS']: 
        obj_data = obj.data 
       else: 
        cont_obj = np.ascontiguousarray(obj) 
        assert(cont_obj.flags['C_CONTIGUOUS']) 
        obj_data = cont_obj.data 
       data_b64 = base64.b64encode(obj_data) 
       return dict(__ndarray__= data_b64.decode('utf-8'), 
          dtype=str(obj.dtype), 
          shape=obj.shape) 


    def json_numpy_obj_hook(dct): 
     """Decodes a previously encoded numpy ndarray with proper shape and dtype. 

     :param dct: (dict) json encoded ndarray 
     :return: (ndarray) if input was an encoded ndarray 
     """ 
     if isinstance(dct, dict) and '__ndarray__' in dct: 
      data = base64.b64decode(dct['__ndarray__']) 
      return np.frombuffer(data, dct['dtype']).reshape(dct['shape']) 
     return dct 

expected = np.arange(100, dtype=np.float) 
dumped = json.dumps(expected, cls=NumpyEncoder) 
result = json.loads(dumped, object_hook=json_numpy_obj_hook) 


# None of the following assertions will be broken. 
assert result.dtype == expected.dtype, "Wrong Type" 
assert result.shape == expected.shape, "Wrong Shape" 
assert np.allclose(expected, result), "Wrong Values"