2016-08-18 146 views
0

我在一個64位的fedora盒子上使用numpy/pandas,在製作時他們把它推到一個32位的Centos盒子上,並在json.dumps上出錯。這是投擲repr(0) is not Serializable爲什麼32位和64位numpy/pandas之間有區別

我試過在64位Centos上測試,它運行得很好。但在32位(精確地說是Centos 6.8),它會拋出一個錯誤。我想知道是否有人之前遇到過這個問題。

下面是64位的Fedora,

Python 2.6.6 (r266:84292, Jun 30 2016, 09:54:10) 
[GCC 5.3.1 20160406 (Red Hat 5.3.1-6)] on linux4 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import pandas as pd 

>>> >>> a = pd.DataFrame([{'a':1}]) 
>>> 
>>> a 
    a 
0 1 
>>> a.to_dict() 
{'a': {0: 1}} 
>>> import json 
>>> json.dumps(a.to_dict()) 
'{"a": {"0": 1}}' 

下面是32位的Centos

import json 
import pandas as pd 

a = pd.DataFrame([ {'a': 1} ]) 
json.dumps(a.to_dict()) 

Traceback (most recent call last): 
    File "sample.py", line 5, in <module> 
    json.dumps(a.to_dict()) 
    File "/usr/lib/python2.6/json/__init__.py", line 230, in dumps 
    return _default_encoder.encode(obj) 
    File "/usr/lib/python2.6/json/encoder.py", line 367, in encode 
    chunks = list(self.iterencode(o)) 
    File "/usr/lib/python2.6/json/encoder.py", line 309, in _iterencode 
    for chunk in self._iterencode_dict(o, markers): 
    File "/usr/lib/python2.6/json/encoder.py", line 275, in _iterencode_dict 
    for chunk in self._iterencode(value, markers): 
    File "/usr/lib/python2.6/json/encoder.py", line 309, in _iterencode 
    for chunk in self._iterencode_dict(o, markers): 
    File "/usr/lib/python2.6/json/encoder.py", line 268, in _iterencode_dict 
    raise TypeError("key {0!r} is not a string".format(key)) 
TypeError: key 0 is not a string 

什麼是平常解決有關此問題?我無法使用json自定義編碼器,因爲我用來推送這些數據的庫需要一個字典,並且它在內部使用json模塊對其進行序列化並將其推送到網絡上。

更新:雙方和熊貓 Python版本2.6.6是0.16.1兩個

+0

這看起來並不像它應該對任何一個版本都工作過。我猜測系統運行的是不同的Pandas版本,一個版本有一個奇怪的詭計,使'0'表現得像一些整數串混合。 – user2357112

+0

@ user2357112它應該已經在兩個版本上工作。 –

+0

你忘了添加最重要的東西:32位centos上的Python版本和它們兩個上的熊貓版本。 –

回答

3

我相信這是因爲該指數是不同大小的Python的intnumpy.intNN而這些本身不轉換從一個到另一個。

一樣,在我的64位的Python 2.7和numpy的:

>>> isinstance(numpy.int64(5), int) 
True 
>>> isinstance(numpy.int32(5), int) 
False 

然後:

>>> json.dumps({numpy.int64(5): '5'}) 
'{"5": "5"}' 
>>> json.dumps({numpy.int32(5): '5'}) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps 
    return _default_encoder.encode(obj) 
    File "/usr/lib/python2.7/json/encoder.py", line 207, in encode 
    chunks = self.iterencode(o, _one_shot=True) 
    File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode 
    return _iterencode(o, 0) 
TypeError: keys must be a string 

你可以試着指數變更爲numpy.int32numpy.int64int

>>> df = pd.DataFrame([ {'a': 1}, {'a': 2} ]) 
>>> df.index = df.index.astype(numpy.int32) # perhaps your index was of these? 
>>> json.dumps(df.to_dict()) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps 
    return _default_encoder.encode(obj) 
    File "/usr/lib/python2.7/json/encoder.py", line 207, in encode 
    chunks = self.iterencode(o, _one_shot=True) 
    File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode 
    return _iterencode(o, 0) 
TypeError: keys must be a string 

所以,你可以嘗試改變指數類型int32int64或只是簡單的Python int

>>> df.index = df.index.astype(numpy.int64) 
>>> json.dumps(df.to_dict()) 
'{"a": {"0": 1, "1": 2}}' 

>>> df.index = df.index.astype(int) 
>>> json.dumps(df.to_dict()) 
'{"a": {"0": 1, "1": 2}}'