2014-09-12 82 views
2

追加,我得到以下錯誤:hdfstore誤差與大熊貓

exportStore.append(key, hdfStoreLocal, index = False, data_columns = True) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 911, in append 
    **kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 1270, in _write_to_group 
    s.write(obj=value, append=append, complib=complib, **kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 3605, in write 
    **kwargs) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 3293, in create_axes 
    raise e 
ValueError: invalid itemsize in generic type tuple 

爲什麼這會發生任何想法?這是一個相當大的項目,所以我不確定我可以提供什麼代碼,但是這發生在第一次追加。任何幫助將非常感激。

編輯::::::

顯示Version結果:

INSTALLED VERSIONS 
------------------ 
commit: None 
python: 2.7.6.final.0 
python-bits: 64 
OS: Linux 
OS-release: 3.13.0-35-generic 
machine: x86_64 
processor: x86_64 
byteorder: little 
LC_ALL: None 
LANG: en_US.UTF-8 

pandas: 0.14.1 
nose: None 
Cython: 0.20.2 
numpy: 1.8.1 
scipy: 0.13.3 
statsmodels: None 
IPython: 1.2.1 
sphinx: 1.2.2 
patsy: None 
scikits.timeseries: None 
dateutil: 1.5 
pytz: 2012c 
bottleneck: None 
tables: 3.1.1 
numexpr: 2.2.2 
matplotlib: 1.3.1 
openpyxl: None 
xlrd: None 
xlwt: None 
xlsxwriter: None 
lxml: None 
bs4: None 
html5lib: 0.999 
httplib2: 0.8 
apiclient: None 
rpy2: None 
sqlalchemy: None 
pymysql: None 
psycopg2: None 

信息結果:

<class 'pandas.core.frame.DataFrame'> 
Int64Index: 61500 entries, 0 to 61499 
Data columns (total 48 columns): 
Sequential_Code_1  61500 non-null float64 
Age_1     61500 non-null float64 
Sex_1     61500 non-null object 
Race_1     61500 non-null object 
Ethnicity_1    61500 non-null object 
Principal_Code_1   61500 non-null object 
Admitting_Code_1   61500 non-null object 
Principal_Code_2   61500 non-null object 
Other_Codes_1   61500 non-null object 
Other_Codes_2   61500 non-null object 
Other_Codes_3   61500 non-null object 
Other_Codes_4   61500 non-null object 
Other_Codes_5   61500 non-null object 
Other_Codes_6   61500 non-null object 
Other_Codes_7   61500 non-null object 
Other_Codes_8   61500 non-null object 
Other_Codes_9   61500 non-null object 
Other_Codes_10   61500 non-null object 
Other_Codes_11   61500 non-null object 
Other_Codes_12   61500 non-null object 
Other_Codes_13   61500 non-null object 
Other_Codes_14   61500 non-null object 
Other_Codes_15   61500 non-null object 
Other_Codes_16   61500 non-null object 
Other_Codes_17   61500 non-null object 
Other_Codes_18   61500 non-null object 
Other_Codes_19   61500 non-null object 
Other_Codes_20   61500 non-null object 
Other_Codes_21   61500 non-null object 
Other_Codes_22   61500 non-null object 
Other_Codes_23   61500 non-null object 
Other_Codes_24   61500 non-null object 
External_Code_1   61500 non-null object 
Place_Code_1    61500 non-null object 

頭:

head  Sequential_Number_1 Age_1 Sex_1 Race_1 \ 
1128     2.000000e+13  73    F    01 
2185     2.000000e+13  52    M    01 
2202     2.000000e+13  64    M    01 
2283     2.000000e+13  72    F    01 
4471     2.000000e+13  62    F    01 
+0

顯示''pd.show_versions()''和''df.info()''和''ptdump -av '' – Jeff 2014-09-12 20:02:17

+0

不知道要做什麼ptdump,因爲我正在做一個append,其他! – Cenoc 2014-09-12 20:17:22

+0

你也可以發佈''df.head()''來顯示一些數據。我懷疑你沒有數據中的所有字符串(你有對象列),但實際上是一個python對象。 – Jeff 2014-09-12 22:24:20

回答

1

的問題是,你需要指定一個min_itemsize,參見文檔here

這可以控制字段列的大小。如果您對ANY值沒有任何長度,則會失敗(問題可能是更好的錯誤消息)。它將花費傳遞值的最大長度來確定它需要的大小。

指定這個的原因是說你正在追加多個塊。在塊2中可以有更長的字符串,這意味着該列至少應該是這個大小,但只看塊1不會告訴你這一點。

此外,還可以預處理此數據,使其不使用0-len字符串,而使用np.nan作爲缺失值(HDFstore/pandas)正確處理。