2016-06-28 31 views
0

我有數據幀的列表,如何存儲大熊貓數據幀的列表,方便地訪問

df1 = 
    Stock Year Profit CountPercent 
    AAPL 2012 1  38.77 
    AAPL 2013 1  33.33 
df2 = 
    Stock Year Profit CountPercent 
    GOOG 2012 1  43.47 
    GOOG 2013 1  32.35 

df3 = 
    Stock Year Profit CountPercent 
    ABC 2012 1  40.00 
    ABC 2013 1  32.35 

該出把一個函數的是[df1,df2,df3,......]這樣, 所有的數據幀中的列將是相同的,但行將有所不同,

如何我可以將這些存儲在硬盤中,並以最快速和有效的方式再次檢索列表?

+0

都是你的話語結構具有相同的形狀(行數和列數)? – MaxU

回答

1

如果Stock列中的值是一樣的,你可以通過iloc刪除此列,並使用dict comprehension(關鍵是在每個dfStock列的第一個值):

dfs = {df.ix[0,'Stock']: df.iloc[:, 1:] for df in [df1,df2,df3]} 

print (dfs['AAPL']) 
    Year Profit CountPercent 
0 2012  1   38.77 
1 2013  1   33.33 

print (dfs['ABC']) 
    Year Profit CountPercent 
0 2012  1   40.00 
1 2013  1   32.35 

print (dfs['GOOG']) 
    Year Profit CountPercent 
0 2012  1   43.47 
1 2013  1   32.35 

對於disk存儲我認爲最好的使用hdf5 pytables

如果每個Stack列中的值是一樣的,你可以concat所有df然後存儲它:

df = pd.concat([df1.set_index('Stock'), df2.set_index('Stock'), df3.set_index('Stock')]) 
print (df) 
     Year Profit CountPercent 
Stock        
AAPL 2012  1   38.77 
AAPL 2013  1   33.33 
GOOG 2012  1   43.47 
GOOG 2013  1   32.35 
ABC 2012  1   40.00 
ABC 2013  1   32.35 

store = pd.HDFStore('store.h5') 
store['df'] = df 
print (store) 
<class 'pandas.io.pytables.HDFStore'> 
File path: store.h5 
/df   frame  (shape->[1,4]) 
1

我認爲,如果你所有的話語結構具有相同的形狀,那麼這將是更自然的存儲您的數據pandas.Panel代替話語結構的名單 - 這是怎麼pandas_datareader工作

import io 
import pandas as pd 

df1 = pd.read_csv(io.StringIO(""" 
Stock,Year,Profit,CountPercent 
AAPL,2012,1,38.77 
AAPL,2013,1,33.33 
""" 
)) 

df2 = pd.read_csv(io.StringIO(""" 
Stock,Year,Profit,CountPercent 
GOOG,2012,1,43.47 
GOOG,2013,1,32.35 
""" 
)) 

df3 = pd.read_csv(io.StringIO(""" 
Stock,Year,Profit,CountPercent 
ABC,2012,1,40.0 
ABC,2013,1,32.35 
""" 
)) 


store = pd.HDFStore('c:/temp/stocks.h5') 

# i had to drop `Stock` column and make it Panel-Axis, because of ERROR: 
# TypeError: Cannot serialize the column [%s] because its data contents are [mixed-integer] object dtype 
# when saving Panel to HDFStore ... 
p = pd.Panel({df.iat[0, 0]:df.drop('Stock', 1) for df in [df1,df2,df3]}) 

store = pd.HDFStore('c:/temp/stocks.h5') 
store.append('stocks', p, data_columns=True, mode='w') 
store.close() 

# read panel from HDFStore 
store = pd.HDFStore('c:/temp/stocks.h5') 
p = store.select('stocks') 

商店:

In [18]: store 
Out[18]: 
<class 'pandas.io.pytables.HDFStore'> 
File path: c:/temp/stocks.h5 
/stocks   wide_table (typ->appendable,nrows->6,ncols->3,indexers->[major_axis,minor_axis],dc->[AAPL,ABC,GOOG]) 

面板尺寸:

In [19]: p['AAPL'] 
Out[19]: 
    Year Profit CountPercent 
0 2012.0  1.0   38.77 
1 2013.0  1.0   33.33 

In [20]: p[:, :, 'Profit'] 
Out[20]: 
    AAPL ABC GOOG 
0 1.0 1.0 1.0 
1 1.0 1.0 1.0 

In [21]: p[:, 0] 
Out[21]: 
       AAPL  ABC  GOOG 
Year   2012.00 2012.0 2012.00 
Profit   1.00  1.0  1.00 
CountPercent 38.77 40.0 43.47