2017-01-10 105 views
1

我有包含有關各種金融證券的價格,交易量和其他數據的數據。我輸入的數據如下所示:重置從groupby或pivot創建的pandas DataFrame的索引?

import numpy as np 
import pandas 

prices = np.random.rand(15) * 100 
volumes = np.random.randint(15, size=15) * 10 
idx = pandas.Series([2007, 2007, 2007, 2007, 2007, 2008, 
        2008, 2008, 2008, 2008, 2009, 2009, 
        2009, 2009, 2009], name='year') 
df = pandas.DataFrame.from_items([('price', prices), ('volume', volumes)]) 
df.index = idx 

# BELOW IS AN EXMPLE OF WHAT INPUT MIGHT LOOK LIKE 
# IT WON'T BE EXACT BECAUSE OF THE USE OF RANDOM 
#   price volume 
# year 
# 2007 0.121002  30 
# 2007 15.256424  70 
# 2007 44.479590  50 
# 2007 29.096013  0 
# 2007 21.424690  0 
# 2008 23.019548  40 
# 2008 90.011295  0 
# 2008 88.487664  30 
# 2008 51.609119  70 
# 2008 4.265726  80 
# 2009 34.402065  140 
# 2009 10.259064  100 
# 2009 47.024574  110 
# 2009 57.614977  140 
# 2009 54.718016  50 

我想生產,看起來像一個數據幀:

year  2007  2008  2009 
0  0.121002 23.019548 34.402065 
1  15.256424 90.011295 10.259064 
2  44.479590 88.487664 47.024574 
3  29.096013 51.609119 57.614977 
4  21.424690 4.265726 54.718016 

我知道的一個方式生產使用GROUPBY以上輸出:

df = df.reset_index() 
grouper = df.groupby('year') 
df2 = None 
for group, data in grouper: 
    series = data['price'].copy() 
    series.index = range(len(series)) 
    series.name = group 
    df2 = pandas.DataFrame(series) if df2 is None else pandas.concat([df2, series], axis=1) 

而且我也知道,你可以做支點,以獲得具有NaN的對樞丟失索引的數據幀:

# df = df.reset_index() 
df.pivot(columns='year', values='price') 

# Output 
# year  2007  2008  2009 
# 0  0.121002  NaN  NaN 
# 1  15.256424  NaN  NaN 
# 2  44.479590  NaN  NaN 
# 3  29.096013  NaN  NaN 
# 4  21.424690  NaN  NaN 
# 5   NaN 23.019548  NaN 
# 6   NaN 90.011295  NaN 
# 7   NaN 88.487664  NaN 
# 8   NaN 51.609119  NaN 
# 9   NaN 4.265726  NaN 
# 10   NaN  NaN 34.402065 
# 11   NaN  NaN 10.259064 
# 12   NaN  NaN 47.024574 
# 13   NaN  NaN 57.614977 
# 14   NaN  NaN 54.718016 

我的問題是:

有沒有辦法,我可以創建在GROUPBY我的輸出數據框,而無需創建一系列的方式,或者是有辦法,我可以重新索引我輸入的數據幀,使我得到使用樞軸的理想輸出?

回答

3

你需要標籤每年0-4。爲此,請在分組後使用cumcount。然後,您可以使用該新列作爲索引正確旋轉。

df['year_count'] = df.groupby(level='year').cumcount() 
df.reset_index().pivot(index='year_count', columns='year', values='price') 

year    2007  2008  2009 
year_count         
0   61.682275 32.729113 54.859700 
1   44.231296 4.453897 45.325802 
2   65.850231 82.023960 28.325119 
3   29.098607 86.046499 71.329594 
4   67.864723 43.499762 19.255214 
0

可以使用groupby通過valuesnumpy array創建applySeries然後unstack重塑:

print (df.groupby(level='year')['price'].apply(lambda x: pd.Series(x.values)).unstack(0)) 
year  2007  2008  2009 
0  55.360804 68.671626 78.809139 
1  50.246485 55.639250 84.483814 
2  17.646684 14.386347 87.185550 
3  54.824732 91.846018 60.793002 
4  24.303751 50.908714 22.084445 
相關問題