2017-07-29 40 views
0

我有我已經轉動一個數據幀:分類指數由列表 - Python的熊貓

FinancialYear 2014/2015 2015/2016 2016/2017 2017/2018 
Month    
April    42   32   29   27 
August   34   28   32   0 
December   45   51   28   0 
February   28   20   28   0 
January   32   28   33   0 
July    40   66   31   30 
June    32   67   37   35 
March    43   36   39   0 
May    34   30   24   29 
November   39   32   31   0 
October   38   39   28   0 
September   29   19   34   0 

這是我使用的代碼:

new_hm01 = hmdf[['FinancialYear','Month','FirstReceivedDate']] 

hm05 = new_hm01.pivot_table(index=['FinancialYear','Month'], aggfunc='count') 

df_hm = new_hm01.groupby(['Month', 'FinancialYear']).size().unstack(fill_value=0).rename(columns=lambda x: '{}'.format(x)) 

月是不是我想要的順序,所以我用下面的代碼根據列表重新索引它:

vals = ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'] 

df_hm = df_hm.reindex(vals) 

這個工作,但在我的表中的值現在大多呈現NaN值。

FinancialYear 2014/2015 2015/2016 2016/2017 2017/2018 
Month    
April    nan   nan   nan   nan 
May    nan   nan   nan   nan 
June    nan   nan   nan   nan 
July    nan   nan   nan   nan 
August    nan   nan   nan   nan 
September   29   19   34   0 
October   nan   nan   nan   nan 
November   nan   nan   nan   nan 
December   nan   nan   nan   nan 
January   nan   nan   nan   nan 
February   nan   nan   nan   nan 
March    nan   nan   nan   nan 

想知道發生了什麼嗎?如何解決它?如果有更好的替代方法?

+2

請在調用'reindex'之前發佈'df_hm.index.tolist()'*。 (很可能,原始索引中的標籤與重新索引中使用的標籤不完全相同,也許存在空白區別...)。 – unutbu

+0

我沒有你所說的話,並且該指數顯示:u'April ' u'August', u'December ' u'February', u'January ' u'July', u'June ' u'March', u'May ' u'November', u'October ' u'September'] – ScoutEU

+0

唉唉,有後的空間。泰! – ScoutEU

回答

1

重新索引後的意外NaN通常是由於新索引標籤與舊索引標籤不完全匹配。例如,如果原始的索引標識包含空格,但新標籤不這樣做,那麼你會得到的NaN:

import numpy as np 
import pandas as pd 

df = pd.DataFrame({'col':[1,2,3]}, index=['April ', 'June ', 'May ', ]) 
print(df) 
#   col 
# April  1 
# June  2 
# May  3 

df2 = df.reindex(['April', 'May', 'June']) 
print(df2) 
#  col 
# April NaN 
# May NaN 
# June NaN 

這可以通過刪除空白,使其固定在標籤匹配:

df.index = df.index.str.strip() 
df3 = df.reindex(['April', 'May', 'June']) 
print(df3) 
#  col 
# April 1 
# May  3 
# June  2 
+0

謝謝你,這是非常有益的! – ScoutEU