2015-08-09 51 views
1

我需要重新編制一個熊貓數據幀的第二級索引,以便第二級成爲每個第一級索引的列表np.arange(N)。我試圖按照this,但不幸的是,它只創建一個索引,其行數與之前的行數一樣多。我想要的是,對於每個新索引插入新行(帶有nan值)。重新索引第二級多級數據幀

In [79]: 

df = pd.DataFrame({ 
    'first': ['one', 'one', 'one', 'two', 'two', 'three'], 
    'second': [0, 1, 2, 0, 1, 1], 
    'value': [1, 2, 3, 4, 5, 6] 
}) 
print df 
    first second value 
0 one  0  1 
1 one  1  2 
2 one  2  3 
3 two  0  4 
4 two  1  5 
5 three  1  6 
In [80]: 

df['second'] = df.reset_index().groupby(['first']).cumcount() 
print df 
    first second value 
0 one  0  1 
1 one  1  2 
2 one  2  3 
3 two  0  4 
4 two  1  5 
5 three  0  6 

我期望的結果是:

first second value 
0 one  0  1 
1 one  1  2 
2 one  2  3 
3 two  0  4 
4 two  1  5 
4 two  2  nan 
5 three  0  6 
5 three  1  nan 
5 three  2  nan 
+0

你能不能先用所有你需要的行創建數據框?然後用你的值更新它。 – Pekka

+0

是「第二」始終連續並從「0」開始的索引? –

+0

@ chris-sc:是的。 – orange

回答

1

我覺得你可以先設置列firstsecond多級索引,然後reindex

# your data 
# ========================== 
df = pd.DataFrame({ 
    'first': ['one', 'one', 'one', 'two', 'two', 'three'], 
    'second': [0, 1, 2, 0, 1, 1], 
    'value': [1, 2, 3, 4, 5, 6] 
}) 

df 

    first second value 
0 one  0  1 
1 one  1  2 
2 one  2  3 
3 two  0  4 
4 two  1  5 
5 three  1  6 

# processing 
# ============================ 
multi_index = pd.MultiIndex.from_product([df['first'].unique(), np.arange(3)], names=['first', 'second']) 

df.set_index(['first', 'second']).reindex(multi_index).reset_index() 

    first second value 
0 one  0  1 
1 one  1  2 
2 one  2  3 
3 two  0  4 
4 two  1  5 
5 two  2 NaN 
6 three  0 NaN 
7 three  1  6 
8 three  2 NaN