重新索引第二級多級數據幀

我需要重新編制一個熊貓數據幀的第二級索引，以便第二級成爲每個第一級索引的列表np.arange(N)。我試圖按照this，但不幸的是，它只創建一個索引，其行數與之前的行數一樣多。我想要的是，對於每個新索引插入新行（帶有nan值）。重新索引第二級多級數據幀

In [79]: 

df = pd.DataFrame({ 
    'first': ['one', 'one', 'one', 'two', 'two', 'three'], 
    'second': [0, 1, 2, 0, 1, 1], 
    'value': [1, 2, 3, 4, 5, 6] 
}) 
print df 
    first second value 
0 one  0  1 
1 one  1  2 
2 one  2  3 
3 two  0  4 
4 two  1  5 
5 three  1  6 
In [80]: 

df['second'] = df.reset_index().groupby(['first']).cumcount() 
print df 
    first second value 
0 one  0  1 
1 one  1  2 
2 one  2  3 
3 two  0  4 
4 two  1  5 
5 three  0  6

我期望的結果是：

first second value 
0 one  0  1 
1 one  1  2 
2 one  2  3 
3 two  0  4 
4 two  1  5 
4 two  2  nan 
5 three  0  6 
5 three  1  nan 
5 three  2  nan

來源

2015-08-09 orange

你能不能先用所有你需要的行創建數據框？然後用你的值更新它。 – Pekka

是「第二」始終連續並從「0」開始的索引？ –

@ chris-sc：是的。 – orange

我覺得你可以先設置列first和second多級索引，然後reindex。

# your data 
# ========================== 
df = pd.DataFrame({ 
    'first': ['one', 'one', 'one', 'two', 'two', 'three'], 
    'second': [0, 1, 2, 0, 1, 1], 
    'value': [1, 2, 3, 4, 5, 6] 
}) 

df 

    first second value 
0 one  0  1 
1 one  1  2 
2 one  2  3 
3 two  0  4 
4 two  1  5 
5 three  1  6 

# processing 
# ============================ 
multi_index = pd.MultiIndex.from_product([df['first'].unique(), np.arange(3)], names=['first', 'second']) 

df.set_index(['first', 'second']).reindex(multi_index).reset_index() 

    first second value 
0 one  0  1 
1 one  1  2 
2 one  2  3 
3 two  0  4 
4 two  1  5 
5 two  2 NaN 
6 three  0 NaN 
7 three  1  6 
8 three  2 NaN

來源

2015-08-09 09:08:02

重新索引第二級多級數據幀

回答

相關問題