Python Pandas按多重索引和列排序

在Pandas 0.17中，我嘗試按特定列進行排序，同時保持分層索引（A和B）。 B是通過串聯設置數據幀時創建的運行編號。我的數據是這樣的：Python Pandas按多重索引和列排序

  C  D 
A B 
bar one shiny 10 
    two dull 5 
    three glossy 8 
foo one dull 3 
    two shiny 9 
    three matt 12

這就是我需要：

  C  D 
A B 
bar two dull 5 
    three glossy 8 
    one shiny 10 
foo one dull 3 
    three matt 12 
    two shiny 9

下面是我使用的代碼和結果。注意：Pandas 0.17警告dataframe.sort將被棄用。

df.sort_values(by="C", ascending=True) 
      C  D 
A B 
bar two dull 5 
foo one dull 3 
bar three glossy 8 
foo three matt 12 
bar one shiny 10 
foo two shiny 9

添加.groupby產生相同的結果：

df.sort_values(by="C", ascending=True).groupby(axis=0, level=0, as_index=True)

類似地，在切換到第一分揀索引，然後GROUPBY列不是卓有成效：

df.sort_index(axis=0, level=0, as_index=True).groupby(C, as_index=True)

我不能肯定關於重新索引我需要保留第一個索引A，第二個索引B可以重新分配，但不一定要。如果沒有簡單的解決方案，我會感到驚訝;我想我只是沒有找到它。任何建議表示讚賞。

編輯：在我丟棄所述第二索引乙同時，重新分配第一索引A至是列而不是索引排序的多個列，然後重新索引它：

df.index = df.index.droplevel(1) 
df.reset_index(level=0, inplace=True) 
df_sorted = df.sort_values(["A", "C"], ascending=[1,1]) #A is a column here, not an index. 
df_reindexed = df_sorted.set_index("A")

還是很冗長。

來源

2015-10-17 raummensch

感覺就像有可能是一個更好的辦法，但這裏有一個方法：

In [163]: def sorter(sub_df): 
    ...:  sub_df = sub_df.sort_values('C') 
    ...:  sub_df.index = sub_df.index.droplevel(0) 
    ...:  return sub_df 

In [164]: df.groupby(level='A').apply(sorter) 
Out[164]: 
       C D 
A B     
bar two  dull 5 
    three glossy 8 
    one  shiny 10 
foo one  dull 3 
    three matt 12 
    two  shiny 9

來源

2015-10-17 20:55:57 chrisb

您的方法比我的中間解決方案更先進，但我同意應該有更好的方法。 – raummensch

基於chrisb代碼：

注意，在我而言，這是一個系列不是一個數據幀，

s.groupby(level='A', group_keys=False).apply(lambda x: x.sort_values(ascending=False))

來源

2016-05-17 06:35:36 Cheng

Python Pandas按多重索引和列排序

回答

相關問題