2017-08-09 222 views
1

我有一個數據幀,其df是的indeces排序大熊貓數據幀的indeces

df.index 
Out[4]: 
Index([u'2015-03-28_p001_2', u'2015-03-29_p001_2', 
     u'2015-03-30_p001_2', u'2015-03-31_p001_2', 
     u'2015-03-31_p002_3', u'2015-04-01_p001_2', 
     u'2015-04-01_p002_3', u'2015-04-02_p001_2', 
     u'2015-04-02_p002_3', u'2015-04-03_p001_2', 
     ... 
     u'2016-03-31_p127_1', u'2016-04-01_p127_1', 
     u'2016-04-01_p128_3', u'2016-04-02_p127_1', 
     u'2016-04-02_p128_3', u'2016-04-03_p127_1', 
     u'2016-04-03_p128_3', u'2016-04-04_p127_1', 
     u'2016-04-05_p127_1', u'2016-04-06_p127_1'], 
     dtype='object', length=781) 

的數據幀DF表示2個dataframes合併的結果。

正如你可以看到從indeces不排序。例如。 '2015-03-31_p002_3'(第5位)來'2015-04-01_p001_2'(第6位)之前

我想組一起所有_p001_2並將其分類根據日期,那麼所有的_p002_3,等等等等

但我沒有設法做...

+2

[Python,pandas:如何按索引對數據框進行排序]的可能重複(https://stackoverflow.com/questions/22211737/python-pandas-how-to-sort-dataframe-by-index) – Zero

回答

0

如果sort_index無法使用,那麼它是一個有點複雜 - 需要通過split創建幫手DataFrame,然後sort_values和最後reindex

idx = pd.Index([u'2015-03-28_p001_2', u'2015-03-29_p001_2', 
     u'2015-03-30_p001_2', u'2015-03-31_p001_2', 
     u'2015-03-31_p002_3', u'2015-04-01_p001_2', 
     u'2015-04-01_p002_3', u'2015-04-02_p001_2', 
     u'2015-04-02_p002_3', u'2015-04-03_p001_2', 

     u'2016-03-31_p127_1', u'2016-04-01_p127_1', 
     u'2016-04-01_p128_3', u'2016-04-02_p127_1', 
     u'2016-04-02_p128_3', u'2016-04-03_p127_1', 
     u'2016-04-03_p128_3', u'2016-04-04_p127_1', 
     u'2016-04-05_p127_1', u'2016-04-06_p127_1']) 

df = pd.DataFrame({'a':range(len(idx))}, index=idx) 
print (df) 
        a 
2015-03-28_p001_2 0 
2015-03-29_p001_2 1 
2015-03-30_p001_2 2 
2015-03-31_p001_2 3 
2015-03-31_p002_3 4 
2015-04-01_p001_2 5 
2015-04-01_p002_3 6 
2015-04-02_p001_2 7 
2015-04-02_p002_3 8 
2015-04-03_p001_2 9 
2016-03-31_p127_1 10 
2016-04-01_p127_1 11 
2016-04-01_p128_3 12 
2016-04-02_p127_1 13 
2016-04-02_p128_3 14 
2016-04-03_p127_1 15 
2016-04-03_p128_3 16 
2016-04-04_p127_1 17 
2016-04-05_p127_1 18 
2016-04-06_p127_1 19 

df = df.sort_index() 
print (df) 
        a 
2015-03-28_p001_2 0 
2015-03-29_p001_2 1 
2015-03-30_p001_2 2 
2015-03-31_p001_2 3 
2015-03-31_p002_3 4 
2015-04-01_p001_2 5 
2015-04-01_p002_3 6 
2015-04-02_p001_2 7 
2015-04-02_p002_3 8 
2015-04-03_p001_2 9 
2016-03-31_p127_1 10 
2016-04-01_p127_1 11 
2016-04-01_p128_3 12 
2016-04-02_p127_1 13 
2016-04-02_p128_3 14 
2016-04-03_p127_1 15 
2016-04-03_p128_3 16 
2016-04-04_p127_1 17 
2016-04-05_p127_1 18 
2016-04-06_p127_1 19 

df1 = df.index.to_series().str.split('_', expand=True) 
df1[0] = pd.to_datetime(df1[0]) 
#if necessary change order columns for sorting 
df1 = df1.sort_values(by=[1,2,0]) 
print (df1) 
          0  1 2 
2015-03-28_p001_2 2015-03-28 p001 2 
2015-03-29_p001_2 2015-03-29 p001 2 
2015-03-30_p001_2 2015-03-30 p001 2 
2015-03-31_p001_2 2015-03-31 p001 2 
2015-04-01_p001_2 2015-04-01 p001 2 
2015-04-02_p001_2 2015-04-02 p001 2 
2015-04-03_p001_2 2015-04-03 p001 2 
2015-03-31_p002_3 2015-03-31 p002 3 
2015-04-01_p002_3 2015-04-01 p002 3 
2015-04-02_p002_3 2015-04-02 p002 3 
2016-03-31_p127_1 2016-03-31 p127 1 
2016-04-01_p127_1 2016-04-01 p127 1 
2016-04-02_p127_1 2016-04-02 p127 1 
2016-04-03_p127_1 2016-04-03 p127 1 
2016-04-04_p127_1 2016-04-04 p127 1 
2016-04-05_p127_1 2016-04-05 p127 1 
2016-04-06_p127_1 2016-04-06 p127 1 
2016-04-01_p128_3 2016-04-01 p128 3 
2016-04-02_p128_3 2016-04-02 p128 3 
2016-04-03_p128_3 2016-04-03 p128 3 

df = df.reindex(df1.index) 
print (df) 
        a 
2015-03-28_p001_2 0 
2015-03-29_p001_2 1 
2015-03-30_p001_2 2 
2015-03-31_p001_2 3 
2015-04-01_p001_2 5 
2015-04-02_p001_2 7 
2015-04-03_p001_2 9 
2015-03-31_p002_3 4 
2015-04-01_p002_3 6 
2015-04-02_p002_3 8 
2016-03-31_p127_1 10 
2016-04-01_p127_1 11 
2016-04-02_p127_1 13 
2016-04-03_p127_1 15 
2016-04-04_p127_1 17 
2016-04-05_p127_1 18 
2016-04-06_p127_1 19 
2016-04-01_p128_3 12 
2016-04-02_p128_3 14 
2016-04-03_p128_3 16 

編輯:

如果重複,那麼有必要創建新列,排序和最後一滴他們:

df[[0,1,2]] = df.index.to_series().str.split('_', expand=True) 
df[0] = pd.to_datetime(df[0]) 
df = df.sort_values(by=[1,2,0]) 
df = df.drop([0,1,2], axis=1) 
print (df) 
        a 
2015-03-28_p001_2 0 
2015-03-29_p001_2 1 
2015-03-30_p001_2 2 
2015-03-31_p001_2 3 
2015-04-01_p001_2 5 
2015-04-02_p001_2 7 
2015-04-03_p001_2 9 
2015-03-31_p002_3 4 
2015-04-01_p002_3 6 
2015-04-02_p002_3 8 
2016-03-31_p127_1 10 
2016-04-01_p127_1 11 
2016-04-02_p127_1 13 
2016-04-03_p127_1 15 
2016-04-04_p127_1 17 
2016-04-05_p127_1 18 
2016-04-06_p127_1 19 
2016-04-01_p128_3 12 
2016-04-02_p128_3 14 
2016-04-03_p128_3 16