熊貓與去年和以往值的複雜條件

有一個表，訂單號（熊貓據幀）熊貓與去年和以往值的複雜條件

>>> ord = pd.DataFrame([[241147,'01.01.2016'], [241148,'01.01.2016']], columns=['order_id','created']) 
>>> ord 
    order_id created 
0 241147 01.01.2016 
1 241148 01.01.2016

有爲了改變歷史狀態

>>> ord_status_history['ord_id','osh_id','osh_created','osh_status_id','osh_status_reason'] 

    ord_id osh_id osh_id_created osh_status_id osh_status_reason 
0 241147 124632 01.01.2016 1 None 
1 241147 124682 02.01.2016 2 None 
2 241147 124719 03.01.2016 10 None 
7 241148 124633 01.01.2016 1 None 
8 241148 126181 06.01.2016 5 Test_reason

我想要添加到該表或ord有關訂單的最後訂單狀態和倒數第二個狀態的信息（訂單由'osh_created'字段確定）。

order_id created Last_status_id Last_status_date Prev_status_id Prev_status_date reason 
0 241147 01.01.2016 10 03.01.2016 9 02.01.2016 NaN 
1 241148 01.01.2016 5 06.01.2016 1 01.01.2016 Test Reason

但我不明白如何使用np.where或loc條件。由於ord_status_history中的一行命令有幾行，但我需要爲每個訂單僅選擇一行。

我嘗試做這個水木清華（但它的非常糟糕）：

for i in range(ord_stat['order_id'].count()-1): 
    if (ord_stat.loc[i,'order_id']==ord_stat.loc[i+1,'order_id']): 
     if (ord_stat.loc[i,'osh_id_created']<=ord_stat.loc[i+1,'osh_id_created']): 
      if (ord.loc[ord_stat.loc[i,'order_id'],'Last_status_date']=='NAN'): 
       ord.loc[ord_stat.loc[i,'order_id'],'Prev_status_date']=ord_stat.loc[i,'osh_id_created'] 
       ord.loc[ord_stat.loc[i,'order_id'],'Last_status_date']=ord_stat.loc[i+1,'osh_id_created'] 
      else: 
       ord.loc[ord_stat.loc[i,'order_id'],'Prev_status_date']=ord.loc[ord_stat.loc[i,'order_id'],'Last_status_date'] 
       ord.loc[ord_stat.loc[i,'order_id'],'Last_status_date']=ord_stat.loc[i+1,'osh_id_created'] 
     else: 
      if (ord.loc[ord_stat.loc[i,'order_id'],'Last_status_date']=='NAN'): 
       ord.loc[ord_stat.loc[i,'order_id'],'Prev_status_date']=ord_stat.loc[i+1,'osh_id_created'] 
       ord.loc[ord_stat.loc[i,'order_id'],'Last_status_date']=ord_stat.loc[i,'osh_id_created'] 
      else: 
       ord.loc[ord_stat.loc[i,'order_id'],'Prev_status_date']=ord.loc[ord_stat.loc[i,'order_id'],'Last_status_date'] 
       ord.loc[ord_stat.loc[i,'order_id'],'Last_status_date']=ord_stat.loc[i,'osh_id_created']

閱讀nlargest，但我不明白我怎麼可以採取STATUS_ID，如果我用‘與nlargest

osh_created’

ord_stat.groupby('order_id')['osh_id_created'].nlargest(2)

來源

2017-03-17 Zzema

假設我們有以下DataFrames：

In [291]: ord 
Out[291]: 
    order_id created 
0 241147 2016-01-01 
1 241148 2016-01-01 

In [292]: hst 
Out[292]: 
    ord_id osh_id osh_id_created osh_status_id osh_status_reason 
0 241147 124632  2016-01-01    1    None 
1 241147 124682  2016-02-01    2    None 
2 241147 124719  2016-03-01    10    None 
7 241148 124633  2016-01-01    1    None 
8 241148 126181  2016-06-01    5  Test_reason

我們可以彙總，如下所示：

In [293]: funcs = { 
    ...:  'osh_status_id':{ 
    ...:   'Last_status_id':'last', 
    ...:   'Prev_status_id':lambda x: x.shift().iloc[-1] 
    ...:  }, 
    ...:  'osh_id_created':{ 
    ...:   'Last_status_date':'last', 
    ...:   'Prev_status_date':lambda x: x.shift().iloc[-1] 
    ...:  } 
    ...: } 
    ...: 

In [294]: x = (hst.sort_values('osh_id_created') 
    ...:   .groupby('ord_id')['osh_status_id','osh_id_created'] 
    ...:   .agg(funcs) 
    ...:) 
    ...:

導致

In [295]: x 
Out[295]: 
     Last_status_id Prev_status_id Last_status_date Prev_status_date 
ord_id 
241147    10    2  2016-03-01  2016-02-01 
241148    5    1  2016-06-01  2016-01-01

現在我們可以把它合併到原來的ord DF：使用merge()方法

In [296]: ord.set_index('order_id').join(x).reset_index() 
Out[296]: 
    order_id created Last_status_id Prev_status_id Last_status_date Prev_status_date 
0 241147 2016-01-01    10    2  2016-03-01  2016-02-01 
1 241148 2016-01-01    5    1  2016-06-01  2016-01-01

或：

In [297]: pd.merge(ord, x, left_on='order_id', right_index=True) 
Out[297]: 
    order_id created Last_status_id Prev_status_id Last_status_date Prev_status_date 
0 241147 2016-01-01    10    2  2016-03-01  2016-02-01 
1 241148 2016-01-01    5    1  2016-06-01  2016-01-01

來源

2017-03-17 14:58:09 MaxU

熊貓與去年和以往值的複雜條件

回答

相關問題