2016-04-20 51 views
0

我有一個數據框,它具有日期時間作爲索引和3列,id,收入和成本。比較相同數據框中的兩列,並從比較中計算統計信息

d = {'id' : ['4573', '4573', '4573', '958245','958245','958245'] \ 
,'revenue' : np.random.uniform(size=6),'cost' : np.random.uniform(size=6)} 
e = ['2014-03-01','2014-04-01','2014-05-01','2014-05-01','2015-03-01','2015-02-01'] 

dateindex = [datetime.strptime(a, '%Y-%m-%d') for a in e] 

df = pd.DataFrame(d) 
df.index = dateindex 

    cost id revenue 
2014-03-01 0.445597 4573 0.901713 
2014-04-01 0.774029 4573 0.908302 
2014-05-01 0.104274 4573 0.278444 
2014-05-01 0.938426 958245 0.755022 
2015-03-01 0.647886 958245 0.125072 
2015-02-01 0.267773 958245 0.557496 

我想對每個ID執行收入和成本之間的各種比較。

例如:

僞代碼:

If Revenue > Cost > 0 
CountA = CountA + 1 
Elif 0 < Revenue < Cost 
CountB = CountB + 1 
Elif Revenue > 0 > Cost 
CountC = CountC + 1 
Elif Revenue = 0 and Cost > 0 
CountD = CountD + 1 

對於情況AI以爲我可以這樣做:

df[['revenue']][df['id'] == '4573'] > df[['cost']][df['id'] == '4573'] 

但我得到:

ValueError: Can only compare identically-labeled DataFrame objects 

有沒有更有效的方式來做我所做的事情nt做什麼?

回答

2

首先創建你想要的功能,那麼它可以在一個DF被應用的方式建造它,然後GROUPBY「身份證」和應用功能:

import pandas as pd 
import numpy as np 
import datetime 
import collections 

d = {'id' : ['4573', '4573', '4573', '958245','958245','958245'] \ 
,'revenue' : np.random.uniform(size=6),'cost' : np.random.uniform(size=6)} 
e = ['2014-03-01','2014-04-01','2014-05-01','2014-05-01','2015-03-01','2015-02-01'] 

dateindex = [datetime.datetime.strptime(a, '%Y-%m-%d') for a in e] 

df = pd.DataFrame(d) 
df.index = dateindex 

#create basic function 
def Func(Cost,Revenue): 
    if Revenue > Cost > 0: 
     return 'A' 
    elif Cost>Revenue>0 : 
     return 'B' 
    elif Revenue > 0 > Cost: 
     return 'C' 
    elif Revenue == 0 and Cost > 0: 
     return 'D' 

#create a function to use on df 
def Func_df(df): 
    cases_list = [Func(x,y) for x,y in zip(df.cost.values,df.revenue.values)] 
    return collections.Counter(cases_list) 

df.groupby('id').apply(Func_df) 

輸出(希望):

id 
4573  {u'A': 1, u'B': 2} 
958245 {u'A': 1, u'B': 2}