2017-09-03 60 views
1

我有以下的數據幀類別分配基於百分

Group Country GDP 

    A  a  *** 
    A  b  *** 
    B  a  *** 
    B  b  *** 

我想通過創建一個新的列指定產品類別對GDP組百分等級內基於(高,低)。 這是我試過

def c(gr): 
     ser=gr['gdp'] 
     p=np.nanpercentile(ser,50) 
     for i in ser: 
      if i>p: 
       return "high" 
      else: 
       return "low" 

grouped = df.groupby('Group') 
df['perf']=grouped.apply(c) 

逆足列返回楠。我在這裏做錯了什麼?

+1

類似你看着'pd.cut'? –

+0

嗨,我給的解決方案,它是類似的R :) – Wen

回答

3

使用quantilenumpy.where和自定義功能:

def c(gr): 
    ser=gr['gdp'] 
    #q=0.5 is by default, so can be omit 
    p = ser.quantile() 
    gr['perf'] = np.where(ser > p, 'high', 'low') 
    return gr 

df = df.groupby('Group').apply(c) 

這可以通過transform被簡化:

q = df.groupby('Group')['gdp'].transform('quantile') 
df['perf1'] = np.where(df['gdp'] > q, 'high', 'low') 

樣品

np.random.seed(12) 

N = 15 
L = list('abcd') 
df = pd.DataFrame({'Group': np.random.choice(L, N), 
        'gdp': np.random.rand(N)}) 
df = df.sort_values('Group').reset_index(drop=True) 
df.loc[[0,4,5,10,13,14], 'gdp'] = np.nan 
#print (df) 

def c(gr): 
    ser=gr['gdp'] 
    #q=0.5 is by default, so can be omit 
    p = ser.quantile() 
    gr['perf'] = np.where(ser > p, 'high', 'low') 
    return gr 

df = df.groupby('Group').apply(c) 

q = df.groupby('Group')['gdp'].transform('quantile') 
df['perf1'] = np.where(df['gdp'] > q, 'high', 'low') 
print (df) 
    Group  gdp perf perf1 
0  a  NaN low low 
1  a 0.907267 high high 
2  a 0.456051 low low 
3  b 0.675998 low low 
4  b  NaN low low 
5  b  NaN low low 
6  b 0.563141 low low 
7  b 0.801265 high high 
8  c 0.372834 low low 
9  c 0.481530 high high 
10  c  NaN low low 
11  d 0.082524 low low 
12  d 0.725954 high high 
13  d  NaN low low 
14  d  NaN low low 
+0

添加更多的條件我akm這樣做:def tr(group): ser = group ['gdp']。dropna() for i in ser:如果i>(ser.quantile(q = .75)): group ['perf'] =「h」 elif i <(ser.quantile(q = .25)): group ['perf'] = 「L」 否則: 組[ 'PERF'] = 「m」 個 返回組(這是回訪所有GDP作爲 「H」) – mezz

+1

對不起,我完全忘記。最好是使用'Q1 = df.groupby( '組')[ 'GDP']變換。(拉姆達X:x.quantile(Q = 0.75))''Q2 = df.groupby( '組')[」 GDP ']變換。(拉姆達X:x.quantile(q = 0.25))'然後'DF [' perf1 '] = np.where(DF [' GDP']> Q1, 'H',np.where (df ['gdp'] jezrael

1

R

df['output']=df.groupby('Group').gdp.apply(lambda x : np.where(x>x.quantile(0.75),'High','Low')).apply(pd.Series).stack().dropna().values 

df 
Out[333]: 
    Group  gdp output 
0  a  NaN Low 
1  a 0.772128 Low 
2  a 0.070406 Low 
3  a 0.859301 High 
4  a  NaN Low 
5  a  NaN Low 
6  b 0.681299 High 
7  b 0.040839 Low 
8  c 0.896475 High 
9  c 0.726527 Low 
10  c  NaN Low 
11  c 0.244783 Low 
12  c 0.563001 Low 
13  c  NaN Low 
14  d  NaN Low