2015-09-06 78 views
0

我有一個數據幀:尋找大熊貓四分

Av_Temp Tot_Precip 
278.001 0 
274  0.0751864 
270.294 0.631634 
271.526 0.229285 
272.246 0.0652201 
273  0.0840059 
270.463 0.0602944 
269.983 0.103563 
268.774 0.0694555 
269.529 0.010908 
270.062 0.043915 
271.982 0.0295718 

我想找到列的百分位值(25%,50%,75%):「Tot_Precip」的各等分(來自列的值:前10%,後10%......)Av_Temp。目前,我正在這樣做:

import numpy, pandas, pdb 
expl_var = 'Av_Temp' 
cname = 'Tot_Precip' 
num_samples = 10.0 
max_val = df[expl_var].max() 
min_val = df[expl_var].min() 

expl_bins = numpy.linspace(min_val, max_val, num = num_samples) 

for index, val in enumerate(expl_bins): 
    print index 
    if index < (len(expl_bins) - 1): 
     cur_val = val 
     nxt_val = expl_bins[index+1] 

     # Subset dataframe to rows with values of expl_var between 
     # cur_val and nxt_val 
     sub_ind_df = df[(df[expl_var] >= cur_val) & (df[expl_var] <= nxt_val)] 

     sub_ind_df[cname+'_quartiles'] = pandas.qcut(sub_ind_df[cname], 4) 
     # Merge with sub_df 
     pdb.set_trace() 

不知道如何在此之後繼續。

答案可能是這樣的:

Av_Temp_decile  Tot_Precip_25  Tot_Precip_50 Tot_Precip_75 
270 - 272   0.03     0.05    0.08 

回答

1

我只拆分您的數據爲兩半,而不是十分位數這裏由於小例子的數據集,但一切都應該工作一樣,如果你只是增加數在最初的剪輯箱:

# Change this to 10 to get deciles 
df['Temp_Halves'] = pd.qcut(df['Av_Temp'], 2) 

def get_quartiles(group): 
    # Add retbins=True to get the bin edges 
    qs, bins = pd.qcut(group['Tot_Precip'], [.25, .5, .75], retbins=True) 
    # Returning a series from a function means groupby.apply() will 
    # expand it into separate columns 
    return pd.Series(bins, index=['Precip_25', 'Precip_50', 'Precip_75'] 

df.groupby('Temp_Halves').apply(get_quartiles) 
Out[21]: 
        Precip_25 Precip_50 Precip_75 
Temp_Halves           
[268.774, 270.995] 0.048010 0.064875 0.095036 
(270.995, 278.001] 0.038484 0.070203 0.081801 
+0

優秀,這是偉大的! TY – user308827