2017-02-24 95 views

回答

0

我在下面概述了一種方法。

請注意,要將值舍入爲最接近的整數,您應該使用Python的內置round()函數。有關詳細信息,請參見Python documentation中的round()

import pandas as pd 
import numpy as np 
# set random seed for reproducibility 
np.random.seed(748) 

# initialize base example dataframe 
df = pd.DataFrame({"date":np.arange(10), 
        "score":np.random.uniform(size=10)}) 

duplicate_dates = np.random.choice(df.index, 5) 

df_dup = pd.DataFrame({"date":np.random.choice(df.index, 5), 
         "score":np.random.uniform(size=5)}) 

# finish compiling example data 
df = df.append(df_dup, ignore_index=True) 

# calculate 0.7 quantile result with specified parameters 
result = df.groupby("date").quantile(q=0.7, axis=0, interpolation='midpoint') 

# print resulting dataframe 
# contains one unique 0.7 quantile value per date 
print(result) 

""" 
0.7  score 
date   
0  0.585087 
1  0.476404 
2  0.426252 
3  0.363376 
4  0.165013 
5  0.927199 
6  0.575510 
7  0.576636 
8  0.831572 
9  0.932183 
""" 

# to apply the resulting quantile information to 
# a new column in our original dataframe `df` 
# we can apply a dictionary to our "date" column 

# create dictionary 
mapping = result.to_dict()["score"] 

# apply to `df` to produce desired new column 
df["quantile_0.7"] = [mapping[x] for x in df["date"]] 

print(df) 

""" 
    date  score quantile_0.7 
0  0 0.920895  0.585087 
1  1 0.476404  0.476404 
2  2 0.380771  0.426252 
3  3 0.363376  0.363376 
4  4 0.165013  0.165013 
5  5 0.927199  0.927199 
6  6 0.340008  0.575510 
7  7 0.695818  0.576636 
8  8 0.831572  0.831572 
9  9 0.932183  0.932183 
10  7 0.457455  0.576636 
11  6 0.650666  0.575510 
12  6 0.500353  0.575510 
13  0 0.249280  0.585087 
14  2 0.471733  0.426252 
"""