2017-05-30 378 views
3

我正在使用python的數據框架。 如何指定特定四分位數(例如q1,q2,q3,q4)中具有特定列'rate'的值的所有行?這裏,interval是'rate'的範圍,所以[-0,0.913056]是整個範圍。我想指出每行中'rate'的值將落入該範圍的哪個分位數。如何在pandas數據框中指定特定列的四分位數?

 name     rate 
0 3POWER ENERGY GROUP INC  -0.000000 
1 808 RENEWABLE ENERGY CORP -0.112192 
2 YORK WATER CO    0.774955 
3 ZTO EXPRESS (CAYM) INC -ADR 0.086352 
4 AEP GENERATING CO   0.850960 
5 AEP TEXAS CENTRAL CO   0.600301 
6 AIR T INC     0.254511 
7 ALABAMA GAS CORP    0.611631 
8 ALABAMA POWER CO    0.913056 
9 ALLEGIANT TRAVEL CO   0.227421 
10 COMCAST CORP     0.012037 
11 HAWAIIAN ELECTRIC CO   0.670980 
12 HAWAIIAN ELECTRIC INDS  0.775778 

df like this。

name       rate  quartile 
0 3POWER ENERGY GROUP INC  -0.000000 q1 
1 808 RENEWABLE ENERGY CORP -0.112192 q1 
2 YORK WATER CO    0.774955 q3 
3 ZTO EXPRESS (CAYM) INC -ADR 0.086352 q1 
4 AEP GENERATING CO   0.850960 q4 
5 AEP TEXAS CENTRAL CO   0.600301 q3 
6 AIR T INC     0.254511 q2 
7 ALABAMA GAS CORP    0.611631 q3 
8 ALABAMA POWER CO    0.913056 q4 
9 ALLEGIANT TRAVEL CO   0.227421 q2 
10 COMCAST CORP     0.012037 q1 
11 HAWAIIAN ELECTRIC CO   0.670980 q4 
12 HAWAIIAN ELECTRIC INDS  0.775778 q4 

回答

4

您需要qcut

df['quartile'] = pd.qcut(df['rate'], 4, ['q1','q2','q3','q4']) 
print (df) 
          name  rate quartile 
0  3POWER ENERGY GROUP INC -0.000000  q1 
1  808 RENEWABLE ENERGY CORP -0.112192  q1 
2     YORK WATER CO 0.774955  q3 
3 ZTO EXPRESS (CAYM) INC -ADR 0.086352  q1 
4    AEP GENERATING CO 0.850960  q4 
5   AEP TEXAS CENTRAL CO 0.600301  q2 
6      AIR T INC 0.254511  q2 
7    ALABAMA GAS CORP 0.611631  q3 
8    ALABAMA POWER CO 0.913056  q4 
9   ALLEGIANT TRAVEL CO 0.227421  q2 
10     COMCAST CORP 0.012037  q1 
11   HAWAIIAN ELECTRIC CO 0.670980  q3 
12  HAWAIIAN ELECTRIC INDS 0.775778  q4 
相關問題