2017-04-17 92 views
1

我從創建一個數據幀使用SciPy的Percentileofscore在格式閱讀下面在GROUPBY數據幀

Date,Open,High,Low,Close,Volume,Adj Close,Ticker,Indicator1,Indicator2 
42255,91.760002,92.790001,90.400002,92.720001,3085500,86.16844,LB,302.911961,45.621095920339 
42251,88.550003,90.860001,88,90.379997,3230200,83.993779,LB,211.511385,45.7675721184876 
42250,87.110001,90.769997,87.110001,89.279999,3989900,82.971506,LB,177.1386378,46.0213252964444 
42255,65.82,66.790001,65.739998,66.769997,6397600,64.544698,DD,140.6188408,46.1284286660104 
42251,30.559999,31.41,30.559999,31.4,13911700,31.4,EBAY,128.3615396,46.6328167692573 
42250,64.279999,66.199997,64.279999,66.110001,6612700,63.906699,DD,111.3219234,47.1501954595785 
42255,173.699997,177.410004,173.699997,177.279999,7107100,177.279999,BRK-B,103.1589082,48.0697637559109 
42251,30.309999,30.860001,30.27,30.68,17892900,30.68,EBAY,100.6122268,48.3165158150696 
42250,29.809999,30.559999,29.75,30.49,20272000,30.49,EBAY,94.75403852,49.066388420196 
42255,84.68,86.010002,83.32,85.730003,3411000,79.672352,LB,88.39444803,50.0061610393543 
42251,68.629997,70.099998,68.470001,69.910004,4018100,69.910004,AKAM,84.82357186,50.7093832981117 
42250,28.870001,30.309999,28.790001,29.93,44959100,29.93,EBAY,80.94104725,51.6730513843059 
42255,49.02,49.240002,47,47.650002,14153200,47.461114,DAL,78.71521075,51.6915087811999 
42251,70.360001,74.75,70.360001,71.75,3296300,71.75,EVHC,78.54129955,51.9876960547054 

CSV數據我想補充另一列到數據幀,它計算在其中,即某一天INDICATOR1的percentilescore針對特定日期的不同股票代碼的所有值。

任何人都可以請幫我在Python中需要的代碼?我是python中的新成員。

回答

1

IIUC: 使用rank方法。

print(df) 
    Date  Open  High   Low  Close Volume Adj Close Ticker Indicator1 Indicator2 
0 42255 91.760002 92.790001 90.400002 92.720001 3085500 86.168440  LB 302.911961 45.621096 
1 42251 88.550003 90.860001 88.000000 90.379997 3230200 83.993779  LB 211.511385 45.767572 
2 42250 87.110001 90.769997 87.110001 89.279999 3989900 82.971506  LB 177.138638 46.021325 
3 42255 65.820000 66.790001 65.739998 66.769997 6397600 64.544698  DD 140.618841 46.128429 
4 42251 30.559999 31.410000 30.559999 31.400000 13911700 31.400000 EBAY 128.361540 46.632817 
5 42250 64.279999 66.199997 64.279999 66.110001 6612700 63.906699  DD 111.321923 47.150195 
6 42255 173.699997 177.410004 173.699997 177.279999 7107100 177.279999 BRK-B 103.158908 48.069764 
7 42251 30.309999 30.860001 30.270000 30.680000 17892900 30.680000 EBAY 100.612227 48.316516 
8 42250 29.809999 30.559999 29.750000 30.490000 20272000 30.490000 EBAY 94.754039 49.066388 
9 42255 84.680000 86.010002 83.320000 85.730003 3411000 79.672352  LB 88.394448 50.006161 
10 42251 68.629997 70.099998 68.470001 69.910004 4018100 69.910004 AKAM 84.823572 50.709383 
11 42250 28.870001 30.309999 28.790001 29.930000 44959100 29.930000 EBAY 80.941047 51.673051 
12 42255 49.020000 49.240002 47.000000 47.650002 14153200 47.461114 DAL 78.715211 51.691509 
13 42251 70.360001 74.750000 70.360001 71.750000 3296300 71.750000 EVHC 78.541300 51.987696 


df['Indicator1_percentile'] = df.Indicator1.rank(pct=True) 

print(df['Indicator1_percentile'] 
0  1.000000 
1  0.928571 
2  0.857143 
3  0.785714 
4  0.714286 
5  0.642857 
6  0.571429 
7  0.500000 
8  0.428571 
9  0.357143 
10 0.285714 
11 0.214286 
12 0.142857 
13 0.071429 
Name: Indicator1, dtype: float64 
+0

我想按日期(第1列或)基團,然後再計算出的指標。下面的代碼給出錯誤TypeError:'DataFrameGroupBy'對象不支持項目分配 df2 = df1.groupby('Date') df2 ['Indicator1_pct'] = df2.Indicator1.rank(pct = True) print(df2) – vj80

+0

我會建議提交,作爲一個新的問題。這實際上很複雜。因爲'DataFrameGroupBy'對象不打算以這種方式工作。 – Grr