2016-06-01 95 views
0

我想操縱一個熊貓df,這樣我就可以計算出一列中某個數據點出現的頻率,在另一列發生特定事件之後。下面的僞代碼可能總結得最好。任何幫助將非常感激!熊貓列操縱

import datetime 
import time 
import pandas as pd 

# Set number of rows to skip 
rows_to_skip = 0 
# Rows to use after skipped rows 
rows_to_use = 10000 

# Read the file (Adjust arguments accordingly) 
data = pd.read_csv('example.csv',skiprows=rows_to_skip, error_bad_lines=False, nrows=rows_to_use, low_memory=False) 

# Add headers when skipping rows 
data.columns = ["X","Y","Z"] 

# Psuedo Code Below 

for variable in data['X']: 
    if variable > 0: 
     # Count number of times the following conditions are met in all subsequent rows: 
     condition 1) Y > 0 
     condition 2) Z <= Z of the row where variable was > 0 

     # Then I want to add the total count to a new column, and have it in the same row as X when the "variable" > 0. 

任何幫助?

回答

1

假設你想爲每一個實例,其中X>0之間的案件數,而不是計數爲整個DataFrame其餘各X>0後:

您可以創建一個新的column,指示該X>0條件結果爲True,.fillna(method='ffill').groupby()。那麼您只需要.apply()即可獲得grouplen(),其他條件爲True

一些樣本數據:

df = pd.DataFrame(data=np.random.randint(-10, 10, size=(100, 3)), columns=list('XYZ')) 

    X Y Z 
0 -3 6 -7 
1 -4 -10 -1 
2 9 -10 -9 
3 5 0 -8 
4 -2 1 -8 

步驟如下:

df['condition'] = df.index.to_series().where(df.X > 0).fillna(method='ffill') 
df['count'] = df.groupby('condition').apply(lambda x: len(x[(x.Y>0) & (x.Z > x.Z.iloc[0])])) 

獲得:

X Y Z condition count 
0 -3 6 -7  NaN NaN 
1 -4 -10 -1  NaN NaN 
2 9 -10 -9  2.0 0.0 
3 5 0 -8  3.0 0.0 
4 -2 1 -8  3.0 NaN 
5 6 -6 -3  5.0 1.0 
6 0 6 3  5.0 NaN 
7 -6 -7 -6  5.0 NaN 
8 7 -2 -5  8.0 0.0 
9 0 -1 5  8.0 NaN 
10 5 8 -3  10.0 0.0 
11 -2 -2 1  10.0 NaN 
12 3 4 2  12.0 1.0 
13 -5 1 -9  12.0 NaN 
14 -7 2 6  12.0 NaN 
15 1 -10 6  15.0 0.0 
16 1 -8 6  16.0 0.0 
17 -4 -9 -8  16.0 NaN 
18 -9 4 6  16.0 NaN 
19 5 -6 2  19.0 0.0 
20 5 7 -1  20.0 0.0 
21 2 -2 -3  21.0 0.0 
22 -6 -10 -2  21.0 NaN 
23 -7 -9 3  21.0 NaN 
24 -8 7 -8  21.0 NaN 
25 3 -3 6  25.0 0.0 
26 1 -6 -3  26.0 1.0 
27 -4 6 -1  26.0 NaN 
28 6 -4 9  28.0 0.0 
29 -8 2 1  28.0 NaN 
.. .. .. ..  ... ... 
70 -5 7 -6  68.0 NaN 
71 6 6 -7  71.0 1.0 
72 -3 0 3  71.0 NaN 
73 -5 3 2  71.0 NaN 
74 -6 -8 8  71.0 NaN 
75 1 0 -4  75.0 0.0 
76 7 -9 -5  76.0 0.0 
77 1 0 -1  77.0 0.0 
78 5 9 -2  78.0 0.0 
79 -8 -9 -6  78.0 NaN 
80 2 -3 3  80.0 3.0 
81 -7 -5 8  80.0 NaN 
82 -4 -5 -7  80.0 NaN 
83 -3 5 -6  80.0 NaN 
84 -5 1 4  80.0 NaN 
85 -1 6 7  80.0 NaN 
86 -7 4 4  80.0 NaN 
87 -7 -4 -1  80.0 NaN 
88 -2 -8 2  80.0 NaN 
89 4 6 4  89.0 0.0 
90 4 -10 -8  90.0 0.0 
91 -7 -9 5  90.0 NaN 
92 5 3 -1  92.0 0.0 
93 6 6 6  93.0 0.0 
94 9 -2 0  94.0 1.0 
95 -1 5 5  94.0 NaN 
96 2 8 -9  96.0 2.0 
97 -6 7 -4  96.0 NaN 
98 -1 7 -8  96.0 NaN 
99 -4 0 -1  96.0 NaN 
+0

斯特凡,謝謝。很有幫助。但是我試圖計算在X> 0的實例之後但在X> 0的下一行之前兩個條件同時滿足的次數(data.Y> 0和data.Z <= row.Z)。再次感謝! –

+0

這就是我認爲的第一個例子。 – Stefan

+0

我已經更新,只留下解決您的問題的第一個例子。請讓我知道,如果你有任何問題。 – Stefan