刪除所有行

我有一個DF像這樣：刪除所有行

Year ID Count 
1997 1 0 
1998 2 0 
1999 3 1 
2000 4 0 
2001 5 1

和我想的1在Count的第一次出現，這將使我之前刪除的所有行：

Year ID Count 
1999 3 1 
2000 4 0 
2001 5 1

我可以刪除所有行中第一次出現這樣的AFTER：

df=df.loc[: df[(df['Count'] == 1)].index[0], :]

但我似乎無法遵循切片邏輯使其做相反的事情。

來源

2016-08-01 Stefano Potter

我會怎麼做：

df[(df.Count == 1).idxmax():]

df.Count == 1返回boolean數組。 idxmax()將識別最大值的索引。我知道最大值將是True，當有多個True時，它將返回找到的第一個的位置。這正是你想要的。順便說一下，該值爲2。最後，我將從2開始的所有內容與df[2:]進行分片。我在上面的答案中將所有內容放在一行中。

來源

2016-08-01 20:07:55 piRSquared

您可以使用cumsum()方法：

In [13]: df[(df.Count == 1).cumsum() > 0] 
Out[13]: 
    Year ID Count 
2 1999 3  1 
3 2000 4  0 
4 2001 5  1

說明：

In [14]: (df.Count == 1).cumsum() 
Out[14]: 
0 0 
1 0 
2 1 
3 1 
4 2 
Name: Count, dtype: int32

定時針對500K行DF：

In [18]: df = pd.concat([df] * 10**5, ignore_index=True) 

In [19]: df.shape 
Out[19]: (500000, 3) 

In [20]: %timeit df[(df.Count == 1).idxmax():] 
100 loops, best of 3: 3.7 ms per loop 

In [21]: %timeit df[(df.Count == 1).cumsum() > 0] 
100 loops, best of 3: 16.4 ms per loop 

In [22]: %timeit df.loc[df[(df['Count'] == 1)].index[0]:, :] 
The slowest run took 4.01 times longer than the fastest. This could mean that an intermediate result is being cached. 
100 loops, best of 3: 7.02 ms per loop

結論：@ piRSquared的idxmax()解決方案是一個明確的優勝者...

來源

2016-08-01 20:04:19 MaxU

只是片的其他方式：

如果IDX是你的指數做：

df.loc[idx:]

而不是

df.loc[:idx]

這意味着：

df.loc[df[(df['Count'] == 1)].index[0]:, :]

來源

2016-08-01 20:14:08

使用np.where：

df[np.where(df['Count']==1)[0][0]:]

計時

時序上被一個更大的版本數據幀的執行的：

df = pd.concat([df]*10**5, ignore_index=True)

結果：

%timeit df[np.where(df['Count']==1)[0][0]:] 
100 loops, best of 3: 2.74 ms per loop 

%timeit df[(df.Count == 1).idxmax():] 
100 loops, best of 3: 6.18 ms per loop 

%timeit df[(df.Count == 1).cumsum() > 0] 
10 loops, best of 3: 26.6 ms per loop 

%timeit df.loc[df[(df['Count'] == 1)].index[0]:, :] 
100 loops, best of 3: 11.2 ms per loop

來源

2016-08-01 20:24:03 root

回答

相關問題