2017-10-10 117 views
1

標題可能有點所以這裏混亂是一個例子:過濾數據幀

來源:

id |  timestamp 
1 | 2015-12-02 00:00:00 
1 | 2015-12-03 00:00:00 <--- latest for id 1 
2 | 2015-12-02 00:00:00 
2 | 2015-12-04 00:00:00 
2 | 2015-12-06 00:00:00 <--- latest for id 2 

要這樣:

id |  timestamp 
1 | 2015-12-03 00:00:00 
2 | 2015-12-06 00:00:00 
+1

'df.groupby( 'ID')。尾部(1)'? – jezrael

回答

1

使用nth

In [599]: df.groupby('id', as_index=False).nth(-1) 
Out[599]: 
    id   timestamp 
1 1 2015-12-03 00:00:00 
4 2 2015-12-06 00:00:00 

理想情況下,max,因爲你需要最新的日期。

In [601]: df.groupby('id', as_index=False).max() 
Out[601]: 
    id   timestamp 
0 1 2015-12-03 00:00:00 
1 2 2015-12-06 00:00:00 

此外,tail如在評論中提到

In [602]: df.groupby('id').tail(1) 
Out[602]: 
    id   timestamp 
1 1 2015-12-03 00:00:00 
4 2 2015-12-06 00:00:00 
+0

max()取每列最大值,是否正確? – kbball