2017-09-24 101 views
2
select df.id, count(distinct airports) as num 
from df 
group by df.id 
having count(distinct airports) > 3 

我想在Python熊貓做上述的等價物。我試過filter,nunique,agg的不同組合,並且沒有任何工作。有什麼建議?小組由,在熊貓有

例如: DF

df 
id  airport 
1  lax 
1  ohare 
2  phl 
3  lax 
2  mdw 
2  lax 
2  sfw 
2  tpe 

所以我希望得到的結果是:

id  num 
2  5 

回答

1

您可以使用SeriesGroupBy.nuniqueboolean indexingquery

s = df.groupby('id')['airport'].nunique() 
print (s) 
id 
1 2 
2 5 
3 1 
Name: airport, dtype: int64 

df1 = s[s > 3].reset_index() 
print (df1) 
    id airport 
0 2  5 

或者:

df1 = df.groupby('id')['airport'].nunique().reset_index().query('airport > 3') 
print (df1) 
    id airport 
1 2  5 
0

使用GROUPBY和計數:

df_new = df.groupby('id').count() 

濾波器:

df_new = df_new[(df_new['airport'] > 3)]