2017-09-06 139 views
1

我有這樣的數據一列的一系列對象:熊貓:根據另一列

     end station name User Type 
0     Carmine St & 6 Ave Subscriber 
1   South End Ave & Liberty St Subscriber 
2  Christopher St & Greenwich St Subscriber 
3    Lafayette St & Jersey St Subscriber 
4      W 52 St & 11 Ave Subscriber 
5    E 53 St & Lexington Ave Subscriber 
6      W 17 St & 8 Ave Subscriber 
7     St Marks Pl & 2 Ave Subscriber 
8  Washington St & Gansevoort St Customer 
9    Barclay St & Church St Subscriber 
10  Washington St & Gansevoort St Customer 
11    E 37 St & Lexington Ave Subscriber 
12      E 51 St & 1 Ave Subscriber 
13      W 33 St & 7 Ave Subscriber 
14     Pike St & Monroe St Subscriber 
15    E 24 St & Park Ave S Subscriber 
16      1 Ave & E 15 St Subscriber 
17     Broadway & W 32 St Customer 
18      E 39 St & 3 Ave Customer 
19     W 59 St & 10 Ave Subscriber 
20    Centre St & Chambers St Subscriber 
21      9 Ave & W 45 St Customer 
22      8 Ave & W 33 St Subscriber 
23    Suffolk St & Stanton St Subscriber 
24     W 47 St & 10 Ave Subscriber 
25      W 33 St & 7 Ave Subscriber 
26      8 Ave & W 33 St Subscriber 
27      1 Ave & E 15 St Customer 
28      8 Ave & W 33 St Subscriber 
29      W 33 St & 7 Ave Subscriber 
...        ...   ... 

我想找到五(5)最受歡迎的電臺爲客戶降序人氣順序

這裏是我的代碼:

import pandas as pd 
rides = pd.read_csv(csv_file_path, low_memory=False, parse_dates=True) 
five_popular_station_end_trip = rides['end station name'].value_counts().head() 

我能找到從一列最受歡迎的電臺,但我不知道如何根據另一列找到它的主意。

+0

[從在熊貓一列基於值的數據框中選擇行]的可能的複製(https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas) –

+0

我的解決方案如何工作? – jezrael

回答

0

我想你需要通過boolean indexing先過濾:

df1 = rides[rides['User Type'] == 'Customer'] 
five_popular_station_end_trip = df1['end station name'].value_counts().head() 
print (five_popular_station_end_trip) 
Washington St & Gansevoort St 2 
Broadway & W 32 St    1 
1 Ave & E 15 St     1 
E 39 St & 3 Ave     1 
9 Ave & W 45 St     1 
Name: end station name, dtype: int64 

但如果需要的所有類別:

df = rides.groupby('User Type')['end station name'] \ 
      .apply(lambda x: x.value_counts().head()) \ 
      .reset_index(name='count') \ 
      .rename(columns={'level_1':'end station name'}) 
print (df) 
    User Type    end station name count 
0 Customer Washington St & Gansevoort St  2 
1 Customer    Broadway & W 32 St  1 
2 Customer    1 Ave & E 15 St  1 
3 Customer    E 39 St & 3 Ave  1 
4 Customer    9 Ave & W 45 St  1 
5 Subscriber    8 Ave & W 33 St  3 
6 Subscriber    W 33 St & 7 Ave  3 
7 Subscriber    W 59 St & 10 Ave  1 
8 Subscriber   E 24 St & Park Ave S  1 
9 Subscriber    W 17 St & 8 Ave  1