2016-12-15 66 views
1

我有以下其中包含了美國的人口數據數據幀census_df`nth`打破了大熊貓的分類數據幀

  STNAME    CTYNAME CENSUS2010POP 
0  Alabama  Autauga County   54571 
1  Alabama  Baldwin County   182265 
2  Alabama  Barbour County   27457 
3  Alabama   Bibb County   22915 
4  Alabama  Blount County   57322 
5  Alabama  Bullock County   10914 
6  Alabama  Butler County   20947 
7  Alabama  Calhoun County   118572 
8  Alabama  Chambers County   34215 
9  Alabama  Cherokee County   25989 
10  Alabama  Chilton County   43643 
11  Alabama  Choctaw County   13859 
12  Alabama  Clarke County   25833 
13  Alabama   Clay County   13932 
14  Alabama  Cleburne County   14972 
15  Alabama  Coffee County   49948 
16  Alabama  Colbert County   54428 
17  Alabama  Conecuh County   13228 
18  Alabama  Coosa County   11539 
19  Alabama Covington County   37765 
20  Alabama  Crenshaw County   13906 
21  Alabama  Cullman County   80406 
22  Alabama   Dale County   50251 
23  Alabama  Dallas County   43820 
24  Alabama  DeKalb County   71109 
25  Alabama  Elmore County   79303 
26  Alabama  Escambia County   38319 
27  Alabama  Etowah County   104430 
28  Alabama  Fayette County   17241 
29  Alabama  Franklin County   31704 
...   ...     ...   ... 
3112 Wisconsin  Washburn County   15911 
3113 Wisconsin Washington County   131887 
3114 Wisconsin  Waukesha County   389891 
3115 Wisconsin  Waupaca County   52410 
3116 Wisconsin  Waushara County   24496 
3117 Wisconsin Winnebago County   166994 
3118 Wisconsin   Wood County   74749 
3119 Wyoming  Albany County   36299 
3120 Wyoming  Big Horn County   11668 
3121 Wyoming  Campbell County   46133 
3122 Wyoming  Carbon County   15885 
3123 Wyoming  Converse County   13833 
3124 Wyoming  Crook County   7083 
3125 Wyoming  Fremont County   4
3126 Wyoming  Goshen County   13249 
3127 Wyoming Hot Springs County   4812 
3128 Wyoming  Johnson County   8569 
3129 Wyoming  Laramie County   91738 
3130 Wyoming  Lincoln County   18106 
3131 Wyoming  Natrona County   75450 
3132 Wyoming  Niobrara County   2484 
3133 Wyoming   Park County   28205 
3134 Wyoming  Platte County   8667 
3135 Wyoming  Sheridan County   29116 
3136 Wyoming  Sublette County   10247 
3137 Wyoming Sweetwater County   43806 
3138 Wyoming  Teton County   21294 
3139 Wyoming  Uinta County   21118 
3140 Wyoming  Washakie County   8533 
3141 Wyoming  Weston County   7208 

[3142 rows x 3 columns] 

列代表國家的名字,一個縣,人口的名字。現在,我試圖找出每個州的三個人口最多的縣,然後我想總結他們的人口數量,這樣我就可以得到每個州的數字。爲了讓人口最多的縣在每個國家,我試過以下:

'''Sort all the counties according to their population''' 
census_df = census_df.sort_values(by = 'CENSUS2010POP', ascending = False).reset_index(drop = True) 

'''Group counties according to their states and choose first 3 members from each state''' 
group = census_df.groupby('STNAME').nth([0, 1, 2]) 
print(group.tail()) 

這給了我下面的(我只顯示最後幾個值):

  CENSUS2010POP   CTYNAME 
STNAME         
Wisconsin   488073  Dane County 
Wisconsin   389891 Waukesha County 
Wyoming   91738 Laramie County 
Wyoming   46133 Campbell County 
Wyoming   75450 Natrona County 

正如你所看到的,對於最後狀態Wyoming,根據人口的狀態的排序已經被使用nth後被打擾。這發生在許多其他州。有人可以告訴我發生了什麼,如何在選擇前三種情況時保持排序後的值?

回答

1

我相信你想做的事:

group = census_df.groupby('STNAME').head(3) 

這將每組返回第3行。

要得到總和每個狀態,只需運行與你團的總和aggregate功能的groupby

summed = group.groupby('STNAME').aggregate(sum) 
+0

這會破壞'groupby'。它給了我來自每個州的3個結果,但所有州的前三個縣都混雜在一起。 – Peaceful

+0

在你的排序中,運行'census_df = census_df.sort_values(by = ['STNAME','CENSUS2010POP'],ascending = False).reset_index(drop = True)'。然後運行'group = census_df.groupby('STNAME')。head(3)' –

+0

是的,這確實奏效!現在,我如何總結這三個最重要的值併爲每個狀態獲得單個值? – Peaceful

3

您可以使用groupbySeriesGroupBy.nlargest什麼是更快.sort_values(ascending=False).head(n)

print (census_df.set_index('CTYNAME') 
       .groupby('STNAME')['CENSUS2010POP'] 
       .nlargest(3) 
       .sort_index(ascending=False) 
       .reset_index()) 

     STNAME   CTYNAME CENSUS2010POP 
0 Wyoming  Natrona County   75450 
1 Wyoming  Laramie County   91738 
2 Wyoming Campbell County   46133 
3 Wisconsin Winnebago County   166994 
4 Wisconsin Waukesha County   389891 
5 Wisconsin Washington County   131887 
6 Alabama  Etowah County   104430 
7 Alabama  Calhoun County   118572 
8 Alabama  Baldwin County   182265 

總和3頂級數值:

print (census_df.set_index('CTYNAME') 
       .groupby('STNAME')['CENSUS2010POP'] 
       .apply(lambda x: x.nlargest(3).sum()) 
       .sort_index(ascending=False) 
       .reset_index()) 

     STNAME CENSUS2010POP 
0 Wyoming   213321 
1 Wisconsin   688772 
2 Alabama   405267 
+0

我爲3個最高值的總和添加了答案,您可以檢查它。謝謝。 – jezrael

+0

這很有幫助。感謝+1 – Peaceful