2017-10-20 89 views
0

我正在使用R中的寶寶姓名數據進行練習。max([column])其中名稱=(每個姓名在名稱列中)每年在R

total_n <-babynames %>% 
    mutate(name_gender = paste(name,sex))%>% 
    group_by(year) %>% 
    summarise(total_n = sum(n, na.rm=TRUE)) %>% 
    arrange(total_n) 

bn <- inner_join(babynames,total_n,by = "year") 

df <- bn%>% 
    mutate(pct_of_names = n/total_n)%>% 
    group_by(name, year)%>% 
    summarise(pct =sum(pct_of_names)) 

數據幀輸出是這樣的:

enter image description here

對於每一個名字,還有這些年來,和當年相關的PCT。我堅持要爲每個名稱獲得最高的年份。我該怎麼做呢?

回答

2

很簡單,一旦您知道babynames數據來自何處。你所有的一切需要:

library(dplyr) 
library(babynames) 

total_n <-babynames %>% 
    mutate(name_gender = paste(name,sex))%>% 
    group_by(year) %>% 
    summarise(total_n = sum(n, na.rm=TRUE)) %>% 
    arrange(total_n) 

bn <- inner_join(babynames,total_n,by = "year") 

df <- bn%>% 
    mutate(pct_of_names = n/total_n)%>% 
    group_by(name, year)%>% 
    summarise(pct =sum(pct_of_names)) 

你失蹤這最後一步:

df %>% 
    group_by(name) %>% 
    filter(pct == max(pct)) 

# A tibble: 95,025 x 3 
# Groups: name [95,025] 
     name year   pct 
     <chr> <dbl>  <dbl> 
1  Aaban 2014 4.338256e-06 
2  Aabha 2014 2.440269e-06 
3  Aabid 2003 1.316094e-06 
4 Aabriella 2015 1.363073e-06 
5  Aada 2015 1.363073e-06 
6  Aadam 2015 5.997520e-06 
7  Aadan 2009 6.031433e-06 
8 Aadarsh 2014 4.880538e-06 
9  Aaden 2009 3.335645e-04 
10 Aadesh 2011 1.370356e-06 
# ... with 95,015 more row 

group_byfilter是你的朋友。

+0

omg,不敢相信就這麼簡單,我正在考慮循環。謝謝! –

+0

也隨時接受答案! – Steven