如何獲得基於兩列列的總和中的R

我有5列的數據幀（df）：Area.Name，Age，Total，Rural和Urban。我需要根據Area.Name得到Total的總和，然後根據Age：0-2和3-4得到兩個類別的總和。如何獲得基於兩列列的總和中的R

df <- 
structure(list(Area.Name = structure(c(6L, 6L, 6L, 6L, 6L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("District - Central (06)", "District - East (04)", 
"District - New Delhi (05)", "District - North (02)", "District - North East (03)", 
"District - North West (01)", "District - South (09)", "District - South West (08)", 
"District - West (07)", "NCT OF DELHI (07)"), class = "factor"), 
    Age = c(0L, 1L, 2L, 3L, 4L, 0L, 1L, 2L, 3L, 4L, 5L), Total = c(56131L, 
    58644L, 63835L, 63859L, 64945L, 24556L, 27076L, 27234L, 27604L, 
    27725L, 30780L), Rural = c(3589L, 3757L, 4200L, 4102L, 4223L, 
    52L, 56L, 61L, 47L, 67L, 53L), Urban = c(52542L, 54887L, 
    59635L, 59757L, 60722L, 24504L, 27020L, 27173L, 27557L, 27658L, 
    30727L)), .Names = c("Area.Name", "Age", "Total", "Rural", 
"Urban"), row.names = c(102L, 103L, 104L, 105L, 106L, 405L, 406L, 
407L, 408L, 409L, 410L), class = "data.frame")

我的預期輸出是：

Area.Name     Age Total 
District - North West (01) 0-2 178610 
District - North West (01) 3-4 128804 
District - East (04)   0-2 78866 
District - East (04)   3-4 55329

我嘗試使用dplyr包，但我不這麼好這個精通，所以那種堅持在這裏：

df %>% group_by(Area.Name) %>% summarize(Age = Age[0],Tot = sum(Total))

的問題在這裏爲Age我不能給出一個範圍。

來源

2016-07-05 rar

我在嘗試'df％>％group_by（Area.Name）％>％summarize（Age = Age [0]，Tot = sum（Total））'但是問題是，在這裏對於年齡我無法給出一個範圍。 – rar

此處，我cut()的Age內嵌在group_by功能的一種方法：

library(dplyr) 

df %>% 
    group_by(Area.Name, Age = cut(Age, breaks = c(0, 2, 4, +Inf), 
           labels = c("0-2", "3-4", "4+"), include.lowest = TRUE)) %>% 
    summarise(Total = sum(Total)) 

#     Area.Name Age Total 
#      <fctr> <fctr> <int> 
# 1  District - East (04) 0-2 78866 
# 2  District - East (04) 3-4 55329 
# 3  District - East (04)  4+ 30780 
# 4 District - North West (01) 0-2 178610 
# 5 District - North West (01) 3-4 128804

爲了只獲取所需的組，您可以添加%>% filter(Age %in% c("0-2", "3-4"))。

來源

2016-07-05 14:34:43 JasonAizkalns

下面是使用cutaggregate和在基礎R的方法：

df$ageCat <- cut(df$Age, breaks=c(0, 2, max(df$Age)), include.lowest = T) 
aggregate(Total~Area.Name+ageCat, data=df, sum) 
        Area.Name ageCat Total 
1  District - East (04) [0,2] 78866 
2 District - North West (01) [0,2] 178610 
3  District - East (04) (2,5] 86109 
4 District - North West (01) (2,5] 128804

cut斷年齡變量成所需類別。然後將data.frame聚合到所需的變量上。

來源

2016-07-05 14:33:52 lmo

對於團隊建設，您還可以使用'.bincode'像'df $ group < - .bincode（df $ Age，c（0，2，4），include.lowest = T）' – Jimbou

Thanks @Jimbou。我以前沒見過'.bincode'。 – lmo

如何獲得基於兩列列的總和中的R

回答

相關問題