2016-11-30 48 views
1
FAMILY<- c('FAMILYA', 'FAMILYA', 'FAMILYA', 'FAMILYA', 'FAMILYA', 'FAMILYB', 'FAMILYB', 'FAMILYB', 'FAMILYB', 'FAMILYB', 'FAMILYC', 'FAMILYC', 'FAMILYC', 'FAMILYC', 'FAMILYC') 

CHILDREN<-c('JAKE', 'PETE', 'JASON', 'KEVIN', 'ALFRED','DALE', 'STEVE', 'MELISSA', 'DAN', 'THOMAS', 'CAIT', 'BRANDON', 'DEAN', 'ADAM', 'KELSEY') 

CHANGE<-c(1000, -1000, 2000, 3000, 5000, 100, 300, 1234, -1022, -1111, -1112, 1000, 1002, 2131, 1231) 

df1<-data.frame(FAMILY, CHILDREN, CHANGE) 

df1 

    FAMILY CHILDREN CHANGE 
1 FAMILYA  JAKE 1000 
2 FAMILYA  PETE -1000 
3 FAMILYA JASON 2000 
4 FAMILYA KEVIN 3000 
5 FAMILYA ALFRED 5000 
6 FAMILYB  DALE 100 
7 FAMILYB STEVE 300 
8 FAMILYB MELISSA 1234 
9 FAMILYB  DAN -1022 
10 FAMILYB THOMAS -1111 
11 FAMILYC  CAIT -1112 
12 FAMILYC BRANDON 1000 
13 FAMILYC  DEAN 1002 
14 FAMILYC  ADAM 2131 
15 FAMILYC KELSEY 1231 

我想將此數據框轉換爲有4個新的額外列:前兩個顯示1)最大值子項,2)第2個最大值子項和最後兩個列顯示3個)最小值兒童,4)第2小值兒童。將數據幀整形爲前2個值

我還希望它旁邊的變化是各自的孩子。

最終格式應該是這樣的:

FAMILY TOTAL CHANGE  INCREASE #1  INCREASE #2  DECREASE #1  DECREASE #2 
FAMILYA   10000  ALFRED: 5000  KEVIN: 3000  PETE: -1000  JAKE: 1000 
FAMILYB   -499  MELISSA: 1234  STEVE: 300  THOMAS: -1111  DAN: -1022 
FAMILYC   4252  ADAM: 2131  KELSEY: 1231  CAIT: -1112 BRANDON: 1000 

如果你認爲這將是更容易地在一個單獨的列各子項的值旁邊的作品太多,但,這是我需要幫助的概念執行。

任何幫助將是偉大的,謝謝!

回答

2
library(dplyr) 
library(tidyr) 

# below function helps to get the second max or second min 
myfun <- function(x, y) { 
    u <- unique(x) 
    u <- sort(u, decreasing = TRUE) 
    if(y<0) 
    u[length(x)-1] 
    else 
    u[y] 
} 

df2 <- df1 %>% group_by(FAMILY) %>% 
     summarise(a1=CHILDREN[which(CHANGE == max(CHANGE))] , a2 = max(CHANGE), 
       b2 = myfun(CHANGE, 2)   , b1=CHILDREN[which(CHANGE == b2)] , 
       c1=CHILDREN[which(CHANGE == min(CHANGE))] , c2 = min(CHANGE), 
       d2 = myfun(CHANGE,-2)   , d1=CHILDREN[which(CHANGE == d2)]) 
#df2 
# FAMILY  a1 a2  b1 b2  c1 c2  d1 d2 
# <fctr> <fctr> <dbl> <fctr> <dbl> <fctr> <dbl> <fctr> <dbl> 
#1 FAMILYA ALFRED 5000 3000 KEVIN PETE -1000 1000 JAKE 
#2 FAMILYB MELISSA 1234 300 STEVE THOMAS -1111 -1022  DAN 
#3 FAMILYC ADAM 2131 1231 KELSEY CAIT -1112 1000 BRANDON 

# little clumpsy here... would like if someone could suggest a better way of uniting efficiently 
df3 <- unite(df2, "A1", 2,3,sep = ":") 
df4 <- unite(df3, "B1", 4,3,sep = ":") 
df5 <- unite(df4, "c1", 4,5,sep = ":") 
df6 <- unite(df5, "c1", 6,5,sep = ":") 

#df6 
# FAMILY   A1   B1   c1   c1 
# <fctr>  <chr>  <chr>  <chr>  <chr> 
#1 FAMILYA ALFRED:5000 KEVIN:3000 PETE:-1000 JAKE:1000 
#2 FAMILYB MELISSA:1234 STEVE:300 THOMAS:-1111 DAN:-1022 
#3 FAMILYC ADAM:2131 KELSEY:1231 CAIT:-1112 BRANDON:1000 

注:忘了補充TOTAL_CHANGE列 添加TOTAL CHANGE = sum(CHANGE)summarise()和團結添加+1()列索引

+0

感謝您的反饋,我真的很喜歡你正在使用此概念與dplyr。如果你比較上面的例子,我不認爲這些值是正確的。 –

+0

我的天啊!我怎麼錯過了!我很抱歉......修復它! –

+0

我認爲你將整個列表中的max-1和min-1,而不是導致錯誤的組。並感謝您的關注! –

1

這裏使用自定義功能和do(從dplyr)的方法將其應用給每個家庭組。自定義功能也使用dplyr

首先,自定義函數生成(排序)有序的變化。然後,它將返回總更改(總和)以及順序中的第一個和最後兩個更改。它必須作爲data.frame返回,以便與do正常工作。

myFamFunction <- function(CHILDREN, CHANGE){ 
    toOut <- 
    paste(CHILDREN, CHANGE, sep = ": ")[order(CHANGE, decreasing = TRUE)] 

    c(sum(CHANGE) 
    , head(toOut, 2) 
    , tail(toOut, 2)) %>% 
    rbind() %>% 
    data.frame(stringsAsFactors = FALSE) %>% 
    setNames(c("Total Change" 
       , "Biggest Change" 
       , "Second Biggest Change" 
       , "Second Smallest Change" 
       , "Smallest Change")) 
} 

注意,這可能會引發錯誤,如果有小於2名兒童(不過,如果有小於4,結果已經將值得懷疑)。如果你有更復雜的實際數據,告訴我們你想要發生什麼事情將允許保護這些邊緣情況。

然後,就group_by,傳遞你想進入功能列,瞧:

df1 %>% 
    group_by(FAMILY) %>% 
    do(myFamFunction(.$CHILDREN, .$CHANGE)) 

返回:

FAMILY `Total Change` `Biggest Change` `Second Biggest Change` `Second Smallest Change` `Smallest Change` 
    <fctr>   <chr>   <chr>     <chr>     <chr>    <chr> 
1 FAMILYA   10000  ALFRED: 5000    KEVIN: 3000    JAKE: 1000  PETE: -1000 
2 FAMILYB   -499 MELISSA: 1234    STEVE: 300    DAN: -1022  THOMAS: -1111 
3 FAMILYC   4252  ADAM: 2131   KELSEY: 1231   BRANDON: 1000  CAIT: -1112