2017-01-10 138 views
2

我有一個數據幀,看起來像:R聚合基於多個列,然後合併到數據框中?

id<-c(1,1,1,3,3) 
date1<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08") 
type<-c("A","B","A","B","B") 
df<-data.frame(id,date,type) 
df$date<-as.Date(as.character(df$date), format = "%d-%m-%y") 

我想是添加包含每個ID爲每種類型的最早日期的新列。這第一次嘗試正常工作,並基於唯一標識進行聚合和合並。

d = aggregate(df$date, by=list(df$id), min) 
df2 = merge(df, d, by.x="id", by.y="Group.1") 

我想,雖然是也是類型進行篩選,並得到這樣的結果:

data.frame(df2, desired=c("2007-11-30","2007-11-01", "2007-11-30","2007-12-17","2007-12-17")) 

我已經嘗試了很多的可能性。我真的認爲這可以用列表來完成,但我在一個損失如何?

d = aggregate(df$date, by=list(df$id, df$type), min) 

# And merge the result of aggregate with the original data frame 
df2 = merge(df,d,by.x=list("id","type"),by.y=list("Group.1","Group.2")) 

對於這個簡單的例子,我可以只是類型分成自己的DF,建立新的列,然後結合由此產生的2 dfs,但實際上有很多類型和第三列也必須過濾類似,這將不實際...

謝謝!

+0

你有date1'和'date'之間'一個錯字錯配'@thelatemail你說得對df' – thelatemail

+0

。我走了一圈,讓這個日期列... – Soran

回答

2

我們可以使用data.table。將'data.frame'轉換爲'data.table'(setDT(df)),按'id','type'(或'id'),order'date'和assign(:=)'date '作爲'最早的'專欄。

library(data.table) 
setDT(df)[order(date), earliestdateid := date[1], by = id 
    ][order(date), earliestdateidtype := date[1], by = .(id, type)] 
df 
# id  date type earliestdateid earliestdateidtype 
#1: 1 2008-01-23 A  2007-11-01   2007-11-30 
#2: 1 2007-11-01 B  2007-11-01   2007-11-01 
#3: 1 2007-11-30 A  2007-11-01   2007-11-30 
#4: 3 2007-12-17 B  2007-12-17   2007-12-17 
#5: 3 2008-12-12 B  2007-12-17   2007-12-17 

dplyr類似的方法是

library(dplyr) 
df %>% 
    group_by(id) %>% 
    arrange(date) %>% 
    mutate(earliestdateid = first(date)) %>% 
    group_by(type, add = TRUE) %>% 
    mutate(earliestdateidtype = first(date)) 

注意:這避免分兩步這樣即得到總的輸出,然後加入

+1

哇這就是爲什麼我喜歡R.複雜的一堆行動照顧在1行。我認爲2行很棒。如果我碰到類似的東西,但是在數字列而不是日期上,我是否只是將order(date)更改爲數字(或數字),或者對於data.table方式的某種效果? – Soran

+1

@Soran如果你只是想要'mean(numbers)',那麼不需要'order',即'setDT(df)[,Mean:= mean(numbers),。(id,type)]' – akrun

2

你可以得到兩個最小值由不同組別用ave代替:

df$minid <- with(df, ave(date, id, FUN=min, drop=TRUE)) 
df$minidtype <- with(df, ave(date, list(id,type), FUN=min, drop=TRUE)) 
df 

# id  date type  minid minidtype 
#1 1 2008-01-23 A 2007-11-01 2007-11-30 
#2 1 2007-11-01 B 2007-11-01 2007-11-01 
#3 1 2007-11-30 A 2007-11-01 2007-11-30 
#4 3 2007-12-17 B 2007-12-17 2007-12-17 
#5 3 2008-12-12 B 2007-12-17 2007-12-17 

如果你是棘手的,你可以做到這一切在一個電話也:製作時

df[c("minid", "minidtype")] <- lapply(list("id", c("id","type")), 
            FUN=function(x) ave(df$date, df[x], FUN=min, drop=TRUE))