2017-08-03 85 views
1

此問題是從here的擴展。
如果我的數據有一個名爲Remark柱:保留基於其他列的觀察

ID Name Type Date   Amount Remark 
1  AAAA First 2009/7/20  100  Not want 
1  AAAA First 2010/2/3  200  want ya 
2  BBBB First 2015/3/10  250  
2  CCC  Second 2009/2/23  300  good 
2  CCC  Second 2010/1/25  400  OK Right123 
2  CCC  Third 2015/4/9  500  
2  CCC  Third 2016/6/25  700  Stackoverflow is awesome 

我想我的結果,以保持它當Date爲最大。
首先,如果我不考慮列Remark,我可以使用max()得到這個:

dt[,.(Date = max(Date), Amount = sum(Amount)), by = .(ID, Name, Type)] 
    ID Name Type  Date Amount 
1: 1 AAAA First 2010-02-03  300 
2: 2 BBBB First 2015-03-10  250 
3: 2 CCC Second 2010-01-25  700 
4: 2 CCC Third 2016-06-25 1200 

不過,我怎能備註。

ID Name Type  Date Amount  Remark 
1: 1 AAAA First 2010-02-03  300  want ya 
2: 2 BBBB First 2015-03-10  250  
3: 2 CCC Second 2010-01-25  700  OK Right123 
4: 2 CCC Third 2016-06-25 1200  Stackoverflow is awesome 

這裏是我的數據:

dt <- fread(" 
     ID Name Type Date   Amount Remark 
     1  AAAA First 2009/7/20  100  Not.want 
     1  AAAA First 2010/2/3  200  want.ya 
     2  BBBB First 2015/3/10  250  
     2  CCC  Second 2009/2/23  300  good 
     2  CCC  Second 2010/1/25  400  OK.Right123 
     2  CCC  Third 2015/4/9  500  
     2  CCC  Third 2016/6/25  700  Stackoverflow.is.awesome 
     ") 
dt$Date <- as.Date(dt$Date) 
+0

請在重現的格式提供數據。 – Frank

+0

@Frank我編輯我的問題。 –

+1

請參閱https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250我們應該能夠在新的R會話中複製粘貼代碼並查看相同的示例數據。我仍然在那裏看到非日期......另外,運行'fread'時出現錯誤。 – Frank

回答

2

我們可以用一個join

setcolorder(dt[, setdiff(names(dt), "Amount"), with = FALSE][dt[, .(Date = max(Date), 
       Amount = sum(Amount)), 
     by = .(ID, Name, Type)], on = .(ID, Name, Type, Date)], names(dt))[] 
# ID Name Type  Date Amount     Remark 
#1: 1 AAAA First 2010-02-03 300     want ya 
#2: 2 BBBB First 2015-03-10 250       
#3: 2 CCC Second 2010-01-25 700    OK Right123 
#4: 2 CCC Third 2016-06-25 1200 Stackoverflow is awesome 

或不加入

dt1 <- dt[, c(Amount = sum(.SD[["Amount"]]), .SD[which.max(Date), 
    setdiff(names(.SD), "Amount"), with = FALSE]), .(ID, Name, Type)] 

setcolorder(dt1, names(dt)) 
dt1 
# ID Name Type  Date Amount     Remark 
#1: 1 AAAA First 2010-02-03 300     want ya 
#2: 2 BBBB First 2015-03-10 250       
#3: 2 CCC Second 2010-01-25 700    OK Right123 
#4: 2 CCC Third 2016-06-25 1200 Stackoverflow is awesome 

如果有更多數量的「金額」欄的是sum MED

nm1 <- grep("Amount\\d*", names(dt), value = TRUE) 
setcolorder(dt[, setdiff(names(dt), nm1), with = FALSE][dt[, c(Date= max(Date), 
     lapply(.SD, sum)), by = .(ID, Name, Type), .SDcols = nm1], 
     on = .(ID, Name, Type, Date)], names(dt))[] 
+1

如果我有超過3列需要總結('Amount','Amount1','Amount2'),我該怎麼辦? –

+2

@PeterChen在這種情況下,使用'dt [,c(日期=最大(日期), lapply(.SD,sum)), by =。(ID,Name,Type),.SDcols = AmountCols]'within第一個解決方案的第二個鏈,並使用'setdiff' – akrun

1
> df 
    ID Name Type  Date Amount     Remark 
1: 1 AAAA First 03-02-2010 200     want ya 
2: 2 CCC Third 09-04-2015 500       
3: 2 BBBB First 10-03-2015 250       
4: 1 AAAA First 20-07-2009 100     Not want 
5: 2 CCC Second 23-02-2009 300      good 
6: 2 CCC Second 25-01-2010 400    OK Right123 
7: 2 CCC Third 25-06-2016 700 Stackoverflow is awesome 

> df2=df[,.(Date = max(Date), Amount = sum(Amount)), by = .(ID, Name, Type)] 
> df2 
    ID Name Type  Date Amount 
1: 2 BBBB First 10-03-2015 250 
2: 1 AAAA First 20-07-2009 300 
3: 2 CCC Second 25-01-2010 700 
4: 2 CCC Third 25-06-2016 1200 


> df[df2,] 
    ID Name Type  Date Amount     Remark i.ID i.Name i.Type i.Amount 
1: 2 BBBB First 10-03-2015 250        2 BBBB First  250 
2: 1 AAAA First 20-07-2009 100     Not want 1 AAAA First  300 
3: 2 CCC Second 25-01-2010 400    OK Right123 2 CCC Second  700 
4: 2 CCC Third 25-06-2016 700 Stackoverflow is awesome 2 CCC Third  1200 


> df3=df[df2,c("ID","Name","Type","Date","Remark","i.Amount")] 
> df3 
    ID Name Type  Date     Remark i.Amount 
1: 2 BBBB First 10-03-2015        250 
2: 1 AAAA First 20-07-2009     Not want  300 
3: 2 CCC Second 25-01-2010    OK Right123  700 
4: 2 CCC Third 25-06-2016 Stackoverflow is awesome  1200 
+1

對'Amount'列進行更改,您的答案有一些問題。不正確。但方式是對的。 –