我想通過幾個不同的因素來總結數據集。以下是我的數據示例:按日期和組彙總數據框
household<-c("household1","household1","household1","household2","household2","household2","household3","household3","household3")
date<-c(sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 9))
value<-c(1:9)
type<-c("income","water","energy","income","water","energy","income","water","energy")
df<-data.frame(household,date,value,type)
household date value type
1 household1 1999-05-10 100 income
2 household1 1999-05-25 200 water
3 household1 1999-10-12 300 energy
4 household2 1999-02-02 400 income
5 household2 1999-08-20 500 water
6 household2 1999-02-19 600 energy
7 household3 1999-07-01 700 income
8 household3 1999-10-13 800 water
9 household3 1999-01-01 900 energy
我想按月總結數據。理想情況下,最終的數據集將有每戶12行(每月一筆)和每個支出類別(水,能源,收入)的列,該列是該月總數的總和。
我試着從添加一個帶有短日期的列開始,然後我要過濾每個類型,併爲每個事務類型的總和數據創建一個單獨的數據框。然後,我將把這些數據幀合併在一起以得到彙總的df。我試圖使用ddply對其進行總結,但是它彙總得太多了,我無法保留家庭級別的信息。
ddply(df,.(shortdate),summarize,mean_value=mean(value))
shortdate mean_value
1 14/07 15.88235
2 14/09 5.00000
3 14/10 5.00000
4 14/11 21.81818
5 14/12 20.00000
6 15/01 10.00000
7 15/02 12.50000
8 15/04 5.00000
任何幫助將不勝感激!
是的,我只是懶惰,並沒有輸出完整的DF例 –
是的,理想情況下,我會有每行12行(除非你可以推薦更好的方式)。這匹配另一個df我從另一個來源 –