2017-09-16 71 views
0

我有以下十個一年的每日數據:[R平均每月和每年的日常庫存數據和附加到數據集

library(lubridate) 
library(dplyr) 

head(infy_close_subset,24) 
     date INFY.NS.Close 
1 2007-01-02  568.162 
2 2007-01-03  577.838 
3 2007-01-04  571.325 
4 2007-01-05  568.763 
5 2007-01-08  551.400 
6 2007-01-09  547.525 
7 2007-01-10  541.112 
8 2007-01-11  545.750 
9 2007-01-12  555.850 
10 2007-01-15  560.737 
11 2007-01-16  555.550 
12 2007-01-17  551.362 
13 2007-01-18  556.037 
14 2007-01-19  550.588 
15 2007-01-22  563.500 
16 2007-01-23  558.787 
17 2007-01-24  558.513 
18 2007-01-25  560.250 
19 2007-01-29  561.100 
20 2007-01-31  561.825 
21 2007-02-01  567.237 
22 2007-02-02  566.388 
23 2007-02-05  567.325 
24 2007-02-06  568.237 

我想按年份和月份創造的平均新列如下:

Infy_monthlyAvg <- infy_close_subset %>% 
    group_by(yr = year(date), mon = month(date)) %>% 
    summarize(mean_close = mean(INFY.NS.Close)) 

我得到的是隻是一個燎意味着如下值:

head(Infy_monthlyAvg) 
    mean_close 
1 731.6223 

我期待添加一列mean_close追加到infy_close_subset數據框...

 date INFY.NS.Close yr mon mean_close 
     <date>   <dbl> <dbl> <dbl> 
1 2007-01-02  568.162 2007  1 731.6223 
2 2007-01-03  577.838 2007  1 731.6223 
3 2007-01-04  571.325 2007  1 731.6223 
4 2007-01-05  568.763 2007  1 731.6223 
5 2007-01-08  551.400 2007  1 731.6223 
6 2007-01-09  547.525 2007  1 731.6223 
................. 

999 2017-09-08  988.400 2007  9 921.3333 
1000 2017-09-09  977.525 2007  9 921.3333 
+0

你只得到一個結果,因爲你是按年份和月份分組,只有有效組「2017-01」。 –

+0

您是否使用lubridate包中的'year'和'month'函數?請註明它們不屬於系統庫的一部分。 –

回答

2

我會傾向於做一個時段列

df <- left_join(
    infy_close_subset %>% 
    mutate(
     period = format(date, "%Y-%m"), 
     yr = year(date), 
     mon = month(date) 
    ), 
    infy_close_subset %>% 
    mutate(period = format(date, "%Y-%m")) %>% 
    group_by(period) %>% 
    summarise(mean_close = mean(INFY.NS.Close) 
), 
    by = "period" 
) %>% 
select(-period) 

#   date INFY.NS.Close yr mon mean_close 
# 1 2007-01-02  568.162 2007 1 558.2987 
# 2 2007-01-03  577.838 2007 1 558.2987 
# 3 2007-01-04  571.325 2007 1 558.2987 
# 4 2007-01-05  568.763 2007 1 558.2987 
# 5 2007-01-08  551.400 2007 1 558.2987 
# 6 2007-01-09  547.525 2007 1 558.2987 
# 7 2007-01-10  541.112 2007 1 558.2987 
# 8 2007-01-11  545.750 2007 1 558.2987 
# 9 2007-01-12  555.850 2007 1 558.2987 
# 10 2007-01-15  560.737 2007 1 558.2987 
# 11 2007-01-16  555.550 2007 1 558.2987 
# 12 2007-01-17  551.362 2007 1 558.2987 
# 13 2007-01-18  556.037 2007 1 558.2987 
# 14 2007-01-19  550.588 2007 1 558.2987 
# 15 2007-01-22  563.500 2007 1 558.2987 
# 16 2007-01-23  558.787 2007 1 558.2987 
# 17 2007-01-24  558.513 2007 1 558.2987 
# 18 2007-01-25  560.250 2007 1 558.2987 
# 19 2007-01-29  561.100 2007 1 558.2987 
# 20 2007-01-31  561.825 2007 1 558.2987 
# 21 2007-02-01  567.237 2007 2 567.2967 
# 22 2007-02-02  566.388 2007 2 567.2967 
# 23 2007-02-05  567.325 2007 2 567.2967 
# 24 2007-02-06  568.237 2007 2 567.2967 
1

如果添加yrmon列到原始數據幀:

infy_close_subset = infy_close_subset %>% 
    mutate(yr = year(date), mon = month(date)) 

,那麼你可以通過yrmon合併的兩個結果表:

answer = merge(infy_close_subset, Infy_monthlyAvg, by = c("yr", "mon") 

我假設你想要每月的手段。如果你想要整體的意思,那麼答案變得簡單:

answer = infy_close_subset %>% 
    mutate(mean_close = mean(infy_close_subset$INFY.NS.Close)) 

沒有分組,彙總和合並的中間步驟。

2

一個解決方案利用data.table

library(data.table) 
setDT(infy_close_subset) 
infy_close_subset[, mean_close := mean(INFY.NS.Close), by = format(date, "%Y-%m")] 
相關問題