2011-02-24 64 views
5

我有一個問題,找出如何計算「x」天的平均值。如果我嘗試繪製這個csv文件超過1年,那麼太多的數據無法在繪圖線上正確顯示(所附截圖)。我期待的是每隔幾天(也許是2,一週等)的平均數據,所以線圖並不難讀。有關如何使用R解決此問題的任何建議?R腳本到每<x>天的平均值

results.csv

POSTS,PROVIDER,TYPE,DATE 
29337,FTP,BLOG,2010-01-01 
26725,FTP,BLOG,2010-01-02 
27480,FTP,BLOG,2010-01-03 
31187,FTP,BLOG,2010-01-04 
31488,FTP,BLOG,2010-01-05 
32461,FTP,BLOG,2010-01-06 
33675,FTP,BLOG,2010-01-07 
38897,FTP,BLOG,2010-01-08 
37122,FTP,BLOG,2010-01-09 
41365,FTP,BLOG,2010-01-10 
51760,FTP,BLOG,2010-01-11 
50859,FTP,BLOG,2010-01-12 
53765,FTP,BLOG,2010-01-13 
56836,FTP,BLOG,2010-01-14 
59698,FTP,BLOG,2010-01-15 
52095,FTP,BLOG,2010-01-16 
57154,FTP,BLOG,2010-01-17 
80755,FTP,BLOG,2010-01-18 
227464,FTP,BLOG,2010-01-19 
394510,FTP,BLOG,2010-01-20 
371303,FTP,BLOG,2010-01-21 
370450,FTP,BLOG,2010-01-22 
268703,FTP,BLOG,2010-01-23 
267252,FTP,BLOG,2010-01-24 
375712,FTP,BLOG,2010-01-25 
381041,FTP,BLOG,2010-01-26 
380948,FTP,BLOG,2010-01-27 
373140,FTP,BLOG,2010-01-28 
361874,FTP,BLOG,2010-01-29 
265178,FTP,BLOG,2010-01-30 
269929,FTP,BLOG,2010-01-31 

ř腳本

library(ggplot2); 
data <- read.csv("results.csv", header=T); 
dts <- as.POSIXct(data$DATE, format="%Y-%m-%d"); 
attach(data); 
a <- ggplot(dataframe, aes(dts,POSTS/1000, fill = TYPE)) + opts(title = "Report") + labs(x = NULL, y = "Posts (k)", fill = NULL); 
b <- a + geom_bar(stat = "identity", position = "stack"); 
plot_theme <- theme_update(axis.text.x = theme_text(angle=90, hjust=1), panel.grid.major = theme_line(colour = "grey90"), panel.grid.minor = theme_blank(), panel.background = theme_blank(), axis.ticks = theme_blank(), legend.position = "none"); 
c <- b + facet_grid(TYPE ~ ., scale = "free_y"); 
d <- c + scale_x_datetime(major = "1 months", format = "%Y %b"); 
ggsave(filename="/root/results.png",height=14,width=14,dpi=600); 

圖形圖像

enter image description here

+0

您是否嘗試使用'geom_smooth'而不是'geom_bar'? – hadley 2011-02-25 22:45:16

回答

4

嘗試這種情況:

Average <- function(Data,n){ 
    # Make an index to be used for aggregating 
    ID <- as.numeric(as.factor(Data$DATE))-1 
    ID <- ID %/% n 
    # aggregate over ID and TYPE for all numeric data. 
    out <- aggregate(Data[sapply(Data,is.numeric)], 
     by=list(ID,Data$TYPE), 
     FUN=mean) 
    # format output 
    names(out)[1:2] <-c("dts","TYPE") 
    # add the correct dates as the beginning of every period 
    out$dts <- as.POSIXct(Data$DATE[(out$dts*n)+1]) 
    out 
} 

dataframe <- Average(Data,3) 

這適用於您提供的情節腳本。

一些言論:

  • 永遠不會調用一個函數名後(數據,C,...)的一些變量
  • 避免使用attach()。如果你這樣做,之後加入detach(),否則你會在某個時候遇到麻煩。更好的是使用功能with()within()
+0

編輯爲將正確的格式添加到dts變量。 – 2011-02-24 15:42:12

+0

感謝您的快速響應。這正是我需要的。我會聽從你的建議。 – 2011-02-24 16:39:08

+0

您可能想要移除瀏覽器語句。 – hadley 2011-02-25 22:44:52

3

的TTR包也有幾個均線功能,將做到這與單一聲明:

library(TTR) 
mavg.3day <- SMA(data$POSTS, n=3) # Simple moving average 

替代「n」的不同值的所需的移動平均長度。