從日常時間序列數據中刪除季節性

dput(df) 
structure(list(Process = c("PROC050D", "PROC051D", "PROC100D", 
"PROC103D", "PROC104D", "PROC106D", "PROC106D", "PROC110D", "PROC111D", 
"PROC112D", "PROC113D", "PROC114D", "PROC130D", "PROC131D", "PROC132D", 
"PROC154D", "PROC155D", "PROC156D", "PROC157D", "PROC158D", "PROC159D", 
"PROC160D", "PROC161D", "PROC162D", "PROC163D", "PROC164D", "PROC165D", 
"PROC166D", "PROC170D", "PROC171D", "PROC173D", "PROC174D", "PROC177D", 
"PROC180D", "PROC181D", "PROC182D", "PROC185D", "PROC186D", "PROC187D", 
"PROC190D", "PROC191D", "PROC192D", "PROC196D", "PROC197D", "PROC201D", 
"PROC202D", "PROC203D", "PROC204D", "PROC205D", "PROC206D"), 
    Date = structure(c(15393, 15393, 15393, 15393, 15393, 15393, 
    15393, 15393, 15393, 15393, 15393, 15393, 15393, 15393, 15393, 
    15393, 15393, 15393, 15393, 15393, 15393, 15393, 15393, 15393, 
    15393, 15393, 15393, 15393, 15393, 15393, 15393, 15393, 15393, 
    15393, 15393, 15393, 15393, 15393, 15393, 15393, 15393, 15393, 
    15393, 15393, 15393, 15393, 15393, 15393, 15393, 15393), class = "Date"), 
    Duration = c(30L, 78L, 20L, 15L, 129L, 56L, 156L, 10L, 1656L, 
    1530L, 52L, 9L, 10L, 38L, 48L, 9L, 26L, 90L, 15L, 23L, 13L, 
    9L, 34L, 12L, 11L, 16L, 24L, 11L, 236L, 104L, 9L, 139L, 11L, 
    10L, 22L, 11L, 55L, 35L, 12L, 635L, 44L, 337L, 44L, 9L, 231L, 
    32L, 19L, 170L, 22L, 19L)), .Names = c("Process", "Date", 
"Duration"), row.names = c(NA, 50L), class = "data.frame")

我試圖使用IQR方法從我的數據中捕獲異常值。但是當我使用這些數據時，我也會捕獲可能正常的數據。我喜歡從我的數據點中刪除季節性，然後應用異常規則。從日常時間序列數據中刪除季節性

Process列上有成千上萬個不同的進程。我只需要捕獲過程中不正常的持續時間。任何想法如何從我的數據集中刪除季節性？下面的代碼計算離羣值，但由於季節因素，離羣值可能是正常的。在計算離羣值之前，我喜歡從我的數據框中刪除季節。

library(data.table) 

df<-df[, seventyFifth := quantile(Duration, .75), by = Process] 
df<-df[, twentyFifth := quantile(Duration, .25), by = Process] 
df<-df[, IQR := (seventyFifth-twentyFifth), by = Process] 

df$diff<-df$Duration-df$seventyFifth 

df<-df[, outlier := diff > 3 * IQR, by = Process]

來源

2012-11-05 user1471980

@GSee，no。我更新了這篇文章。我想刪除或按摩數據，以便季節性不會顯示在我的異常值計算中。我需要從我的數據集中捕捉異常值，不包括季節性數據點。 – user1471980

這取決於季節性如何可預測性或smooth。是否可以製作一個寬鬆的模型？例如，

LM <- lm(duration~sin(Date)+cos(Date))

或者一些變化。

P <- predict(LM) 
DIF <- P-df$duration

然後你可以在DIF使用IQR：然後你就可以，因爲他們從預測的季節性差異僅就分析數據。說到dif，你可以通過排序數據Date和使用diff得到一些有用的信息。

df <- df[order(df$Date),] 
DIF2 <- diff(df$Date) 
plot(diff(df$Date))

理論上，DIF2應該在LM產生的函數的導數。

作爲一個側面說明，如果有，我不建議採取一種非常系統的方法（即加載包，做BlindlyGetRidOfOultliersAdjustingForSeasonality(df)如果季節性確實是複雜的。

來源

2012-11-05 20:00:46

爲了應對可能的季節性模式，我首先使用acf(df$Duration)來尋找不同時滯的自相關，如果我沒有看到任何東西，除非我有先驗的理由來模擬它，否則我可能不會擔心它，你的樣本數據顯示沒有季節性的證據，因爲 - 除了始終是1-的自相關之外，唯一的相關性在滯後1並且是適度的：

enter image description here

不僅可以處理季節性組件（週期性重複發生的事件），還可以處理趨勢（規範中緩慢移動）的方法是stl()，特別是RobJ Hyndman在this posting中實施的方法。

功能Hyndman給出的decomp函數（複製如下）對於檢查季節性，然後將時間序列分解爲季節性（如果存在），趨勢和殘差分量非常有用。

decomp <- function(x,transform=TRUE) 
{ 
    #decomposes time series into seasonal and trend components 
    #from http://robjhyndman.com/researchtips/tscharacteristics/ 
    require(forecast) 
    # Transform series 
    if(transform & min(x,na.rm=TRUE) >= 0) 
    { 
    lambda <- BoxCox.lambda(na.contiguous(x)) 
    x <- BoxCox(x,lambda) 
    } 
    else 
    { 
    lambda <- NULL 
    transform <- FALSE 
    } 
    # Seasonal data 
    if(frequency(x)>1) 
    { 
    x.stl <- stl(x,s.window="periodic",na.action=na.contiguous) 
    trend <- x.stl$time.series[,2] 
    season <- x.stl$time.series[,1] 
    remainder <- x - trend - season 
    } 
    else #Nonseasonal data 
    { 
    require(mgcv) 
    tt <- 1:length(x) 
    trend <- rep(NA,length(x)) 
    trend[!is.na(x)] <- fitted(gam(x ~ s(tt))) 
    season <- NULL 
    remainder <- x - trend 
    } 
    return(list(x=x,trend=trend,season=season,remainder=remainder, 
    transform=transform,lambda=lambda)) 
}

正如你可以看到它使用stl()（使用黃土），如果有季節性和處罰迴歸樣條，如果沒有季節性。

在你的情況，你可以使用該函數是這樣的：

# makemodel 
df.decomp <- decomp(df$Duration) 

# add results into df 
if (!is.null(df.decomp$season)){ 
    df$season <- df.decomp$season} else 
    {df$season < - 0} 
df$trend <- df.decomp$trend 
df$Durationsmoothed <- df.decomp$remainder 

# if you don't want to detrend 
df$Durationsmoothed <- df$Durationsmoothed+df$trend

您應該諮詢所引用的博客文章，因爲它進一步發展了這一分析。

來源

2012-11-05 20:14:33 MattBagg

從日常時間序列數據中刪除季節性

回答

相關問題