2015-02-23 71 views
0

有沒有一種快速的方法來在R中丟棄數據幀中的X年數據。我期待每個ID丟棄前一年的數據。我的數據按ID和日期排序,日期是id,相隔一個月。我目前的想法是以某種方式爲每個ID創建一個從1到N的計數,然後將N = 1降到12,但我想知道是否有更好的方法,以防萬一我的數據包含一些缺失的日期。在R中丟失數據的第一年的數據ID

例如,我的數據可能是這個樣子:

id | date 
__________ 
a | 2009-01-01 
a | 2009-02-01 
a | 2009-03-01 
a | 2009-04-01 
a | 2009-05-01 
a | 2009-06-01 
a | 2009-07-01 
a | 2009-08-01 
a | 2009-09-01 
a | 2009-10-01 
a | 2009-11-01 
a | 2009-12-01 
a | 2010-01-01 
a | 2010-02-01 
a | 2010-03-01 
b | 2003-07-01 
b | 2003-08-01 
b | 2003-09-01 
b | 2003-10-01 
b | 2003-11-01 
b | 2003-12-01 
b | 2004-01-01 
b | 2004-02-01 
b | 2004-03-01 
b | 2004-04-01 
b | 2004-05-01 
b | 2004-06-01 
b | 2004-07-01 
b | 2004-08-01 
c | 2007-03-01 

和我的願望產量下降的數據的第一年,每個ID:

id | date 
__________ 
a | 2010-01-01 
a | 2010-02-01 
a | 2010-03-01 
b | 2004-07-01 
b | 2004-08-01 

回答

0

易peasy:

df = read.csv(text="id,date 
a,2009-01-01 
a,2009-02-01 
a,2009-03-01 
a,2009-04-01 
a,2009-05-01 
a,2009-06-01 
a,2009-07-01 
a,2009-08-01 
a,2009-09-01 
a,2009-10-01 
a,2009-11-01 
a,2009-12-01 
a,2010-01-01 
a,2010-02-01 
a,2010-03-01 
b,2003-07-01 
b,2003-08-01 
b,2003-09-01 
b,2003-10-01 
b,2003-11-01 
b,2003-12-01 
b,2004-01-01 
b,2004-02-01 
b,2004-03-01 
b,2004-04-01 
b,2004-05-01 
b,2004-06-01 
b,2004-07-01 
b,2004-08-01 
c,2007-03-01") 


library(lubridate) 
df$date <- ymd(df$date) 

library(dplyr) 
df %>% group_by(id) %>% filter(year(date) > min(year(date))) 
#> id  date 
#> 1 a 2010-01-01 
#> 2 a 2010-02-01 
#> 3 a 2010-03-01 
#> 4 b 2004-01-01 
#> 5 b 2004-02-01 
#> 6 b 2004-03-01 
#> 7 b 2004-04-01 
#> 8 b 2004-05-01 
#> 9 b 2004-06-01 
#> 10 b 2004-07-01 
#> 11 b 2004-08-01 
1

使用base R:

# attach the year (as.Date might not be needed if yours is already a date) 
df$year <- format(as.Date(df$date),format = '%Y') 

# attach the minimum year for each id 
df$minyear <- ave(x = df$year,df$id,FUN = min) 

# subset by the minyear variable 
dfnew <- df[df$year != df$minyear, ] 

更新

哦,我看到,在第一年沒有數據,但數據在一年之內從第一個日期。使用lubridate可以使這一切變得簡單。

# add year to date 
require(lubridate) 
df$addyear <- ymd(df$date) %m+% years(1) 

# find minimum cutoff date for each id 
df$mindate <- ave(x = df$addyear,df$id,FUN = min) 

# subset by mindate 
dfnew <- df[df$date >= df$mindate, ] 
+0

謝謝你,我修改您的代碼以獲得所需的結果,因爲它不」,我會在下面發佈t格式很好的評論形式 – samuraiexe 2015-02-24 17:29:45

0

代碼我放在一起達到使用ARobertson的代碼我想要的結果上手

df$year <- format(df$date, format = '%Y') 
df$minyear <- ave(x = df$year,df$id,FUN = min) 

d <- as.POSIXlt(as.Date(df$minyear)) 
d$year <- d$year + 1 
df$cutoff_date <- as.Date(d) 

df$date <- as.Date(df$date) 
dfnew <- df[df$date >= df$cutoff_date, ]