2017-04-19 37 views
2

如何將每個組的滾動連接加入兩個數據表?每個組的data.table滾動連接

library(data.table) 
alldates = as.Date(c('2000-01-01','2005-01-01','2010-01-01','2015-01-01','2020-01-01')) 
gdp = data.table(date = alldates[c(1,3,5,1,3,5)], country = c('A','A','A','B','B','B'), value = c(1,10,100, 2, 20, 200)) 
gdp 
    date country value 
1: 2000-01-01  A  1 
2: 2010-01-01  A 10 
3: 2020-01-01  A 100 
4: 2000-01-01  B  2 
5: 2010-01-01  B 20 
6: 2020-01-01  B 200 

price = data.table(date = alldates, price = c(101, 102, 103, 104, 105)) 
price 
    date price 
1: 2000-01-01 101 
2: 2005-01-01 102 # gdp table is missing mid decade data 
3: 2010-01-01 103 
4: 2015-01-01 104 
5: 2020-01-01 105 

結果我想

  date country value price 
1: 2000-01-01  A  1 101 
2: 2000-01-01  B  2 101 
3: 2005-01-01  A  1 102 # fill in value using previous gdp for each country 
4: 2005-01-01  B  2 102 
5: 2010-01-01  A 10 103 
6: 2010-01-01  B 20 103 
7: 2015-01-01  A 10 104 
8: 2015-01-01  B 20 104 
9: 2020-01-01  A 100 105 
10: 2020-01-01  B 200 105 

NB

  1. 行順序並不重要
  2. 並不需要成爲一個班輪
  3. gdp[price, on = 'date', roll = TRUE]不起作用

回答

3

清理數據後...

# fill in missing levels 
gdpf = gdp[CJ(date = price$date, country = country, unique = TRUE), on=.(date, country)] 

# fill in values for missing levels 
gdpf[order(country), value := first(value), by=.(country, cumsum(!is.na(value)))] 

更新加入則可以抓住價格:

gdpf[price, on=.(date), price := i.price ] 

      date country value price 
1: 2000-01-01  A  1 101 
2: 2000-01-01  B  2 101 
3: 2005-01-01  A  1 102 
4: 2005-01-01  B  2 102 
5: 2010-01-01  A 10 103 
6: 2010-01-01  B 20 103 
7: 2015-01-01  A 10 104 
8: 2015-01-01  B 20 104 
9: 2020-01-01  A 100 105 
10: 2020-01-01  B 200 105 

另一種方式來向下填充值缺失水平是value := na.locf(value), by=country使用動物園包。