按日期範圍擴展數據框行，並使用NA值

使用下面的數據，我想擴展IndID因子的行或每個級別，以便有多少行與CptrDt和MortDt之間的年數一樣多，包括開始和結束的歲月。對於沒有MortDt個人，我希望在未來幾年順序填寫到2017年按日期範圍擴展數據框行，並使用NA值

dat <- data.frame(IndID = c("AAA","BBB","CCC"), 
       CptrDt = as.Date(c("01-01-2013" ,"01-01-2013", "01-01-2014"),"%m-%d-%Y"), 
       MortDt = as.Date(c("01-01-2015" ,"01-01-2016", NA),"%m-%d-%Y")) 

> dat 
    IndID  CptrDt  MortDt 
1 AAA 2013-01-01 2015-01-01 
2 BBB 2013-01-01 2016-01-01 
3 CCC 2014-01-01  <NA>

簡化的結果只會一年返回，如下圖所示，但我可以用其他日期格式工作。

Result <- data.frame(IndID = c(rep("AAA",3), rep("BBB",4), rep("CCC",4)), 
       Year = c(2013,2014,2015,2013,2014,2015,2016,2014,2015,2016,2017)) 

    IndID Year 
1 AAA 2013 
2 AAA 2014 
3 AAA 2015 
4 BBB 2013 
5 BBB 2014 
6 BBB 2015 
7 BBB 2016 
8 CCC 2014 
9 CCC 2015 
10 CCC 2016 
11 CCC 2017

我認識這個問題是非常類似於一個previous post，但考慮NA值和稍微不同的數據結構的情況下，我還沒有能夠與以前的響應，以產生所需的結果，並希望任何建議。此外，如發佈的答案中所示，還有其他解決方案。

來源

2017-02-10 B. Davis

你可以使用一個列表列或'do'：'庫（tidyverse）; ％>％group_by（IndID）％>％mutate（MortDt = coalesce（MortDt，Sys.Date（）），Year = seq（CptrDt，MortDt，by ='year'）％>％lubridate :: year（）％ >％list（））％>％unnest（）' – alistaire

或使用'purrr :: by_slice'：'dat％>％group_by（IndID）％>％mutate_if（lubridate :: is.Date，coalesce，Sys.Date ））％>％by_slice（〜seq（.x $ CptrDt，.x $ MortDt，by ='year'）％>％lubridate :: year（），.collate ='rows'，.to ='year'） ' – alistaire

1-使用gsub，從每行獲得年份並形成它的一個序列。然後使用expand.grid以上述順序擴展IndID的值。最後將rbind數據幀列表合併到一個數據幀中。

dat[is.na(dat$CptrDt), "CptrDt"] <- as.Date("01-01-2017", "%m-%d-%Y") 
dat[is.na(dat$MortDt), "MortDt"] <- as.Date("01-01-2017", "%m-%d-%Y") 

do.call('rbind', apply(dat, 1, function(x) { 
              pattern <- '([0-9]{4})-[0-9]{2}-[0-9]{2}'; 
              y <- as.numeric(gsub(pattern, '\\1', x[2:3])); 
              expand.grid(IndID = x[1], 
                  Year = seq(y[1], y[2], by = 1)) 
              })) 

# IndID Year 
# 1 AAA 2013 
# 2 AAA 2014 
# 3 AAA 2015 
# 4 BBB 2013 
# 5 BBB 2014 
# 6 BBB 2015 
# 7 BBB 2016 
# 8 CCC 2014 
# 9 CCC 2015 
# 10 CCC 2016 
# 11 CCC 2017

2-使用format根據以下評論中的建議。

dat[is.na(dat$CptrDt), "CptrDt"] <- as.Date("01-01-2017", "%m-%d-%Y") 
dat[is.na(dat$MortDt), "MortDt"] <- as.Date("01-01-2017", "%m-%d-%Y") 

dat$CptrDt <- format(dat$CptrDt, "%Y") 
dat$MortDt <- format(dat$MortDt, "%Y") 

do.call('rbind', apply(dat, 1, function(x) { expand.grid(IndID = x[1], 
                  Year = seq(as.numeric(x[2]), as.numeric(x[3]), by = 1)) }))

數據：

dat <- data.frame(IndID = c("AAA","BBB","CCC"), 
        CptrDt = as.Date(c("01-01-2013" ,"01-01-2013", "01-01-2014"),"%m-%d-%Y"), 
        MortDt = as.Date(c("01-01-2015" ,"01-01-2016", NA),"%m-%d-%Y"))

來源

2017-02-10 05:14:00 Sathish

不要使用正則表達式來解析日期;只需在'％Y'中使用'format'即可。 – alistaire

@alistaire感謝您的評論。我在答案中加入了它 – Sathish

按日期範圍擴展數據框行，並使用NA值

回答

相關問題