2016-02-29 117 views
3

我需要從具有這些屬性值的向量中提取開始年份和結束年份。從字符串和文本數據中提取年份

yr<- c("June 2013 – Present (2 years 9 months)", "January 2012 – June 2013 (1 year 6 months)","2006 – Present (10 years)","2002 – 2006 (4 years)") 


yr 
June 2013 – Present (2 years 9 months) 
January 2012 – June 2013 (1 year 6 months) 
2006 – Present (10 years) 
2002 – 2006 (4 years) 

我期待這樣的輸出。有沒有人有建議?

start_yr  end_yr 

2013   2016 
2012   2013 
2006   2016 
2002   2006 
+2

gsub與2016年「現在」並提取四位數字。嘗試它 – rawr

回答

3
x <- gsub("present", "2016", yr, ignore.case = TRUE) 
x <- regmatches(x, gregexpr("\\d{4}", x)) 
start_yr <- sapply(x, "[[", 1) 
end_yr <- sapply(x, "[[", 2) 

這樣可以節省開始一年年底今年2個獨立的變量,如果你想讓他們在一個只需編輯代碼,使Y $ start_yr Y $ end_yr

+0

我有這個東西叫「字符(0)」正在爬行,並得到這個錯誤「錯誤在FUN(X [[i]],...):下標越界」。任何關於刪除行的建議? – user3570187

0

另一種解決方案是使用在stringr

library(stringr) 
x <- str_replace(yr, "Present", 2016) 
DF <- as.data.frame(str_extract_all(x, "\\d{4}", simplify = T)) 
names(DF) <- c("start_yr", "end_yr") 
DF 

,你會得到

 start_yr end_yr 
1  2013 2016 
2  2012 2013 
3  2006 2016 
4  2002 2006