2017-04-06 69 views
1

我給出如下特徵向量:更換日期的特徵向量,以特定的格式

"On the evening of 2017-04-23, I was too tired" 
"to complete my homework that was due on 24.04.2017." 

我需要通過它來搜索日期的所有出現,並與格式MONTHNAME d,YYYY替換它們。

我知道一般格式應該是%B%d,%Y,我可能必須使用sub()函數,但我不太確定如何將兩者結合在一起。

當我嘗試像

sub("[0-9]{2}.[0-9]{2}.[0-9]{4}","%B %d, %Y",x) 

我剛剛得到以下結果

"On the evening of 2001-01-15, I was too tired to complete my homework that was due on %B %d, %Y." 

可能有人請幫助我弄清楚如何把它一起?


我與同伴stackoverflowers的幫助下新的代碼如下:

streamlineDates(x) 
{ 
#set pattern to dates in form of YYYY-MM-DD or DD.MM.YYYY 
pattern <- "\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}" 

y <- c(x) 

val <- unlist(regmatches(y, gregexpr(pattern, y))) 

val1 <- as.Date(val,format=c("%Y-%m-%d","%d.%m.%Y")) 
val2 <- format(val1,"%B %d, %Y") 

y1 <- list() 
for (i in 1:length(y)){ 
    y1[i] <- gsub(pattern,val2[i],y[i]) 
} 
} 

然而,當我只輸入:

x <- "to complete my homework that was due on 24.04.2017." 

...它只返回NA。我已將問題範圍縮小到gsub,其中替換值值,「如果NA,則結果中對應於匹配的所有元素將被設置爲NA」。因此,當僅輸入最後一行時缺少第一個日期,它僅返回NA。

我該如何讓它接受一個或兩個日期?

+0

數據格式(例如, '%B%d%Y')不能用在'sub'或'gsub'函數中,它必須用在'as.Date'中。 – emilliman5

+0

@ sooki-sooki看到我的解決方案,我希望這有助於。謝謝 – PKumar

回答

2

第一方法:

使用基礎R溶液(不使用任何包):

pattern <- "\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}" 
rep <- c("On the evening of 2017-04-23, I was too tired","to complete my homework that was due on 24.04.2017.") 


val <- unlist(regmatches(rep, gregexpr(pattern, rep))) 

val1 <- as.Date(val,format=c("%Y-%m-%d","%d.%m.%Y")) 
val2 <- format(val1,"%B %d, %Y") 
val2 
rep1 <- list() 
for (i in 1:length(rep)){ 
rep1[i] <- gsub(pattern,val2[i],rep[i]) 
} 

答案:

do.call("c",rep1) 

> do.call("c",rep1)             
[1] "On the evening of April 23, 2017, I was too tired"  
[2] "to complete my homework that was due on April 24, 2017." 
> 

第2種方法:

使用圖書館stringr

library(stringr) 
rep <- c("On the evening of 2017-04-23, I was too tired","to complete my homework that was due on 24.04.2017.") 
val <- str_extract(rep,"\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}") 
val1 <- as.Date(val,format=c("%Y-%m-%d","%d.%m.%Y")) 
val2 <- format(val1,"%B %d, %Y") 
rep1 <- str_replace_all(rep,"\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}",val2) 
rep1 

答:但它

> rep1 
[1] "On the evening of April 23, 2017, I was too tired"  
[2] "to complete my homework that was due on April 24, 2017." 
> 

編輯OP之後已經改變的問題一點,解決的辦法是更通用,假設該月將始終處於中間位置,並且分隔符僅限於破折號( - )和點(。):

pattern <- "\\d{2,4}[.-]\\d{2}[.-]\\d{2,4}" 
rep <- c("On the evening of 2017-04-23, I was too tired","to complete my homework that was due on 24.04.2017.") 


val <- unlist(regmatches(rep, gregexpr(pattern, rep))) 

year <- regmatches(val, gregexpr("\\d{4}", val)) 

month <- regmatches(val, gregexpr("(?<=[.-])\\d{1,2}(?=[.-])", val,perl=T)) 

date <- regmatches(val, gregexpr("(?<=[.-])\\d{2}$|^\\d{2}(?=[.-])", val,perl=T)) 
#Extracting year month and date , assuming month always falls in middle string 

date1 <- paste0(year,"-",month,"-",date) 
date1 <- as.Date(date1,"%Y-%m-%d") 
val2 <- format(date1,"%B %d, %Y") 

rep1 <- list() 
for (i in 1:length(rep)){ 
    rep1[i] <- gsub(pattern,val2[i],rep[i]) 
} 


do.call("c",rep1) 
+0

這很棒,但是如果沒有任何附加的庫,也就是隻有標準的預加載的R庫,會不會有這種方法? –

+0

我剛剛對代碼進行了一些進一步的測試,並注意到如果僅「完成2017年4月24日到期的作業」。作爲輸入提供,代碼不起作用,只返回NA。 你可能知道如何解決這個問題嗎? –

+1

請理解,如果你想爲每一個場景取得正確的結果。你必須在'as.Date(val,format = c(「%Y-%m-%d」,「%d。%m。%Y」))中加上相應的正確格式,就像這裏我們把兩個兩種不同日期戳的格式不同。一種格式不能與你擁有的每一種日期格式相關聯。給我一些時間我試圖使它通用。如果有可能,我不會,但肯定會嘗試。 – PKumar

1

首先您需要指定日期的所有格式。然後轉換爲日期,使用的格式,讓您所需的輸出,即

#Note that I don't use any delimiter in the formatting simply because 
#I will use gsub to replace all except the numbers with '' from the string 
v1 <- c('%Y%m%d', '%d%m%Y') 

format(as.Date(gsub('\\D+', '', x), format = v1), "%B %d, %Y") 
#[1] "April 23, 2017" "April 24, 2017" 

可以使用(一個比較難看)從stringrstr_replace_all正則表達式,即

stringr::str_replace_all(x, '\\d+-\\d+-\\d+|\\d+\\.\\d+\\.\\d+', 
         format(as.Date(gsub('\\D+', '', x), format = v1), "%B %d, %Y")) 

#[1] "On the evening of April 23, 2017, I was too tired"  
#[2] "to complete my homework that was due on April 24, 2017."