2017-04-03 41 views
1

如何將R中數據框中最短日期和最大日期之間的採樣日期作爲附加列返回?返回數據框中R中最短日期和最長日期之間的採樣日期

Course MinEnrollmentDate MaxEnrollmentDate 
Maths 3/11/2016 3/4/2016 
Chemistry 6/11/2016 6/4/2016 
Physics 9/11/2016 9/4/2016 
English 12/11/2016 12/4/2016 
Science 3/11/2017 3/4/2017 
+0

我認爲'MinEnrollmentDate'和'MaxEnrollmentDate'的列名已被互換。理想情況下,'MaxEnrollmentDate'必須> ='MinEnrollmentDate' – Aramis7d

回答

0

假設你在一個數據幀名爲MYDATA的工作,你可以使用下面的代碼片段:

mydata$sampledate <- sample(seq(as.Date(mydata$MinEnrollmentDate), as.Date(mydata$MinEnrollmentDate), by="day"), 1) 

基本上,這樣做是首先生成的開始和之間的所有天序列結束日期,然後從該序列中隨機抽取1號樣本,並將其寫入您的數據框。

1

使用dplyr,我們可以這樣做:如果

library(dplyr) 

df <- df %>% 
    rowwise() %>% 
    mutate(MinEnrollmentDate = as.Date(MinEnrollmentDate, format = '%m/%d/%Y'), 
      MaxEnrollmentDate = as.Date(MaxEnrollmentDate, format = '%m/%d/%Y'), 
      sampleDate = sample(seq(MinEnrollmentDate, MaxEnrollmentDate, '-1 day'), 1)) 

df 
#> Source: local data frame [5 x 4] 
#> Groups: <by row> 
#> 
#> # A tibble: 5 x 4 
#>  Course MinEnrollmentDate MaxEnrollmentDate sampleDate 
#>  <chr>   <date>   <date>  <date> 
#> 1  Maths  2016-03-11  2016-03-04 2016-03-08 
#> 2 Chemistry  2016-06-11  2016-06-04 2016-06-09 
#> 3 Physics  2016-09-11  2016-09-04 2016-09-06 
#> 4 English  2016-12-11  2016-12-04 2016-12-09 
#> 5 Science  2017-03-11  2017-03-04 2017-03-06 

不知道我得到了你的日期格式正確,它的曖昧,隨時糾正format=部分。 數據:

df <- read.table(text = 'Course MinEnrollmentDate MaxEnrollmentDate 
        Maths 3/11/2016 3/4/2016 
        Chemistry 6/11/2016 6/4/2016 
        Physics 9/11/2016 9/4/2016 
        English 12/11/2016 12/4/2016 
        Science 3/11/2017 3/4/2017', header = T, stringsAsFactors = F) 
1

你可以計算天的兩個日期之間的數字:

days <- as.Date(data$MinEnrollmentDate, format="%d/%m/%Y") - as.Date(data$MaxEnrollmentDate, format="%d/%m/%Y") 

,然後添加到MinEnrollmentDate 1天到MaxEnrollmentDate與功能的數量之間的隨機數sample()

for(i in seq_along(days)) { 
    data[i,4] <- as.character(as.Date(data$MinEnrollmentDate, format="%d/%m/%Y")[i] + sample(1:days[i],1)) 
} 
0

一步一步lubridate溶液,爲完整起見(使用GGamba的df):

if (!require(lubridate)) { 
    install.packages("lubridate") 
} 

df <- read.table(text = 'Course MinEnrollmentDate MaxEnrollmentDate 
        Maths 3/11/2016 3/4/2016 
        Chemistry 6/11/2016 6/4/2016 
        Physics 9/11/2016 9/4/2016 
        English 12/11/2016 12/4/2016 
        Science 3/11/2017 3/4/2017', header = T, stringsAsFactors = F) 

no_days <- as.POSIXct(df$MinEnrollmentDate, format = "%d/%m/%Y") - as.POSIXct(df$MaxEnrollmentDate, format = "%d/%m/%Y") 

random_days <- sapply(no_days, function(x) sample(x = 1:x, size = 1, replace = T)) 

df$random_date <- as.POSIXct(df$MinEnrollmentDate, format = "%d/%m/%Y") + days(random_days)