2017-09-16 63 views
1

我在R中有一個data.table,並且想要創建一個新列。假設我將日期列名稱保存爲變量,並且希望將_year附加到新列中的該名稱。我可以通過指定名稱來完成正常的路由,但是如何使用date_col變量創建新的列名稱。在data.table中動態創建新列

這是我試過的。最後兩個,我想要的,不工作。

dat = data.table(one = 1:5, two = 1:5, 
       order_date = lubridate::ymd("2015-01-01","2015-02-01","2015-03-01", 
          "2015-04-01","2015-05-01")) 
dat 
date_col = "order_date" 
dat[,`:=`(OrderDate_year = substr(get(date_col)[!is.na(get(date_col))],1,4))][] 
dat[,`:=`(new = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][] 
dat[,`:=`(paste0(date_col, "_year", sep="") = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][] 
dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][] 

回答

1

最後兩個語句返回一條錯誤消息:

dat[,`:=`(paste0(date_col, "_year", sep="") = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][] 
Error: unexpected '=' in "dat[,`:=`(paste0(date_col, "_year", sep="") =" 
dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][] 
Error: unexpected '=' in "dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) =" 

用於調用:=()功能正確的語法是:

dat[, `:=`(paste0(date_col, "_year", sep = ""), 
      substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))], 1, 4))][] 
dat[, `:=`(noquote(paste0(date_col, "_year", sep = "")), 
      substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))], 1, 4))][] 

即,由,替換=


但是,賦值語法和右手邊太複雜了。

order_date柱已經Date類:

str(dat) 
Classes ‘data.table’ and 'data.frame': 5 obs. of 3 variables: 
$ one  : int 1 2 3 4 5 
$ two  : int 1 2 3 4 5 
$ order_date: Date, format: "2015-01-01" "2015-02-01" ... 
- attr(*, ".internal.selfref")=<externalptr> 

爲了提取年,year()函數可用於(或者從data.table包或lubridate包任何最後加載),所以不需要轉換回字符和提取年份字符串:

date_col = "order_date" 
dat[, paste0(date_col, "_year") := lapply(.SD, year), .SDcols = date_col][] 
one two order_date order_date_year 
1: 1 1 2015-01-01   2015 
2: 2 2 2015-02-01   2015 
3: 3 3 2015-03-01   2015 
4: 4 4 2015-04-01   2015 
5: 5 5 2015-05-01   2015 

另外,

dat[, paste0(date_col, "_year") := year(get(date_col))][] 
dat[, `:=`(paste0(date_col, "_year"), year(get(date_col)))][] 

工作爲好。

1

功能是很好的做到這一點。比在data.table裏面設置還要快。這是你在追求什麼? http://brooksandrew.github.io/simpleblog/articles/advanced-data-table/#fast-looping-with-set

library(data.table) 
dat = data.table(one = 1:5, two = 1:5, 
       order_date = lubridate::ymd("2015-01-01","2015-02-01","2015-03-01", 
          "2015-04-01","2015-05-01")) 
dat 
date_col = "order_date" 

year_col <- paste0(date_col, "_year", sep="") 
set(dat, j = year_col, value = substr(dat[[date_col]], 1, 4))