2015-03-31 78 views
1

如果另一列匹配條件,我需要更新特定列的值。下面是示例:R轉換()是否覆蓋每個值?

zz1 <- "or,d,ddate,rdate,changes,class,price,fdate,company,number,minutes,added,source 
VA1,VA2,2014-05-24,,0,0,2124,2014-05-22 15:50:16,,,,2014-05-22 12:20:03,ss 
VA1,VA2,2014-05-26,,0,0,2124,2014-05-22 15:03:44,,,,2014-05-22 12:20:03,s1 
VA1,VA2,2014-06-05,,0,0,2124,2014-05-22 15:48:24,,,,2014-05-22 12:20:03,s1 
VA1,VA2,2014-06-09,,0,0,2124,2014-05-22 15:37:35,,,,2014-05-22 12:20:03,s2 
VA1,VA2,2014-06-16,,0,0,2124,2014-05-22 14:17:33,,,,2014-05-22 12:20:03,ss" 

columnClasses <- c("factor", "factor", "POSIXct", "factor", "integer", "factor", "integer", "factor", "factor", "factor", "integer", "factor", "factor") 
dt1 <- read.table(text=zz1, header = TRUE, sep = ",", comment.char = "", quote = "", na.strings = c(""), colClasses = columnClasses) 

第一列(or)值應改爲列ord爲源等於sss2的組合值。

我試着做象下面這樣:

dt1$or[dt1$source == "ss" | dt1$source == "s2"] <- paste0(dt1$or, as.character(dt1$d)) 

但它返回一個錯誤number of items to replace is not a multiple of replacement length

現在我用下面的代碼做到這一點:

dt1$or <- as.character(dt1$or) 
dt1 <- transform(dt1, or = ifelse(source == "ss" | source == "s2", paste0(dt1$or, as.character(dt1$d)), dt1$or)) 

它運作良好,但恐怕它重新寫的來源不等於sss2每個值。如果這是真的,那麼我應該如何更改我的代碼以避免它?

回答

2

首先,根據您以前的問題判斷,您到目前爲止使用的是data.table,所以讓我們保留它,並使用fread而不是read.table

所以第一步將是:

library(data.table) 
dt1 <- fread(zz1, colClasses = columnClasses) 

第二步,是source(壞名BTW列)以避免您的數據,爲了避免(正確地提到執行二進制加入由你開銷)ifelse,即:

setkey(dt1, source) 
dt1[.(c("ss", "s2")), or := paste0(or, d)][] 
#  or d  ddate rdate changes class price    fdate company number minutes    added source 
# 1: VA1 VA2 2014-05-26 NA  0  0 2124 2014-05-22 15:03:44  NA  NA  NA 2014-05-22 12:20:03  s1 
# 2: VA1 VA2 2014-06-05 NA  0  0 2124 2014-05-22 15:48:24  NA  NA  NA 2014-05-22 12:20:03  s1 
# 3: VA1VA2 VA2 2014-06-09 NA  0  0 2124 2014-05-22 15:37:35  NA  NA  NA 2014-05-22 12:20:03  s2 
# 4: VA1VA2 VA2 2014-05-24 NA  0  0 2124 2014-05-22 15:50:16  NA  NA  NA 2014-05-22 12:20:03  ss 
# 5: VA1VA2 VA2 2014-06-16 NA  0  0 2124 2014-05-22 14:17:33  NA  NA  NA 2014-05-22 12:20:03  ss 
+0

謝謝!奇怪的是,當'columnClasses'中給出'factor'時,'fread'讀取'或'列作爲'character'。所以,在你的情況下,結果'或'列有'character'類,在我的情況下(當使用'read.csv'時),我最終需要'factor'。我可以將'read.csv'改成'fread',但結果我得到更多的問題。例如,將它看作「字符」然後轉換爲「因子」更有效?或者,應該修改'fread'方法從一開始就獲得'factor'。 – 2015-03-31 18:22:35

+0

而且,爲什麼'源'是一個壞名字? – 2015-03-31 18:23:01

+0

如果你想修改'or',你必須擁有'character'類。你自己做了'dt1 $或< - as.character(dt1 $或')'。修改後,您可以簡單地執行'dt1 [,或:= as.factor(or)]',它將通過引用*修改它*。回答你的第二個問題。 'source'是R中的一個函數,爲了防止意外行爲,最好不要*將列或數據集稱爲存儲函數。 – 2015-03-31 19:00:17