R - base ::與字符變量交互的更快解決方案？

考慮下面的模擬數據：R - base ::與字符變量交互的更快解決方案？

df <- data.frame(a=c("John", "Susan", "Eric", "John", "Susan"), 
       b=c("K", NA, "J", "K", "S"), 
       c=c("Smith", "Johnson", "May", "Smith", "Johnson")) 
df$a <- as.character(df$a) 
df$b <- as.character(df$b) 
df$c <- as.character(df$c)

，看起來像這樣：

> df 
     a b  c 
1 John K Smith 
2 Susan <NA> Johnson 
3 Eric J  May 
4 John K Smith 
5 Susan S Johnson

我生成一個名爲unique列，它保存的三個字符變量之間的相互作用的唯一編號。

我使用ifelse語句僅與列a和c交互，如果列b是NA。

df$unique <- NA 
df$unique <- ifelse(is.na(df$b), 
      as.integer(interaction(df$a, df$c)), 
      as.integer(interaction(df$a, df$b, df$c)))

這導致：

> df 
     a b  c unique 
1 John K Smith  23 
2 Susan <NA> Johnson  3 
3 Eric J  May  10 
4 John K Smith  23 
5 Susan S Johnson  9

當我使用此代碼來構建unique變量中包含數百萬行的我的真實數據，這種計算21小時運行。

有什麼方法可以加速這種性能？更聰明的解決方案？

ifelse聲明是瓶頸嗎？

來源

2017-06-22 wake_wake

這項工作？

library(data.table) 
dt1 <- as.data.table(df) 
dt1[, unique := .GRP, by = names(dt1)] 

     a b  c unique 
1: John K Smith  1 
2: Susan NA Johnson  2 
3: Eric J  May  3 
4: John K Smith  1 
5: Susan S Johnson  4

來源

2017-06-22 10:09:12

R - base ::與字符變量交互的更快解決方案？

回答

相關問題