編輯:從@ thelatemail的評論,改變.SD
到sum
這應該提高速度。一個data.table
的解決辦法是:
dt[,percent := sum*100/sum[espEvent=="s_All"], by = (visitDate)]
dt
# visitDate espEvent sum percent
#1: 1/2/05 s_All 1352 100.0000000
#2: 1/2/05 s_Animal 6 0.4437870
#3: 1/2/05 s_CD 4 0.2958580
#4: 1/4/05 s_All 1412 100.0000000
#5: 1/4/05 s_Animal 4 0.2832861
#6: 1/4/05 s_CD 2 0.1416431
這將永遠是相對的百分比計算該行地方espEvent == "s_All"
。
數據:
dt <- structure(list(visitDate = c("1/2/05", "1/2/05", "1/2/05", "1/4/05",
"1/4/05", "1/4/05"), espEvent = c("s_All", "s_Animal", "s_CD",
"s_All", "s_Animal", "s_CD"), sum = c(1352L, 6L, 4L, 1412L, 4L,
2L)), .Names = c("visitDate", "espEvent", "sum"), row.names = c(NA,
-6L), class = c("data.table", "data.frame"))
編輯:速度測試 - 因爲我很好奇,我決定一次使用sum
和我原來的.SD
- 看起來像sum
快得多:
library(microbenchmark)
microbenchmark(sum = dt[,percent := sum*100/sum[espEvent=="s_All"], by = (visitDate)],
.SD = dt[,percent := sum*100/.SD[espEvent=="s_All", sum], by = (visitDate)])
#Unit: microseconds
# expr min lq mean median uq max neval
# sum 814.043 934.400 1035.136 984.082 1105.372 1670.071 100
# .SD 1630.884 1846.173 1987.738 1977.260 2093.886 2496.242 100
沒有必要使用'.SD' - 'dat [,percent:= sum * 100/sum [espEvent ==「s_All」],by = visitDate]'就可以做到。在一個大的數據集中,這會在速度上產生巨大的相對差異。 – thelatemail
@thelatemail謝謝!我會更新我的答案 –