2017-06-12 57 views
3

與累積和0-1列如下:填補我的數據在1之間

id <- c(1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4) 
start <- c(NA, NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, 1, NA, NA, NA, NA, NA, 1, NA, NA, NA) 
e <- as.data.frame(cbind(id, start)) 

我想全來港,每次累計總和的比喻開始了既可以當開始== 1或有一個新的ID。 我做了一個for循環,但是我的實際數據太長了for循環在最近的幾天內結束。有沒有辦法加快我的解決方案?我的目標變量可以抄錄如下:

e$target <- NA 
for (i in 2:length(e$id)){ 
    if (e$id[i]!=e$id[i-1]){ 
    e$target[i] <- NA 
    } else { 
    e$target[i] <- e$target[i-1]+1 
    if (!is.na(e$start[i]==1)){ 
     e$target[i] <- 0 
    } 
    } 
} 
+0

根據您的可重複的示例,'id'3的最後一個元素,target將爲0。請您確認一下嗎? – akrun

+1

@akrun,這是正確的 – user3349993

回答

2

我們可以做到這一點與data.table

library(data.table) 
setDT(e)[, target1 := seq_len(.N)-1,.(grp = cumsum(!is.na(start)), id)] 
e[e[, c(.I[all(is.na(start))], .I[seq_len(which.max(!is.na(start))-1)]), 
        id]$V1, target1 := NA] 
e 
# id start target target1 
# 1: 1 NA  NA  NA 
# 2: 1 NA  NA  NA 
# 3: 1 NA  NA  NA 
# 4: 1  1  0  0 
# 5: 1 NA  1  1 
# 6: 1 NA  2  2 
# 7: 2 NA  NA  NA 
# 8: 2 NA  NA  NA 
# 9: 2  1  0  0 
#10: 2 NA  1  1 
#11: 3 NA  NA  NA 
#12: 3 NA  NA  NA 
#13: 3  1  0  0 
#14: 3 NA  1  1 
#15: 3 NA  2  2 
#16: 3 NA  3  3 
#17: 3 NA  4  4 
#18: 3 NA  5  5 
#19: 3  1  0  0 
#20: 4 NA  NA  NA 
#21: 4 NA  NA  NA 
#22: 4 NA  NA  NA 
2

你可以試試tidyverse。使用fill向下拖動最新的非NA項,然後用它們的長度的順序替換這些值(-1是獲得序列0開始)

library(tidyverse) 

e %>% 
group_by(id) %>% 
mutate(target = start) %>% 
fill(target) %>% 
mutate(target = replace(target, !is.na(target), seq(length(target[!is.na(target)]))-1), 
     target = replace(target, start == 1, 0)) 
2

另一個data.table選項是:

library(data.table) 
setDT(e)[, subgroup := cumsum(start==1 & !is.na(start)), by = id] 
e[ , target2 := cumsum(is.na(start)), by = .(id, subgroup)][subgroup == 0, target2 := NA_integer_] 

# id start target subgroup target2 
#1: 1 NA  NA  0  NA 
#2: 1 NA  NA  0  NA 
#3: 1 NA  NA  0  NA 
#4: 1  1  0  1  0 
#5: 1 NA  1  1  1 
#6: 1 NA  2  1  2 
#7: 2 NA  NA  0  NA 
#8: 2 NA  NA  0  NA 
#9: 2  1  0  1  0 
#10: 2 NA  1  1  1 
#11: 3 NA  NA  0  NA 
#12: 3 NA  NA  0  NA 
#13: 3  1  0  1  0 
#14: 3 NA  1  1  1 
#15: 3 NA  2  1  2 
#16: 3 NA  3  1  3 
#17: 3 NA  4  1  4 
#18: 3 NA  5  1  5 
#19: 3  1  0  2  0 
#20: 4 NA  NA  0  NA 
#21: 4 NA  NA  0  NA 
#22: 4 NA  NA  0  NA 
+0

作爲替代方案,我認爲你的第二行也可以是'e [subgroup!= 0,ix:= 1:.N - 1L,by =。(id,subgroup)]' – Henrik