2014-08-27 92 views
0

樣本數據:與NA替換除了最後一行給出ID,其中R

x <- data.frame(id=c(1,1,1,2,2,7,7,7,7),dna=c(232,424,5345,45345,45,345,4543,345345,4545)) 
y <- data.frame(id=c(1,1,1,2,2,7,7,7),year=c(2001,2002,2003,2005,2006,2000,2001,2002)) 
x <- transform(x, rec = ave(id, id, FUN = seq_along)) 
y <- transform(y, rec = ave(id, id, FUN = seq_along)) 

df <- merge(x, y, c("id", "rec")) 
df 

我想與NA取代的dna列值除給定idrec最後一行。我怎樣纔能有效地做到這一點?理想將是在基地R解決方案。謝謝!

所需的輸出:

id rec dna year 
1 1 1  NA 2001 
2 1 2  NA 2002 
3 1 3 5345 2003 
4 2 1  NA 2005 
5 2 2  45 2006 
... 
... 

回答

3

試試這個:

df$dna <- with(df, ave(dna, df$id, FUN = function(x){ 
    if ((len <- length(x)) > 1) 
    x[1:(len-1)] <- NA 
    x 
})) 
df 
# id rec dna year 
# 1 1 1  NA 2001 
# 2 1 2  NA 2002 
# 3 1 3 5345 2003 
# 4 2 1  NA 2005 
# 5 2 2  45 2006 
# 6 7 1  NA 2000 
# 7 7 2  NA 2001 
# 8 7 3 345345 2002 
+1

雖然這如果有任何ID – rawr 2014-08-27 12:03:16

+0

好點@rawr只有一個REC難道不工作,沒想到的是......我做了更新 – lukeA 2014-08-27 12:07:51

2

雖然你問的基礎R解決方案,但這裏有一個data.table溶液(以防萬一效率事項)

library(data.table) 
setDT(df)[, indx := .N, by = id][rec != indx, dna := NA_real_, by = id] 

# id rec dna year indx 
# 1: 1 1  NA 2001 3 
# 2: 1 2  NA 2002 3 
# 3: 1 3 5345 2003 3 
# 4: 2 1  NA 2005 2 
# 5: 2 2  45 2006 2 
# 6: 7 1  NA 2000 3 
# 7: 7 2  NA 2001 3 
# 8: 7 3 345345 2002 3 
2

另一種方法:

transform(df, dna = ave(dna, id, FUN = function(x) "is.na<-"(x, -length(x)))) 

# id rec dna year 
# 1 1 1  NA 2001 
# 2 1 2  NA 2002 
# 3 1 3 5345 2003 
# 4 2 1  NA 2005 
# 5 2 2  45 2006 
# 6 7 1  NA 2000 
# 7 7 2  NA 2001 
# 8 7 3 345345 2002 
1

id列中,您可以使用duplicated函數及其fromLast參數。然後,我們可以將其包裝在dna列的向量子集中,並將NA值賦值給結果。

> df$dna[duplicated(df$id, fromLast = TRUE)] <- NA 
> df 
# id rec dna year 
# 1 1 1  NA 2001 
# 2 1 2  NA 2002 
# 3 1 3 5345 2003 
# 4 2 1  NA 2005 
# 5 2 2  45 2006 
# 6 7 1  NA 2000 
# 7 7 2  NA 2001 
# 8 7 3 345345 2002