與NA替換除了最後一行給出ID，其中R

樣本數據：與NA替換除了最後一行給出ID，其中R

x <- data.frame(id=c(1,1,1,2,2,7,7,7,7),dna=c(232,424,5345,45345,45,345,4543,345345,4545)) 
y <- data.frame(id=c(1,1,1,2,2,7,7,7),year=c(2001,2002,2003,2005,2006,2000,2001,2002)) 
x <- transform(x, rec = ave(id, id, FUN = seq_along)) 
y <- transform(y, rec = ave(id, id, FUN = seq_along)) 

df <- merge(x, y, c("id", "rec")) 
df

我想與NA取代的dna列值除給定id和rec最後一行。我怎樣纔能有效地做到這一點？理想將是在基地R解決方案。謝謝！

所需的輸出：

id rec dna year 
1 1 1  NA 2001 
2 1 2  NA 2002 
3 1 3 5345 2003 
4 2 1  NA 2005 
5 2 2  45 2006 
... 
...

來源

2014-08-27 Maximilian

試試這個：

df$dna <- with(df, ave(dna, df$id, FUN = function(x){ 
    if ((len <- length(x)) > 1) 
    x[1:(len-1)] <- NA 
    x 
})) 
df 
# id rec dna year 
# 1 1 1  NA 2001 
# 2 1 2  NA 2002 
# 3 1 3 5345 2003 
# 4 2 1  NA 2005 
# 5 2 2  45 2006 
# 6 7 1  NA 2000 
# 7 7 2  NA 2001 
# 8 7 3 345345 2002

來源

2014-08-27 12:00:18 lukeA

雖然這如果有任何ID – rawr 2014-08-27 12:03:16

好點@rawr只有一個REC難道不工作，沒想到的是......我做了更新 – lukeA 2014-08-27 12:07:51

雖然你問的基礎R解決方案，但這裏有一個data.table溶液（以防萬一效率事項）

library(data.table) 
setDT(df)[, indx := .N, by = id][rec != indx, dna := NA_real_, by = id] 

# id rec dna year indx 
# 1: 1 1  NA 2001 3 
# 2: 1 2  NA 2002 3 
# 3: 1 3 5345 2003 3 
# 4: 2 1  NA 2005 2 
# 5: 2 2  45 2006 2 
# 6: 7 1  NA 2000 3 
# 7: 7 2  NA 2001 3 
# 8: 7 3 345345 2002 3

來源

2014-08-27 12:15:40

另一種方法：

transform(df, dna = ave(dna, id, FUN = function(x) "is.na<-"(x, -length(x)))) 

# id rec dna year 
# 1 1 1  NA 2001 
# 2 1 2  NA 2002 
# 3 1 3 5345 2003 
# 4 2 1  NA 2005 
# 5 2 2  45 2006 
# 6 7 1  NA 2000 
# 7 7 2  NA 2001 
# 8 7 3 345345 2002

來源

2014-08-27 13:34:36

在id列中，您可以使用duplicated函數及其fromLast參數。然後，我們可以將其包裝在dna列的向量子集中，並將NA值賦值給結果。

> df$dna[duplicated(df$id, fromLast = TRUE)] <- NA 
> df 
# id rec dna year 
# 1 1 1  NA 2001 
# 2 1 2  NA 2002 
# 3 1 3 5345 2003 
# 4 2 1  NA 2005 
# 5 2 2  45 2006 
# 6 7 1  NA 2000 
# 7 7 2  NA 2001 
# 8 7 3 345345 2002

來源

2014-08-27 14:50:21

與NA替換除了最後一行給出ID，其中R

回答

相關問題