如何根據另一個列條件刪除重複的行？

如何可以選擇基於在第二列中的最大的重複行（僅基於第一列）：如何根據另一個列條件刪除重複的行？

data<-data.frame(a=c(1,3,3,3),b=c(1,4,6,3),d=c(1,5,7,1)) 

a b d 
1 1 1 
3 4 5 
3 6 7 
3 3 1 


a b d 
1 1 1 
3 6 7

在第二列6是最大4,6,3之間

來源

2015-05-09 Soheil

您可以嘗試像下面這樣，使用「dplyr」：

library(dplyr) 

data %>%     ## Your data 
    group_by(a) %>%   ## grouped by "a" 
    filter(b == max(b))  ## filtered to only include the rows where b == max(b) 
# Source: local data frame [2 x 3] 
# Groups: a 
# 
# a b d 
# 1 1 1 1 
# 2 3 6 7

但是請注意，如果有匹配b == max(b)，這些也將返回更多的行。因此，另一種可能是：

data %>%     ## Your data 
    group_by(a) %>%   ## grouped by "a" 
    arrange(desc(b)) %>% ## sorted by descending values of "b" 
    slice(1)    ## with just the first row extracted

來源

2015-05-09 03:32:52 A5C1D2H2I1M1N2O1R2T1

謝謝，什麼'％>％'做的到底是什麼？ – Soheil

@Sheheil，它將輸出從一個步驟「輸送」到下一個步驟，並允許您構建關於您正在嘗試執行的操作的「語句」（如我的意見）。 – A5C1D2H2I1M1N2O1R2T1

謝謝，我在包裝中閱讀。一扇新的門爲我打開。 – Soheil

選項使用data.table被

library(data.table) 
setDT(data)[, .SD[which.max(b)], a] 
# a b d 
#1: 1 1 1 
#2: 3 6 7

或者用.I獲得行索引（這將是一個有點快）

setDT(data)[data[, .I[which.max(b)], a]$V1] 
# a b d 
#1: 1 1 1 
#2: 3 6 7

或者

setkey(setDT(data), a,b)[,.SD[.N], a] 
# a b d 
#1: 1 1 1 
#2: 3 6 7

如果有關係的最大值

setDT(data)[, .SD[max(b)==b], a] 
# a b d 
#1: 1 1 1 
#2: 3 6 7

來源

2015-05-09 11:36:38 akrun

如何根據另一個列條件刪除重複的行？

回答

相關問題