從分組數據

使用dplyr選擇第一個和最後一排，我怎麼在一個聲明中選擇分組數據的頂部和底部的意見/行？

數據&例

給定一個數據幀

df <- data.frame(id=c(1,1,1,2,2,2,3,3,3), 
       stopId=c("a","b","c","a","b","c","a","b","c"), 
       stopSequence=c(1,2,3,3,1,4,3,1,2))

我可以從使用slice每組頂部和底部的觀察，但使用兩個單獨的statments：

firstStop <- df %>% 
    group_by(id) %>% 
    arrange(stopSequence) %>% 
    slice(1) %>% 
    ungroup 

lastStop <- df %>% 
    group_by(id) %>% 
    arrange(stopSequence) %>% 
    slice(n()) %>% 
    ungroup

我可以結合這兩個statmenets到一個選擇均爲頂部和底部觀察？

來源

2015-07-21 tospig

126

有可能是一個更快的方式：

df %>% 
    group_by(id) %>% 
    arrange(stopSequence) %>% 
    filter(row_number()==1 | row_number()==n())

來源

2015-07-21 01:48:01 jeremycg

+37

'ROWNUMBER（）以％C（％1，N（））將避免兩次運行向量掃描的需要 – MichaelChirico

@MichaelChirico I懷疑你省略了一個'_'？即'filter（row_number（）％in％c（1，n（）））' –

喜歡的東西：

library(dplyr) 

df <- data.frame(id=c(1,1,1,2,2,2,3,3,3), 
       stopId=c("a","b","c","a","b","c","a","b","c"), 
       stopSequence=c(1,2,3,3,1,4,3,1,2)) 

first_last <- function(x) { 
    bind_rows(slice(x, 1), slice(x, n())) 
} 

df %>% 
    group_by(id) %>% 
    arrange(stopSequence) %>% 
    do(first_last(.)) %>% 
    ungroup 

## Source: local data frame [6 x 3] 
## 
## id stopId stopSequence 
## 1 1  a   1 
## 2 1  c   3 
## 3 2  b   1 
## 4 2  c   4 
## 5 3  b   1 
## 6 3  a   3

隨着do你幾乎可以在組，但@ jeremycg的答案執行任何數量的操作是方式更合適只是爲了這個任務。

來源

2015-07-21 01:48:43 hrbrmstr

沒有考慮寫一個函數 - 當然是一個更復雜的方法。 – tospig

這似乎過於複雜相比，只是使用'slice'，如'DF％>％安排（stopSequence）％>％GROUP_BY（ID）％>％切片（C（1，N（）））' – Frank

不不同意（我指出jeremycg在帖子中是一個更好的答案），但在這裏有一個'do'的例子可能有助於其他人在'slice'不起作用的時候（例如對一個組進行更復雜的操作）。而且，你可以發表你的評論作爲答案（這是最好的答案）。 – hrbrmstr

不dplyr，但它使用data.table的更直接：

library(data.table) 
setDT(df) 
df[ df[order(id, stopSequence), .I[c(1L,.N)], by=id]$V1 ] 
# id stopId stopSequence 
# 1: 1  a   1 
# 2: 1  c   3 
# 3: 2  b   1 
# 4: 2  c   4 
# 5: 3  b   1 
# 6: 3  a   3

更詳細的解釋：

# 1) get row numbers of first/last observations from each group 
# * basically, we sort the table by id/stopSequence, then, 
#  grouping by id, name the row numbers of the first/last 
#  observations for each id; since this operation produces 
#  a data.table 
# * .I is data.table shorthand for the row number 
# * here, to be maximally explicit, I've named the variable V1 
#  as row_num to give other readers of my code a clearer 
#  understanding of what operation is producing what variable 
first_last = df[order(id, stopSequence), .(row_num = .I[c(1L,.N)]), by=id] 
idx = first_last$row_num 

# 2) extract rows by number 
df[idx]

一定要檢查出Getting Started維基得到data.table基本覆蓋

來源

2015-07-21 02:05:52 MichaelChirico

或者'df [df [order（stopSequence），.I [c（1，.N）]，keyby = id] $ V1]'。看到'id'出現兩次對我來說很奇怪。 – Frank

您可以在'setDT'調用中設置按鍵。所以'訂單'電話不需要在這裏。 –

@ArtemKlevtsov - 儘管如此，您可能並不總是想要設置按鍵。 – SymbolixAU

只是爲了完整性：您可以通過slice一個指標向量S：

df %>% arrange(stopSequence) %>% group_by(id) %>% slice(c(1,n()))

這給

id stopId stopSequence 
1 1  a   1 
2 1  c   3 
3 2  b   1 
4 2  c   4 
5 3  b   1 
6 3  a   3

來源

2015-07-21 17:11:14 Frank

我知道指定dplyr的問題。但是，因爲其他人使用其他套餐已發佈的解決方案，我決定有一個去使用其他的包太：

基礎包：

df <- df[with(df, order(id, stopSequence, stopId)), ] 
merge(df[!duplicated(df$id), ], 
     df[!duplicated(df$id, fromLast = TRUE), ], 
     all = TRUE)

數據。表：

df <- setDT(df) 
df[order(id, stopSequence)][, .SD[c(1,.N)], by=id]

sqldf：

library(sqldf) 
min <- sqldf("SELECT id, stopId, min(stopSequence) AS StopSequence 
     FROM df GROUP BY id 
     ORDER BY id, StopSequence, stopId") 
max <- sqldf("SELECT id, stopId, max(stopSequence) AS StopSequence 
     FROM df GROUP BY id 
     ORDER BY id, StopSequence, stopId") 
sqldf("SELECT * FROM min 
     UNION 
     SELECT * FROM max")

在一個查詢：

sqldf("SELECT * 
     FROM (SELECT id, stopId, min(stopSequence) AS StopSequence 
       FROM df GROUP BY id 
       ORDER BY id, StopSequence, stopId) 
     UNION 
     SELECT * 
     FROM (SELECT id, stopId, max(stopSequence) AS StopSequence 
       FROM df GROUP BY id 
       ORDER BY id, StopSequence, stopId)")

輸出：

id stopId StopSequence 
1 1  a   1 
2 1  c   3 
3 2  b   1 
4 2  c   4 
5 3  a   3 
6 3  b   1

來源

2015-07-21 18:06:04 mpalanco

回答

相關問題