重複data.frame N次

我有以下的數據幀重複data.frame N次

data.frame(a = c(1,2,3),b = c(1,2,3)) 
    a b 
1 1 1 
2 2 2 
3 3 3

，我希望把它變成

或重複了N次。在R中有這樣一個簡單的功能嗎？謝謝！

來源

2012-01-06 Michael

您可以使用replicate()，然後rbind將結果重新拼接在一起。 rownames會自動更改爲從1：nrows運行。

d <- data.frame(a = c(1,2,3),b = c(1,2,3)) 
n <- 3 
do.call("rbind", replicate(n, d, simplify = FALSE))

更傳統的方式是使用索引，但這裏的rowname改變不是那麼整齊（但更多的信息）：

d[rep(seq_len(nrow(d)), n), ]

來源

2012-01-06 05:23:35 mdsumner

謹防零個數據幀。 seq_len可能是一個更好的選擇 – hadley 2012-01-06 09:17:42

謝謝，我對此表示遺憾（我總是認爲這是seq_along，並沒有付出努力）。我很欣賞那些頭。 – mdsumner 2012-01-06 13:34:03

d <- data.frame(a = c(1,2,3),b = c(1,2,3)) 
r <- Reduce(rbind, list(d)[rep(1L, times=3L)])

來源

2012-01-06 19:34:05

要小心詳細說明你剛剛做了什麼以及它如何與mdsumner的答案進行比較？也許粘貼一些結果？ – 2012-01-07 01:28:51

對於data.frame對象，這個解決方案是幾比@ mdsummer和@ wojciech-sobala的時間快了幾倍。

d[rep(seq_len(nrow(d)), n), ]

對於data.table對象，@ mdsummer的是有點不是轉換成data.frame之後應用上述更快。對於大n可能翻轉。 microbenchmark 。

全碼：

Repeat1 <- function(d, n) { 
    return(do.call("rbind", replicate(n, d, simplify = FALSE))) 
} 

Repeat2 <- function(d, n) { 
    return(Reduce(rbind, list(d)[rep(1L, times=n)])) 
} 

Repeat3 <- function(d, n) { 
    if ("data.table" %in% class(d)) return(d[rep(seq_len(nrow(d)), n)]) 
    return(d[rep(seq_len(nrow(d)), n), ]) 
} 

Repeat3.dt.convert <- function(d, n) { 
    if ("data.table" %in% class(d)) d <- as.data.frame(d) 
    return(d[rep(seq_len(nrow(d)), n), ]) 
} 

# Try with data.frames 
mtcars1 <- Repeat1(mtcars, 3) 
mtcars2 <- Repeat2(mtcars, 3) 
mtcars3 <- Repeat3(mtcars, 3) 

library(RUnit) 
checkEquals(mtcars1, mtcars2) 
# Only difference is row.names having ".k" suffix instead of "k" from 1 & 2 
checkEquals(mtcars1, mtcars3) 

# Works with data.tables too 
mtcars.dt <- data.table(mtcars) 
mtcars.dt1 <- Repeat1(mtcars.dt, 3) 
mtcars.dt2 <- Repeat2(mtcars.dt, 3) 
mtcars.dt3 <- Repeat3(mtcars.dt, 3) 

# No row.names mismatch since data.tables don't have row.names 
checkEquals(mtcars.dt1, mtcars.dt2) 
checkEquals(mtcars.dt1, mtcars.dt3) 

# Time test 
library(microbenchmark) 
res <- microbenchmark(Repeat1(mtcars, 10), 
         Repeat2(mtcars, 10), 
         Repeat3(mtcars, 10), 
         Repeat1(mtcars.dt, 10), 
         Repeat2(mtcars.dt, 10), 
         Repeat3(mtcars.dt, 10), 
         Repeat3.dt.convert(mtcars.dt, 10)) 
print(res) 
library(ggplot2) 
ggsave("~/gdrive/repeat_microbenchmark.png", autoplot(res))

來源

2015-02-13 18:55:46

只要使用具有復讀功能簡單的索引。

mydata<-data.frame(a = c(1,2,3),b = c(1,2,3)) #creating your data frame 
n<-10   #defining no. of time you want repetition of the rows of your dataframe 

mydata<-mydata[rep(rownames(mydata),n),] #use rep function while doing indexing 
rownames(mydata)<-1:NROW(mydata) #rename rows just to get cleaner look of data

來源

2016-04-01 11:22:13 learner

封裝dplyr包含函數bind_rows()直接結合所有的數據幀中的列表，以使得沒有必要與rbind()一起使用do.call()：

df <- data.frame(a = c(1, 2, 3), b = c(1, 2, 3)) 
library(dplyr) 
bind_rows(replicate(3, df, simplify = FALSE))

對於大量repetions的bind_rows()也比rbind()快得多：

library(microbenchmark) 
microbenchmark(rbind = do.call("rbind", replicate(1000, df, simplify = FALSE)), 
       bind_rows = bind_rows(replicate(1000, df, simplify = FALSE)), 
       times = 20) 
## Unit: milliseconds 
##  expr  min  lq  mean median  uq  max neval cld 
##  rbind 31.796100 33.017077 35.436753 34.32861 36.773017 43.556112 20 b 
## bind_rows 1.765956 1.818087 1.881697 1.86207 1.898839 2.321621 20 a

來源

2017-08-11 15:30:22 Stibu

我猜'slice（rep（row_number（），3））'更好，根據Max的基準。哦，剛剛看到你的長凳......我個人認爲擴大DF的大小會是正確的方向，而不是桌子的數量，但我不知道。 – Frank 2017-08-11 15:34:24

不錯的一個！當我對它進行基準測試時，'slice（df，rep（row_number（），3））'比bind_rows（replicate（...））'慢了1.9（比2.1 ms）。無論如何，我認爲有一個'dplyr'解決方案以及... – Stibu 2017-08-11 15:42:42

@Frank你可能是對的。我沒有檢查大數據框會發生什麼，因爲我只是使用了問題中提供的那個。 – Stibu 2017-08-11 15:45:42

重複data.frame N次

回答

相關問題