2017-06-16 70 views
0

我有一個數據幀的工作在另一個data.frame創建一個列的100個隨機抽樣data.frame

df <- data.frame(a = c("gene1", "gene2", "gene3", ...), 
       b = c(10, 20, 30, ...)) 

我想創建一個由每100列的新數據幀其中包含來自原始數據幀的列a的250個基因的不同隨機選擇。這是我迄今爲止嘗試:

data.frame(matrix(data = df[sample(nrow(df), 250), 1], 
        ncol = 100, nrow = 250)) 

然而,這種填充用相同的隨機抽樣,而不是唯一的一個每一列。

+2

使用'重複(100,樣本...)'。你的'樣本'表達很好。你可以用'data.frame'包裝整個東西。 –

+0

謝謝,那很好用! –

+0

您提供給'matrix'的'data'的元素數量比創建的維度要少,因此它會被回收。根據您希望替換的方式,您可以增加樣本大小以匹配,例如'sample(nrow(df),100 * 250)' – alistaire

回答

0

你去那裏,與10,而不是100和5,而不是250

df <- data.frame(a = paste0("gene",1:100), 
       b = seq(10,100,10)) 
random_samples <- replicate(10,df[sample(nrow(df), 5), 1]) 

# [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10] 
# [1,] "gene14" "gene100" "gene13" "gene5" "gene20" "gene68" "gene24" "gene57" "gene54" "gene44" 
# [2,] "gene71" "gene67" "gene44" "gene25" "gene90" "gene45" "gene46" "gene69" "gene76" "gene3" 
# [3,] "gene54" "gene34" "gene97" "gene67" "gene10" "gene50" "gene62" "gene54" "gene49" "gene58" 
# [4,] "gene81" "gene18" "gene50" "gene60" "gene56" "gene7" "gene42" "gene82" "gene50" "gene51" 
# [5,] "gene12" "gene71" "gene31" "gene19" "gene50" "gene2" "gene15" "gene95" "gene59" "gene23" 

# with seeds 
seeds <- 1:10 
seeds %>% sapply(function(x){set.seed(x);df[sample(nrow(df), 5), 1]}) %>% as.data.frame %>% setNames(paste0("S",seeds)) 

#   S1  S2  S3  S4  S5  S6  S7  S8  S9 S10 
#   1 gene27 gene19 gene17 gene59 gene21 gene61 gene99 gene47 gene23 gene51 
#   2 gene37 gene70 gene80 gene1 gene68 gene93 gene40 gene21 gene3 gene31 
#   3 gene57 gene57 gene38 gene29 gene90 gene26 gene12 gene79 gene21 gene42 
#   4 gene89 gene17 gene32 gene27 gene28 gene37 gene7 gene64 gene98 gene68 
#   5 gene20 gene91 gene58 gene79 gene11 gene78 gene24 gene31 gene43 gene9 
相關問題