我有一個integer
matrix
:高效施加條件的矩陣
set.seed(1)
counts.mat <- matrix(sample(50,29*10,replace=T),nrow=10,ncol=29)
colnames(counts.mat) <- c("ww.1m_1","ww.1m_2","wm.1m_1","wm.1m_2","wm.1m_3","wn.1m_1","wn.1m_2",
"A_1","A_2","B_1","B_2","C_1","C_2",
"ww.2m_1","ww.2m_2","ww.2m_3","wm.2m_1","wm.2m_2","wn.2m_1","wn.2m_2",
"ww.3m_1","ww.3m_2","ww.3m_3","wm.3m_1","wm.3m_2","wm.3m_3","wn.3m_1","wn.3m_2","wn.3m_3")
其元素表示從一組實驗得到的特定的測量的計數(在此實施例3),其在data.frame
的這個list
描述的:
df.list <- list(df1 = data.frame(gt1=c("ww.1m","wm.1m","wn.1m"),kt1=c("A","B","C"),stringsAsFactors=F),
df2 = data.frame(gt2=c("ww.2m","wm.2m","wn.2m"),stringsAsFactors=F),
df3 = data.frame(gt2=c("ww.3m","wm.3m","wn.3m"),stringsAsFactors=F))
在每data.frame
在df.list
是其相應的實驗的因素列和列的值是實際r水平。 counts.mat
的colnames
是這些因子水平的複製品,並且它們的名稱遵循以下格式:
<factor.level>_<replicate>
。
這相當於df.list
。
例如,在gt1
是df.list$df1
與水平的因子:
"ww.1m" "wm.1m" "wn.1m"
,其相應的次重複在counts.mat
是:
"ww.1m_1","ww.1m_2","wm.1m_1","wm.1m_2","wm.1m_3","wn.1m_1","wn.1m_2"
鑑於:
min.replicates <- 1
min.counts <- 10
我想要做的是每個因子(列),在每個data.frame
在df.list
回報TRUE
或FALSE
如果至少min.replicates
以上至少有min.counts
以上的每一行中counts.mat
。
結果應該是一個matrix
其中它的列的數量等於df.list
因子水平和行數的總數等於counts.mat
行數。
這就是我認爲這是一個緩慢的實現:
res.mat <- do.call(rbind,lapply(1:nrow(counts.mat),function(i){
return(do.call(cbind,lapply(1:length(df.list),function(l){
return(do.call(cbind,lapply(1:ncol(df.list[[l]]),function(j){
return(do.call(cbind,lapply(1:nrow(df.list[[l]]),function(k){
return(length(which(counts.mat[i,which(grepl(paste0(df.list[[l]][k,j],"_\\d+$"),colnames(counts.mat),perl=T))] >= min.counts)) >= min.replicates)
})))
})))
})))
}))
所以我在尋找的東西顯著更快。
在你的'counts.mat'你有重複的列名稱'wm.3m_1'和'wm.3m_2' - 如果倒數第二行上的那些是'2m'而不是比'3m'? –
對不起 - 固定 – dan