2011-05-12 62 views
1

我的同事薩曼莎問了一個不清楚的問題,所以在這裏我問這裏的問題。 她有一個變量goterms,包含所有要分析的數據幀。R:通過檢查參考集從列表生成數據幀

goterms <- c('df1','df2','df3') 

interestedGO變量包含每個goterm與ILMN號的列表。所以第一個列表包含了df1等的ILMN代碼。

df1 <- c("ILMN_1665132", "ILMN_1691487", "ILMN_1716446", "ILMN_1769383", 
     "ILMN_1772387", "ILMN_1783910", "ILMN_1784863") 
df2 <- c("ILMN_1651599", "ILMN_1652693", "ILMN_1652825", "ILMN_1653324", 
     "ILMN_1655595", "ILMN_1656057", "ILMN_1659077", "ILMN_1659923", 
     "ILMN_1659947", "ILMN_1662322", "ILMN_1662619", "ILMN_1664565", 
     "ILMN_1665132", "ILMN_1665738", "ILMN_1665859") 
df3 <- c("ILMN_1661695", "ILMN_1665132", "ILMN_1716446", "ILMN_1737314", 
     "ILMN_1772387", "ILMN_1784863", "ILMN_1796094", "ILMN_1800317", 
     "ILMN_1800512", "ILMN_1807074") 
interestedGO <- list(df1,df2,df3) 

xx2是一個比較集。變量xx2包含所有可能的ILMN號碼的子集。

xx2 <- c("ILMN_1691487", "ILMN_1716446", "ILMN_1769383","ILMN_1832921") 

x是一種參考集。變量x包含所有可能的ILMN號碼。

x <- c("ILMN_1665132", "ILMN_1691487", "ILMN_1716446", "ILMN_1769383", "ILMN_1772387", 
     "ILMN_1783910", "ILMN_1784863","ILMN_1651599", "ILMN_1652693", "ILMN_1652825", 
     "ILMN_1653324", "ILMN_1655595","ILMN_1656057", "ILMN_1659077", "ILMN_1659923", 
     "ILMN_1659947", "ILMN_1662322","ILMN_1662619", "ILMN_1664565", "ILMN_1665132", 
     "ILMN_1665738", "ILMN_1665859","ILMN_1661695", "ILMN_1665132", "ILMN_1716446", 
     "ILMN_1737314", "ILMN_1772387","ILMN_1784863", "ILMN_1796094", "ILMN_1800317", 
     "ILMN_1800512", "ILMN_1807074") 

所有這些變量的目標是與相應ILMN代碼檢查每個goterm如果他們是在referenceset xx2。爲了檢查這一點,使用了匹配函數,並且所有沒有匹配項都給出了0,並且匹配值被替換爲1.爲了便於對所有goterms實驗進行概述,我想創建一個類似於下面的循環,檢查它的每個基因都在參考集x中。最終結果必須是data.frame,比較data.frame中每個goterm的結果。

test <- list() 
for (i in 1:length(goterms)) { 
    goilmn <- as.data.frame(interestedGO[i]) 
    resultILMN <- match(goilmn[,1], xx2, nomatch=0) 
    resultILMN[resultILMN!=0] <- 1 
    result <- cbind(goilmn, resultILMN) 
    colnames(result) <- c('x', 'result') 

    zz <- merge(result, x, all=TRUE) 
    zz[is.na(zz)] <- 0 
    test[[i]] <- matrix(resultloop) 
} 

最終輸出將是就像這樣:

1 ILMN_1651599  0 0 0 
2 ILMN_1652693  0 0 0 
3 ILMN_1652825  0 0 0 
4 ILMN_1653324  0 0 0 
5 ILMN_1655595  0 0 0 
6 ILMN_1656057  0 0 0 
7 ILMN_1659077  0 0 0 
8 ILMN_1659923  0 0 0 
9 ILMN_1659947  0 0 0 
10 ILMN_1661695  0 0 0 
11 ILMN_1662322  0 0 0 
12 ILMN_1662619  0 0 0 
13 ILMN_1664565  0 0 0 
14 ILMN_1665132  0 0 0 
15 ILMN_1665132  0 0 0 
16 ILMN_1665132  0 0 0 
17 ILMN_1665738  0 0 0 
18 ILMN_1665859  0 0 0 
19 ILMN_1691487  0 0 1 
20 ILMN_1716446  1 0 1 
21 ILMN_1716446  1 0 1 
22 ILMN_1737314  0 0 0 
23 ILMN_1769383  0 0 1 
24 ILMN_1772387  0 0 0 
25 ILMN_1772387  0 0 0 
26 ILMN_1783910  0 0 0 
27 ILMN_1784863  0 0 0 
28 ILMN_1784863  0 0 0 
29 ILMN_1796094  0 0 0 
30 ILMN_1800317  0 0 0 
31 ILMN_1800512  0 0 0 
32 ILMN_1807074  0 0 0 

誰能幫助我? 謝謝!

回答

3

這是否適合您?

data.frame(code=x, sapply(interestedGO, function(curdf){ 
     ifelse(x %in% xx2, x %in% curdf, 0) 
    })) 
+0

+1不錯。我正在研究類似的方法,但您的解決方案非常緊湊。 – Andrie 2011-05-12 10:02:11

+0

這是Briljant!非常感謝! – Lisann 2011-05-12 10:06:20