解讀R代碼功能

我想進行通路富集分析。我有21個重要基因的列表，以及我想檢查的多種類型的通路（即檢查KEGG通路，GOterms，複合體等的富集）。解讀R代碼功能

我發現這個代碼例子，在一箇舊BIOC崗位。但是，我無法適應自己。

首先， 1這是什麼意思？我不知道這個多重冒號語法。

hyperg <- Category:::.doHyperGInternal

2 - 我不明白這條線是如何工作的。 hyperg.test是一個需要傳遞給它3個變量的函數，對嗎？這是行不知何故通過「genes.by.pathways，significant.genes和all.geneIDs到THR hyperg.test？

，我想

pVals.by.pathway<-t(sapply(genes.by.pathway, hyperg.test, significant.genes, all.geneIDs))

代碼以適應

library(KEGGREST) 
library(org.Hs.eg.db) 

    # created named list, length 449, eg: 
    # path:hsa00010: "Glycolysis/Gluconeogenesis" 

pathways <- keggList("pathway", "hsa") 

    # make them into KEGG-style human pathway identifiers 
human.pathways <- sub("path:", "", names(pathways)) 

    # for demonstration, just use the first ten pathways 

demo.pathway.ids <- head(human.pathways, 10) 
demo.pathways <- setNames(keggGet(demo.pathway.ids), demo.pathway.ids) 

genes.by.pathway <- lapply(demo.pathways, function(demo.pathway) { 
    demo.pathway$GENE[c(TRUE, FALSE)] 
     }) 

all.geneIDs <- keys(org.Hs.eg.db) 

    # chose one of these for demonstration. the first (a whole genome random 
    # set of 100 genes) has very little enrichment, the second, a random set 
    # from the pathways themselves, has very good enrichment in some pathways 

set.seed(123) 
significant.genes <- sample(all.geneIDs, size=100) 
#significant.genes <- sample(unique(unlist(genes.by.pathway)), size=10) 

    # the hypergeometric distribution is traditionally explained in terms of 
    # drawing a sample of balls from an urn containing black and white balls. 
    # to keep the arguments straight (in my mind at least), I use these terms 
    # here also 

hyperg <- Category:::.doHyperGInternal 
hyperg.test <- 
    function(pathway.genes, significant.genes, all.genes, over=TRUE) 
{ 
    white.balls.drawn <- length(intersect(significant.genes, pathway.genes)) 
    white.balls.in.urn <- length(pathway.genes) 
    total.balls.in.urn <- length(all.genes) 
    black.balls.in.urn <- total.balls.in.urn - white.balls.in.urn 
    balls.pulled.from.urn <- length(significant.genes) 
    hyperg(white.balls.in.urn, black.balls.in.urn, 
      balls.pulled.from.urn, white.balls.drawn, over) 
} 

pVals.by.pathway <- 
    t(sapply(genes.by.pathway, hyperg.test, significant.genes, all.geneIDs)) 

print(pVals.by.pathway)

來源

2014-11-24 user2814482

是像您期望或代碼行爲不，你只是想只是更好地理解它？目前還不清楚你是否有問題或者只是想要信息。 – cdeterman 2014-11-24 13:43:19

@cdeterman - 兩個，我真的不明白，尤其是最後一行，這使得它很難改變它，我的「M收到此錯誤> pVals.by.pathway < - + T（sapply（基因。 by.pathway，hyperg.test，significant.genes，all.geneIDs）） FUN（X [[1L]]，...的錯誤）：找不到函數「hyperg」 – user2814482 2014-11-24 13:54:37

原因你是讓你的錯誤是因爲它似乎你沒有Category包從Bioconductor的安裝。我懷疑，因爲三冒號操作:::了這一點。這種操作是非常相似的雙冒號操作::。而與::您可以訪問導出來自包的對象不加載它，:::允許訪問非導出對象（在這種情況下函數從Category）。如果您安裝了Category包，代碼將無誤地運行。

關於sapply聲明：

pVals.by.pathway<-t(sapply(genes.by.pathway, hyperg.test, significant.genes, all.geneIDs))

您可以將其分爲單獨的部分去了解它。首先，sapply正在迭代gene.by.pathway的元素並將它們傳遞給hyperg.test的第一個參數。以下參數是兩個附加參數。這是有點不清楚，我個人建議人們明確確定參數，以避免意外的驚喜，並避免需要完全相同的順序。這是在這種情況下，有點重複，但一個很好的方式，以避免愚蠢的錯誤（比如把significant.genesall.geneIds後）

改寫爲：

pVals.by.pathway <- 
    t(sapply(genes.by.pathway, hyperg.test, significant.genes=significant.genes, all.genes=all.geneIDs))

一旦這個循環完成，sapply功能簡化了輸出到一個矩陣。但是，通過轉置t，輸出更加便於用戶使用。

一般來說，試圖瞭解複雜的apply報表時，我發現最好打散他們以更小的部分，看看對象本身的樣子。

來源

2014-11-24 14:16:40 cdeterman

謝謝，這有幫助，我不知道你可以通過這種方式將參數傳遞給函數 – user2814482 2014-11-24 14:26:08

解讀R代碼功能

回答

相關問題