循環遍歷R中的S4對象中的列

我試圖使用snpStats包執行關聯。我有一個名爲'plink'的snp矩陣，其中包含我的基因型數據（如 $基因型列表，$ map，$ fam），並且plink $ genotype具有：SNP名稱作爲列名稱（2個SNP）和主題標識符作爲行名稱：循環遍歷R中的S4對象中的列

plink$genotype 
SnpMatrix with 6 rows and 2 columns 
Row names: 1 ... 6 
Col names: 203 204

的砰砰數據集可以再現複製下面的PED和映射文件並將其保存爲「plink.ped」分別plink.map」：

plink.ped: 

1 1 0 0 1 -9 A A G G 
2 2 0 0 2 -9 G A G G 
3 3 0 0 1 -9 A A G G 
4 4 0 0 1 -9 A A G G 
5 5 0 0 1 -9 A A G G 
6 6 0 0 2 -9 G A G G 

plink.map: 

1 203 0 792429 
2 204 0 819185

然後以這種方式使用plink：

./plink --file plink --make-bed 

@[email protected] 
|  PLINK!  |  v1.07  | 10/Aug/2009  | 
|----------------------------------------------------------| 
| (C) 2009 Shaun Purcell, GNU General Public License, v2 | 
|----------------------------------------------------------| 
| For documentation, citation & bug-report instructions: | 
|  http://pngu.mgh.harvard.edu/purcell/plink/  | 
@[email protected] 

Web-based version check (--noweb to skip) 
Recent cached web-check found...Problem connecting to web 

Writing this text to log file [ plink.log ] 
Analysis started: Tue Nov 29 18:08:18 2011 

Options in effect: 
--file /ugi/home/claudiagiambartolomei/Desktop/plink 
--make-bed 

2 (of 2) markers to be included from [ /ugi/home/claudiagiambartolomei/Desktop /plink.map ] 
6 individuals read from [ /ugi/home/claudiagiambartolomei/Desktop/plink.ped ] 
0 individuals with nonmissing phenotypes 
Assuming a disease phenotype (1=unaff, 2=aff, 0=miss) 
Missing phenotype value is also -9 
0 cases, 0 controls and 6 missing 
4 males, 2 females, and 0 of unspecified sex 
Before frequency and genotyping pruning, there are 2 SNPs 
6 founders and 0 non-founders found 
Total genotyping rate in remaining individuals is 1 
0 SNPs failed missingness test (GENO > 1) 
0 SNPs failed frequency test (MAF < 0) 
After frequency and genotyping pruning, there are 2 SNPs 
After filtering, 0 cases, 0 controls and 6 missing 
After filtering, 4 males, 2 females, and 0 of unspecified sex 
Writing pedigree information to [ plink.fam ] 
Writing map (extended format) information to [ plink.bim ] 
Writing genotype bitfile to [ plink.bed ] 
Using (default) SNP-major mode 

Analysis finished: Tue Nov 29 18:08:18 2011

我也有一個包含結果的表型數據幀（outcome1，outcome2，...）我想與基因型，這是該關聯：

ID<- 1:6 
sex<- rep(1,6) 
age<- c(59,60,54,48,46,50) 
bmi<- c(26,28,22,20,23, NA) 
ldl<- c(5, 3, 5, 4, 2, NA) 
pheno<- data.frame(ID,sex,age,bmi,ldl)

協會工程當我做的唯一條件是：（用公式「snp.rhs.test」）：

bmi<-snp.rhs.tests(bmi~sex+age,family="gaussian", data=pheno, snp.data=plink$genotype)

我的問題是，我怎麼通過成果循環？這種類型的數據似乎不同於其他所有人，我在操作它時遇到問題，所以如果您有一些教程的建議可以幫助我瞭解如何執行此操作以及其他操作（如子集），我也將不勝感激。例如，snp.matrix數據。

這是我曾嘗試循環：

rhs <- function(x) { 
x<- snp.rhs.tests(x, family="gaussian", data=pheno, 
snp.data=plink$genotype) 
} 
res_ <- apply(pheno,2,rhs)

Error in x$terms : $ operator is invalid for atomic vectors

然後我嘗試這樣的：

for (cov in names(pheno)) { 
association<-snp.rhs.tests(cov, family="gaussian",data=pheno, snp.data=plink$genotype) 
}

Error in eval(expr, envir, enclos) : object 'bmi' not found

感謝您像往常一樣對你的幫助！ -f

來源

2011-11-29 user971102

你有沒有我們玩的例子數據集？ –

你不能在矩陣上使用'$'，而是使用'snp.matrix [，'genotype']'代替 – James

對不起，我厭倦了讓我的問題更清楚，我會嘗試添加生成的snpmatrix數據plink如果仍然令人困惑... – user971102

snpStats的作者是David Clayton。雖然在包裝說明中列出的網站是錯誤的，他仍然在該領域，並有可能做一個搜索的文檔與此規範的谷歌高級搜索功能：

snpStats site:https://www-gene.cimr.cam.ac.uk/staff/clayton/

可能的原因爲您的難度與訪問是這是一個S4包，訪問的方法是不同的。而不是打印方法S4對象通常具有show-methods。這裏有一個小插件：https://www-gene.cimr.cam.ac.uk/staff/clayton/courses/florence11/practicals/practical6.pdf，他的整個短期課程目錄是開放的訪問：https://www-gene.cimr.cam.ac.uk/staff/clayton/courses/florence11/

很明顯，從snp.rhs.tests返回的對象可以用「[」使用序列號或名稱，如第7頁所示。你可以得到名稱：

# Using the example on the help(snp.rhs.tests) page: 

> names(slt3) 
[1] "173760" "173761" "173762" "173767" "173769" "173770" "173772" "173774" 
[9] "173775" "173776"

您可致電列中的東西可能是「插槽」

> getSlots(class(slt3)) 
    snp.names var.names  chisq   df   N 
     "ANY" "character" "numeric" "integer" "integer" 
> str(getSlots(class(slt3))) 
Named chr [1:5] "ANY" "character" "numeric" "integer" "integer" 
- attr(*, "names")= chr [1:5] "snp.names" "var.names" "chisq" "df" ... 
> names(getSlots(class(slt3))) 
[1] "snp.names" "var.names" "chisq"  "df"  "N"

但對於遍歷這些插槽名稱沒有[i,j]方法。您應該轉到幫助頁?"GlmTests-class"，其中列出了爲該S4類定義的方法。

來源

2011-11-29 16:41:48

做的正確的方式所需的初始海報是什麼：

for (i in ncol(pheno)) { 
    association <- snp.rhs.tests(pheno[,i], family="gaussian", snp.data=plink$genotype) 
}

的snp.rhs.tests()的文件說，如果數據丟失，表型是從父幀拍攝 - 或許它是在措辭相反意義：如果指定了數據，表型將在指定的data.frame中進行評估。

這是一個清晰的版本：

for (i in ncol(pheno)) { 
    cc <- pheno[,i] 
    association <- snp.rhs.tests(cc, family="gaussian", snp.data=plink$genotype) 
}

來源

2012-02-18 15:29:36 hello123

文檔說data=parent.frame()是snp.rhs.tests()默認。

apply()代碼有一個明顯的錯誤 - 請不要做x <- some.fun(x)，因爲它確實很糟糕。試試這個 - 刪除data=，並使用不同的變量名稱。

rhs <- function(x) { 
y<- snp.rhs.tests(x, family="gaussian", snp.data=plink$genotype) 
} 
res_ <- apply(pheno,2,rhs)

此外，最初的海報的問題是誤導。

plink $ genotype是一個S4對象，pheno是一個data.frame（一個S3對象）。您真的只想選擇S3 data.frame中的列，但您將如何從snp.rhs.tests（）查找列（如果給出data.frame）或向量表型（if它是作爲一個普通的矢量給出的 - 即在父框架或你的「當前」框架中，因爲子程序是在「子框架」中評估的！）

來源

2012-02-18 15:47:32 hello123

循環遍歷R中的S4對象中的列

回答

相關問題