2016-07-22 49 views
0

我有一個名爲gwas.data運行到問題的因素有R功能

 SNP A1 A2   EFF  FRQ 
2353 rs10001803 A G -0.06620391 0.06860 
2307 rs10002573 T C -0.03969763 0.78100 
504 rs10003143 A C 0.03829721 0.53170 
1802 rs1001022 T C 0.08159842 0.96174 
461 rs10011564 T C 0.04930432 0.27840 
2331 rs10013187 A C -0.03600030 0.54490 

我有第二個框架命名correct.orientation data.frame:

 SNP CLST A1 A2 FRQ IMP  POS CHR BVAL 
54445 rs10001803 Brahui G A 1.00 1 157121506 4 898 
49713 rs10002573 Brahui C T 0.26 0 31120097 4 983 
52885 rs10003143 Brahui A C 0.42 0 114272159 4 918 
193805 rs1001022 Brahui T C 0.98 0 24733488 22 970 
48257 rs10011564 Brahui T C 0.10 1 18734768 4 863 
52313 rs10013187 Brahui C A 0.34 1 103040573 4 908 

我想有A1的列和A2匹配這兩個文件。如果gwas.data的列從correct.orientation翻轉過來,那麼我想將它們翻轉到正確的方向。如果我翻轉它們,我也想要更改EFF列的符號,並取012-列的(1-FRQ)。這是我目前嘗試使用要做到這一點代碼:

gwas.data <- MatchAlleles (gwas.data , assoc.loci.freqs)  
MatchAlleles <- function (gwas.data , assoc.loci.freqs) { 

      if (nrow (gwas.data) != nrow (correct.orientation)) { 
        stop ("GWAS dataset and Orientation Matching dataset contain differing numbers of SNPs") 
      } 

      flip <- gwas.data$A1 == correct.orientation$A2 & gwas.data$A2 == correct.orientation$A1 
      dont.flip <- gwas.data$A1 == correct.orientation$A1 & gwas.data$A2 == correct.orientation$A2 
      for (i in 1 : nrow (gwas.data)) { 
        if (flip [ i ]) { 
          gwas.data$A1 [ i ] <- correct.orientation$A1 [ i ] 
          gwas.data$A2 [ i ] <- correct.orientation$A2 [ i ] 
          gwas.data$EFF [ i ] <- - gwas.data$EFF [ i ] 
          gwas.data$FRQ [ i ] <- 1 - gwas.data$FRQ [ i ] 
        } else if (dont.flip [ i ]) { 
          #do nothing 
        } else { 
          stop ("Strand Issue") 
        } 
      } 
    return (gwas.data) 
    } 

assoc.loci.freqs項是無關的,包括在原始代碼,但在功能上漲,不會影響這一點。當我嘗試使用此代碼時,我收到錯誤:Error in Ops.factor(gwas.data$A1, correct.orientation$A2) : level sets of factors are different這可能是什麼原因造成的?

+4

試過只是把因素轉換成字符?或者閱讀文件,以便設置'as.is = TRUE'以避免創建因素? – dayne

+0

兩個數據幀的長度是否相同? (行數) –

+0

我是一個白癡......是的他們是相同的長度,但@Dayne修復它 – Evan

回答

0

難道是某個核苷酸在gwas.data$A1而不是correct.orientation$A2?使用上面給出的示例,來自A1的因子水平是AT,而來自A2的因子水平是A,TC

@ dayne的評論中的建議應該繞開這個問題。