2012-08-05 132 views
2

我有以下代碼:使用「Biopython」 - 我怎麼能提高我的代碼

from Bio import AlignIO 
import itertools 

out=open("test.csv","a") 
align = AlignIO.read("HPV16_CG.aln.fas", "fasta") 
n=0 
def SNP(line): 
    result=[] 
    result.append(str(n+1)) 
    result.append(line[0]) 
    result.append(align[y].id.rsplit("|")[3]) 
    result.append(x) 
    return result 



while n<len(align[0]): 
    line = align[:,n] 
    y=0 
    for x in line: 
    if line[0]!=x: 
     print >> out, ','.join(map(str,SNP(line))) 
     y=y+1 
    else: 
     y=y+1 
    y=0 
    n=n+1 
out.close() 

f=open("test.csv","rU") 

out=open("test_2.csv","a") 
lines=f.read().split() 

for key, group in itertools.groupby(lines, lambda line: line.partition(',')[0]): 
    print >>out, ','.join(group) 

out.close() 
f.close() 

正如你所看到的,我目前正在寫兩個文件。我真的只需要第二個文件。 有沒有人有任何建議將兩個「下標」合併爲一個?

輸入文件「HPV16_CG.aln.fas」看起來是這樣的:

>gi|333031|lcl|HPV16REF.1| Alpha-9 - Human Papillomavirus 16, complete genome. 
ACTACAATAATTCATGTATAAAACTAAGGGCGTAACCGAAATCGGTTGAACCGAAACCGG 

>gi|333031|gb|K02718.1|PPH16 Human papillomavirus type 16 (HPV16), complete genome 
ACTACAATAATTCATGTATAAAACTAAGGGCGTAACCGAAATCGGTTGAACCGAAACCGG 

>gi|196170262|gb|FJ006723.1| Human papillomavirus type 16, complete genome 
ACTACAATAATTCATGTATAAAACTAAGGGCGTAACCGAAATCGGTTGAACCGAAACCGG 

我真的很感激所有幫助/建議,幫助我提高這個!

+0

您需要修復您的縮進,然後才能幫助您,恐怕。請參閱[如何格式化我的代碼塊?](http://meta.stackexchange.com/q/22186) – 2012-08-05 15:10:46

+3

如果您想要進行通用代碼審查,請考慮使用http://codereview.stackexchange.com – kojiro 2012-08-05 16:01:30

回答

1

最簡單的做法是將文件的行保留在內存中,但我懷疑這不會起作用,因爲任何有用的生物信息文件都可能相當大。

這裏是在通過移除全局變量的使用並增加了發電機的功能,創建從SNP功能以流方式,應該是與您的通話itertools.groupby兼容返回行清理腳本的嘗試。

from Bio import AlignIO 
import itertools 

n=0 
align = AlignIO.read("HPV16_CG.aln.fas", "fasta") 

def SNP(line, y, x): 
    """Pass y as a parameter rather than relying on a global""" 
    result=[] 
    result.append(str(n+1)) 
    result.append(line[0]) 
    result.append(align[y].id.rsplit("|")[3]) 
    result.append(x) 
    return result 

def generate_snp_lines(align, n): 
    """this is a function generator that'll produce lines without writing them to a file""" 
    while n<len(align[0]): 
     line = align[:,n] 
     y=0 
     for x in line: 
      if line[0]!=x: 
       yield ','.join(map(str,SNP(line, y, x))) 
      y+=1 
     n+=1 

def main(): 

    # let's use a context manager to open and cleanup this file for us: 
    with open("test.csv","a") as out: 
     # construct the generator: 
     lines = generate_snp_lines(align, n) 
     # pass it to itertools.groupby like we'd pass any iterable: 
     for key, group in itertools.groupby(lines, lambda line: line.partition(',')[0]): 
      print >>out, ','.join(group) 

if __name__=="__main__": 
    main() 
+0

感謝幫幫我。代碼產生了一個空文件...你知道爲什麼會這樣嗎? – Stylize 2012-08-05 15:41:15

+0

我編輯了上面的代碼......它可以工作,但並不完全符合我的需求 – Stylize 2012-08-05 15:53:28

+0

它不是做什麼的? – stderr 2012-08-06 13:30:53