2012-01-02 44 views
1

我必須寫一個腳本,這個序列翻譯:翻譯我的序列?

dict = {"TTT":"F|Phe","TTC":"F|Phe","TTA":"L|Leu","TTG":"L|Leu","TCT":"S|Ser","TCC":"S|Ser", 
       "TCA":"S|Ser","TCG":"S|Ser", "TAT":"Y|Tyr","TAC":"Y|Tyr","TAA":"*|Stp","TAG":"*|Stp", 
       "TGT":"C|Cys","TGC":"C|Cys","TGA":"*|Stp","TGG":"W|Trp", "CTT":"L|Leu","CTC":"L|Leu", 
       "CTA":"L|Leu","CTG":"L|Leu","CCT":"P|Pro","CCC":"P|Pro","CCA":"P|Pro","CCG":"P|Pro", 
       "CAT":"H|His","CAC":"H|His","CAA":"Q|Gln","CAG":"Q|Gln","CGT":"R|Arg","CGC":"R|Arg", 
       "CGA":"R|Arg","CGG":"R|Arg", "ATT":"I|Ile","ATC":"I|Ile","ATA":"I|Ile","ATG":"M|Met", 
       "ACT":"T|Thr","ACC":"T|Thr","ACA":"T|Thr","ACG":"T|Thr", "AAT":"N|Asn","AAC":"N|Asn", 
       "AAA":"K|Lys","AAG":"K|Lys","AGT":"S|Ser","AGC":"S|Ser","AGA":"R|Arg","AGG":"R|Arg", 
       "GTT":"V|Val","GTC":"V|Val","GTA":"V|Val","GTG":"V|Val","GCT":"A|Ala","GCC":"A|Ala", 
       "GCA":"A|Ala","GCG":"A|Ala", "GAT":"D|Asp","GAC":"D|Asp","GAA":"E|Glu", 
       "GAG":"E|Glu","GGT":"G|Gly","GGC":"G|Gly","GGA":"G|Gly","GGG":"G|Gly"} 

seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA" 
a="" 

for y in range(0, len (seq)): 
    c=(seq[y:y+3]) 
    #print(c) 
    for k, v in dict.items(): 
     if seq[y:y+3] == k: 
      alle_amino = v[::3] #alle aminozuren op rijtje, a1.1 -a2.1- a.3.1-a1.2 enzo 
      print (v) 

有了這個劇本我拿到下對方從3幀的氨基酸,但我該如何解決這並從框架的所有氨基酸1,並且來自第2幀的所有氨基酸彼此相鄰,第3幀相同?

例如,我的結果一定是:

+3 SerIleLeuAlaStpProLysTrpGluProProTyrValAlaStpProIleTyrIleTyrTle

+2 PheAsnThrSerMetThrLysValGlyThrProLeuArgSerMetThrHisIleTyrIleTyr

+1 PheGlnTyrStpHisAspGlnSerGlyAsnProLeuThrStpHisAspProTyrIleTyrIle

TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA

我使用Python 3

我還有一個問題:我可以通過我自己的腳本中的一些更改來獲得這個結果嗎?

+0

我刪除了第一個問題,我希望我的問題現在更清晰 – 2012-01-02 12:47:24

+3

有點更清楚了。順便說一句,你不應該使用'dict'作爲變量名,因爲這會影響內建的'dict'! – 2012-01-02 12:54:35

+1

我沒有得到「框架」的概念。請澄清... – 2012-01-02 12:55:08

回答

5

您可以使用(注意:這將使用biopython翻譯方法是可笑更容易):

dictio = {your dictionary here} 

def translate(seq): 
    x = 0 
    aaseq = [] 
    while True: 
     try: 
      aaseq.append(dicti[seq[x:x+3]]) 
      x += 3 
     except (IndexError, KeyError): 
      break 
    return aaseq 

seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA" 

for frame in range(3): 
    print('+%i' %(frame+1), ''.join(item.split('|')[1] for item in translate(seq[frame:]))) 

注意我用dicti改變你的字典的名稱(不覆蓋dict) 。


一些評論,以幫助您瞭解:

translate把你的序列,並在其中每個項目對應於三重編碼那個位置的氨基酸翻譯列表的形式返回。像:

aaseq = ["L|Leu","L|Leu","P|Pro", ....] 

,你可以處理更多的這個數據(只得到一個或三個字母代碼)內translate或返回,因爲它是要被處理後,因爲我已經做了。

translate被稱爲在

''.join(item.split('|')[1] for item in translate(seq[frame:])) 

對於每個幀。對於幀值爲0,1或2,它將seq [frame:]作爲參數進行轉換。也就是說,您正在發送對應於三個不同閱讀框架的序列,從而對它們進行串聯處理。然後,在

''.join(item.split('|')[1] 

我分裂一個和三字母代碼爲每個氨基酸,並採取一個索引1處(第二)。然後,他們將組合成一個字符串

+0

+1 [biopython](http://biopython.org/wiki/Main_Page) – gecco 2012-01-02 13:59:00

+0

非常感謝您的幫助,但是您可以通過撰寫此腳本來幫助我更廣泛,因爲我是首發者並且不理解您爲我寫的所有腳本 – 2012-01-02 14:11:52

+0

不確定要什麼。如果你把這個代碼放在一個文件中,並且沒有任何改變(顯然你應該包含完整的翻譯詞典),那麼這個代碼(以及我測試的同樣可以說來自@RicardoCárdenes的那個代碼)。我編輯了我的代碼,使其不受歡迎。如果這是家庭作業,您應該相應標記 – joaquin 2012-01-02 14:21:18

1

不算漂亮,但你想要做什麼

dct = {"TTT":"F|Phe","TTC":"F|Phe","TTA":"L|Leu","TTG":"L|Leu","TCT":"S|Ser","TCC":"S|Ser", 
"TCA":"S|Ser","TCG":"S|Ser", "TAT":"Y|Tyr","TAC":"Y|Tyr","TAA":"*|Stp","TAG":"*|Stp", 
"TGT":"C|Cys","TGC":"C|Cys","TGA":"*|Stp","TGG":"W|Trp", "CTT":"L|Leu","CTC":"L|Leu", 
"CTA":"L|Leu","CTG":"L|Leu","CCT":"P|Pro","CCC":"P|Pro","CCA":"P|Pro","CCG":"P|Pro", 
"CAT":"H|His","CAC":"H|His","CAA":"Q|Gln","CAG":"Q|Gln","CGT":"R|Arg","CGC":"R|Arg", 
"CGA":"R|Arg","CGG":"R|Arg", "ATT":"I|Ile","ATC":"I|Ile","ATA":"I|Ile","ATG":"M|Met", 
"ACT":"T|Thr","ACC":"T|Thr","ACA":"T|Thr","ACG":"T|Thr", "AAT":"N|Asn","AAC":"N|Asn", 
"AAA":"K|Lys","AAG":"K|Lys","AGT":"S|Ser","AGC":"S|Ser","AGA":"R|Arg","AGG":"R|Arg", 
"GTT":"V|Val","GTC":"V|Val","GTA":"V|Val","GTG":"V|Val","GCT":"A|Ala","GCC":"A|Ala", 
"GCA":"A|Ala","GCG":"A|Ala", "GAT":"D|Asp","GAC":"D|Asp","GAA":"E|Glu", 
"GAG":"E|Glu","GGT":"G|Gly","GGC":"G|Gly","GGA":"G|Gly","GGG":"G|Gly"} 


seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA" 

def get_amino_list(s): 
    for y in range(3): 
     yield [s[x:x+3] for x in range(y, len(s) - 2, 3)] 

for n, amn in enumerate(get_amino_list(seq), 1): 
    print ("+%d " % n + "".join(dct[x][2:] for x in amn)) 

print(seq) 
1

這裏是我的解決方案。我把你的「dict」變量稱爲「aminos」。函數method3返回「|」右側的值列表。要將它們合併成單個字符串,只需將它們加入「」。

從看你的代碼,我相信你的aminos字典包含所有可能的三字母組合。因此,我刪除了驗證這一點的檢查。它應該運行得更快。

def overlapping_groups(seq, group_len=3): 
    """Returns `N` adjacent items from an iterable in a sliding window style 
    """ 
    for i in range(len(seq)-group_len): 
     yield seq[i:i+group_len] 

def method3(seq, aminos): 
    return [aminos[k][2:] for k in overlapping_groups(seq, 3)] 

for i in range(3): 
    print("%d: %s" % (i, "".join(method3(seq[i:], aminos))))