2013-03-06 54 views
1

所以,我有下面的腳本:的Python:字符串格式化其中線長度可以變化

def single_to_tripple(res): 
    aa = {'R':'ARG','H':'HIS','K':'LYS','D':'ASP','E':'GLU','S':'SER','T':'THR','N':'ASN','Q':'GLN','C':'CYS','U':'SEC','G':'GLY','P':'PRO','A':'ALA','I':'ILE','L':'LEU','M':'MET','F':'PHE','W':'TRP','Y':'TYR','V':'VAL'} 
    return(aa[res]) 
seq = 'ASALKDYYAIMGVKPTDDLKTIKTAYRRLARKYHPDVSKEPDAEARFKEVAEAWEVLSDEQRRAEYDQMWQHRNDPQFNRQFHHGDGQSFNAEDFDDIFSSIFGQHARQSRQRPATRGHDIEIEVAVFLEETLTEHKRTISYNLPVYNAFGMIEQEIPKTLNVKIPAGVGNGQRIRLKGQGTPGENGGPNGDLWLVIHIAPHPLFDIVGQDLEIVVPVSPWEAALGAKVTVPTLKESILLTIPPGSQAGQRLRVKGKGLVSKKQTGDLYAVLKIVMPPKPDENTAALWQQLADAQSSFDPRKDWGKA' 
length = len(seq) 

for i,v in enumerate(xrange(0,len(seq),13)): 
    line = seq[v:v+13] 
    out_line = ('{:<3} '*13).format(single_to_tripple(line[0]),single_to_tripple(line[1]),single_to_tripple(line[2]),single_to_tripple(line[3]),single_to_tripple(line[4]),single_to_tripple(line[5]),single_to_tripple(line[6]),single_to_tripple(line[7]),single_to_tripple(line[8]),single_to_tripple(line[9]),single_to_tripple(line[10]),single_to_tripple(line[11]),single_to_tripple(line[12])) 
    print out_line 

我使用的腳本來拼接seq串每隔13個元素,然後將每個元件從所述拼接字符串中其單字母代碼在single_to_tripple的三位字母代碼中。我的數據輸出需要包含13個由空格分隔的列。如果拼接不包含13個元素,則會在最後一個拼接處發生問題。我怎樣才能抓住這一點,並像往常一樣格式化字符串?

我在我的for循環中使用了enumerate,因爲我需要在以後添加行號。

我當前的代碼輸出:

ALA SER ALA LEU LYS ASP TYR TYR ALA ILE MET GLY VAL 
LYS PRO THR ASP ASP LEU LYS THR ILE LYS THR ALA TYR 
ARG ARG LEU ALA ARG LYS TYR HIS PRO ASP VAL SER LYS 
GLU PRO ASP ALA GLU ALA ARG PHE LYS GLU VAL ALA GLU 
ALA TRP GLU VAL LEU SER ASP GLU GLN ARG ARG ALA GLU 
TYR ASP GLN MET TRP GLN HIS ARG ASN ASP PRO GLN PHE 
ASN ARG GLN PHE HIS HIS GLY ASP GLY GLN SER PHE ASN 
ALA GLU ASP PHE ASP ASP ILE PHE SER SER ILE PHE GLY 
GLN HIS ALA ARG GLN SER ARG GLN ARG PRO ALA THR ARG 
GLY HIS ASP ILE GLU ILE GLU VAL ALA VAL PHE LEU GLU 
GLU THR LEU THR GLU HIS LYS ARG THR ILE SER TYR ASN 
LEU PRO VAL TYR ASN ALA PHE GLY MET ILE GLU GLN GLU 
ILE PRO LYS THR LEU ASN VAL LYS ILE PRO ALA GLY VAL 
GLY ASN GLY GLN ARG ILE ARG LEU LYS GLY GLN GLY THR 
PRO GLY GLU ASN GLY GLY PRO ASN GLY ASP LEU TRP LEU 
VAL ILE HIS ILE ALA PRO HIS PRO LEU PHE ASP ILE VAL 
GLY GLN ASP LEU GLU ILE VAL VAL PRO VAL SER PRO TRP 
GLU ALA ALA LEU GLY ALA LYS VAL THR VAL PRO THR LEU 
LYS GLU SER ILE LEU LEU THR ILE PRO PRO GLY SER GLN 
ALA GLY GLN ARG LEU ARG VAL LYS GLY LYS GLY LEU VAL 
SER LYS LYS GLN THR GLY ASP LEU TYR ALA VAL LEU LYS 
ILE VAL MET PRO PRO LYS PRO ASP GLU ASN THR ALA ALA 
LEU TRP GLN GLN LEU ALA ASP ALA GLN SER SER PHE ASP 
Traceback (most recent call last): 
    File "make_seq_res.py", line 10, in <module> 
    out_line = ('{:<3} '*13).format(single_to_tripple(line[0]),single_to_tripple(line[1]),single_to_tripple(line[2]),single_to_tripple(line[3]),single_to_tripple(line[4]),single_to_tripple(line[5]),single_to_tripple(line[6]),single_to_tripple(line[7]),single_to_tripple(line[8]),single_to_tripple(line[9]),single_to_tripple(line[10]),single_to_tripple(line[11]),single_to_tripple(line[12])) 
IndexError: string index out of range 

回答

3

,你必須鍵入了許多變數手動應該給你,你正在做的比必要的做法,以產生輸出暗示的事實。

沒有太大變化的原密碼,能做到這一點是這樣的:

for i,v in enumerate(xrange(0,len(seq),13)): 
    line = seq[v:v+13] 
    out_line = ' '.join('{:<3}'.format(single_to_tripple(part)) for part in line) 
    print out_line 

正如馬亭指出,三胞胎總是三個字符,所以實際上你可以跳過格式:

out_line = ' '.join(single_to_tripple(part) for part in line) 
+1

由於字符串總是3個字符長,所以不需要格式化。 *根本*。 – 2013-03-06 12:08:35

+0

@MartijnPieters是的,也是如此。那麼,我的例子就是展示如何應用一種格式...... ^^ – poke 2013-03-06 12:10:07

+0

我意識到我可以在發佈問題後儘快使用'join'。有時候我覺得很愚蠢。謝謝你的回答,我會接受這個,一旦時間到了。 – Harpal 2013-03-06 12:12:29

0

您可以保存您的線路長度,比使用它:

def single_to_tripple(res): 
    aa = {'R':'ARG','H':'HIS','K':'LYS','D':'ASP','E':'GLU','S':'SER','T':'THR','N':'ASN','Q':'GLN','C':'CYS','U':'SEC','G':'GLY','P':'PRO','A':'ALA','I':'ILE','L':'LEU','M':'MET','F':'PHE','W':'TRP','Y':'TYR','V':'VAL'} 
    return(aa[res]) 

seq = 'ASALKDYYAIMGVKPTDDLKTIKTAYRRLARKYHPDVSKEPDAEARFKEVAEAWEVLSDEQRRAEYDQMWQHRNDPQFNRQFHHGDGQSFNAEDFDDIFSSIFGQHARQSRQRPATRGHDIEIEVAVFLEETLTEHKRTISYNLPVYNAFGMIEQEIPKTLNVKIPAGVGNGQRIRLKGQGTPGENGGPNGDLWLVIHIAPHPLFDIVGQDLEIVVPVSPWEAALGAKVTVPTLKESILLTIPPGSQAGQRLRVKGKGLVSKKQTGDLYAVLKIVMPPKPDENTAALWQQLADAQSSFDPRKDWGKA' 
length = len(seq) 

for i,v in enumerate(xrange(0,len(seq),13)): 
    line = seq[v:v+13] 
    length = len(line) 
    out_line = ('{:<3} '*length).format(*[single_to_tripple(a) for a in line]) 
    print out_line 
2

你僅僅侷限於N不需要格式化:

for i,v in enumerate(xrange(0,len(seq),13)): 
    line = seq[v:v+13] 
    print ' '.join([single_to_tripple(part) for part in line]) 

沒有必要在這裏過分複雜。 :-)

注意,使用str.join()時,使用列表理解了生成器表達式(所以包括[...])爲.join()將轉換爲一個列表反正上榜理解得更快。

結果(最後3行):

ILE VAL MET PRO PRO LYS PRO ASP GLU ASN THR ALA ALA 
LEU TRP GLN GLN LEU ALA ASP ALA GLN SER SER PHE ASP 
PRO ARG LYS ASP TRP GLY LYS ALA 

你也可以使用一個基於itertools石斑魚,以簡化你的循環:

from itertools import izip_longest 

def grouper(n, iterable, padvalue=None): 
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" 
    return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue) 

aa = {'R':'ARG','H':'HIS','K':'LYS','D':'ASP','E':'GLU','S':'SER','T':'THR','N':'ASN','Q':'GLN','C':'CYS','U':'SEC','G':'GLY','P':'PRO','A':'ALA','I':'ILE','L':'LEU','M':'MET','F':'PHE','W':'TRP','Y':'TYR','V':'VAL', None: ''} 
def single_to_tripple(res): 
    return(aa[res]) 

for line in grouper(13, seq): 
    print ' '.join([single_to_tripple(part) for part in line]) 

,我通過移動映射增進了你的single_to_tripple()功能出的功能(不需要定義它每個時間你稱它),一個d增加一個None鍵(石斑填補None值的最後一組)。

+0

嘿,我也會建議石斑魚配方,但後來沒有,因爲我不想觸摸'單到tripple'功能^^ – poke 2013-03-06 12:20:55