2016-11-09 72 views
0

我想從這裏http://rosalind.info/problems/cons/羅莎琳德共識和檔案

我的劇本充滿計數器列表和輸出相同長度的字符串共識解決這個問題。我不認爲有數學或指標錯誤發生,並且遇到了困難。我的代碼:

with open('C:/users/steph/downloads/rosalind_cons (3).txt') as f: 
    seqs = f.read().splitlines() 

#remove all objects that are not sequences of interest 
for s in seqs: 
    if s[0] == '>': 
     seqs.remove(s) 

n = range(len(seqs[0])+1) 

#lists to store counts for each nucleotide 
A, C, G, T = [0 for i in n], [0 for i in n], [0 for i in n], [0 for i in n] 

#see what nucleotide is at each index and augment the 
#same index of the respective list 
def counter(Q): 
    for q in Q: 
     for k in range(len(q)): 
      if q[k] == 'A': 
       A[k] += 1 
      elif q[k] == 'C': 
       C[k] += 1 
      elif q[k] == 'G': 
       G[k] += 1 
      elif q[k] == 'T': 
       T[k] += 1 
counter(seqs) 

#find the max of all the counter lists at every index 
#and add the respective nucleotide to the consensus sequence 
def consensus(a,t,c,g): 
     consensus = '' 
     for k in range(len(a)): 
      if (a[k] > t[k]) and (a[k]>c[k]) and (a[k]>g[k]): 
       consensus = consensus+"A" 
      elif (t[k] > a[k]) and (t[k]>c[k]) and (t[k]>g[k]): 
       consensus = consensus+ 'T' 
      elif (c[k] > t[k]) and (c[k]>a[k]) and (c[k]>g[k]): 
       consensus = consensus+ 'C' 
      elif (g[k] > t[k]) and (g[k]>c[k]) and (g[k]>a[k]): 
       consensus = consensus+ 'G' 
      #ensure a nucleotide is added to consensus sequence 
      #when more than one index has the max value 
      else: 
       if max(a[k],c[k],t[k],g[k]) in a: 
        consensus = consensus + 'A' 
       elif max(a[k],c[k],t[k],g[k]) in c: 
        consensus = consensus + 'C' 
       elif max(a[k],c[k],t[k],g[k]) in t: 
        consensus = consensus + 'T' 
       elif max(a[k],c[k],t[k],g[k]) in g: 
        consensus = consensus + 'G' 
     print(consensus) 
     #debugging, ignore this --> print('len(consensus)',len(consensus)) 
consensus(A,T,C,G) 

#debugging, ignore this --> print('len(A)',len(A)) 

print('A: ',*A, sep=' ') 
print('C: ',*C, sep=' ') 
print('G: ',*G, sep=' ') 
print('T: ',*T, sep=' ') 

謝謝您的時間

+1

那麼,這是什麼問題?你還沒有解釋什麼不起作用 –

回答

0
  • 有以下行錯誤:

    N =範圍(LEN(seqs [0])+ 1)

這導致序列太長(填充額外A和4倍0)。刪除+1,它應該工作。

  • 此外,您的輸出中有兩個空格,請在您的打印語句中刪除:之後的空格。
  • 如果你修復了這兩行,它將適用於這個例子,但是對於比一行更長的序列將會失敗(就像真正的例子)。

嘗試合併線條與類似下面的剪斷:

new_seqs = list() 
for s in seqs: 
    if s.startswith('>'): 
     new_seqs.append('') 
    else: 
     new_seqs[-1]+=s 
seqs = new_seqs 

,並再次嘗試。

+0

這些建議很好,但不幸的是我仍然得到了不正確的答案。瀏覽Rosalind社區的想法後,我認爲這個問題是格式化輸出或隱藏的換行符的一些錯誤。 –

+0

@SankFinatra:你的格式很好,沒有隱藏的換行符,我相應地更新了答案。 –

+0

@Maxmilian Peters:我現在看到我錯誤地構建了我的'seqs'列表。我實施了您的建議更改,但由於某些原因,我仍然得到不正確答案 –