2016-04-25 57 views
1

因此,我對編程非常陌生,對任何編程語言都不是很熟練。我購買了一本關於生物學家編程的書,我已經摸索了一些東西。我想要:從文件中獲取序列並從中找到並提取可變區域。下面我的代碼:DNA序列操作

**

#!/usr/bin/python 
#for extracting GAA sequences 
import os 
import sys 
import re 
#opens sequence file and defines it as reps 
reps = open('142sequences.txt') 
#defining what to read 
line = reps.readlines() 
#defines what we are looking for in rep lines 
for line in reps: 
    sear = re.search(r"C[A]{2,}G[ATCG]{17, 2700}AAT[A]{2,4}G[A]{2,}", reps) 
    if sear: 
     repeats = sear.group() 
     print(repeats) 
    else: 
     print('Not Recognized') 

** 我得到什麼回報。請幫助

回答

1

您需要搜索的每一行不是代表這是所有行的列表:

with open('142sequences.txt') as reps: 
    # iterate over each line in the file 
    for line in reps: 
     # pass each line to re.search 
     sear = re.search(r"C[A]{2,}G[ATCG]{17, 2700}AAT[A]{2,4}G[A]{2,}", line) 
     if sear: 
      repeats = sear.group() 
      print(repeats) 
     else: 
      print('Not Recognized') 

調用readlines方法讀取所有的行到一個列表,這樣你實際上從未環在自己的代碼因爲你會用最初的readline調用來使用迭代器,如果你已經循環了,它會導致一個錯誤,因爲你必須傳遞一個字符串而不是一個列表來搜索。

+0

謝謝!還在搞清楚 –