從FASTA文件添加多個序列到Python中的列表

我試圖組織多個序列的文件。爲此，我試圖將名稱添加到列表中，並將序列添加到與名稱列表並行的單獨列表中。我想出瞭如何將名稱添加到列表中，但我無法弄清楚如何將隨後的序列添加到單獨的列表中。我嘗試將序列行附加到空字符串中，但它將所有序列的所有行附加到單個字符串中。從FASTA文件添加多個序列到Python中的列表

所有的名字開始與「>」

def Name_Organizer(FASTA,output): 

    import os 
    import re 

    in_file=open(FASTA,'r') 
    dir,file=os.path.split(FASTA) 
    temp = os.path.join(dir,output) 
    out_file=open(temp,'w') 

    data='' 
    name_list=[] 

    for line in in_file: 

     line=line.strip() 
     for i in line: 
      if i=='>': 
       name_list.append(line) 
       break 
      else: 
       line=line.upper() 
     if all([k==k.upper() for k in line]): 
      data=data+line 

    print data

我如何序列添加到列表中的一組字符串？

輸入文件看起來像這樣

enter image description here

來源

2012-03-04 O.rka

你需要的時候你打標記線，像這樣重設串：

def Name_Organizer(FASTA,output): 

    import os 
    import re 

    in_file=open(FASTA,'r') 
    dir,file=os.path.split(FASTA) 
    temp = os.path.join(dir,output) 
    out_file=open(temp,'w') 

    data='' 
    name_list=[] 
    seq_list=[] 

    for line in in_file: 

     line=line.strip() 
     for i in line: 
      if i=='>': 
       name_list.append(line) 
       if data: 
        seq_list.append(data) 
        data='' 
       break 
      else: 
       line=line.upper() 
     if all([k==k.upper() for k in line]): 
      data=data+line 

    print seq_list

當然，它也可能會更快（取決於您的文件的大小）使用字符串連接而不是連續追加：

data = [] 

# ... 

data.append(line) # repeatedly 

# ... 

seq_list.append(''.join(data)) # each time you get to a new marker line 
data = []

來源

2012-03-04 18:49:27 Amber

它的工作原理組織吧！我只是困惑的線「如果數據：」字符串的名稱如何可以是一個if語句？ – 2012-03-04 18:53:45

在Python中，空字符串是假值，非空字符串是真值。因此'如果數據：'相當於「如果數據不爲空」 – Amber 2012-03-04 18:56:37

@ draconisthe0ry，琥珀色，我覺得我應該提一下，像這樣遍歷每一行的每一個字符都有一些奇怪的東西。這不是沒有必要嗎？我錯過了什麼嗎？ – senderle 2012-03-04 19:01:49

如果您正在使用Python & fasta文件，您可能需要考慮安裝BioPython.它已包含此解析功能以及更多其他功能。

解析FASTA文件將是如此簡單：

from Bio import SeqIO 
for record in SeqIO.parse('filename.fasta', 'fasta'): 
    print record.id, record.seq

來源

2012-03-04 18:55:28 Tim

我在字典中的第一

# remove white spaces from the lines 
lines = [x.strip() for x in open(sys.argv[1]).readlines()] 
fasta = {} 
for line in lines: 
    if not line: 
     continue 
    # create the sequence name in the dict and a variable 
    if line.startswith('>'): 
     sname = line 
     if line not in fasta: 
      fasta[line] = '' 
     continue 
    # add the sequence to the last sequence name variable 
    fasta[sname] += line 
# just to facilitate the input for my function 
lst = list(fasta.values())

來源

2017-06-29 03:19:42

從FASTA文件添加多個序列到Python中的列表

回答

相關問題