2013-05-12 102 views
4

我正在嘗試改變以前的腳本,該腳本利用biopython獲取關於物種門的信息。這個腳本是爲了一次檢索一個物種的信息而編寫的。我想修改腳本,以便我一次可以處理100個生物體。 這裏是最初的代碼嘗試從Biopython獲取分類信息

import sys 
from Bio import Entrez 

def get_tax_id(species): 
    """to get data from ncbi taxomomy, we need to have the taxid. we can 
    get that by passing the species name to esearch, which will return 
    the tax id""" 
    species = species.replace(" ", "+").strip() 
    search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml") 
    record = Entrez.read(search) 
    return record['IdList'][0] 

def get_tax_data(taxid): 
    """once we have the taxid, we can fetch the record""" 
    search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml") 
    return Entrez.read(search) 

Entrez.email = "" 
if not Entrez.email: 
    print "you must add your email address" 
    sys.exit(2) 
taxid = get_tax_id("Erodium carvifolium") 
data = get_tax_data(taxid) 
lineage = {d['Rank']:d['ScientificName'] for d in 
    data[0]['LineageEx'] if d['Rank'] in ['family', 'order']} 

我已成功地修改腳本,以便它接受一個包含我現在用的是生物的一個本地文件。但是我需要將它延伸到100個生物體。 因此,這個想法是從我的有機體文件中生成一個列表,並以某種方式將列表中生成的每個項目分別送入taxid = get_tax_id("Erodium carvifolium")行,並用我的有機體名稱替換「Erodium carvifolium」。但我不知道該怎麼做。

這裏是代碼的樣本版本與我的一些調整

import sys 
from Bio import Entrez 


def get_tax_id(species): 
    """to get data from ncbi taxomomy, we need to have the taxid. we can 
    get that by passing the species name to esearch, which will return 
    the tax id""" 
    species = species.replace(' ', "+").strip() 
    search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml") 
    record = Entrez.read(search) 
    return record['IdList'][0] 

def get_tax_data(taxid): 
    """once we have the taxid, we can fetch the record""" 
    search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml") 
    return Entrez.read(search) 

Entrez.email = "" 
if not Entrez.email: 
    print "you must add your email address" 
    sys.exit(2) 
list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304'] 
i = iter(list) 
item = i.next() 
for item in list: 
    ??? 
taxid = get_tax_id(?) 
data = get_tax_data(taxid) 
lineage = {d['Rank']:d['ScientificName'] for d in 
    data[0]['LineageEx'] if d['Rank'] in ['phylum']} 
print lineage, taxid 

問號是指在那裏我難倒下一步做什麼的地方。我不明白我如何連接我的循環來替換?在get_tax_id(?)中。或者我需要以某種方式附加列表中的每個項目,以便每次修改它們以包含get_tax_id(Helicobacter pylori 26695),然後找到某種方法將它們放置在包含taxid的行中=

+1

你應該問biostars:http://www.biostars.org/ – Pierre 2013-05-12 17:51:17

+1

謝謝你的忠告 – user2374216 2013-05-12 23:09:46

回答

2

以下是您需要的內容,請將它放在下面你的函數定義,行之後即說:sys.exit(2)

species_list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304'] 

taxid_list = [] # Initiate the lists to store the data to be parsed in 
data_list = [] 
lineage_list = [] 

print('parsing taxonomic data...') # message declaring the parser has begun 

for species in species_list: 
    print ('\t'+species) # progress messages 

    taxid = get_tax_id(species) # Apply your functions 
    data = get_tax_data(taxid) 
    lineage = {d['Rank']:d['ScientificName'] for d in data[0]['LineageEx'] if d['Rank'] in ['phylum']} 

    taxid_list.append(taxid) # Append the data to lists already initiated 
    data_list.append(data) 
    lineage_list.append(lineage) 

print('complete!')