我正在嘗試改變以前的腳本,該腳本利用biopython獲取關於物種門的信息。這個腳本是爲了一次檢索一個物種的信息而編寫的。我想修改腳本,以便我一次可以處理100個生物體。 這裏是最初的代碼嘗試從Biopython獲取分類信息
import sys
from Bio import Entrez
def get_tax_id(species):
"""to get data from ncbi taxomomy, we need to have the taxid. we can
get that by passing the species name to esearch, which will return
the tax id"""
species = species.replace(" ", "+").strip()
search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml")
record = Entrez.read(search)
return record['IdList'][0]
def get_tax_data(taxid):
"""once we have the taxid, we can fetch the record"""
search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml")
return Entrez.read(search)
Entrez.email = ""
if not Entrez.email:
print "you must add your email address"
sys.exit(2)
taxid = get_tax_id("Erodium carvifolium")
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in
data[0]['LineageEx'] if d['Rank'] in ['family', 'order']}
我已成功地修改腳本,以便它接受一個包含我現在用的是生物的一個本地文件。但是我需要將它延伸到100個生物體。 因此,這個想法是從我的有機體文件中生成一個列表,並以某種方式將列表中生成的每個項目分別送入taxid = get_tax_id("Erodium carvifolium")
行,並用我的有機體名稱替換「Erodium carvifolium」。但我不知道該怎麼做。
這裏是代碼的樣本版本與我的一些調整
import sys
from Bio import Entrez
def get_tax_id(species):
"""to get data from ncbi taxomomy, we need to have the taxid. we can
get that by passing the species name to esearch, which will return
the tax id"""
species = species.replace(' ', "+").strip()
search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml")
record = Entrez.read(search)
return record['IdList'][0]
def get_tax_data(taxid):
"""once we have the taxid, we can fetch the record"""
search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml")
return Entrez.read(search)
Entrez.email = ""
if not Entrez.email:
print "you must add your email address"
sys.exit(2)
list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304']
i = iter(list)
item = i.next()
for item in list:
???
taxid = get_tax_id(?)
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in
data[0]['LineageEx'] if d['Rank'] in ['phylum']}
print lineage, taxid
問號是指在那裏我難倒下一步做什麼的地方。我不明白我如何連接我的循環來替換?在get_tax_id(?)中。或者我需要以某種方式附加列表中的每個項目,以便每次修改它們以包含get_tax_id(Helicobacter pylori 26695)
,然後找到某種方法將它們放置在包含taxid的行中=
你應該問biostars:http://www.biostars.org/ – Pierre 2013-05-12 17:51:17
謝謝你的忠告 – user2374216 2013-05-12 23:09:46