如何根据物种拉丁名找到其在NCBI Taxonomy所处的位置

问题描述:

我想知道某个物种在NCBI的分类系统里被归为哪个目、哪个科、哪个属? 单个物种可以手动NCBI网站检索,如果物种数非常多如何实现?

之前读 ete3 的帮助文档的时候看到过类似的功能http://etetoolkit.org/docs/latest/tutorial/tutorial_ncbitaxonomy.html。最近可能会用到这个功能,记录自己使用的代码 (首先是安装ete3:自己windows10电脑安装了Anaconda3,直接在DOS窗口下使用命令pip install ete3即可安装)

  • 单个物种 以石榴Punica granatum为例
from ete3 import NCBITaxa  ncbi = NCBITaxa  name2taxid = ncbi.get_name_translator(["Punica granatum"])  for a,b in name2taxid.items():      lineage = ncbi.get_lineage(b[0])      names = ncbi.get_taxid_translator(lineage)      for taxid in lineage:          print(names[taxid])  

输出结果

root  cellular organisms  Eukaryota  Viridiplantae  Streptophyta  Streptophytina  Embryophyta  Tracheophyta  Euphyllophyta  Spermatophyta  Magnoliophyta  Mesangiospermae  eudicotyledons  Gunneridae  Pentapetalae  rosids  malvids  Myrtales  Lythraceae  Punica  Punica granatum  
  • 多个物种 将物种拉丁名放到文本文件里,每行一个
Lumnitzera littorea  Punica granatum  Heimia myrtifolia  Sonneratia alba  Epilobium ulleungensis  

代码

import sys  from ete3 import NCBITaxa  input_file = sys.argv[1]  output_file = sys.argv[2]  ncbi = NCBITaxa()  fw = open(output_file,"w")  with open(input_file,"r") as fr:      for line in fr:          species_name = line.strip()          name2taxid = ncbi.get_name_translator([species_name])          for a,b in name2taxid.items():              lineage = ncbi.get_lineage(b[0])              names = ncbi.get_taxid_translator(lineage)              i = 1              for taxid in lineage:                  if i < len(lineage):                      fw.write(names[taxid]+",")                      i = i + 1                  else:                      fw.write(names[taxid]+"n")          print(species_name + ":","OK")      fw.close()  #使用方法  python .get_species_placement_in_NCBI.py .Organism_name.txt placement.txt  #输出结果  root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Combretaceae,Lumnitzera,Lumnitzera littorea  root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Lythraceae,Punica,Punica granatum  root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Lythraceae,Heimia,Heimia myrtifolia  root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Lythraceae,Sonneratia,Sonneratia alba  root,cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliophyta,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Myrtales,Onagraceae,Onagroideae,Epilobieae,Epilobium,Epilobium ulleungensis