mVISTA:在線程序展示葉綠體基因組相似性小實例

2020 年 3 月 3 日
筆記

葉綠體基因組類的文章通常會有一幅圖來展示葉綠體基因組的相似性（Sequence identity plot），出圖的工具是mVISTA：mVISTA分為本地版和在線版兩種。本文簡要介紹使用在線版mVISTA獲得Sequence identity plot的步驟。

本文使用到的序列數據為5個蘋果屬植物的葉綠體基因組序列

Malus florentina | NC_035625

Malus micromalus | NC_036368

Malus prunifolia | NC_031163

Malus trilobata | NC_035671

Malus tschonoskii | NC_035672

第一步：下載序列

下載每個葉綠體基因組的fasta格式；下載作為參考基因組的genbank格式文件。（序列不是很多可以手動下載；如果序列條數非常多自己寫個一個簡單的python腳本批量下載葉綠體基因組序列）

import os  import sys  import argparse  parser = argparse.ArgumentParser(description='This script was used to download gb or fasta file of cp genome from NCBI nucleotides database')  parser.add_argument('-f','--format',help='Please input file format you want to download',required=True)  parser.add_argument('-a','--accession',help='file name contain accession number of cp genome you want to download',required=True)  args = parser.parse_args()      fr = open(args.accession,'r')  acc = []  for line in fr:      acc.append(line.strip())  print("The number of sequence will be downloaded is: ",len(acc))    from Bio import Entrez  from Bio import SeqIO  Entrez.email = "" #在這裡添加郵箱    fold_name = "downladed_seq_"+args.format  os.mkdir(fold_name)  os.chdir(fold_name)    for line in acc:      print(line + " will be downloaded!")      fw = open(line+"."+args.format,'w')      cp = Entrez.efetch(db='nucleotide',id=[line],rettype=args.format)      seq = SeqIO.read(cp,args.format)      SeqIO.write(seq,fw,args.format)      fw.close()      print(line + " is downloaded!")    print(len(os.listdir()),"sequence have been downloaded")

使用方式

#下載fasta格式  python download_gb_or_fa_from_NCBI_cp_genome_database_1.py -f fasta -a accession_numbers.txt  # 下載genbank格式  python download_gb_or_fa_from_NCBI_cp_genome_database_1.py -f gb -a accession_numbers.txt  # 序列號放到文本文件中每行一個

第二步：上傳序列至mVISTA

http://genome.lbl.gov/vista/mvista/submit.shtml

image.png 本例中 total number of seuqences is 5

image.png

填寫郵箱地址，運行完結果發送至郵箱；上傳下載好的fasta格式序列
選擇比對程序 論文中通常使用第三個

image.png

上傳注釋文件 mVISTA要求的注釋文件格式為 http://genome.lbl.gov/vista/mvista/instructions.shtml

image.png 可以通過處理genbank格式文件得到

import argparse    from Bio import SeqIO    parser = argparse.ArgumentParser(description = 'This script was used to get mVISTA annotation file used to cp genome comparative analysis')  parser.add_argument('-i','--genbank',help="Please input genBank format file", required = True)  args = parser.parse_args()    fw = open(args.genbank + "_mVISTA_annotation","w")    for rec in SeqIO.parse(args.genbank,"gb"):      for feature in rec.features:          if feature.type == "gene":              for part in feature.location.parts:                  if part.strand == -1:                      start_location = str(part.start)                      end_location = str(part.end)                      gene_name = feature.qualifiers["gene"][0]                      fw.write("<t%st%st%sn"%(start_location,end_location,gene_name))                  else:                      start_location = str(part.start)                      end_location = str(part.end)                      gene_name = feature.qualifiers["gene"][0]                      fw.write("<t%st%st%sn"%(start_location,end_location,gene_name))          elif feature.type == "CDS":              for part in feature.location.parts:                  if part.strand == -1:                      start_location = str(part.start)                      end_location = str(part.end)                      fw.write("%st%st%sn"%(start_location,end_location,"exon"))                  else:                      start_location = str(part.start)                      end_location = str(part.end)                      fw.write("%st%st%sn"%(start_location,end_location,"exon"))          elif feature.type == "tRNA" or feature.type == "rRNA":              for part in feature.location.parts:                  start_location = str(feature.location.start)                  end_location = str(feature.location.end)                  fw.write("%st%st%sn"%(start_location,end_location,"utr"))          else:              print("%s %s"%(feature.type,"is not needed"))    print("The process is done!")    fw.close()    # 使用方法  python get_mVISTA_annotation_file_from_genbank_1.py -i NC_031163.gb

mage.png 這裡只上傳參考序列的注釋文件和全部上傳注釋文件是否有區別自己還沒有搞清楚，這裡暫時只選擇上傳參考序列的注釋文件。其他參數選擇默認，具體參數是什麼功能自己還需要仔細查看幫助文檔。