mVISTA:在線程序展示葉綠體基因組相似性小實例
- 2020 年 3 月 3 日
- 筆記
葉綠體基因組類的文章通常會有一幅圖來展示葉綠體基因組的相似性(Sequence identity plot),出圖的工具是mVISTA:mVISTA分為本地版和在線版兩種。本文簡要介紹使用在線版mVISTA獲得Sequence identity plot的步驟。
本文使用到的序列數據為5個蘋果屬植物的葉綠體基因組序列
Malus florentina | NC_035625
Malus micromalus | NC_036368
Malus prunifolia | NC_031163
Malus trilobata | NC_035671
Malus tschonoskii | NC_035672
第一步:下載序列
下載每個葉綠體基因組的fasta格式;下載作為參考基因組的genbank格式文件。(序列不是很多可以手動下載;如果序列條數非常多自己寫個一個簡單的python腳本批量下載葉綠體基因組序列)
import os import sys import argparse parser = argparse.ArgumentParser(description='This script was used to download gb or fasta file of cp genome from NCBI nucleotides database') parser.add_argument('-f','--format',help='Please input file format you want to download',required=True) parser.add_argument('-a','--accession',help='file name contain accession number of cp genome you want to download',required=True) args = parser.parse_args() fr = open(args.accession,'r') acc = [] for line in fr: acc.append(line.strip()) print("The number of sequence will be downloaded is: ",len(acc)) from Bio import Entrez from Bio import SeqIO Entrez.email = "" #在這裡添加郵箱 fold_name = "downladed_seq_"+args.format os.mkdir(fold_name) os.chdir(fold_name) for line in acc: print(line + " will be downloaded!") fw = open(line+"."+args.format,'w') cp = Entrez.efetch(db='nucleotide',id=[line],rettype=args.format) seq = SeqIO.read(cp,args.format) SeqIO.write(seq,fw,args.format) fw.close() print(line + " is downloaded!") print(len(os.listdir()),"sequence have been downloaded")
使用方式
#下載fasta格式 python download_gb_or_fa_from_NCBI_cp_genome_database_1.py -f fasta -a accession_numbers.txt # 下載genbank格式 python download_gb_or_fa_from_NCBI_cp_genome_database_1.py -f gb -a accession_numbers.txt # 序列號放到文本文件中每行一個
第二步:上傳序列至mVISTA
http://genome.lbl.gov/vista/mvista/submit.shtml
image.png 本例中 total number of seuqences is 5
image.png
- 填寫郵箱地址,運行完結果發送至郵箱;上傳下載好的fasta格式序列
- 選擇比對程序 論文中通常使用第三個
image.png
- 上傳注釋文件 mVISTA要求的注釋文件格式為 http://genome.lbl.gov/vista/mvista/instructions.shtml
image.png 可以通過處理genbank格式文件得到
import argparse from Bio import SeqIO parser = argparse.ArgumentParser(description = 'This script was used to get mVISTA annotation file used to cp genome comparative analysis') parser.add_argument('-i','--genbank',help="Please input genBank format file", required = True) args = parser.parse_args() fw = open(args.genbank + "_mVISTA_annotation","w") for rec in SeqIO.parse(args.genbank,"gb"): for feature in rec.features: if feature.type == "gene": for part in feature.location.parts: if part.strand == -1: start_location = str(part.start) end_location = str(part.end) gene_name = feature.qualifiers["gene"][0] fw.write("<t%st%st%sn"%(start_location,end_location,gene_name)) else: start_location = str(part.start) end_location = str(part.end) gene_name = feature.qualifiers["gene"][0] fw.write("<t%st%st%sn"%(start_location,end_location,gene_name)) elif feature.type == "CDS": for part in feature.location.parts: if part.strand == -1: start_location = str(part.start) end_location = str(part.end) fw.write("%st%st%sn"%(start_location,end_location,"exon")) else: start_location = str(part.start) end_location = str(part.end) fw.write("%st%st%sn"%(start_location,end_location,"exon")) elif feature.type == "tRNA" or feature.type == "rRNA": for part in feature.location.parts: start_location = str(feature.location.start) end_location = str(feature.location.end) fw.write("%st%st%sn"%(start_location,end_location,"utr")) else: print("%s %s"%(feature.type,"is not needed")) print("The process is done!") fw.close() # 使用方法 python get_mVISTA_annotation_file_from_genbank_1.py -i NC_031163.gb
mage.png 這裡只上傳參考序列的注釋文件和全部上傳注釋文件是否有區別自己還沒有搞清楚,這裡暫時只選擇上傳參考序列的注釋文件。其他參數選擇默認,具體參數是什麼功能自己還需要仔細查看幫助文檔。
image.png 結果以郵件的形式發送郵箱
選擇輸出的長度
點擊此處下載結果
image.png 輸出結果為pdf文件自己調節細節