ALTER:序列比对格式转化小工具
- 2020 年 3 月 3 日
- 笔记
多序列比对结果可以存储为很多格式(Multiple sequence alignments can be stored in a large variety of formats.)
比如最常见的:
Fasta
>ccsA1 ATGATATTTTCAACTTTAGAGCATATAT >ccsA2 ATGATATTTTCAACTTTAGAGCATATAT >ccsA3 ATGATATTTTCAACTTTAGAGCATATAT >ccsA4 ATGATATTTTCAACTTTAGAGCATATAT
clustal
CLUSTAL W (1.8) multiple sequence alignment (ALTER 1.3.3) ccsA1 ATGATATTTTCAACTTTAGAGCATATAT ccsA2 ATGATATTTTCAACTTTAGAGCATATAT ccsA3 ATGATATTTTCAACTTTAGAGCATATAT ccsA4 ATGATATTTTCAACTTTAGAGCATATAT ****************************
NEXUS
#NEXUS BEGIN DATA; dimensions ntax=4 nchar=28; format missing=? interleave=yes datatype=DNA gap=- match=.; matrix ccsA1 ATGATATTTTCAACTTTAGAGCATATAT ccsA2 ATGATATTTTCAACTTTAGAGCATATAT ccsA3 ATGATATTTTCAACTTTAGAGCATATAT ccsA4 ATGATATTTTCAACTTTAGAGCATATAT ; end;
PHYLIP
4 28 ccsA1 atgatatttt caactttaga gcatatat ccsA2 atgatatttt caactttaga gcatatat ccsA3 atgatatttt caactttaga gcatatat ccsA4 atgatatttt caactttaga gcatatat
MEGA
#mega TITLE: MSA converted with ALTER 1.3.3 #ccsA1 ATGATATTTT CAACTTTAGA GCATATAT #ccsA2 ATGATATTTT CAACTTTAGA GCATATAT #ccsA3 ATGATATTTT CAACTTTAGA GCATATAT #ccsA4 ATGATATTTT CAACTTTAGA GCATATAT
不同的比对软件会输出不一样的比对格式;比对后分析用到的软件对输入格式的要求也不一样。比如序列比对我习惯使用MAFFT。MAFFT输出结果默认为fasta格式,clustal可选;如果后续需要使用MrBayes构建贝叶斯树,需要将其转化为NEXUS格式。这里推荐 ALTER http://www.sing-group.org/ALTER/ 来完成比对格式转化的任务。如果分析的序列不是很多,可以选择网页版;如果序列条数比较多可以选择安装本地版 https://github.com/sing-group/ALTER;按照安装步骤执行即可,自己的安装过程没有遇到报错;
安装步骤
git clone https://github.com/sing-group/ALTER.git cd ALTER mvn package
依赖
Git tool for cloning the last version A Java Compiler and tool The Maven tool
以上依赖软件都可以通过conda安装;关于conda的安装教程可以微信搜索教程价值999的全外显子教学视频–免费送
安装好以后执行
java -jar alter-lib/target/ALTER-1.3.4-jar-with-dependencies.jar help # 输出结果 No argument is allowed: help -c (--collapse) : Collapse sequences to haplotypes. -cg (--collapseGaps) : Treat gaps as missing data when collapsing. -cl (--collapseLimit) N : Connection limit (sequences differing at <= l si tes will be collapsed) (default is l=0). -cm (--collapseMissing) : Count missing data as differences when collapsin g. -i (--input) FILE : Input file. -ia (--inputAutodetect) : Autodetect format (other input options are omitt ed). -if (--inputFormat) VAL : Input format (ALN, FASTA, GDE, MEGA, MSF, NEXUS, PHYLIP or PIR). -io (--inputOS) VAL : Input operating system (Linux, MacOS or Windows) . -ip (--inputProgram) VAL : Input program (Clustal, MAFFT, MUSCLE, PROBCONS or TCoffee). -o (--output) FILE : Output file. -of (--outputFormat) VAL : Output format (ALN, FASTA, GDE, MEGA, MSF, NEXUS , PHYLIP or PIR). -ol (--outputLowerCase) : Lowe case output. -om (--outputMatch) : Output match characters. -on (--outputResidueNumbers) : Output residue numbers (only ALN format). -oo (--outputOS) VAL : Output operating system (Linux, MacOS or Windows ). -op (--outputProgram) VAL : Output program (jModelTest, MrBayes, PAML, PAUP, PhyML, ProtTest, RAxML, TCS, CodABC, BioEdit, M EGA, dnaSP, Se-Al, Mesquite, SplitsTree, Clustal , MAFFT, MUSCLE, PROBCONS, TCoffee, Gblocks, Sea View, trimAl or GENERAL) -os (--outputSequential) : Sequential output (only NEXUS and PHYLIP formats ).
我自己将fasta格式转化为NEXUX格式
java -jar alter-lib/target/ALTER-1.3.4-jar-with-dependencies.jar -i ~/mingyan/practice_assorted/Myrtales_CP_genomes/another/Myrtales_cp_genome_aligned.fasta-gb -ia -o ./output.nex -of NEXUS -op MrBayes -oo Linux # 运行结果 <INFO> : FASTA format detected. <INFO> : MSA read in FASTA format (Taxa = 90, Length = 106571). <INFO> : Nucleotide MSA type inferred. <INFO> : MSA successfully converted to NEXUS format!
小工具对应的论文
ALTER: program-oriented conversion of DNA and protein alignments
期刊
Nucleic Acids Research 2010年