ALTER:序列比对格式转化小工具

多序列比对结果可以存储为很多格式(Multiple sequence alignments can be stored in a large variety of formats.)

比如最常见的:

Fasta

>ccsA1  ATGATATTTTCAACTTTAGAGCATATAT  >ccsA2  ATGATATTTTCAACTTTAGAGCATATAT  >ccsA3  ATGATATTTTCAACTTTAGAGCATATAT  >ccsA4  ATGATATTTTCAACTTTAGAGCATATAT

clustal

CLUSTAL W (1.8) multiple sequence alignment (ALTER 1.3.3)      ccsA1           ATGATATTTTCAACTTTAGAGCATATAT  ccsA2           ATGATATTTTCAACTTTAGAGCATATAT  ccsA3           ATGATATTTTCAACTTTAGAGCATATAT  ccsA4           ATGATATTTTCAACTTTAGAGCATATAT                  ****************************

NEXUS

#NEXUS  BEGIN DATA;  dimensions ntax=4 nchar=28;  format missing=?  interleave=yes datatype=DNA gap=- match=.;    matrix  ccsA1       ATGATATTTTCAACTTTAGAGCATATAT  ccsA2       ATGATATTTTCAACTTTAGAGCATATAT  ccsA3       ATGATATTTTCAACTTTAGAGCATATAT  ccsA4       ATGATATTTTCAACTTTAGAGCATATAT    ;  end;

PHYLIP

4 28  ccsA1       atgatatttt caactttaga gcatatat  ccsA2       atgatatttt caactttaga gcatatat  ccsA3       atgatatttt caactttaga gcatatat  ccsA4       atgatatttt caactttaga gcatatat

MEGA

#mega  TITLE: MSA converted with ALTER 1.3.3    #ccsA1       ATGATATTTT CAACTTTAGA GCATATAT  #ccsA2       ATGATATTTT CAACTTTAGA GCATATAT  #ccsA3       ATGATATTTT CAACTTTAGA GCATATAT  #ccsA4       ATGATATTTT CAACTTTAGA GCATATAT

不同的比对软件会输出不一样的比对格式;比对后分析用到的软件对输入格式的要求也不一样。比如序列比对我习惯使用MAFFT。MAFFT输出结果默认为fasta格式,clustal可选;如果后续需要使用MrBayes构建贝叶斯树,需要将其转化为NEXUS格式。这里推荐 ALTER http://www.sing-group.org/ALTER/ 来完成比对格式转化的任务。如果分析的序列不是很多,可以选择网页版;如果序列条数比较多可以选择安装本地版 https://github.com/sing-group/ALTER;按照安装步骤执行即可,自己的安装过程没有遇到报错;

安装步骤

git clone https://github.com/sing-group/ALTER.git  cd ALTER  mvn package

依赖

Git tool for cloning the last version  A Java Compiler and tool  The Maven tool

以上依赖软件都可以通过conda安装;关于conda的安装教程可以微信搜索教程价值999的全外显子教学视频–免费送

安装好以后执行

java -jar alter-lib/target/ALTER-1.3.4-jar-with-dependencies.jar help    # 输出结果  No argument is allowed: help   -c (--collapse)              : Collapse sequences to haplotypes.   -cg (--collapseGaps)         : Treat gaps as missing data when collapsing.   -cl (--collapseLimit) N      : Connection limit (sequences differing at <= l si                                  tes will be collapsed) (default is l=0).   -cm (--collapseMissing)      : Count missing data as differences when collapsin                                  g.   -i (--input) FILE            : Input file.   -ia (--inputAutodetect)      : Autodetect format (other input options are omitt                                  ed).   -if (--inputFormat) VAL      : Input format (ALN, FASTA, GDE, MEGA, MSF, NEXUS,                                   PHYLIP or PIR).   -io (--inputOS) VAL          : Input operating system (Linux, MacOS or Windows)                                  .   -ip (--inputProgram) VAL     : Input program (Clustal, MAFFT, MUSCLE, PROBCONS                                  or TCoffee).   -o (--output) FILE           : Output file.   -of (--outputFormat) VAL     : Output format (ALN, FASTA, GDE, MEGA, MSF, NEXUS                                  , PHYLIP or PIR).   -ol (--outputLowerCase)      : Lowe case output.   -om (--outputMatch)          : Output match characters.   -on (--outputResidueNumbers) : Output residue numbers (only ALN format).   -oo (--outputOS) VAL         : Output operating system (Linux, MacOS or Windows                                  ).   -op (--outputProgram) VAL    : Output program (jModelTest, MrBayes, PAML, PAUP,                                   PhyML, ProtTest, RAxML, TCS, CodABC, BioEdit, M                                  EGA, dnaSP, Se-Al, Mesquite, SplitsTree, Clustal                                  , MAFFT, MUSCLE, PROBCONS, TCoffee, Gblocks, Sea                                  View, trimAl or GENERAL)   -os (--outputSequential)     : Sequential output (only NEXUS and PHYLIP formats                                  ).

我自己将fasta格式转化为NEXUX格式

java -jar alter-lib/target/ALTER-1.3.4-jar-with-dependencies.jar -i ~/mingyan/practice_assorted/Myrtales_CP_genomes/another/Myrtales_cp_genome_aligned.fasta-gb -ia -o ./output.nex -of NEXUS -op MrBayes -oo Linux    # 运行结果  <INFO> : FASTA format detected.  <INFO> : MSA read in FASTA format (Taxa = 90, Length =  106571).  <INFO> : Nucleotide MSA type inferred.  <INFO> : MSA successfully converted to NEXUS format!
小工具对应的论文

ALTER: program-oriented conversion of DNA and protein alignments

期刊

Nucleic Acids Research 2010年