Authors : Tokumasa Horiike
Disciplines : Molecular biology
Keywords : Molecular evolution, phylogenetic analysis, phylogenetic tree
Full Text: : HTML PDF


Phylogenetic analysis using molecular data such as DNA sequence for genes and amino acid sequence for proteins is very common not only in the field of evolutionary biology but also in the wide fields of molecular biology. The reason is that DNA sequencing became very popular and a huge amount of sequence data of genes and proteins are available in the public online database. Since many molecules (genes or proteins) which have various evolutionary rates are available, it is important to choose the suitable molecule for the phylogenetic analysis of a given lineage. For example, when the evolutionary rate of the gene (or protein) is too much higher for a given lineage, the substitution of nucleotide (or amino acid) is saturated. In this case, the accuracy of the phylogenetic analysis decreases. The methods for phylogenetic analysis are improving along with the evolution of computer science. Thus, there are many methods to infer phylogenetic tree, and many programs for each method are available. This mini review shows that general pattern of phylogenetic analysis, and explains some representative methods (Unweighted pair group method with arithmetic, Neighbor-joining method, Maximum parsimony method, maximum likelihood method, and Bayesian method). In the phylogenetic analysis, the most important feature is the interpretation of the phylogenetic tree. Therefore, several distinct points to evaluate a phylogenetic tree are also explained. These include, “validity of the tree shape”, “evolutionary distance”, and “validation of each internal branch”. Towards the end, the procedure of evaluating a phylogenetic tree with an example using MEGA 7 is presented.


Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, and Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25: 3389-3402.

Duret L and Mouchiroud D (2000) Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol., 17: 68-74.

Eck RV and Dayhoff MO (1966) Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Washington, D.C..Edgar CR (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32: 1792-1797.

Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol., 17: 368-376.

Felsenstein J (1989) PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166.

Fitch WM (1971) Toward defining the course of evolution: minimum change for a specified tree topology. Systematic Zoology 20: 406-416.

Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, and Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 59:307-321. Hastings KE (1996) Strong evolutionary conservation of broadly expressed protein isoforms in the troponin I gene family and other vertebrate gene families. J. Mol. Evol., 42: 631-640.

Hennig (1966) Phylogenetic Systematics. University of Illinois Press, Urbana.

Huelsenbeck JP and Ronquist F (2001) MrBayes: Bayesian inference of phylogeny. Bioinformatics, 17: 754-755.

Katoh K, Misawa K, Kuma K, and Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res., 30: 3059-3066.

Kumar S, Stecher G, and Tamura K (2016) MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol Biol Evol. 33:1870-1874.

Milne I, Lindner D, Bayer M, Husmeier D, McGuire G, Marshall DF, and Wright F (2009) TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops. Bioinformatics. 25:126-127.

Miyata T, Yasunaga T, and Nishida T (1980) Nucleotide sequence divergence and functional constraint in mRNA evolution. Proc Natl Acad Sci U S A. 77: 7328-7332.

Nei M and Kumar S (2000) Molecular Evolution and Phylogenetics. Oxford University Press, Inc., New York.

Niimura Y and Nei M (2005) Evolutionary dynamics of olfactory receptor genes in fishes and tetrapods. Proc Natl Acad Sci U S A. 102: 6039-6044.

Page RDM and Holmes EC (1998) Molecular Evolution: A Phylogenetic Approach. Blackwell Science, Oxford.

Price MN, Dehal PS, and Arkin AP (2009) FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol. 26:1641-1650. Saitou N and Nei M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol., 4: 406-25. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, Thompson JD, and Higgins DG (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol., 7: 539.

Sokal R and Michener C (1958) A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin 38: 1409-1438.

Stamatakis A, Ludwig T, and Meier H (2005) RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics, 21: 456-463.

Tamura K, Nei M, and Kumar S (2004) Prospects for inferring very large phylogenies by using the neighbor-joining method. Proc Natl Acad Sci U S A. 101: 11030-11035.

Thompson JD, Higgins DG, and Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22; 4673-4680.

Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, and Higgins DG (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res., 25: 4876-4882.