Phylogenetic-Tree Reconstruction
Genome biases and selective processes at the level of protein and nucleic acids determine the patterns of differentiation of homologous protein sequences. Methods for reconstructing the phylogenetic relations among protein sequences seek to model the effects of these factors on the relation between protein sequence similarity and evolutionary distance. Sequence dissimilarity (p-distance = 1 - q, where q is the fractional sequence identity) is known to be an unreliable indicator of phylogenetic relations of two sequences and can support wrong tree topologies when evolutionary rates vary among lineages (Rzhetsky and Sitnikova 1996). In distance methods true evolutionary distances (defined as the average number of mutational events per site that occurred in the evolution from one sequence to the other) are inferred by some transformation of p-distances. Different transformations have been proposed to estimate the average number of mutations per site between two sequences from their p-distance (e.g., Zuckerkandl and Pauling 1965, Dayhoff et al. 1972, 1978, Kimura 1981, Ota and Nei 1994, Grishin 1995). These transformations result in different determination of evolutionary distance from the same sequence similarity. Significantly different distance assessments are attained for evolutionary distances greater than, say, 0.5 mutations per site (Brocchieri 2001). As a consequence, phylogenetic tree reconstructions can be severely conditioned by the choice of transformation adopted (see also Grishin 1999).
Distance transformations proposed in the literature implicitly average the effects of positive selection and neutral evolution in the processes of protein differentiation and interpret position-specific rates of differentiation as a combination of these two factors. In a new approach, we will develop new models and procedures for the estimate of evolutionary distances that seek to model neutral differentiation, divergence due to positive selection and mutational biases (e.g., genome G+C content) in affecting divergence between sequences.Publications supported by this project:
- Brocchieri L (2001). Phylogenetic inferences from molecular sequences: review and critique. Theor Popul Biol 59: 27-40.
- Dayhoff, MO, Schwartz RM and Orcutt BC (1978). A model of evolutionary change in proteins, in: Atlas of protein sequence and structure (Dayhoff MO, Ed.), vol. 5, suppl. 3, pp. 345-352. National Biomedical Research Foundation.
- Grishin NV (1995). Estimation of the number of amino acid substitutions per site when the distribution rate varies among sites. J Mol Evol 41: 675-679.
- Grishin NV (1999). A novel approach to phylogeny reconstruction from protein sequences. J Mol Evol 48: 264-273.
- Kimura M (1981). Estimation of evolutionary distances between homologous nucleotide sequences. Proc Natl Acad Sci USA 78: 454-458.
- Ota T and Nei M (1994). Estimation of the number of amino acid substitutions per site when the substitution rate varies among sites. J Mol Evol 38: 642-643.
- Rzhetsky A and Sitnikova T (1996). When is it safe to use an oversimplified substitution model in tree-making? Mol Biol Evol 13: 1255-1265.
- Zuckerkandl E and Pauling L (1965). Molecules as documents of evolutionary history. J. Theor. Biol. 8: 357-366.
