University of Florida
Bioinformatics

Phylogenetic-Tree Reconstruction

Genome biases and selective processes at the level of protein and nucleic acids determine the patterns of differentiation of homologous protein sequences. Methods for reconstructing the phylogenetic relations among protein sequences seek to model the effects of these factors on the relation between protein sequence similarity and evolutionary distance. Sequence dissimilarity (p-distance = 1 - q, where q is the fractional sequence identity) is known to be an unreliable indicator of phylogenetic relations of two sequences and can support wrong tree topologies when evolutionary rates vary among lineages (Rzhetsky and Sitnikova 1996). In distance methods true evolutionary distances (defined as the average number of mutational events per site that occurred in the evolution from one sequence to the other) are inferred by some transformation of p-distances. Different transformations have been proposed to estimate the average number of mutations per site between two sequences from their p-distance (e.g., Zuckerkandl and Pauling 1965, Dayhoff et al. 1972, 1978, Kimura 1981, Ota and Nei 1994, Grishin 1995). These transformations result in different determination of evolutionary distance from the same sequence similarity. Significantly different distance assessments are attained for evolutionary distances greater than, say, 0.5 mutations per site (Brocchieri 2001). As a consequence, phylogenetic tree reconstructions can be severely conditioned by the choice of transformation adopted (see also Grishin 1999).
Distance transformations proposed in the literature implicitly average the effects of positive selection and neutral evolution in the processes of protein differentiation and interpret position-specific rates of differentiation as a combination of these two factors. In a new approach, we will develop new models and procedures for the estimate of evolutionary distances that seek to model neutral differentiation, divergence due to positive selection and mutational biases (e.g., genome G+C content) in affecting divergence between sequences.

Publications supported by this project:

 

Brocchieri Lab
McIntyre Lab
Riva Lab