University of Florida
Bioinformatics

Brocchieri Lab
Riva Lab
McIntyre Lab

Amino Acid and Codon Usages

The purpose of this research project is to investigate the relations of protein, DNA and mRNA composition with environmental conditions and patterns of expression in eukaryotic, prokaryotic and viral organisms. The sequence of amino acids that compose a protein largely determines the structural and functional proteins of the protein. A multitude of factors may condition the frequencies of amino acids in proteins. Amino acid usages have also been studied in relation to environmental conditions and adaptation. For example, acidic residues are found to be predominant over basic residues in halophilic prokaryotes (Gandbhir et al. 1995, Kennedy et al. 2001). Among the properties that distinguish thermo-resistant proteins are surface-loop reduction, increased number of hydrophobic interactions, increased frequency of branched residues, of long-range interactions and of solubility, increased number of hydrogen bonds and of the fraction of polar surface, increased proportion of charged residues and salt bridges (see, e.g., Kumar and Nussinov 2001, for review). Akashi and Gojobori (2002) and Seligmann (2003) provide evidence that amino acid composition in the proteomes of E. coli and B. subtilis reflects the action of natural selection to enhance metabolic efficiency. We propose (Brocchieri ad Karlin, 2005) that the shorter median protein length observed in certain prokaryotic species may reflect an advantage of producing less expensive proteins in species subject to starving conditions. Amino acid usages are also largely influenced by other factors that primarily affect DNA composition. A large number of studies have been devoted to study the relation of amino acid usage and genomic DNA properties, particularly in regard to genome G+C content. These studies have verified that the usage of amino acid types encoded by codons rich or poor in G+C content correlates with the genomic G+C content (Sueoka 1961, D'Onofrio et al. 1991, 1999, Karlin et al. 1992, Porter 1995, Lobry 1997, Gu et al. 1998, Nishizawa and Nishizawa 1998, Besemer and Borodovsky 1999, Wilquet and Van de Casteele 1999, Singer and Hickey 2000, Knight et al. 2001). Kreil and Ouzounis (2001) and Tekaia et al. (2002) conclude that the two principal components explaining most of the amino acid usage variability among proteomes correlate with genome C+G content and living temperature (mesophile vs. thermophile). A biologically relevant conclusion from the effect of G+C content on amino acid usages is that the usage of amino acids in proteins largely depends of processes that are not related to selection at the protein level. In this respect, amino acid usages could also be influenced by processes of mutation and selection at the DNA and mRNA level, e.g., affecting DNA or mRNA stability or efficiency of transcription and translation, and how these processes are related to the different patterns of expression of the genes.

Publications supported by this project: