Amino Acid and Codon Usages
The purpose of this research project is to investigate the relations of protein, DNA and mRNA composition with environmental conditions and patterns of expression in eukaryotic, prokaryotic and viral organisms. The sequence of amino acids that compose a protein largely determines the structural and functional proteins of the protein. A multitude of factors may condition the frequencies of amino acids in proteins. Amino acid usages have also been studied in relation to environmental conditions and adaptation. For example, acidic residues are found to be predominant over basic residues in halophilic prokaryotes (Gandbhir et al. 1995, Kennedy et al. 2001). Among the properties that distinguish thermo-resistant proteins are surface-loop reduction, increased number of hydrophobic interactions, increased frequency of branched residues, of long-range interactions and of solubility, increased number of hydrogen bonds and of the fraction of polar surface, increased proportion of charged residues and salt bridges (see, e.g., Kumar and Nussinov 2001, for review). Akashi and Gojobori (2002) and Seligmann (2003) provide evidence that amino acid composition in the proteomes of E. coli and B. subtilis reflects the action of natural selection to enhance metabolic efficiency. We propose (Brocchieri ad Karlin, 2005) that the shorter median protein length observed in certain prokaryotic species may reflect an advantage of producing less expensive proteins in species subject to starving conditions. Amino acid usages are also largely influenced by other factors that primarily affect DNA composition. A large number of studies have been devoted to study the relation of amino acid usage and genomic DNA properties, particularly in regard to genome G+C content. These studies have verified that the usage of amino acid types encoded by codons rich or poor in G+C content correlates with the genomic G+C content (Sueoka 1961, D'Onofrio et al. 1991, 1999, Karlin et al. 1992, Porter 1995, Lobry 1997, Gu et al. 1998, Nishizawa and Nishizawa 1998, Besemer and Borodovsky 1999, Wilquet and Van de Casteele 1999, Singer and Hickey 2000, Knight et al. 2001). Kreil and Ouzounis (2001) and Tekaia et al. (2002) conclude that the two principal components explaining most of the amino acid usage variability among proteomes correlate with genome C+G content and living temperature (mesophile vs. thermophile). A biologically relevant conclusion from the effect of G+C content on amino acid usages is that the usage of amino acids in proteins largely depends of processes that are not related to selection at the protein level. In this respect, amino acid usages could also be influenced by processes of mutation and selection at the DNA and mRNA level, e.g., affecting DNA or mRNA stability or efficiency of transcription and translation, and how these processes are related to the different patterns of expression of the genes.
Publications supported by this project:
- Akashi H and Gojobori T (2002). Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis 99: 3695-3700.
- Besemer J and Borodovsky M (1999). Heuristic approach to deriving models for gene finding. Nucleic Acid Res 27: 3911-3920.
- Brocchieri L and Karlin S (2005). Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acid Res 33: 3390-3400.
- D'Onofrio G, Jabbari K, Musto H and Bernardi G (1999). The correlation of protein hydropathy with the base composition of coding sequences. Gene 238: 3-14.
- D'Onofrio G, Mouchiroud D, Aissani B, Gautier C and Bernardi G (1991). Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins. J Mol Evol 32: 493-503.
- Gandbhir M, Rasched I, MarliËre P and Mutzel R (1995). Convergent evolution of amino acid usage in archaebacterial and eubacterial lineages adapted to high salt. Res Microbiol 146: 113-120.
- Gu X, Hwett-Emmett D, and Li WH (1998). Directional mutational pressure affects the amino acid composition and hydrophobicity of proteins in bacteria. Genetica 102-103: 383-391.
- Karlin S, Blaisdell BE and Bucher P (1992). Quantile distributions of amino acid usage in protein classes. Prot Eng 5: 729-738.
- Kennedy SP, Ng WV, Salzberg SL, Hood L and DasSarma S (2001). Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. Genome Res 11: 1641-1650.
- Knight RD, Freeland SJ and Landweber LF (2001). A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biology 2: research0010.1-0010.13.
- Kreil DP and Ouzounis CA (2001). Identification of thermophilic species by the amino acid compositions deduced from their genomes. Nucleic Acid Res 29: 1608-1615.
- Kumar S and Nussinov R (2001). How do thermophilic proteins deal with heat? Cell Mol Life Sci 58: 1216-1233.
- Lobry JR (1997). Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species. Gene 205: 309-316.
- Nishizawa M and Nishizawa K (1998). Biased usages of arginines and lysines in proteins are correlated with local-scale fluctuations of the G+C content of DNA sequences. J Mol Evol 47: 385-393.
- Porter TD (1995). Correlation between codon usage, regional genomic nucleotide composition, and amino acid composition in the cytochrome P-450 gene superfamily. Biochem Biophys Acta 1261: 394-400.
- Singer GAC and Hickey DA (2000). Nucleotide bias causes a genomewide bias in the amino acid composition of proteins. Mol Evol Biol 17: 1581-1588.
- Seligmann H (2003). Cost-minimization of amino acid usage. J Mol Evol 56: 151-161.
- Sueoka N (1961). Correlation between base composition of deoxyribonucleic acid and amino acid composition of proteins. Proc Natl Acad Sci USA 47: 1141-1149.
- Tekaia F, Yeramian E and Dujon B (2002). Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis. Gene 297: 51-60.
- Wilquet V and Van de Casteele M (1999). The role of the first letter in the relationship between genomic GC content and protein amino acid composition. Res. Microbiol. 150: 21-32.
