Patterns of nucleotide and amino acid substitution

Similar documents
Hands on Simulation of Mutation

Mutation. Mutation provides raw material to evolution. Different kinds of mutations have different effects

Gene Finding CMSC 423

Molecular Facts and Figures

( TUTORIAL. (July 2006)

Protein Synthesis Simulation

DNA Bracelets

Hiding Data in DNA. 1 Introduction

Biological One-way Functions

Insulin mrna to Protein Kit

UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS

Coding sequence the sequence of nucleotide bases on the DNA that are transcribed into RNA which are in turn translated into protein

a. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled

UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet

SEAC 2012 Medical Director Potpourri BANNER. WILLIAM PENN. YOUR COMPANY FOR LIFE

Gene and Chromosome Mutation Worksheet (reference pgs in Modern Biology textbook)

Ribosomal Protein Synthesis

CHALLENGES IN THE HUMAN GENOME PROJECT

Problem Set 3 KEY

Chapter 9. Applications of probability. 9.1 The genetic code

Provincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme.

Part ONE. a. Assuming each of the four bases occurs with equal probability, how many bits of information does a nucleotide contain?

Mutations and Genetic Variability. 1. What is occurring in the diagram below?

BIOLÓGIA ANGOL NYELVEN

Name: Date: Problem How do amino acid sequences provide evidence for evolution? Procedure Part A: Comparing Amino Acid Sequences

The p53 MUTATION HANDBOOK

Mutation, Repair, and Recombination

Pipe Cleaner Proteins. Essential question: How does the structure of proteins relate to their function in the cell?

Shu-Ping Lin, Ph.D.

PRINCIPLES OF POPULATION GENETICS

The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION LIVING ENVIRONMENT. Tuesday, January 25, :15 a.m. to 12:15 p.m.

ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes

PRACTICE TEST QUESTIONS

AP BIOLOGY 2010 SCORING GUIDELINES (Form B)

Molecular Genetics. RNA, Transcription, & Protein Synthesis

GENEWIZ, Inc. DNA Sequencing Service Details for USC Norris Comprehensive Cancer Center DNA Core

Protein Synthesis. Page 41 Page 44 Page 47 Page 42 Page 45 Page 48 Page 43 Page 46 Page 49. Page 41. DNA RNA Protein. Vocabulary

The Puzzle of Life A Lesson Plan for Life S cien ce Teach ers From: The G reat Lakes S cien ce C ent er, C lev elan d, OH

Introduction. What is Ecological Genetics?

13.2 Ribosomes & Protein Synthesis

DnaSP, DNA polymorphism analyses by the coalescent and other methods.

Genomes and SNPs in Malaria and Sickle Cell Anemia

BOC334 (Proteomics) Practical 1. Calculating the charge of proteins

Transcription and Translation of DNA

Genetics Lecture Notes Lectures 1 2

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

10 µg lyophilized plasmid DNA (store lyophilized plasmid at 20 C)

Bio 102 Practice Problems Genetic Code and Mutation

From DNA to Protein

The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION LIVING ENVIRONMENT. Monday, January 26, :15 a.m. to 12:15 p.m.

Academic Nucleic Acids and Protein Synthesis Test

Umm AL Qura University MUTATIONS. Dr Neda M Bogari

Two-locus population genetics

Table S1. Related to Figure 4

Ms. Campbell Protein Synthesis Practice Questions Regents L.E.

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Guidelines for Writing a Scientific Paper

Recombinant DNA Technology

Amino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group.

Specific problems. The genetic code. The genetic code. Adaptor molecules match amino acids to mrna codons

Protein Synthesis How Genes Become Constituent Molecules

DNA Sample preparation and Submission Guidelines

IV. -Amino Acids: carboxyl and amino groups bonded to -Carbon. V. Polypeptides and Proteins

(A) Microarray analysis was performed on ATM and MDM isolated from 4 obese donors.

Sickle cell anemia: Altered beta chain Single AA change (#6 Glu to Val) Consequence: Protein polymerizes Change in RBC shape ---> phenotypes

Thymine = orange Adenine = dark green Guanine = purple Cytosine = yellow Uracil = brown

Drosophila NK-homeobox genes

Microbial Genetics (Chapter 8) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College. Eastern Campus

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

Advanced Medicinal & Pharmaceutical Chemistry CHEM 5412 Dept. of Chemistry, TAMUK

Translation Study Guide

GenesCodeForProteins

Amino Acids and Their Properties

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

Concluding lesson. Student manual. What kind of protein are you? (Basic)

Amino Acids, Peptides, Proteins

Recap. Lecture 2. Protein conformation. Proteins. 8 types of protein function 10/21/10. Proteins.. > 50% dry weight of a cell

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

BCOR101 Midterm II Wednesday, October 26, 2005

LECTURE 6 Gene Mutation (Chapter )

Inquiry Labs In The High School Classroom. EDTL 611 Final Project By Misha Stredrick and Laura Rettig

RNA & Protein Synthesis

Protein Physics. A. V. Finkelstein & O. B. Ptitsyn LECTURE 1

Introduction to Bioinformatics (Master ChemoInformatique)

Actual Quiz 1 (closed book) will be given Monday10/4 at 10:00 am

Next Generation Sequencing

Characterization of cdna clones of the family of trypsin/a-amylase inhibitors (CM-proteins) in barley {Hordeum vulgare L.)

GENETIC CODING. A mathematician considers the problem of how genetic information is encoded for transmission from parent to offspring,.

Regents Biology REGENTS REVIEW: PROTEIN SYNTHESIS

Structure and Function of DNA

A. A peptide with 12 amino acids has the following amino acid composition: 2 Met, 1 Tyr, 1 Trp, 2 Glu, 1 Lys, 1 Arg, 1 Thr, 1 Asn, 1 Ile, 1 Cys

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Part A: Amino Acids and Peptides (Is the peptide IAG the same as the peptide GAI?)

Introduction to Perl Programming Input/Output, Regular Expressions, String Manipulation. Beginning Perl, Chap 4 6. Example 1

RNA and Protein Synthesis

Module 6: Digital DNA

Biology Final Exam Study Guide: Semester 2

Input Data Files (FASTA format; MEGA format; NBRF/PIR format; NEXUS format; PHYLIP format; HapMap3

The Steps. 1. Transcription. 2. Transferal. 3. Translation

Transcription:

Patterns of nucleotide and amino acid substitution Introduction So I ve just suggested that the neutral theory of molecular evolution explains quite a bit, but it also ignores quite a bit. 1 The derivations we did assumed that all substitutions are equally likely to occur, because they are selectively neutral. That isn t plausible. We need look no further than sickle cell anemia to see an example of a protein polymorphism in which a single amino acid difference has a very large effect on fitness. Even reasoning from first principles we can see that it doesn t make much sense to think that all nucleotide substitutions are created equal. Just as it s unlikely that you ll improve the performance of your car if you pick up a sledgehammer, open its hood, close your eyes, and hit something inside, so it s unlikely that picking a random amino acid in a protein and substituting it with a different one will improve the function of the protein. 2 The genetic code Of course, not all nucleotide sequence substitutions lead to amino acid substitutions in protein-coding genes. There is redundancy in the genetic code. Table 1 is a list of the codons in the universal genetic code. 3 Notice that there are only two amino acids, methionine and tryptophan, that have a single codon. All the rest have at least two. Serine, arginine, and leucine have six. Moreover, most of the redundancy is in the third position, where we can distinguish 2-fold from 4-fold redundant sites (Table 2). 2-fold redundant sites are those at which either one 1 I won t make my bikini joke. I ll save that for when we get to quantitative genetics. Nonetheless, the pure version of the neutral theory of molecular evolution makes a lot of simplifying assumptions. 2 Obviously it happens sometimes. If it didn t, there wouldn t be any adaptive evolution. It s just that, on average, mutations are more likely to decrease fitness than to increase it. 3 By the way, the universal genetic code is not universal. There are at least eight, but all of them have similar redundancy properties. c 2001-2017 Kent E. Holsinger

Amino Amino Amino Amino Codon Acid Codon Acid Codon Acid Codon Acid UUU Phe UCU Ser UAU Tyr UGU Cys UUC Phe UCC Ser UAC Tyr UGC Cys UUA Leu UCA Ser UAA Stop UGA Stop UUG Leu UCG Ser UAG Stop UGG Trp CUU Leu CCU Pro CAU His CGU Arg CUC Leu CCC Pro CAC His CGC Arg CUA Leu CCA Pro CAA Gln CGA Arg CUG Leu CCG Pro CAG Gln CGG Arg AUU Ile ACU Thr AAU Asn AGU Ser AUC Ile ACC Thr AAC Asn AGC Ser AUA Ile ACA Thr AAA Lys AGA Arg AUG Met ACG Thr AAG Lys AGG Arg GUU Val GCU Ala GAU Asp GGU Gly GUC Val GCC Ala GAC Asp GGC Gly GUA Val GCA Ala GAA Glu GGA Gly GUG Val GCG Ala GAG Glu GGG Gly Table 1: The universal genetic code. 2

Amino Codon Acid Redundancy CCU Pro 4-fold CCC CCA CCG AAU Asn 2-fold AAC AAA Lys 2-fold AAG Table 2: Examples of 4-fold and 2-fold redundancy in the 3rd position of the universal genetic code. of two nucleotides can be present in a codon for a single amino acid. 4-fold redundant sites are those at which any of the four nucleotides can be present in a codon for a single amino acid. In some cases there is redundancy in the first codon position, e.g, both AGA and CGA are codons for arginine. Thus, many nucleotide substitutions at third positions do not lead to amino acid substitutions, and some nucleotide substitutions at first positions do not lead to amino acid substitutions. But every nucleotide substitution at a second codon position leads to an amino acid substitution. Nucleotide substitutions that do not lead to amino acid substitutions are referred to as synonymous substitutions, because the codons involved are synonymous, i.e., code for the same amino acid. Nucleotide substitutions that do lead to amino acid substituions are non-synonymous substitutions. Rates of synonymous and non-synonymous substitution By using a modification of the simple Jukes-Cantor model we encountered before, it is possible make separate estimates of the number of synonymous substitutions and of the number of non-synonymous substitutions that have occurred since two sequences diverged from a common ancestor. If we combine an estimate of the number of differences with an estimate of the time of divergence we can estimate the rates of synonymous and nonsynonymous substitution (number/time). Table 3 shows some representative estimates for the rates of synonymous and non-synonymous substitution in different genes studied in mammals. Two very important observations emerge after you ve looked at this table for awhile. The 3

Locus Non-synonymous rate Synonymous rate Histone H4 0.00 3.94 H2 0.00 4.52 Ribosomal proteins S17 0.06 2.69 S14 0.02 2.16 Hemoglobins & myoglobin α-globin 0.56 4.38 β-globin 0.78 2.58 Myoglobin 0.57 4.10 Interferons γ 3.06 5.50 α1 1.47 3.24 β1 2.38 5.33 Table 3: Representative rates of synonymous and non-synonymous substitution in mammalian genes (from [11]). Rates are expressed as the number of substitutions per 10 9 years. first won t come as any shock. The rate of non-synonymous substitution is generally lower than the rate of synonymous substitution. This is a result of my sledgehammer principle. Mutations that change the amino acid sequence of a protein are more likely to reduce that protein s functionality than to increase it. As a result, they are likely to lower the fitness of individuals carrying them, and they will have a lower probability of being fixed than those mutations that do not change the amino acid sequence. The second observation is more subtle. Rates of non-synonymous substitution vary by more than two orders of magnitude: 0.02 substitutions per nucleotide per billion years in ribosomal protein S14 to 3.06 substitutions per nucleotide per billion years in γ-interferon, while rates of synonymous substitution vary only by a factor of two (2.16 in ribosomal protein S14 to 5.50 in γ interferons. If synonymous substitutions are neutral, as they probably are to a first approximation, 4 then the rate of synonymous substitution should equal the mutation rate. Thus, the rate of synonymous substitution should be approximately the same at every locus, which is roughly what we observe. But proteins differ in the degree to which 4 We ll see that they may not be completely neutral a little later, but at least it s reasonable to believe that the intensity of selection to which they are subject is less than that to which non-synonymous substitutions are subject. 4

their physiological function affects the performance and fitness of the organisms that carry them. Some, like histones and ribosomal proteins, are intimately involved with chromatin or translation of messenger RNA into protein. It s easy to imagine that just about any change in the amino acid sequence of such proteins will have a detrimental effect on its function. Others, like interferons, are involved in responses to viral or bacterial pathogens. It s easy to imagine not only that the selection on these proteins might be less intense, but that some amino acid substitutions might actually be favored by natural selection because they enhance resistance to certain strains of pathogens. Thus, the probability that a nonsynonymous substitution will be fixed is likely to vary substantially among genes, just as we observe. Revising the neutral theory So we ve now produced empirical evidence that many mutations are not neutral. Does this mean that we throw the neutral theory of molecular evolution away? Hardly. We need only modify it a little to accomodate these new observations. Most non-synonymous substitutions are deleterious. We can actually generalize this assertion a bit and say that most mutations that affect function are deleterious. After all, organisms have been evolving for about 3.5 billion years. Wouldn t you expect their cellular machinery to work pretty well by now? Most molecular variability found in natural populations is selectively neutral. If most function-altering mutations are deleterious, it follows that we are unlikely to find much variation in populations for such mutations. Selection will quickly eliminate them. Natural selection is primarily purifying. Although natural selection for variants that improve function is ultimately the source of adaptation, even at the molecular level, most of the time selection is simply eliminating variants that are less fit than the norm, not promoting the fixation of new variants that increase fitness. Alleles enhancing fitness are rapidly incorporated. 5 They do not remain polymorphic for long, so we aren t likely to find them when they re polymorphic. As we ll see, even these revisions aren t entirely sufficient, but what we do from here on out is more to provide refinements and clarifications than to undertake wholesale revisions. 5 To be more precise I should have written Alleles enhancing fitness are rapidly incorporated, when they are not lost quickly as a result of genetic drift. 5

References [1] The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature, 449(7164):851 861, 2007. [2] J C Fay and C.-I. Wu. Hitchhiking under positive Darwinian selection. Genetics, 155:1405 1413, 2000. [3] Y X Fu. Statistical properties of segregating sites. Theoretical Population Biology, 48:172 197, 1995. [4] Y.-X. Fu. Statistical tests of neutrality of mutations against population growth, hitchhiking, and background selection. Genetics, 147:915 925, 1997. [5] Feng Guo, Dipak K Dey, and Kent E Holsinger. A Bayesian hierarchical model for analysis of SNP diversity in multilocus, multipopulation samples. Journal of the American Statistical Association, 104(485):142 154, March 2009. [6] J L Hubby and R C Lewontin. A molecular approach to the study of genic heterozygosity in natural populations. I. The number of alleles at different loci in Drosophila pseudoobscura. Genetics, 54:577 594, 1966. [7] M Kreitman. Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster. Nature, 304:412 417, 1983. [8] M Kreitman and M Aguadé. Excess polymorphism at the alcohol dehydrogenase locus in Drosophila melanogaster. Genetics, 114:93 110, 1986. [9] M Kreitman and R R Hudson. Inferring the evolutionary history of the Adh and Adhdup loci in Drosophila melanogaster from patterns of polymorphism and divergence. Genetics, 127:565 582, 1991. [10] R C Lewontin and J L Hubby. A molecular approach to the study of genic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. Genetics, 54:595 609, 1966. [11] W.-H. Li. Molecular Evolution. Sinauer Associates, Sunderland, MA, 1997. [12] F Tajima. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics, 123:585 595, 1989. [13] K Zeng, Y.-X. Fu, S Shi, and C.-I. Wu. Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics, 174:1431 1439, 2006. 6

Creative Commons License These notes are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA. 7