Amino Acids and Their Properties
Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that of another We can estimate relatedness
Amino Acid Substitutions Recall we can align DNA & RNA sequences What does that mean? We can also align two amino acid sequences Can 2 nucleotides partially match? Can 2 amino acids partially match?
Amino Acid Substitutions Aligning sequences Can 2 nucleotides partially match? Are some nucleotide mutations more significant than others? Can 2 amino acids partially match? Are some amino acid mismatches more significant than others?
Amino Acid Substitutions Can 2 nucleotides partially match? Significance of a nucleobase mutation Does name matter? Does location matter? Can 2 amino acids partially match? Significance of an amino acid mutation Name? Location?
Sequence matching and evolution rate Proteins tend to evolve slower than DNA Many DNA changes have no affect on a protein A changed codon may map to the same amino acid Non-coding DNA changes may have no effect What does this mean for gauging the relatedness of humans and chimpanzees? humans and fish?
Sequence matching and evolution rate Ribosomal RNA (rrna) evolves very slowly Much slower than proteins What might rrna matching be good for measuring the relatedness of? humans and chimpanzees? humans and fish? humans and what?
Sequence matching and evolution rate Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used (what's that?) However, different regions of ss-rrna mutate at different rates (Ribosome images next)
The Ribosome Source: www.buzzle.c om/articles/ri bosomesfunction.html
Ribosomes: diagrams and images...check images.google.com for: Ribosome diagram Ribosome structure Videos includehttp://www.youtube.com/watch?v=id7tdar39ow
Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that of another We can estimate relatedness
Relatedness and Mutations Much DNA mutates relatively quickly Much ss-rrna mutates relatively slowly Much protein mutates at intermediate rates Let's focus on protein mutation next
Amino acid subsitutions Some amino acids substitutions are more likely than others Why?
Amino acid substitutions Some amino acids substitutions are more likely than others Why? Some are closer to others in terms of nucleobase codons Some are closer in terms of resulting protein function
Amino acid substitutions II Substituting similar ones is likely to Retain the protein structure and function Substituting dissimilar ones is likely to Change the protein structure and function Similarity of amino acids means what?
Amino acid substitutions III Similarity of amino acids means similar physicochemical properties Physicochemical: Concerning the physical and chemical Concerning physical chemistry Physical chemistry: Connecting macroscopic properties of substances with their molecular properties
Amino acid physicochemical properties Nonpolar(Hydrophobic) ACFGILMPVW Polar (hydrophilic): NQSTY Aromatic: FHWY (having to do with 6-carbon rings) Basic: HKR Acidic: DE (See http://www.bio.davidson.edu/courses/genomics/jmol/aatable.html By way of contrast, can anyone think of a nonphysicochemical property of some amino acids?
Aromatic Special type of ring-shaped molecule Characterized by an unusual stabilizing property Aliphatic Non-aromatic
Amino acid abbrevs. G=glycine, P=proline, T=threonine, A=alanine,, but why the following?? F=phenylalanine Y=tyrosine N=asparagine Q=glutamine W=tryptophan
Scoring protein sequence alignments Simple way: Two matching (identical) amino acids score 1 Two mismatching (non-identical) ones score 0 Goal: maximize % of matching amino acids Works well for very similar sequences Example: CADQH CADPM Alignment score=
Scoring protein sequence alignments II Simple way ignores degree of similarity better to account for degree of similarity! Solution: substitution matrices PAM (Accepted Point Mutation, but PAM easier to say than APM ) matrix Developed in 1970s by Margaret Dayhoff PAM1 matrix: answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?
Scoring protein sequence alignments II Substitution matrices PAM (Accepted Point Mutation, but PAM easier to say than APM ) matrix PAM1 matrix: answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one? PAM2 matrix: Not 2%! Rather, 1%, twice What is the difference?
Scoring protein sequence alignments II Substitution matrices PAM (Accepted Point Mutation, but PAM easier to say than APM ) matrix PAM1 matrix: answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one? PAM250 matrix: Not 250%, obviously Why obviously? It is 1%, repeated 250 times!
Scoring protein sequence alignments II Substitution matrices PAM (Accepted Point Mutation, but PAM easier to say than APM ) matrix PAM1 matrix: answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one? PAM250 matrix: It is 1%, repeated 250 times! BLOSUM matrix is a popular type also
Scoring protein sequences: Here is PAM250 source: PAM250 http://bioinfo.cnio.es/docus/courses/sek2003filogenias/seq_analysis/pam250matrix.gif CADQH CADPM Alignment score=?
Scoring protein sequences: BLOSUM62 (default in Blast 2.0) Source=http://bioinfo.cnio.es/docus/courses/SEK2003Filoge nias/seq_analysis/pairwise.html.
Why do self substitutions have the highest numbers?
Why use PAM, BLOSUM, etc.? Sequence similarity is related to evolutionary distance Simple base matching (match/not) may work ok for closely related organisms humans and chimps, for example Amino acid matching works better as evolutionary distance increases (why?) We d like to be able to assess relatedness of organisms that diverged long ago humans and worms, for example
Relatedness Long Ago See images.google.com for domains of life We still are not sure, but the 3-domain system seems likely But cladistics demands binary splits, so 3 domains requires 2 splits, and 2 domains are more related than the 3rd
Why use PAM, BLOSUM (II) Organisms that diverged long ago have divergent analogous amino acid sequences Since different amino acid substitutions occur at different frequencies we can measure relatedness back farther e.g. when the fraction of identical amino acids is surprisingly low and the fraction of identical base pairs is even lower
Comparing Sequences with PAMs (+ recap)
What does PAM mean? PAM is considered an acronym for Point Accepted Mutation Accepted Point Mutation (original) Percent Accepted Mutations A point mutation is a substitution of 1 amino acid for another An accepted mutation is one that is passed down through the generations Will a mutation be accepted if it is helpful? Harmful? Neutral? Helpful in some circumstances, harmful in others?
What Does PAM Mean, cont. PAM has two meanings PAM is a unit of evolutionary time PAM is kind of substitution matrix (The meanings are related)
PAM as a Unit of Time A PAM is the amount of evolutionary change resulting in: 1 amino acid mutation per 100 amino acids It is an average over >>100 amino acids because mutations have randomness After 1 PAM, will an organism have exactly 1% of its amino acids different from what they started out as?
PAM, Evolution, and Gaps PAM ignores Insertions Deletions Silent nucleotide substitutions (which are?) PAM counts a change from A to B and back to A as 2 accepted point mutations 2 sequences 200 PAMs apart will have about 25% of amino acids the same!
PAM Matrices They describe substitutability of amino acids, based on empirical evidence Empirical = experiential The matrices are derived from repositories of actual homologous sequences A PAM 1 matrix is geared to best compare 2 sequences that are 1 PAM apart A PAM 250 matrix is good for comparing quite diverged sequences PAM 250 matrix is standard
Creating a PAM Matrix Let f i be the frequency of amino acid i We express f i as a fraction of the total f i = instances of i. instances of any amino acid Frequencies range from 0.091 (L) down to 0.014 (W) The most common amino acid occurs about times more commonly than the least
Creating PAM matrix, cont. Determine mutabilities of the amino acids Some amino acids tend to change easily Others not If alanine s mutability is set to 100 Serine s mutability is 117 (highest, 1991 data) Tryptophan s mutability is 25 (lowest, 1991) Let s look more closely at m i...
Creating PAM matrix, cont. Mutability is a number Given an evolutionary interval of 1 PAM let m i = # mutations of amino acid i # instances of amino acid i Alternatively, m i = p (an instance of i mutates)
Are the formulas on the previous slide identical?
Creating PAM matrix, cont. Next, we break m i into constituent m i,j s That is, i mutates, but into j at what rate? Use actual data from observed mutations Populate a matrix of probabilities
The Diagonal Values on the matrix diagonal do not really describe i mutating into itself! (In reality, can that happen?) They basically show p (i does not mutate) Thus, the columns add up to 1
Is the matrix on the last slide Symmetric? Are there about 1% changed?
PAM0 What do you think a PAM 0 matrix might look like?
PAMn Use matrix multiplication PAM2 = PAM1 x PAM1 PAM3 = PAM2 x PAM1 PAM250? Do it 250 times!
PAM What do you imagine a PAM matrix might look sort of like?
Logarithmicize Actually, we take logarithms to get the usual matrix from the probability matrices First, build another, reference matrix of expected probabilities Assume all amino acids are equally mutable Also assume they mutate into each other in proportion to their frequencies (I.e., overall amino acid frequencies are maintained, but otherwise they don t care what they mutate into)
Logarithmicize Now we have two matrices Make a 3 rd. Each entry is: Observed probability Expected probability we re comparing reality to if mutations were truly random Take the log of each entry to make a 4 th An entry of 1 means 10x more mutations of that type than expected An entry of -1 means what?
Carrying On We now use the matrix to measure relative evolutionary distance