Amino Acids and Their Properties



Similar documents
Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Pairwise Sequence Alignment

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Bio-Informatics Lectures. A Short Introduction

RNA Structure and folding

Clone Manager. Getting Started

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.

1 Mutation and Genetic Change

The Steps. 1. Transcription. 2. Transferal. 3. Translation

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

The Central Dogma of Molecular Biology

Network Protocol Analysis using Bioinformatics Algorithms

MAKING AN EVOLUTIONARY TREE

Graph theoretic approach to analyze amino acid network

Concluding lesson. Student manual. What kind of protein are you? (Basic)

Introduction to Bioinformatics 3. DNA editing and contig assembly

6.4 Normal Distribution

Chapter 6 DNA Replication

Provincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme.

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

PRACTICE TEST QUESTIONS

Bob Jesberg. Boston, MA April 3, 2014

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

Lab # 12: DNA and RNA

Separation of Amino Acids by Paper Chromatography

Chapter 5: The Structure and Function of Large Biological Molecules

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

H H N - C - C 2 R. Three possible forms (not counting R group) depending on ph

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Introduction to Principal Components and FactorAnalysis

Vector NTI Advance 11 Quick Start Guide

Principles of Evolution - Origin of Species

Molecular Genetics. RNA, Transcription, & Protein Synthesis

Name: Date: Period: DNA Unit: DNA Webquest

Hidden Markov Models

Structure and Function of DNA

Translation Study Guide

13.2 Ribosomes & Protein Synthesis

Lecture 3: Mutations

CALCULATIONS & STATISTICS

DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

Name: Date: Problem How do amino acid sequences provide evidence for evolution? Procedure Part A: Comparing Amino Acid Sequences

Lab 2/Phylogenetics/September 16, PHYLOGENETICS

Evidence for evolution factsheet

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams.

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Regents Biology REGENTS REVIEW: PROTEIN SYNTHESIS

Bonding & Molecular Shape Ron Robertson

Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

AP BIOLOGY 2010 SCORING GUIDELINES (Form B)

RNA and Protein Synthesis

Introduction to Phylogenetic Analysis

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

7 Gaussian Elimination and LU Factorization

Chapter 17. How are acids different from bases? Acid Physical properties. Base. Explaining the difference in properties of acids and bases

CSC 2427: Algorithms for Molecular Biology Spring Lecture 16 March 10

Bioinformatics Resources at a Glance

Linear Sequence Analysis. 3-D Structure Analysis

agucacaaacgcu agugcuaguuua uaugcagucuua

MUTATION, DNA REPAIR AND CANCER

Transcription and Translation of DNA

Introduction to Bioinformatics AS Laboratory Assignment 6

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes

a. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled

Preliminary MFM Quiz

Protein Synthesis How Genes Become Constituent Molecules

Bioinformatics Grid - Enabled Tools For Biologists.

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Gene mutation and molecular medicine Chapter 15

Ms. Campbell Protein Synthesis Practice Questions Regents L.E.

Representing Vector Fields Using Field Line Diagrams

Lab 3 Organic Molecules of Biological Importance

Lecture 19: Proteins, Primary Struture

Titration curves. Strong Acid-Strong Base Titrations

Operation Count; Numerical Linear Algebra

Guide for Bioinformatics Project Module 3

Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

BCOR101 Midterm II Wednesday, October 26, 2005

DNA, RNA, Protein synthesis, and Mutations. Chapters

Umm AL Qura University MUTATIONS. Dr Neda M Bogari

Row Echelon Form and Reduced Row Echelon Form

Phylogenetic Trees Made Easy

Translation. Translation: Assembly of polypeptides on a ribosome

12.1 The Role of DNA in Heredity

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.

Transcription:

Amino Acids and Their Properties

Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that of another We can estimate relatedness

Amino Acid Substitutions Recall we can align DNA & RNA sequences What does that mean? We can also align two amino acid sequences Can 2 nucleotides partially match? Can 2 amino acids partially match?

Amino Acid Substitutions Aligning sequences Can 2 nucleotides partially match? Are some nucleotide mutations more significant than others? Can 2 amino acids partially match? Are some amino acid mismatches more significant than others?

Amino Acid Substitutions Can 2 nucleotides partially match? Significance of a nucleobase mutation Does name matter? Does location matter? Can 2 amino acids partially match? Significance of an amino acid mutation Name? Location?

Sequence matching and evolution rate Proteins tend to evolve slower than DNA Many DNA changes have no affect on a protein A changed codon may map to the same amino acid Non-coding DNA changes may have no effect What does this mean for gauging the relatedness of humans and chimpanzees? humans and fish?

Sequence matching and evolution rate Ribosomal RNA (rrna) evolves very slowly Much slower than proteins What might rrna matching be good for measuring the relatedness of? humans and chimpanzees? humans and fish? humans and what?

Sequence matching and evolution rate Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used (what's that?) However, different regions of ss-rrna mutate at different rates (Ribosome images next)

The Ribosome Source: www.buzzle.c om/articles/ri bosomesfunction.html

Ribosomes: diagrams and images...check images.google.com for: Ribosome diagram Ribosome structure Videos includehttp://www.youtube.com/watch?v=id7tdar39ow

Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that of another We can estimate relatedness

Relatedness and Mutations Much DNA mutates relatively quickly Much ss-rrna mutates relatively slowly Much protein mutates at intermediate rates Let's focus on protein mutation next

Amino acid subsitutions Some amino acids substitutions are more likely than others Why?

Amino acid substitutions Some amino acids substitutions are more likely than others Why? Some are closer to others in terms of nucleobase codons Some are closer in terms of resulting protein function

Amino acid substitutions II Substituting similar ones is likely to Retain the protein structure and function Substituting dissimilar ones is likely to Change the protein structure and function Similarity of amino acids means what?

Amino acid substitutions III Similarity of amino acids means similar physicochemical properties Physicochemical: Concerning the physical and chemical Concerning physical chemistry Physical chemistry: Connecting macroscopic properties of substances with their molecular properties

Amino acid physicochemical properties Nonpolar(Hydrophobic) ACFGILMPVW Polar (hydrophilic): NQSTY Aromatic: FHWY (having to do with 6-carbon rings) Basic: HKR Acidic: DE (See http://www.bio.davidson.edu/courses/genomics/jmol/aatable.html By way of contrast, can anyone think of a nonphysicochemical property of some amino acids?

Aromatic Special type of ring-shaped molecule Characterized by an unusual stabilizing property Aliphatic Non-aromatic

Amino acid abbrevs. G=glycine, P=proline, T=threonine, A=alanine,, but why the following?? F=phenylalanine Y=tyrosine N=asparagine Q=glutamine W=tryptophan

Scoring protein sequence alignments Simple way: Two matching (identical) amino acids score 1 Two mismatching (non-identical) ones score 0 Goal: maximize % of matching amino acids Works well for very similar sequences Example: CADQH CADPM Alignment score=

Scoring protein sequence alignments II Simple way ignores degree of similarity better to account for degree of similarity! Solution: substitution matrices PAM (Accepted Point Mutation, but PAM easier to say than APM ) matrix Developed in 1970s by Margaret Dayhoff PAM1 matrix: answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one?

Scoring protein sequence alignments II Substitution matrices PAM (Accepted Point Mutation, but PAM easier to say than APM ) matrix PAM1 matrix: answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one? PAM2 matrix: Not 2%! Rather, 1%, twice What is the difference?

Scoring protein sequence alignments II Substitution matrices PAM (Accepted Point Mutation, but PAM easier to say than APM ) matrix PAM1 matrix: answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one? PAM250 matrix: Not 250%, obviously Why obviously? It is 1%, repeated 250 times!

Scoring protein sequence alignments II Substitution matrices PAM (Accepted Point Mutation, but PAM easier to say than APM ) matrix PAM1 matrix: answers question, if 1% of the amino acids in a sequence change, at what rates would each amino acid be substituted for each other one? PAM250 matrix: It is 1%, repeated 250 times! BLOSUM matrix is a popular type also

Scoring protein sequences: Here is PAM250 source: PAM250 http://bioinfo.cnio.es/docus/courses/sek2003filogenias/seq_analysis/pam250matrix.gif CADQH CADPM Alignment score=?

Scoring protein sequences: BLOSUM62 (default in Blast 2.0) Source=http://bioinfo.cnio.es/docus/courses/SEK2003Filoge nias/seq_analysis/pairwise.html.

Why do self substitutions have the highest numbers?

Why use PAM, BLOSUM, etc.? Sequence similarity is related to evolutionary distance Simple base matching (match/not) may work ok for closely related organisms humans and chimps, for example Amino acid matching works better as evolutionary distance increases (why?) We d like to be able to assess relatedness of organisms that diverged long ago humans and worms, for example

Relatedness Long Ago See images.google.com for domains of life We still are not sure, but the 3-domain system seems likely But cladistics demands binary splits, so 3 domains requires 2 splits, and 2 domains are more related than the 3rd

Why use PAM, BLOSUM (II) Organisms that diverged long ago have divergent analogous amino acid sequences Since different amino acid substitutions occur at different frequencies we can measure relatedness back farther e.g. when the fraction of identical amino acids is surprisingly low and the fraction of identical base pairs is even lower

Comparing Sequences with PAMs (+ recap)

What does PAM mean? PAM is considered an acronym for Point Accepted Mutation Accepted Point Mutation (original) Percent Accepted Mutations A point mutation is a substitution of 1 amino acid for another An accepted mutation is one that is passed down through the generations Will a mutation be accepted if it is helpful? Harmful? Neutral? Helpful in some circumstances, harmful in others?

What Does PAM Mean, cont. PAM has two meanings PAM is a unit of evolutionary time PAM is kind of substitution matrix (The meanings are related)

PAM as a Unit of Time A PAM is the amount of evolutionary change resulting in: 1 amino acid mutation per 100 amino acids It is an average over >>100 amino acids because mutations have randomness After 1 PAM, will an organism have exactly 1% of its amino acids different from what they started out as?

PAM, Evolution, and Gaps PAM ignores Insertions Deletions Silent nucleotide substitutions (which are?) PAM counts a change from A to B and back to A as 2 accepted point mutations 2 sequences 200 PAMs apart will have about 25% of amino acids the same!

PAM Matrices They describe substitutability of amino acids, based on empirical evidence Empirical = experiential The matrices are derived from repositories of actual homologous sequences A PAM 1 matrix is geared to best compare 2 sequences that are 1 PAM apart A PAM 250 matrix is good for comparing quite diverged sequences PAM 250 matrix is standard

Creating a PAM Matrix Let f i be the frequency of amino acid i We express f i as a fraction of the total f i = instances of i. instances of any amino acid Frequencies range from 0.091 (L) down to 0.014 (W) The most common amino acid occurs about times more commonly than the least

Creating PAM matrix, cont. Determine mutabilities of the amino acids Some amino acids tend to change easily Others not If alanine s mutability is set to 100 Serine s mutability is 117 (highest, 1991 data) Tryptophan s mutability is 25 (lowest, 1991) Let s look more closely at m i...

Creating PAM matrix, cont. Mutability is a number Given an evolutionary interval of 1 PAM let m i = # mutations of amino acid i # instances of amino acid i Alternatively, m i = p (an instance of i mutates)

Are the formulas on the previous slide identical?

Creating PAM matrix, cont. Next, we break m i into constituent m i,j s That is, i mutates, but into j at what rate? Use actual data from observed mutations Populate a matrix of probabilities

The Diagonal Values on the matrix diagonal do not really describe i mutating into itself! (In reality, can that happen?) They basically show p (i does not mutate) Thus, the columns add up to 1

Is the matrix on the last slide Symmetric? Are there about 1% changed?

PAM0 What do you think a PAM 0 matrix might look like?

PAMn Use matrix multiplication PAM2 = PAM1 x PAM1 PAM3 = PAM2 x PAM1 PAM250? Do it 250 times!

PAM What do you imagine a PAM matrix might look sort of like?

Logarithmicize Actually, we take logarithms to get the usual matrix from the probability matrices First, build another, reference matrix of expected probabilities Assume all amino acids are equally mutable Also assume they mutate into each other in proportion to their frequencies (I.e., overall amino acid frequencies are maintained, but otherwise they don t care what they mutate into)

Logarithmicize Now we have two matrices Make a 3 rd. Each entry is: Observed probability Expected probability we re comparing reality to if mutations were truly random Take the log of each entry to make a 4 th An entry of 1 means 10x more mutations of that type than expected An entry of -1 means what?

Carrying On We now use the matrix to measure relative evolutionary distance