Lecture 19: Proteins, Primary Struture



Similar documents
The peptide bond is rigid and planar

CSC 2427: Algorithms for Molecular Biology Spring Lecture 16 March 10

Built from 20 kinds of amino acids

(c) How would your answers to problem (a) change if the molecular weight of the protein was 100,000 Dalton?

This class deals with the fundamental structural features of proteins, which one can understand from the structure of amino acids, and how they are

The peptide bond Peptides and proteins are linear polymers of amino acids. The amino acids are

Amino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group.

Advanced Medicinal & Pharmaceutical Chemistry CHEM 5412 Dept. of Chemistry, TAMUK

Combinatorial Biochemistry and Phage Display

Biological Molecules

PROTEINS THE PEPTIDE BOND. The peptide bond, shown above enclosed in the blue curves, generates the basic structural unit for proteins.

Disulfide Bonds at the Hair Salon

4. Which carbohydrate would you find as part of a molecule of RNA? a. Galactose b. Deoxyribose c. Ribose d. Glucose

Role of Hydrogen Bonding on Protein Secondary Structure Introduction

How To Understand The Chemistry Of Organic Molecules

18.2 Protein Structure and Function: An Overview

Myoglobin and Hemoglobin

Chapter 5: The Structure and Function of Large Biological Molecules

Lecture Overview. Hydrogen Bonds. Special Properties of Water Molecules. Universal Solvent. ph Scale Illustrated. special properties of water

Peptide Bonds: Structure

Recap. Lecture 2. Protein conformation. Proteins. 8 types of protein function 10/21/10. Proteins.. > 50% dry weight of a cell

A disaccharide is formed when a dehydration reaction joins two monosaccharides. This covalent bond is called a glycosidic linkage.

Part A: Amino Acids and Peptides (Is the peptide IAG the same as the peptide GAI?)

Proteins. Proteins. Amino Acids. Most diverse and most important molecule in. Functions: Functions (cont d)

Hydrogen Bonds The electrostatic nature of hydrogen bonds

Peptide Bond Amino acids are linked together by peptide bonds to form polypepetide chain.

The Molecules of Cells

Amino Acids and Proteins

Paper: 6 Chemistry University I Chemistry: Models Page: 2 of Which of the following weak acids would make the best buffer at ph = 5.0?

Carbohydrates, proteins and lipids

Molecular Genetics. RNA, Transcription, & Protein Synthesis

Peptide bonds: resonance structure. Properties of proteins: Peptide bonds and side chains. Dihedral angles. Peptide bond. Protein physics, Lecture 5

Disaccharides consist of two monosaccharide monomers covalently linked by a glycosidic bond. They function in sugar transport.

Chapter 3. Protein Structure and Function

Chapter 5. The Structure and Function of Macromolecule s

Helices From Readily in Biological Structures

Proteins and Nucleic Acids

Chapter 3 Molecules of Cells

Basic Concepts of DNA, Proteins, Genes and Genomes

RNA & Protein Synthesis

Chapter 3: Biological Molecules. 1. Carbohydrates 2. Lipids 3. Proteins 4. Nucleic Acids

Shu-Ping Lin, Ph.D.

Protein Physics. A. V. Finkelstein & O. B. Ptitsyn LECTURE 1

Introduction to Proteins and Enzymes

NO CALCULATORS OR CELL PHONES ALLOWED

Lecture Conformation of proteins Conformation of a protein three-dimensional structure native state. native condition

8/20/2012 H C OH H R. Proteins

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

Elements in Biological Molecules

MCAT Organic Chemistry - Problem Drill 23: Amino Acids, Peptides and Proteins

Structure of proteins

PHYSIOLOGY AND MAINTENANCE Vol. II - On The Determination of Enzyme Structure, Function, and Mechanism - Glumoff T.

IV. -Amino Acids: carboxyl and amino groups bonded to -Carbon. V. Polypeptides and Proteins

Introduction to Protein Folding

Bioinformatics for Biologists. Protein Structure

Structure and Function of DNA

Previously published in Biophysical Society On-line Textbook PROTEINS CHAPTER 1. PROTEIN STRUCTURE. Section 1. Primary structure, secondary motifs,

Biochemistry of Cells

Section I Using Jmol as a Computer Visualization Tool

FTIR Analysis of Protein Structure

Worksheet Chapter 13: Human biochemistry glossary

A. A peptide with 12 amino acids has the following amino acid composition: 2 Met, 1 Tyr, 1 Trp, 2 Glu, 1 Lys, 1 Arg, 1 Thr, 1 Asn, 1 Ile, 1 Cys

Translation Study Guide

Structure Determination

Lab 3 Organic Molecules of Biological Importance

Biological molecules:

Acidic amino acids: Those whose side chains can carry a negative charge at certain ph values. Typically aspartic acid, glutamic acid.

Chapter 11: Molecular Structure of DNA and RNA

In addition to being shorter than a single bond, the double bonds in ethylene don t twist the way single bonds do. In other words, the other atoms

Name: Hour: Elements & Macromolecules in Organisms

Structure and properties of proteins. Vladimíra Kvasnicová

Ionization of amino acids

LECTURE-2. Basics of Amino acids and Proteins HANDOUT. Proteins are the most complex and versatile macromolecules comprised of amino acids

1 Peptide bond rotation

The Steps. 1. Transcription. 2. Transferal. 3. Translation

Pipe Cleaner Proteins. Essential question: How does the structure of proteins relate to their function in the cell?


Linear Sequence Analysis. 3-D Structure Analysis

Recognizing Organic Molecules: Carbohydrates, Lipids and Proteins

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

Translation. Translation: Assembly of polypeptides on a ribosome

Problem Set 1 KEY

Carbon-organic Compounds

INTRODUCTION TO PROTEIN STRUCTURE

Student name ID # 2. (4 pts) What is the terminal electron acceptor in respiration? In photosynthesis? O2, NADP+

Protein 3D-structure analysis. why and how

Ms. Campbell Protein Synthesis Practice Questions Regents L.E.

PRACTICE TEST QUESTIONS

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

Chapter 12 - Proteins

Proteins the primary biological macromolecules of living organisms

Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert Karakaş, Phoebe L. Stewart, and Jens Meiler

Kaustubha Qanungo Ph.D Biological Sciences Trident Technical College 7000 Rivers Avenue Charleston SC 29464

Transcription and Translation of DNA

Provincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme.

Structural Bioinformatics (C3210) Experimental Methods for Macromolecular Structure Determination

Molecular Docking. - Computational prediction of the structure of receptor-ligand complexes. Receptor: Protein Ligand: Protein or Small Molecule

AP BIOLOGY 2008 SCORING GUIDELINES

LESSON 5. Learning to Use Cn3D: A Bioinformatics Tool. Introduction. Learning Objectives. Key Concepts

Transcription:

CPS260/BGT204.1 Algorithms in Computational Biology November 04, 2003 Lecture 19: Proteins, Primary Struture Lecturer: Pankaj K. Agarwal Scribe: Qiuhua Liu 19.1 The Building Blocks of Protein [1] Proteins are polypeptide chains obtained by translation from strands of messenger RNA. The functional properties of proteins depend on their three-dimensional structures. The three-dimensional structure arises because particular sequences of amino acids in polypeptide chains fold to generate, from linear chains, compact domains with specific three-dimensional structures(figure 19.1). Totally there are 20 different amino acids. The amino acid sequence of a protein s polypeptide is called its primary structure. Different regions of the sequence form local regular secondary structures, such as alpha (α) helices or beta (β)strands. The tertiary structure is formed by packing such structural elements into one or several compact globular units called domains. The final protein may contain several polypeptide chains arranged in a quaternary struture. By formation of such tertiary and quaternary structure amino acids far apart in the sequence are brought close together in three dimensions to form a functional region. Figure 19.1: The protein structures [1] To understand the biological function of proteins we would like to be able to deduce or predict the threedimensional structure from the amino acid sequence. However, this folding problem is still unsolved and remains one of the most basic intellectual challenges in molecular biology. Instead, the three-dimensional structures of individual proteins are determined experimentally by x-ray crystallography, electron crystallography or neclear magnetic resonance (NMR) techniques. 19-1

Lecture 19: November 04, 2003 19-2 19.1.1 Proteins are polypeptide chains All of the 20 amino acids have in common a central carbon atom (C α ) to which are attached a hygrogen atom, an amino group (NH 2 ), and a carboxyl group (COOH) (Figure 19.2a). What distinguishes one amino acid from another is the side chain attached to the C α through its fourth valence. Amino acids are joined end-to-end during protein synthesis by the formation of peptide bonds which takes place when the carboxyl group of one amino acid condenses with the amino group of the next to eliminate water (Figure 19.2b). This process gets repeated as the chain elongates. The resulting repeating sequence of nitrogen, α-carbon and carbon atoms is the backbone or main chain of the protein. Amino acids that are linked into the polypeptide are referred to as residues. Figure 19.2: Proteins are built up by amino acids that are linked by peptide bonds. (a) Schematic diagram of an amino acids. (b) The amino acid residues are joined together by a peptide bond [2]. The four neighbours of an α-carbon, C α, are at the vertex positions of a tetrahedron around C α. This tetrahedron has two orientations, one being the mirror image of the other, as illustrated in Figure 19.3. The two oriented forms are referred to as isomers and distinguished by letters L and D. Only L-amino acids occur in nature as building blocks of proteins.

Lecture 19: November 04, 2003 19-3 Figure 19.3: The two isomers of an amino acids [3]. 19.1.2 The genetic code specifies 20 different amino acids As mentioned earlier, there are 20 different amino acid side chains in all, specified by the genetic code. The sequence of nucleotides is read in groups of three, called codons. Totally we have 4 3 = 64 codons (Figure 19.4). The 20 amino acids are usually divided into three different groups defined by the chemical nature of the side chain (Figure 19.5): hydrophobic, hydrophilic and in-between. Their names are abbreviated with a three-letter code. 19.1.3 Ramachandran Plot Since the peptide units are effectively rigid groups that are linked into a chain by covalent bonds at the C α atoms, the only degrees of freedom they have are rotations around these bonds. Each unit can rotate around two such bonds: the C α - C and the N-C α bonds. By convention, the angle of rotation around the N-C α bond is called phi (φ) and the angle around the C α -C bond from the same C α atoms is called psi (ψ). In this way, the conformation of the whole main chain is completely determined when the φ and ψ angles for each amino acids are defined. Most combinations of φ and ψ angles of an amino acids are not allowed because of steric collisions between the side chains and main chain. The angle pairs φ and ψ are usually plotted against each other in a diagram called a Ramachandran plot after the indian biophysicist G.N.Ramachandran who first made calculations of sterically allowed regions. Figure 19.6 shows the results of such calculation for all amino acids except glycline from a number of accurately determined protein structures. The major allowed regions in Figure 19.6 are the right-handed α -helical cluster (Figure 19.7) in the lower left quadrant; the broad region of extended β strands (Figure 19.7) of both parallel and antiparallel β structures in the upper left quadrant; and the small, sparsely popluated left-handed α-helical region in the upper right quadrant.

Lecture 19: November 04, 2003 19-4 Figure 19.4: The genetic code [2]. 19.1.4 Protein Structure Classification-CATH [5] The CATH is a protein structure database which curretly contains more than 1200 evolutionary superfamilies, constructed by both automatic and manual evaluation of structure relationships. The first level in the CATH hierachy describes the protein (C)lass; that is whether the strcture comprises mainly α-helices, mainly β-strands or a mixture of both. At the next (A)rchitectural level, proteins are grouped according to the orienetations of their secondary structures in 3-D. A large portion of structures adopt very simple layered architectures such as sandwiches (e.g. two or three-layer α-β proteins) or barrel-like arrangements (Figure 19.8). The (T)opology level or fold group then discriminates according to differences in the connectivities between the secondary structures in these architectures. 19.2 Representation of the Backbones for Proteins As mentioned in the first section, the sequence of C α atoms of a protein is called its backbone. The representation of the backbones are listed below. Coordinates of C α atoms If a protein has n amino acids, it will need 3n real numbers (x 1, y 1, z 1, x 2, y 2, z 2,..., x n, y n, z n ) to

Lecture 19: November 04, 2003 19-5 represent its backbone. The problem is that usually the proteins contain thousands of amino acids. Therefore, we need to compress the presentation. Coordinates of the first C α atom and (φ, ψ) angles for the rest If a protein has n amino acids, this representation will need 2n + 1 real numbers (x 1, y 1, z 1, φ 2, ψ 2,..., φ n, ψ n ). This method still needs a lot of space. HP model In the HP model [6], a protein is modelled as a sequence of hydrophobic (H) and hydrophilic (P) monomers. The sequence is grown into a two-dimensional lattice using a self-avoiding walk, and the resulting conformation is calculated by summing interactions between pairs of monomers that occupy adjacent lattice sites but are not covalently bounded (Figure 19.9). Each grid point has 4 neighbours, and for each of the C α atoms, 2 of its neighbours are occupied by its adjacent C α atoms. 19.3 Protein Structure Alignment 19.3.1 Structure similarity measure Many protein structure alignment algorithms use geometry for comparison purposes, but ignore the similarities in the enviroment of the residues. To account for the structure similarity, two different root mean square (RMS) values have been proposed, crms and drms. Given an alignment of proteins A and B (Figure 19.10), where the dashed lines represent the alignment between the two corresponding C α atoms i r and j r in A and B, r = 1,...k), the crms is defined as the norm of the distance vector of the alignment: crms = 1 k d k 2 (A(i r ) B(j r )), (19.1) r=1 where k is the number of atoms that are aligned, A(i r ) and B(j r ) are the transformation (including translation and rotation) of the atoms indexed i r and j r and d(.,.) represents the Euclidean distance. The drms measures the difference between the respecitve distance matrices of the alignment: 1 drms = (k d(a(i r ) A(i s )) d(b(j r ) B(j s )) 2) 2, (19.2) 1 r<s k 19.3.2 Structure alignment algorithm The alignment algorithm using crms and drms including two steps: Given an alignment, define the score of the alignment by crms and drms as, σ(a, B) = 1 1 + crms + G (19.3)

Lecture 19: November 04, 2003 19-6 and where G represents the gap penalty. σ(a, B) = Find an alignment that maximizes the score: -Translate and Rotate B 1 1 + drms + G (19.4) -For a fixed embedding of B, compute an alignment that maximize the score The above two steps can be done by dynamic programming. For drms, the score is not affacted by translation and rotation, therefore, in the second step we only need to find an alignment that maximizes the score (Equation 19.3); For crms, in the second step we need to find the transformation (translation and rotation), such that an aligment minimizes the score, which is done by the EM alogrithm: Fix an initial embedding of B; Find an alignment µ that maximizes the score (Equation 19.4); Find the translation and rotation for µ that maximize the score (Equation 19.4). Like all the EM algorithms, this iteration method can get trapped in a local optima. To reduce the probability of this, one usually tries multiple initial embeddings of B or appplies the simulated annealing method. 19.3.3 Programs of structure alignment Many research groups in structure alignment have generously made their programs available for use over the Internet and the World Wide Web. Here, we give four popularly used alignment algorithms and their websites: DALI: http://www2.ebi.ac.uk/dali STRUCTAL: http://bioinfo.mbb.yale.edu/align/server.cgi. LOCK: http://gene.stanford.edu/lock/. VAST: http://www.ncbi.nlm.nih.gov/structure/vast/vastsearch.html. Among them, DALI uses drms as the similarity measure and both STRUCTAL and LOCK use crms as the similarity measure. VAST is based on graph heuristic appoarch and are different from the other three. For more information about those programs, please refer to their websites.

Lecture 19: November 04, 2003 19-7 References [1] C. Branden and J. Tooze, Introduction to Protein Structure, Second Edition, Chapter 1,1999. [2] P.K. Agarwal, class powerpoint notes: Protein.pdf. [3] http://www.cs.duke.edu/education/courses/fall02/cps296.1/ [4] http://www.expasy.org/swissmod/course/text/chapter1.htm. [5] C.A. Orengo and et al., The CATH protein family databases: A resource for structural and functional anotation of genomes, Proteomics 2002, 2:11-21. [6] P. Keohl, Protein structure similarities, Current Opinion in Structure Biology 2001, 11:348-353. [7] A. R. Leach, Molecular Modelling, Principles and Applications, Second Edition, Chapter 10, 2001.

Lecture 19: November 04, 2003 19-8 Figure 19.5: The 20 different amino acids [2].

Lecture 19: November 04, 2003 19-9 Figure 19.6: Ramachandran plot[4].

Lecture 19: November 04, 2003 19-10 Figure 19.7: α-helix and β-strand [2].

Lecture 19: November 04, 2003 19-11 Figure 19.8: database [5]. Schematic representation of the (C)lass, (A)rchitecture and (T)oplogy levels in the CATH

Lecture 19: November 04, 2003 19-12 Figure 19.9: The HP model for Backbond Representation [6] Figure 19.10: Protein structure alignment