Biomolekulare Strukturmodellierung DKFZ, Abteilung Molekulare Biophysik Michaela Knapp-Mohammady
Biomolekulare Strukturmodellierung I) Structure of proteins, basics - Primary structure - Secondary structure - Tertiary structure II) Protein modelling, tools and techniques - Primary structure analysis - Secondary structure prediction - Tertiary structure analysis and modelling - Protein simulation
! "#!$ "%& "$ '() *& &! '++!,
Nachfolgend das vollständige Gen in komplementärer Sequenz: GGATCCTGCC AGAGCCTCCT CCCACCTGGA GGGGTCCCAG CGTCCACCTT CCCTGCCCCA 60 GCCCCCCTCC TCGAGGTACT GGGAGGCTGG ATAAAGTCTT CGGCTGGGCC ACACCCCACC 120 CCAAATTCTC CCTGTCCCAC CCTAGTGCCC AGGCCACCCC GGCCTGCTCC CTTCCGCAAG 180 GCACCTCACC TTCTGTGCCC AGACCATTAG CCAACGCGGT GACCTTGACC CCGGCCCAGG 240 CCCTGCTAAT GAAGAGGAAA GCCCGTACGC ACTCGGCCTG ACCCACGGCG ACCCTCTGTG 300 ACCAATCATA CTACCAACCT CTTAAACAGA GCTCCACCGA CGCAATGCCC AGGCATAAAA 360 AGGCCAGGCC GAGAGACCGC CACCAGTCAC GGACCCTGGA CCCAGCGCAC CCGCACCATG 420 GCCGGCCCCA GCCTCGCTTG CTGTCTGCTC GGCCTCCTGG CGCTGACCTC CGCCTGCTAC 480 ATCCAGAACT GCCCCCTGGG AGGCAAGAGG GCCGCGCCGG ACCTCGACGT GCGCAAGGTG 540 AGTCCCCAGC CCTGGTCCCG CGGCGCTCCG GGGAGGGAGG GACCCGCAGC CACAGGGGCG 600 CGCCCCGCTC CGGCCTCGCC TGAGAACTCC AGGAGCTGAG CGGATTTTGA CGCCCCGCCC 660 TTGACCGCGG TCGAGGCCCC CACGGCGCCC CAGCGTCTCA GCCCCGCTGT CCCCGCCCGA 720 ACTCCGAACC CCGGACCCCA GCATCCTTGC CCGGCGCACC CCGGCCGGCC TCGCAGGGTC 780 CTCCGAGCGA GTCCCCAGCG CCGCCCCGCG TCCCGCTCAC CCCGCCCGTC CCCCGAGTGC 840 CTCCCCTGCG GCCCCGGGGG CAAAGGCCGC TGCTTCGGGC CCAATATCTG CTGCGCGGAA 900 GAGCTGGGCT GCTTCGTGGG CACCGCCGAA GCGCTGCGCT GCCAGGAGGA GAACTACCTG 960 CCGTCGCCCT GCCAGTCCGG CCAGAAGGCG TGCGGGAGCG GGGGCCGCTG CGCCTTGGGC 1020 CTCTGCTGCA GCCCGGGTGA GCGGGGCAAG GCGCTCCGGG GCCAGGGGGA GGCGGGCGGG 1080 GGTGCGGCCG GGATTCCCCT GACTCCACCT CTTCCTCCAG ACGGCTGCCA CGCCGACCCT 1140 GCCTGCGACG CGGAAGCCAC CTTCTCCCAG CGCTGAAACT TGATGGCTCC GAACACCCTC 1200 GAAGCGCGCC ACTCGCTTCC CCCATAGCCA CCCCAGAAAT GGTGAAAATA AAATAAAGCA 1260 GGTTTTTCTC CTCTACCTTG ACTCGTGTCT AAGTGCCAGA AATGGGACGG GGAGGGGGCA 1320 TTGTGGGACT GGAAGATC 1338
Die 20 Aminosäuren unterscheiden sich nur in ihren Seitenketten (funktionelle Gruppen)
different amino acids Amino acids have different biochemical and physical properties that influence their relative replaceability in evolution. aliphatic I L C S+S V A G T P G C SH S D N tiny small hydrophobic aromatic M F Y W H K E Q R charged positive polar
Unter Abgabe eines Wassermoleküls vereinigen sich die Aminosäuren zu einem Dipeptid. Es entsteht eine sogenannte Peptidbindung zwischen einem C- und einem N-Atom.
Hier sieht man die Peptidbindung in Großaufnahme (blau = Stickstoff, rot = Sauerstoff, schwarz = Kohlenstoff, grau = Wasserstoff, grün = Rest). Die dunkelrot gefärbten Bindungen liegen in einer Ebene und sind recht starr. Ursache hierfür ist die C=O-Doppelbindung. An den anderen Stellen des Peptids herrscht dagegen freie Beweglichkeit. Tripeptide bilden sich, wenn drei Aminosäuren (oder ein Dipeptid und eine Aminosäure) miteinander unter Wasserabspaltung reagieren (man nennt einen solchen Vorgang, bei dem Wasser abgegeben wird, auch Kondensation). Allgemein bezeichnet man Peptide, die aus wenigen Aminosäuren bestehen, als Oligopeptide. Das Gegenteil sind dann die Polypeptide, die aus vielen Aminosäuren bestehen. Peptide, die aus mehr als 100 Aminosäuren zusammengesetzt sind, bezeichnet man dann als Proteine.
Secondary structure - alpha-helix Properties of the α-helix. The structure repeats itself every 5.4 Å along the helix axis, i.e. we say that the α-helix has a pitch of 5.4 Å. α-helices have 3.6 amino acid residues per turn, i.e. a helix 36 amino acids long would form 10 turns.
Helix-Stukturen
Secondary Structure - ß-Sheet The ß-sheet structure In a ß-sheet two or more polypeptide chains run alongside each other and are linked in a regular manner by hydrogen bonds between the main chain C=O and N-H groups. Therefore all hydrogen bonds in a ß-sheet are between different segments of polypeptide. This contrasts with the α-helix where all hydrogen bonds involve the same element of secondary structure.
Secondary Structure - ß-Sheet
Secondary structure Reverse turns A reverse turn is region of the polypeptide having a hydrogen bond from one main chain carbonyl oxygen to the main chain N-H group 3 residues along the chain (i.e. Oi to Ni+3). Helical regions are excluded from this definition and turns between ß-strands form a special class of turn known as the ß-hairpin.
Tertiary structure Tertiary structure describes the packing of alpha-helices, beta-sheets and random coils with respect to each other on the level of one whole polypeptide chain. Figure shows the tertiary structure of Chain B of Protein Kinase C Interacting Protein
Quarternary structure Quaternary structure only exists, if there is more than one polypeptide chain present in a complex protein. Then quaternary structure describes the spatial organization of the chains. The figure shows the Protein Kinase C interacting protein.
Zusammenfassung von I) The wide variety of 3-dimensional protein structures corresponds to the diversity of functions proteins fulfill. Proteins fold in three dimensions. Protein structure is organized hierarchically from so-called primary structure to quaternary structure. Higher-level structures are motifs and domains. The primary structure is the sequence of residues in the polypedptide chain.
II Aufgaben der Bioinformatik
How can protein structures be predicted Structure prediction methods are coarsely divided into three categories: 1. Comparative modelling If the sequence to model has a homologue in the PDB (Brookhaven protein database) which it is very similar to, the homologue may be used as target and a structural model is built on the basis of this template. 2. Fold recognition In absence of a significantly similar sequence with known structure, various methods put together in the term "Fold Recognition". 3. Ab initio prediction In contrast to the above methods, the goal of ab initio prediction is to build a model for a given sequence without using a template e.g by minimizing knowledge based energy functions (Potential energy for any protein conformation - Potential energy function (PEF) Secondary Structure Prediction
1. Protein structure database - PDB Experimental methods given by X-ray crystallography and NMR spectroscopy to determine protein structure are essential. The Brookhaven Protein Data Bank (PDB) is the repository for those structures. Files include atom coordinates and are suited for visualization by graphical molecule viewers like rasmol. Atom coordinates Sequences (NRL3D)
How are the secondary structures detected in a PDB file The figure below shows the three main chain torsion angles of a polypeptide. These are phi (F), psi (Y), and omega (W). beta alpha omega fixed because of planar peptide bond.
Sequence Analysis on the Web
2.
Sequence Databases SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, posttranslational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. TrEMBL is a computer-annotated supplement of SWISS-PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT. These databases are developed by the SWISS-PROT groups at SIB and at EBI. SwissProt:Release 40 and updates up to 15-Nov-2001: 102164 entries TrEMBL (Nov. 2001): 557388 entries
Homology modelling Quick and easy!!!! Use the SWISS-MODEL server: HTTP://www.expasy.ch/swissmod/SWISS-MODEL.html SWISS-MODEL is an Automated Protein Modelling Server running at the GlaxoWellcome Experimental Research in Geneva, Switzerland. Disclaimer The result of any modelling procedure is NON- EXPERIMENTAL and MUST be considered with care. This is especially true since there is no human intervention during model building. New 3D modeling Server Geno3d: HTTP://geno3d-pbil.ibcp.fr/
TASK DESIGN DomainSweep compares a protein sequence with a range of protein family databases. The output of DomainSweep is comprised of an overview of the different database search results as well as a graphical report on the location of family patterns found in the sequence. PROBLEM Determine function for an uncharacterised protein sequence
Protein Domain Databases Evaluation Protein Analysis Each database has different strengths and weaknesses PFAM, PRODOM: Identification of members of highly divergent superfamilies but less likely to give specific sub-family diagnoses and quality is low PRINTS, BLOCKS: give specific sub-family diagnoses but less coverage Pattern part of PROSITE: good detection of very short motifs but least coverage and unreliable in the identification of highly divergent superfamilies
all alpha Fold classes all beta alpha+beta
Fold class prediction - FoldClass FoldClass (HUSAR) predicts protein fold classes and protein domains from sequence data. The predictions are generated by artificial neural networks (Reczko, M. and Bohr, H. Nucl. Ac. Res. 22: 3616-3619 (1994)). This program predicts: a specific overall fold-class, a super fold-class with respect to secondary structure content and spatial distribution optionally, a profile of possible fold-classes along the sequence.
Fold class prediction - (Gen)Threader Algorithm: A library of unique protein domain folds is derived from PDB Testsequence is optimally fitted to all folds (allowing insertions/deletions) Energy of each possible fit is calculated by summing interactions and solvations parameters The lowest energy fold is taken Unlike most threading methods, such as the original THREADER, GenTHREADER attempts to make inferences about possible evolutionary relationships.
Number of analysis programs is huge. Which one should be used for what purpose? It is difficult to feed results from one program as input into the next program Users need compact presentable reports on analysis results
3.
Energy Minimisation - Start Calculate potentiell energy for a given molecule (atom coordinates): set of nuclear positions of all atoms = R
Energy Minimisation - Method We move the molecule so as to reduce its potential energy. There are several routines to do this: - Steepest Descent - Gradient conjugation - and more Unfortunately no technique can guarantee to find the global energy minimum of a complex problem (although simulated annealing is partial solution).
Modelling Programs WHATIF INSIGHTII GAUSSIAN SCC-DFTB.. GROMOS DISCOVER..
Model Viewer: Rasmol Kinemage Molden Gaussview Sybyl MSViewer Insight WebLab Swiss... SWISS-3DIMAGE (References) is an image database which strives to provide high quality pictures of biological macromolecules with known three-dimensional structure. The database contains mostly images of experimentally elucidated structures, but also provides views of well accepted theoretical protein models. The images are provided in several useful formats; both mono and stereo pictures are generally available (Disclaimer).
Molecule Simulation - Molecular Dynamics - The starting place for most simulations is the experimental crystal or NMR structure. - This is energy minimized, solvated in a box of water. - System is heated (high energy state) - Equilibration and simulation for 1 nano seconds, only short times are possible The detailed atomic motions are usually unimportant. What really matters are "the ensemble average" properties - i.e., what happens on average (MD is in fact chaotic with sensitive dependence on initial conditions - like the weather!).
Molecular Dynamics Proteins are not the static structures that X-ray crystallography can suggest, but are continuously moving. This is a short simulation of crambin, calculated using the AMBER force field.
DNA is not static either. This simulation was calculated using AMBER and a continuum model for water.
MD-Simulation