Bioinformatics for Biologists. Protein Structure



Similar documents
Structure Tools and Visualization

Lecture 19: Proteins, Primary Struture

Linear Sequence Analysis. 3-D Structure Analysis

CSC 2427: Algorithms for Molecular Biology Spring Lecture 16 March 10

Hydrogen Bonds The electrostatic nature of hydrogen bonds

Consensus alignment server for reliable comparative modeling with distant templates

Amino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group.

Advanced Medicinal & Pharmaceutical Chemistry CHEM 5412 Dept. of Chemistry, TAMUK

Disulfide Bonds at the Hair Salon

Built from 20 kinds of amino acids

Recap. Lecture 2. Protein conformation. Proteins. 8 types of protein function 10/21/10. Proteins.. > 50% dry weight of a cell

The peptide bond is rigid and planar

Protein annotation and modelling servers at University College London

Helices From Readily in Biological Structures

Peptide Bonds: Structure

Guide for Bioinformatics Project Module 3

Computational Systems Biology. Lecture 2: Enzymes

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Peptide bonds: resonance structure. Properties of proteins: Peptide bonds and side chains. Dihedral angles. Peptide bond. Protein physics, Lecture 5

Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert Karakaş, Phoebe L. Stewart, and Jens Meiler

This class deals with the fundamental structural features of proteins, which one can understand from the structure of amino acids, and how they are

PROTEINS THE PEPTIDE BOND. The peptide bond, shown above enclosed in the blue curves, generates the basic structural unit for proteins.

18.2 Protein Structure and Function: An Overview

Structure Determination

Section I Using Jmol as a Computer Visualization Tool

Pairwise Sequence Alignment

Protein 3D-structure analysis. why and how

Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks

Pipe Cleaner Proteins. Essential question: How does the structure of proteins relate to their function in the cell?

Bioinformatics Grid - Enabled Tools For Biologists.

3D structure visualization and high quality imaging. Chimera

(c) How would your answers to problem (a) change if the molecular weight of the protein was 100,000 Dalton?

INTRODUCTION TO PROTEIN STRUCTURE

The peptide bond Peptides and proteins are linear polymers of amino acids. The amino acids are

Functional Architecture of RNA Polymerase I

Basic Concepts of DNA, Proteins, Genes and Genomes

Introduction to Bioinformatics AS Laboratory Assignment 6

Role of Hydrogen Bonding on Protein Secondary Structure Introduction

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

Gold (Genetic Optimization for Ligand Docking) G. Jones et al. 1996

Biological Databases and Protein Sequence Analysis

Molecular Visualization. Introduction

Nafith Abu Tarboush DDS, MSc, PhD

IV. -Amino Acids: carboxyl and amino groups bonded to -Carbon. V. Polypeptides and Proteins

T cell Epitope Prediction

NO CALCULATORS OR CELL PHONES ALLOWED

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Myoglobin and Hemoglobin

Refinement of a pdb-structure and Convert

K'NEX DNA Models. Developed by Dr. Gary Benson Department of Biomathematical Sciences Mount Sinai School of Medicine

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Part A: Amino Acids and Peptides (Is the peptide IAG the same as the peptide GAI?)

Chapter 3. Protein Structure and Function

Overview'of'Solid-Phase'Peptide'Synthesis'(SPPS)'and'Secondary'Structure'Determination'by'FTIR'

Protein Physics. A. V. Finkelstein & O. B. Ptitsyn LECTURE 1

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet

Paper: 6 Chemistry University I Chemistry: Models Page: 2 of Which of the following weak acids would make the best buffer at ph = 5.0?

CD-HIT User s Guide. Last updated: April 5,

Protein Sequence Analysis - Overview -

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Combinatorial Biochemistry and Phage Display

BIOINFORMATICS TUTORIAL

Introduction to Principal Components and FactorAnalysis

Module 1. Sequence Formats and Retrieval. Charles Steward

TITLE PAGE - CURRENT PROTOCOLS IN BIOINFORMATICS

EMBL-EBI. EM map display & Visualization

Protein Protein Interaction Networks

A successful market segmentation initiative answers the following critical business questions: * How can we a. Customer Status.

Protein Studies Using CAChe

Clone Manager. Getting Started

Final Project Report

FTIR Analysis of Protein Structure

AISMIG an interactive server-side molecule image generator

Integrated design of antibodies for systems biology using AbDesigner

LESSON 5. Learning to Use Cn3D: A Bioinformatics Tool. Introduction. Learning Objectives. Key Concepts

Peptide Bond Amino acids are linked together by peptide bonds to form polypepetide chain.

UGENE Quick Start Guide

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Proteins. Proteins. Amino Acids. Most diverse and most important molecule in. Functions: Functions (cont d)

How To Cluster

NMR Nuclear Magnetic Resonance

Student name ID # 2. (4 pts) What is the terminal electron acceptor in respiration? In photosynthesis? O2, NADP+

Structure of proteins

Lectures 2 & 3. If the base pair is imbedded in a helix, then there are several more angular attributes of the base pair that we must consider:

Multiobjective Robust Design Optimization of a docked ligand

Antibody responses to linear and conformational epitopes

Proteins and Nucleic Acids

Identification algorithms for hybrid systems

Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)

file:///c /Documents%20and%20Settings/terry/Desktop/DOCK%20website/terry/Old%20Versions/dock4.0_faq.txt

Transcription:

Bioinformatics for Biologists Comparative Protein Analysis: Part III. Protein Structure Prediction and Comparison Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research Protein Structure Why is protein structure information useful? WIBR Bioinformatics Course, Whitehead Institute, October 2003 2 1

Predicting Important AAs S265 ATP Binding M244 E236 L266D276 E275 DomainE258 Y257 V270 F311 F317 A269 L301 L284 E255 T315 L248 K271 E281 E316 M278 E292 E286 Y253 E279 Q252 E282 L384 Q346 M351 V339 Y342 Binding Site E494F486E352 H396 Y440M472 Activation Domain L451 E450 WIBR Bioinformatics Course, Whitehead Institute, October 2003 3 Surface Mapping WIBR Bioinformatics Course, Whitehead Institute, October 2003 4 2

Protein Interfaces WIBR Bioinformatics Course, Whitehead Institute, October 2003 5 Property Comparisons WIBR Bioinformatics Course, Whitehead Institute, October 2003 6 3

Syllabus Protein Structure Classification Structure Coordinate Files & Databases Comparing Protein Structures Aligning 3D Structures Predicting Protein Structure Specialized Structural Regions Secondary Structure Prediction Tertiary Structure Prediction Threading Modeling Structure Visualization WIBR Bioinformatics Course, Whitehead Institute, October 2003 7 Structure Classification Proteins can adopt only a limited number of possible 3D conformations Combinations of a helices, b sheets, loops, and coils Completely different sequences can fold into similar shapes Protein Structure Classes Class a: bundles of a helices Class b: anti-parallel b sheets (sandwiches and barrels) Class a / b: parallel b sheets with intervening a helices Class a + b: segregated a helices and anti-parallel b sheets Multi-domain Membrane/Cell surface proteins WIBR Bioinformatics Course, Whitehead Institute, October 2003 8 *http://info.bio.cmu.edu/courses/03231/protstruc/protstruc2.htm * 4

Structure Families Divide structures into the limited number of possible structure families Homologous proteins can be identified by examining their respective structures for conserved fold patterns Representative members can be used for modeling sequences of unknown structure WIBR Bioinformatics Course, Whitehead Institute, October 2003 9 Structure Family Databases SCOP: Structural Classification Of Proteins based on a definition of structural similarities. Hierarchical levels to reflect evolutionary and structural relationships http://scop.mrc-lmb.cam.ac.uk/scop CATH: Classification by Class, Architecture, Topology, and Homology classified first into hierarchical levels like SCOP http://www.biochem.ucl.ac.uk/bsm/cath/ FSSP: Fold classification based on Structure-structure alignment of proteins based on structural alignment of all pair-wise combinations of proteins in PDB by DALI (used to id common folds and place into groups) http://www2.embl-ebi.ac.uk/dali/fssp/fssp.html MMDB Aligns 3D structures based on similar arrangements of secondary structural elements (VAST) http://www.ncbi.nlm.nih.gov/structure/mmdb/mmdb.shtml SARF categorized on the basis of structural similarity, categories are similar to other dbs http://123d.ncifcrf.gov/ WIBR Bioinformatics Course, Whitehead Institute, October 2003 10 5

Syllabus Protein Structure Classification Structure Coordinate Files & Databases Comparing Protein Structures Aligning 3D Structures Predicting Protein Structure Specialized Structural Regions Secondary Structure Prediction Tertiary Structure Prediction Threading Modeling Structure Visualization WIBR Bioinformatics Course, Whitehead Institute, October 2003 11 Coordinates Coordinate Data: location of a molecule s atoms in space (XYZ triple) XYZ triple is labeled with an atom, residue, chain Modified aa are labeled with X, H s not usually listed Atom Residue Chain X Y Z 54 ALA C 35.4-9.3 102.5 Data Representation Chemistry Rules Approach: connect the dots utilizing a standard rules base to specify bond distances (not consistent among applications) Explicit Bonding Approach: explicit bonding information is specified in the file (very consistent) WIBR Bioinformatics Course, Whitehead Institute, October 2003 12 6

Coordinate File Formats MMDB Molecular Modeling DataBank Format ASN.1 standard data description language (explicit bond information) mmcif Chemical Interchange Format (relational db format) PDB Protein DataBank Format Column oriented, flexible format (chemistry rules) Example PDB File tag Residue Atom# Atom type Chain Residue# X Y Z Structure scores ATOM 1432 N ALA A 259 15.711 12.486 46.370 1.00 28.54 ATOM 1433 CA ALA A 259 17.047 12.953 46.726 1.00 27.48 ATOM 1434 C ALA A 259 17.029 14.459 46.979 1.00 25.31 ATOM 1435 O ALA A 259 17.787 15.207 46.367 1.00 25.19 ATOM 1436 CB ALA A 259 18.035 12.617 45.610 1.00 25.32 ATOM 1437 N TRP A 260 16.149 14.897 47.875 1.00 23.61 ATOM 1438 CA TRP A 260 16.033 16.312 48.210 1.00 21.03 ATOM 1439 C TRP A 260 17.121 16.700 49.211 1.00 20.94 ATOM 1440 O TRP A 260 17.917 17.601 48.957 1.00 19.84 WIBR Bioinformatics Course, Whitehead Institute, October 2003 13 Coordinate Databases RCSB (Research Collaboratory for Structural Bioinformatics) http://www.rcsb.org/ Formally know as the Protein Data Bank at Brookhaven National Laboratories Structure Explorer PDB search engine Text and PDB ID (4 letter code) searching MMDB (Molecular Modeling Database @NCBI) Compilation of structures represented in multiple formats Provides structure summaries BLAST sequences to search for available structures WIBR Bioinformatics Course, Whitehead Institute, October 2003 14 7

Syllabus Protein Structure Classification Structure Coordinate Files & Databases Comparing Protein Structures Aligning 3D Structures Predicting Protein Structure Specialized Structural Regions Secondary Structure Prediction Tertiary Structure Prediction Threading Modeling Structure Visualization WIBR Bioinformatics Course, Whitehead Institute, October 2003 15 Sequence & Structure Homology Sequence Identify relationships between sets of linear protein sequences Structure Categorize related structures based on 3D shapes Structure families do not necessarily share sequence homology WIBR Bioinformatics Course, Whitehead Institute, October 2003 16 8

Structure Comparison Compare Structures that are: Identical Similarity/difference of independent structures, x-ray vs. nmr, apo vs. holo forms, wildtype vs. mutant Similar Predict function, evolutionary history, important domains Unrelated Identify commonalities between proteins with no apparent common overall structure - focus on active sites, ligand binding sites Superimpose Structures by 3D Alignment for Comparison WIBR Bioinformatics Course, Whitehead Institute, October 2003 17 Structural Alignment Structure alignment forms relationships in 3D space similarity can be redundant for multiple sequences Considerations Which atoms/regions between two structure will be compared Will the structures be compared as rigid or flexible bodies Compare all atoms including side chains or just the backbone/ca Try to maximize the number of atoms to align or focus on one localized region (biggest differences usually in solvent-exposed loop structures) How does the resolution of each structure affect comparison WIBR Bioinformatics Course, Whitehead Institute, October 2003 18 9

Translation and Rotation Alignment Translate center of mass to a common origin Rotate to find a suitable superposition Algorithms Identify equivalent pairs (3) of atoms between structures to seed alignment Iterate translation/rotation to maximize the number of matched atom pairs Examine all possible combinations of alignments and identify the optimal solution WIBR Bioinformatics Course, Whitehead Institute, October 2003 19 Alignment Methods Initially examine secondary structural elements and Ca-Cb distances to identify folds and the ability to align Gap penalties for structures that have discontinuous regions that do not align (alignment-gap-alignment) Anticipate that two different regions may align separately, but not in the same alignment Proceed with alignment method: Fast, Secondary Structure-Based Dynamic Programming Distance Matrix WIBR Bioinformatics Course, Whitehead Institute, October 2003 20 10

Fast Alignment by SS Secondary structure elements can be represented by a vector starting at the beginning of the element Position & length Compare the arrangement of clustered vectors between two structures to identify common folds Sometimes supplement vectors with information about the arrangement of the side chains (burial/exposure) Significance of alignment Likelihood that a cluster of secondary structural elements would be expected between unrelated structures WIBR Bioinformatics Course, Whitehead Institute, October 2003 21 VAST and SARF Implement automatic methods to assign secondary structure VAST http://www.ncbi.nlm.nih.gov:80/structure/vas T/vastsearch.html SARF http://123d.ncifcrf.gov/ WIBR Bioinformatics Course, Whitehead Institute, October 2003 22 11

Exhaustive Alignment Dynamic Programming Local environment defined in terms of Interatomic distances, bond angles, side chain identity, side chain burial/exposure Align structures by matching local environments - for example, draw vectors representing each Ca-Cb bond, superimpose vectors Distance Matrix Graphic procedure similar to a dot matrix alignment of two sequences to identify atoms that lie most closely together in a 3D structure (based on Ca distances) Similar structures have super-imposable graphs WIBR Bioinformatics Course, Whitehead Institute, October 2003 23 DALI Distance Alignment DALI - http://www2.embl-ebi.ac.uk/dali/ Aligns two structures Determines if a new structure is similar to one already in database (for classification) WIBR Bioinformatics Course, Whitehead Institute, October 2003 24 12

Alignment Quality Calculate deviation between two aligned structures RMSD (Root Mean Square Deviation) Goodness of fit between two sets of coordinates Best if < 3 Å Calculate Ca-Ca distances, sum square of distances, divide by the number of pairs, square root RMSD = N D i 2 /N WIBR Bioinformatics Course, Whitehead Institute, October 2003 25 Syllabus Protein Structure Classification Structure Coordinate Files & Databases Comparing Protein Structures Aligning 3D Structures Predicting Protein Structure Specialized Structural Regions Secondary Structure Prediction Tertiary Structure Prediction Threading Modeling Structure Visualization WIBR Bioinformatics Course, Whitehead Institute, October 2003 26 13

Predicting Specialized Structures Leucine Zippers Antiparallel a helices held together by interactions between L residues spaced at ever 7th position Coiled Coils 2 or three a helices coiled around each other in a left-handed supercoil Multicoil http://jura.wi.mit.edu/cgi-bin/multicoil/multicoil.pl COILS2 http://www.ch.embnet.org/software/coils_form.html Transmembrane Regions 20-30aa domains with strong hydrophobicity PHDhtm, PHDtopology, TMpred (TMbase) http://www.embl-heidelberg.de/predictprotein/predictprotein.html WIBR Bioinformatics Course, Whitehead Institute, October 2003 27 Predicting Secondary Structure Recognizing Potential Secondary Structure 50% of a sequence is usually alpha helices and beta sheet structures Helices: 3.6 residues/turn, N+4 bonding Strands: extended conformation, interactions between strands, disrupted by beta bulges Coils: A,G,S,T,P are predominant Sequences with >45% sequence identity should have similar structures Databases of sequences and accompanying secondary structures (DSSP) WIBR Bioinformatics Course, Whitehead Institute, October 2003 28 14

SS Prediction Algorithms Chou-Fasman/GOR Analyze the frequency of each of the 20 aa in every secondary structure (Chou, 1974) A,E,L,M prefer a helices; P,G break helices Use a 4-6aa examination window to predict probability of a helix, 3-5aa window for beta strands Extend regions by moving window along sequence 50-60% effective (Higgins, 2000) GOR method assumes that residues flanking the central window/core also influence secondary structure WIBR Bioinformatics Course, Whitehead Institute, October 2003 29 SS Prediction Algorithms Neural Networks Examine patterns in secondary structures by computationally learning to recognize combinations of aa that are prevalent within a particular secondary structure Program is trained to distinguish between patterns located in a secondary structure from those that are not usually located in it PHDsec (Profile network from HeiDelberg) ~ 70% correct predictions http://www.embl-heidelberg.de/predictprotein/submit_def.html WIBR Bioinformatics Course, Whitehead Institute, October 2003 30 15

SS Prediction Algorithms Nearest Neighbor Generate an iterated list of peptide fragments by sliding a fixed-size window along sequence Predict structure of aa in center of the window by examining its k neighbors (Yi, 1993) Propensity of center position to adopt a structure within the context of the neighbors Method relies on an initial training set to teach it how neighbors influence secondary structure NNSSP http://bioweb.pasteur.fr/seqanal/interfaces/nnssp-simple.html WIBR Bioinformatics Course, Whitehead Institute, October 2003 31 SS Prediction Tools NNpredict - 65 % effective*, outputs H,E,- http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html PredictProtein - query sequence examined against SWISS-PROT to find homologous sequences MSA of results given to PHD for prediction 72% effective* http://www.embl-heidelberg.de/predictprotein/submit_def.html Jpred - integrates multiple structure prediction applications and returns a consensus, 73% effective* http://www.compbio.dundee.ac.uk/~www-jpred/submit.html WIBR Bioinformatics Course, Whitehead Institute, October 2003 32 16

Tertiary Structure Prediction Goal Build a model to use for comparison with other structures, identify important residues/interactions, determine function Challenges Reveal interactions that occur between residues that are distant from each other in a linear sequence Slight changes in local structure can have large effects on global structure Methods Sequence Homology - use a homologous sequence as a template Threading - search for structures that have similar fold configurations without any obvious sequence similarity WIBR Bioinformatics Course, Whitehead Institute, October 2003 33 Threading - Approaches Sequence is compared for its compatibility (structural similarity) with existing structures Approaches to determine compatibility Environmental Template: environment of ea. aa in a structure is classified into one of 18 types, evaluate ea. position in query sequence for how well it fits into a particular type (Mount, 2001) Contact Potential Method: analyze the closeness of contacts between aa in the structure, determine whether positions within query sequence could produce similar interactions (find most energetically favorable) (Mount, 2001) WIBR Bioinformatics Course, Whitehead Institute, October 2003 34 17

Threading Process Sequence moved position-by-position through a structure Protein fold modeled by pair-wise inter-atomic calculations to align a sequence with the backbone of the template Comparisons between local and non-local atoms Compare position i with every other position j and determine whether interactions are feasible Optimize model with pseudo energy minimizations - most energetically stable alignment assumed to be most favorable Thread the smallest segment reasonable! Computationally intensive. 123D http://123d.ncifcrf.gov/123d+.html MYNPQGGYQQQFNPQGGRGNYKNFNYNNNLQGYQAGFQPQSQGMSLNDFQKQQKQAAPKPKKTLKLVSSSGIKLANATKK VGTKPAESDKKEEEKSAETKEPTKEPTKVEEPVKKEEKPVQTEEKTEEKSELPKVEDLKISESTHNTNNANVTSADALIK EQEEEVDDEVVNDMFGGKDHVSLIFMGHVDAGKSTMGGNLLYLTGSVDKRTIEKYEREAKDAGRQGWYLSWVMDTNKEER WIBR Bioinformatics Course, Whitehead Institute, October 2003 35 Model Building Perform automated model constructions SWISS-MODEL Compare sequence to ExPdb to find a template (homology) Define your own templates (from threading) http://www.expasy.ch/swissmod/swiss-model.html GENO3D PSI-BLAST to identify homologs possessing structures to be used as templates http://geno3d-pbil.ibcp.fr WIBR Bioinformatics Course, Whitehead Institute, October 2003 36 18

Model Evaluation Manually examine model and alignments Find similar structures through database searches DALI How does the model compare to other structures with the template family? Remember, it s only a MODEL (but even models can be useful) WIBR Bioinformatics Course, Whitehead Institute, October 2003 37 Structure Visualization Different representations of molecule wire, backbone, space-filling, ribbon NMR ensembles Models showing dynamic variation of molecules in solution VIEWERS RasMol (Chime is the Netscape plug-in) http://www.umass.edu/microbio/rasmol/index2.html Cn3D MMDB viewer (See in 3D) with explicit bonding http://www.ncbi.nlm.nih.gov/structure SwissPDB Viewer http://www.expasy.ch/spdbv/mainpage.html imol http://www.pirx.com/imol WIBR Bioinformatics Course, Whitehead Institute, October 2003 38 19

Pulling It All Together YFP What is it? Linear Sequence Homology 3D Structural Homology Blast Search Identify Homologs and Perform MSAs Domain Search Identify Functional Domains Specialized Structure Search Functional Domain Homology Threading/Model Building Structural Similarity Motif Search Locate Conserved Sequence Patterns WIBR Bioinformatics Course, Whitehead Institute, October 2003 39 Demonstration Thread sequence to identify template Web-based: 123D http://123d.ncifcrf.gov/123d+.html Model sequence with template http://www.expasy.ch/swissmod/swiss-model.html Visualization WIBR Bioinformatics Course, Whitehead Institute, October 2003 40 20

References Bioinformatics: Sequence and genome Analysis. David W. Mount. CSHL Press, 2001. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Andreas D. Baxevanis and B.F. Francis Ouellete. Wiley Interscience, 2001. Bioinformatics: Sequence, structure, and databanks. Des Higgins and Willie Taylor. Oxford University Press, 2000. Chou, P.Y. and Fasman, G. D. (1974). Biochemistry, 13, 211. Yi, T-M. and Lander, E.S.(1993) J. Mol. Biol., 232,1117. WIBR Bioinformatics Course, Whitehead Institute, October 2003 41 21