DBDB : a Disulfide Bridge DataBase for the predictive analysis of cysteine residues involved in disulfide bridges



Similar documents
The Ramachandran Map of More Than. 6,500 Perfect Polypeptide Chains

Advanced Medicinal & Pharmaceutical Chemistry CHEM 5412 Dept. of Chemistry, TAMUK

Peptide bonds: resonance structure. Properties of proteins: Peptide bonds and side chains. Dihedral angles. Peptide bond. Protein physics, Lecture 5

IV. -Amino Acids: carboxyl and amino groups bonded to -Carbon. V. Polypeptides and Proteins

Structure of proteins

Computational Systems Biology. Lecture 2: Enzymes

Recap. Lecture 2. Protein conformation. Proteins. 8 types of protein function 10/21/10. Proteins.. > 50% dry weight of a cell

Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data

Built from 20 kinds of amino acids

Data Mining Analysis of HIV-1 Protease Crystal Structures

Amino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group.

Ionization of amino acids

A. A peptide with 12 amino acids has the following amino acid composition: 2 Met, 1 Tyr, 1 Trp, 2 Glu, 1 Lys, 1 Arg, 1 Thr, 1 Asn, 1 Ile, 1 Cys

Part A: Amino Acids and Peptides (Is the peptide IAG the same as the peptide GAI?)

CSC 2427: Algorithms for Molecular Biology Spring Lecture 16 March 10

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Proteins. Proteins. Amino Acids. Most diverse and most important molecule in. Functions: Functions (cont d)

Myoglobin and Hemoglobin

Protein Physics. A. V. Finkelstein & O. B. Ptitsyn LECTURE 1

Pipe Cleaner Proteins. Essential question: How does the structure of proteins relate to their function in the cell?

Carbohydrates, proteins and lipids

Amino Acids, Peptides, Proteins

Hydrogen Bonds The electrostatic nature of hydrogen bonds

The peptide bond is rigid and planar

Peptide Bonds: Structure

Nafith Abu Tarboush DDS, MSc, PhD

Student name ID # 2. (4 pts) What is the terminal electron acceptor in respiration? In photosynthesis? O2, NADP+

(c) How would your answers to problem (a) change if the molecular weight of the protein was 100,000 Dalton?

Papers listed: Cell2. This weeks papers. Chapt 4. Protein structure and function

18.2 Protein Structure and Function: An Overview

H H N - C - C 2 R. Three possible forms (not counting R group) depending on ph

Refinement of a pdb-structure and Convert

The Organic Chemistry of Amino Acids, Peptides, and Proteins

Chapter 12 - Proteins

Guide for Bioinformatics Project Module 3

Outline. Market & Technology Trends. LifeTein Technology Portfolio. LifeTein Services

Biological Molecules

Helices From Readily in Biological Structures

8/20/2012 H C OH H R. Proteins


AP BIOLOGY 2008 SCORING GUIDELINES

Paper: 6 Chemistry University I Chemistry: Models Page: 2 of Which of the following weak acids would make the best buffer at ph = 5.0?

A disaccharide is formed when a dehydration reaction joins two monosaccharides. This covalent bond is called a glycosidic linkage.

Shu-Ping Lin, Ph.D.

Graph theoretic approach to analyze amino acid network

Protein Folding. The resulting three-dimensional structure is determined by the amino acid sequence (Anfinsen's dogma).

INTRODUCTION TO PROTEIN STRUCTURE

Consensus alignment server for reliable comparative modeling with distant templates

Lecture 4: Peptides and Protein Primary Structure [PDF] Key Concepts. Objectives See also posted Peptide/pH/Ionization practice problems.

MAKING AN EVOLUTIONARY TREE

Combinatorial Biochemistry and Phage Display

Lecture 19: Proteins, Primary Struture

BRENDA Exercises Quick Search

Introduction to Bioinformatics 3. DNA editing and contig assembly

Algorithm and computational complexity of Insulin

CONFORMATIONAL B-CELL EPITOPES CLASSIFICATION USING MACHINE LEARNING TECHNIQUES

AMINO ACIDS & PEPTIDE BONDS STRUCTURE, CLASSIFICATION & METABOLISM

TEACHING CONTEMPORARY CHEMISTRY, BIOCHEMISTRY AND BIOLOGY: FREE AVAILABLE DATABASES, WEB TUTORIALS AND ON-LINE TOOLS

FINDING RELATION BETWEEN AGING AND

Biochemistry - I. Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture-11 Enzyme Mechanisms II

--not necessarily a protein! (all proteins are polypeptides, but the converse is not true)

Invariant residue-a residue that is always conserved. It is assumed that these residues are essential to the structure or function of the protein.

Peptide Bond Amino acids are linked together by peptide bonds to form polypepetide chain.

Amino Acids, Proteins, and Enzymes. Primary and Secondary Structure Tertiary and Quaternary Structure Protein Hydrolysis and Denaturation

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Lecture 8. Protein Trafficking/Targeting. Protein targeting is necessary for proteins that are destined to work outside the cytoplasm.

PROTEINS THE PEPTIDE BOND. The peptide bond, shown above enclosed in the blue curves, generates the basic structural unit for proteins.

Course Outline. 1. COURSE INFORMATION Session Offered Winter 2012 Course Name Biochemistry

Chapter 3 Molecules of Cells

Structure Tools and Visualization

Exam 4 Outline CH 105 Spring 2012

Keystone Review Practice Test Module A Cells and Cell Processes. 1. Which characteristic is shared by all prokaryotes and eukaryotes?

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Proteins and Nucleic Acids

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

Bioinformatics Grid - Enabled Tools For Biologists.

Introduction to Principal Components and FactorAnalysis

Lecture Overview. Hydrogen Bonds. Special Properties of Water Molecules. Universal Solvent. ph Scale Illustrated. special properties of water

Lab 2/Phylogenetics/September 16, PHYLOGENETICS

Structure and properties of proteins. Vladimíra Kvasnicová

Acidic amino acids: Those whose side chains can carry a negative charge at certain ph values. Typically aspartic acid, glutamic acid.

Chapter 5. The Structure and Function of Macromolecule s

Disaccharides consist of two monosaccharide monomers covalently linked by a glycosidic bond. They function in sugar transport.

GenBank, Entrez, & FASTA

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

*Corresponding author: Yoshitaka Nishiyama,

Isomers Have same molecular formula, but different structures

Structures of Proteins. Primary structure - amino acid sequence

PROTEIN SEQUENCING. First Sequence

High Throughput Network Analysis

4. Which carbohydrate would you find as part of a molecule of RNA? a. Galactose b. Deoxyribose c. Ribose d. Glucose

Disulfide Bonds at the Hair Salon

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Activity 7.21 Transcription factors

Section I Using Jmol as a Computer Visualization Tool

Preliminary MFM Quiz

Biological Sequence Data Formats

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Amino Acids and Proteins

Guidance for Industry

Transcription:

DBDB : a Disulfide Bridge DataBase for the predictive analysis of cysteine residues involved in disulfide bridges Emmanuel Jaspard Gilles Hunault Jean-Michel Richer Laboratoire PMS UMR A 9, Université d Angers, Boulevard Lavoisier, 9 Angers Cedex emmanuel.jaspard@univ-angers.fr Université d Angers, Boulevard Lavoisier, 9 Angers Cedex gilles.hunault@univ-angers.fr LERIA, Université d Angers, Boulevard Lavoisier, 9 Angers Cedex jean-michel.richer@univ-angers.fr Abstract: We present a database of structured, verified and useful information about cysteine residues involved in disulfide bridges. The database has been filled with proteins of a subset of the PDB database called select. A rough analysis of the proteins contained in the database shows that it is possible to identify cysteine residues which are involved or not in a disulfide bond from the amino acid sequence that surrounds the cysteine. The goal of this database is to serve as a reference for the evaluation of disulfide bridge prediction softwares and will help us develop our own prediction software. Keywords: Disulfide bridge prediction, database, bioinformatics. Biological context Proteins contain cysteine residues that can be oxidized to form a covalent bond called a disulfide bridge. Past experiments [] showed that disulfide bridges can increase the thermodynamic stability of the native structure of proteins by reducing the number of unfolded conformations. Therefore, an exact prediction of disulfide connectivity can strongly reduce the conformational search space and increase the accuracy in protein structure prediction. Based on an analysis of a part of the Protein Data Bank (PDB) [], the select subset, we found that about % of the proteins of this subset contain disulfide bridges. Cysteine residues are considered as if they are not involved in a disulfide bridge. Otherwise they can be implied in an or bond if they belong to the same polypeptidic chain or to two different chains, respectively. Note that the PDB select contains only % of bridges.. Presentation and purpose of the Disulfide Bridge DataBase (DBDB) Up to now it has not been possible to assert if a given cysteine is involved in a disulfide bridge, unless the three dimensional structure of the protein is known. http://homepages.fh-giessen.de/ hg/pdbselect/ This analysis is based on the data downloaded from ftp://ftp.rcsb.org/pub/pdb/data/biounit/coordinates/all

Figure. The three disulfide bridges of the scorpion toxin peptide P (PDB ACW). The sequence is VSC EDC PEHC STQKAQAKC 9 DNDKC VC EPI. Nevertheless much work has been devoted to the prediction of disulfide bonds [,,,]. The main drawback of these approaches is that they rely on sets of proteins, called by extension databases which are not organized as genuine databases, but which are a list of PDB identifiers. Some other researchers have developed specialized databases [8,7] for the study of disulfide bridges. Our goal is to provide the bioinformatics community with a database of selected proteins issued from the PDB, that we have called DBDB for Disulfide Bridge DataBase, which has two main purposes : the first one is to provide structured, verified and useful information on the number of and/or bonds of a protein and of the s of cysteine residues involved or not in disulfide bridge; the taxonomy and classification information of proteins of the DBDB allows to focus on specific sets of proteins in order to study functions and/or species, the second one is to serve has a benchmark set specifically designed for the evaluation and comparison of cysteine bridges prediction softwares [,9]. The database is currently available at the following address : http://www.info.univ-angers.fr/pub/richer/rec/bio/dbdb and contains up to now proteins, 8 amino acids among which cysteines, organized as follows (see table ) : The first set of proteins (Working set) contains proteins with and/or bonds. The second set contains proteins with no bonds and serves as a Control set. This discrimination enables us to define four kinds of environments and their related sets for cysteine residues (see fig. ) : - T, the set of cysteines which belong to proteins with no bond - for cysteines which belong to proteins - F, the set of cysteines, not involved in a bond - I a, the set of cysteines involved in an bond - I e, the set of cysteines involved in an bond Each entry in the Disulfide Bridge DataBase contains the following type of information which are mostly extracted from the PDB files : the PDB identifier of the protein,

Working Set Control set Containing proteins with / bonds proteins without bonds Number of proteins Number of chains 9 Number of bonds not applicable Number of bonds not applicable Number of cysteines 9 9 Cysteines PolyPeptidic chains Total 7 Total bond 8 with no bond (Control set) bond 8 with bond 78 9 with bond with and bonds Table. Information currently available in the Disulfide Bridge DataBase the classification of the protein such as hydrolase or toxin, for enzymes, the E.C. number, the taxonomy of the organism like Homo sapiens, Bos taurus, the s of cysteines (for and bonds); these s are validated, i.e., they were manually adjusted when discrepancies between the Fasta sequence and the s given by the PDB files were found, bibliographic information. The DBDB allows the user to view cysteine residues directly ed in the Fasta sequence of each chain that composes the protein (see fig ).. Analysis of cysteine environments In the following we will compare the evolution of frequencies of occurrence of amino acids that belong to the environment of a cysteine. This analysis is based on the amino acid sequence of proteins.. Amino acid frequencies The whole database contains more than 8 residues. The most present amino acids are L, G, and A, while the less present are respectively W, M, H and C. The diagram on figure represents the frequencies of occurrence of each amino acid in the database.. Statistical analysis of amino acid frequencies A first analysis of the DBDB was performed in order to determine the influence of amino acids that appear around all cysteines as they are considered to influence the formation of disulfide bridges. If a residue is more

T Cumulative or specific environment of Cys Cumulative or specific environment of Cys involved in disulfide bridge I a C + + + + + (Control set)... C +... +... C +... +... C +... +... C +... +... C +... + F Cumulative or specific environment of Cys of chains with disulfide bridges Cumulative or specific environment of Cys involved in disulfide bridge I e Figure. The four different kinds of cysteine environments in proteins (or less) present than it should be in one kind of environment, then we might infer that it has some influence on the environment and might play some key role. We first made two separate analysis for the Working and Control sets. For the Working set we studied the three subsets of segments for the different environments : F, I a and I e. For the Control set we can only focus on the cysteine set T. For each segment of residues (see [] for the choice of the segment size) centered on a cysteine we computed the difference between the frequency of occurrence of an amino acid near the Cys residue and its global frequency of occurrence over the whole database. We then performed 9 8 7 % A R N D C Q E G H I L K M F P S T W Y V Figure. Percentage of occurrence of each amino acid in DBDB

- a cumulative analysis of all segments for each environment taking into account residues X n...x C X +...X +n, for a given distance n [..] in order to study the global influence of amino acids, - a -specific analysis only taking into account residues X n and X +n in order to determine the local evolution of the frequencies of amino acids. From a general point of view we can say that : the four kinds of environments are different and show different variations of frequencies of amino acids around cysteine residues, except for D which is not influent (see graphs,, of figure ). the bond environment I e shows the most important variations of the frequencies for a great number of amino acids : V, A, F, R, S, W, N, Q and G. In particular G is overabundant when it is very close to the cysteine residue and varies from +8 % to +. % (see graph of figure ) the cysteine residue is very rare in and for proteins which have no bridge (see of fig. and ). Amino acids that are typical of an environment are listed below : Amino Acid First set Second set D no influence L, C T F, I a, I e A, D T, F I a, I e H, S I a I e E, Q, K I a T, F, I e T, M I e T, F, I a Table. Discriminant amino acids. Conclusion and future work The Disulfide Bridge DataBase appears to us as a convenient database for anyone involved in the prediction of disulfide bridge prediction. We think that the growing number of proteins into the DBDB should lead to more precise results. Further analysis have to be performed for example by using information about the secondary structure. Our next step toward disulfide bridge prediction will be to use physico-chemical properties of amino acids in order to characterize the environment of cysteine residues in terms of volume, hydrophobicity, surface area, charge and polarity. References [] Martelli Pier Luigi, Fariselli Piero, Malaguti Luca and Casadio Rita, Prediction of the disulfide-bonding state of cysteines in proteins at 88 % accuracy, Protein Science, :7-79,. [] Matsumura M., Signor G. and Mathews B.W., Substantial increase in protein stability by multiple disulfide-bonds, Nature,, 9-, 989. [] Berman H.M., Westbrook J., Feng Z., Gilliland,G., Bhat T.N., Weissig H., Shindyalov I.N. and Bourne P.E., The Protein Data Bank. Nucleic Acids Res., 8, -,.

[] Neves Petersen Maria Teresa, Johnson Per Harald and Petersen Steffen B., Amino acid neighbours and detailed conformational analysis of cysteines in proteins, Protein Engineering, vol, no. 7, pp -8, 999. [] Hyunsoo Kim and Haesun Park, Protein secondary structure prediction based on an improved support vector machines approach, Protein Engineering vol. no. 8 pp. -,. [] Vullo A. and Frasconi P., Disulfide connectivity prediction using recursive neural networks and evolutionary information. Bioinformatics,, -9,. [7] Vinayagam A., Pugalenthi G., Rajesh R. and Sowdhamini R. DSDBASE: a consortium of native and modelled disulphide bonds in proteins. Nucleic Acids Res.,, D-D,. [8] Srinivasan K.N., Gopalakrishnakone P., Tan P.T., Chew K.C., Cheng B., Kini R.M., Koh J.L., SCORPION, a molecular database of scorpion toxins. Toxicon,, -,. [9] O Connor D, Yeates TO., GDAP: a web tool for genome-wide protein disulfide bond prediction., Nucleic Acids Res.,,. [] Fariselli P. and Casadio R., Prediction of disulfide connectivity in proteins. Bioinformatics, 7, 97-9,.

- () Cys (C) for all sets - 7.. -. - -. - () H and S for and sets 7 H H S S () Glu (E) for all sets () Thr (T) for all sets - - - 7-7 () Gly (G) for all sets 8-7 Figure. Some variations of frequencies from the cumulative analysis

- - - () Cys (C) for all sets 7 (a) Cysteine - - - () Gln (Q) for all sets 7 (b) His and Glu - - - () Glu (E) for all sets 7 (c) Gln Figure. Some variations of frequencies from the -specific analysis

Figure. Sample information for protein BI which contains chains L and H, and bonds