Introduction to Bioinformatics (Master ChemoInformatique)



Similar documents
Recap. Lecture 2. Protein conformation. Proteins. 8 types of protein function 10/21/10. Proteins.. > 50% dry weight of a cell

The peptide bond is rigid and planar

Advanced Medicinal & Pharmaceutical Chemistry CHEM 5412 Dept. of Chemistry, TAMUK

IV. -Amino Acids: carboxyl and amino groups bonded to -Carbon. V. Polypeptides and Proteins

Hands on Simulation of Mutation

( TUTORIAL. (July 2006)

Built from 20 kinds of amino acids

Amino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group.

Pipe Cleaner Proteins. Essential question: How does the structure of proteins relate to their function in the cell?

A. A peptide with 12 amino acids has the following amino acid composition: 2 Met, 1 Tyr, 1 Trp, 2 Glu, 1 Lys, 1 Arg, 1 Thr, 1 Asn, 1 Ile, 1 Cys

Protein Physics. A. V. Finkelstein & O. B. Ptitsyn LECTURE 1

Peptide bonds: resonance structure. Properties of proteins: Peptide bonds and side chains. Dihedral angles. Peptide bond. Protein physics, Lecture 5

UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet

GENEWIZ, Inc. DNA Sequencing Service Details for USC Norris Comprehensive Cancer Center DNA Core

Part A: Amino Acids and Peptides (Is the peptide IAG the same as the peptide GAI?)

Part ONE. a. Assuming each of the four bases occurs with equal probability, how many bits of information does a nucleotide contain?

A disaccharide is formed when a dehydration reaction joins two monosaccharides. This covalent bond is called a glycosidic linkage.

The p53 MUTATION HANDBOOK

(A) Microarray analysis was performed on ATM and MDM isolated from 4 obese donors.

Combinatorial Biochemistry and Phage Display

Structure of proteins

Helices From Readily in Biological Structures


Proteins. Proteins. Amino Acids. Most diverse and most important molecule in. Functions: Functions (cont d)

Mutations and Genetic Variability. 1. What is occurring in the diagram below?

Coding sequence the sequence of nucleotide bases on the DNA that are transcribed into RNA which are in turn translated into protein

Disulfide Bonds at the Hair Salon

This class deals with the fundamental structural features of proteins, which one can understand from the structure of amino acids, and how they are

Shu-Ping Lin, Ph.D.

Chapter 12 - Proteins

(c) How would your answers to problem (a) change if the molecular weight of the protein was 100,000 Dalton?

Carbohydrates, proteins and lipids

18.2 Protein Structure and Function: An Overview

Amino Acids, Peptides, Proteins

10 µg lyophilized plasmid DNA (store lyophilized plasmid at 20 C)

Myoglobin and Hemoglobin

How To Understand The Chemistry Of Organic Molecules

Molecular Facts and Figures

a. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled

Amino Acids and Proteins

Table S1. Related to Figure 4

Proteins and Nucleic Acids

ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes

Provincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme.

Chapter 3: Biological Molecules. 1. Carbohydrates 2. Lipids 3. Proteins 4. Nucleic Acids

DNA Sample preparation and Submission Guidelines

Invariant residue-a residue that is always conserved. It is assumed that these residues are essential to the structure or function of the protein.

4. Which carbohydrate would you find as part of a molecule of RNA? a. Galactose b. Deoxyribose c. Ribose d. Glucose

Peptide Bonds: Structure

Covalent bonds are the strongest chemical bonds contributing to the protein structure A peptide bond is formed between with of the following?

The Organic Chemistry of Amino Acids, Peptides, and Proteins

CSC 2427: Algorithms for Molecular Biology Spring Lecture 16 March 10

Amino Acids, Proteins, and Enzymes. Primary and Secondary Structure Tertiary and Quaternary Structure Protein Hydrolysis and Denaturation

Paper: 6 Chemistry University I Chemistry: Models Page: 2 of Which of the following weak acids would make the best buffer at ph = 5.0?

Chapter 5. The Structure and Function of Macromolecule s

Biological molecules:

PRACTICE TEST QUESTIONS

Problem Set 1 KEY

Chapter 9. Applications of probability. 9.1 The genetic code

The peptide bond Peptides and proteins are linear polymers of amino acids. The amino acids are

Nafith Abu Tarboush DDS, MSc, PhD

Disaccharides consist of two monosaccharide monomers covalently linked by a glycosidic bond. They function in sugar transport.

Proteins the primary biological macromolecules of living organisms

AP BIOLOGY 2008 SCORING GUIDELINES

Chapter 3 Molecules of Cells

DNA Bracelets

Genomes and SNPs in Malaria and Sickle Cell Anemia

Lecture 19: Proteins, Primary Struture

INTRODUCTION TO PROTEIN STRUCTURE

Chemistry 110. Bettelheim, Brown, Campbell & Farrell. Introduction to General, Organic and Biochemistry Chapter 22 Proteins

Title : Parallel DNA Synthesis : Two PCR product from one DNA template

Biological Molecules

Structure and properties of proteins. Vladimíra Kvasnicová

Conformational Properties of Polypeptide Chains

Lecture Overview. Hydrogen Bonds. Special Properties of Water Molecules. Universal Solvent. ph Scale Illustrated. special properties of water

Structures of Proteins. Primary structure - amino acid sequence

The Molecules of Cells

Preliminary MFM Quiz

Introduction to Perl Programming Input/Output, Regular Expressions, String Manipulation. Beginning Perl, Chap 4 6. Example 1

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

pcas-guide System Validation in Genome Editing

Next Generation Sequencing

DNA, RNA, Protein synthesis, and Mutations. Chapters

Inverse PCR & Cycle Sequencing of P Element Insertions for STS Generation

SERVICES CATALOGUE WITH SUBMISSION GUIDELINES

Role of Hydrogen Bonding on Protein Secondary Structure Introduction

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

Molecular Genetics. RNA, Transcription, & Protein Synthesis

Supplementary Online Material for Morris et al. sirna-induced transcriptional gene

Non-Covalent Bonds (Weak Bond)

Introduction to Protein Folding

CHAPTER 29 AMINO ACIDS, POLYPEPTIDES, AND PROTEINS SOLUTIONS TO REVIEW QUESTIONS

PROTEINS STRUCTURE AND FUNCTION (DR. TRAISH)

--not necessarily a protein! (all proteins are polypeptides, but the converse is not true)

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

Module 6: Digital DNA

Papers listed: Cell2. This weeks papers. Chapt 4. Protein structure and function

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

Hydrogen Bonds The electrostatic nature of hydrogen bonds

Basic Concepts of DNA, Proteins, Genes and Genomes

Transcription:

Introduction to Bioinformatics (Master ChemoInformatique) Roland Stote Institut de Génétique et de Biologie Moléculaire et Cellulaire Biocomputing Group 03.90.244.730 rstote@igbmc.fr Biological Function at the Molecular Level 1

3.1x10 9 letters in the DA code in every one of the 100x10 12 cells in the human body. umans have between 30,000 and 40,000 genes. There is approximately 2m of DA in each cell packed into the nucleosome. If all the DA in the human body were put end-to-end, it would reach to the Sun and back more than 600 times What is Bioinformatics? Bioinformatics is the study of information contained within biological, chemical or medical systems through the use of computers. Bioinformatics methods are used in a wide variety of fields including basic science, biotechnology, medicine, pharmaceutical development and public health, plus others. Bioinformatics is continually evolving; new approaches and tools are being developed that allow the researcher to more accurately and efficiently acquire, analyze and present the large amounts of data that are generated in today's research environment. 2

What is Bioinformatics? Development and application of computerized methods for the study biological information and data (generation of databases) Analysis and interpretation of these data (software tools) Developing algorithms for text string comparison (sequence alignment and keyword searches) Developing algorithms for pattern matching (data mining, cluster analysis) Algorithms for geometry analysis (docking,visualisation) Physical simulations and model building (molecular dynamics, molecular mechanics, homology modeling) Bioinformatics is situated at the interface of multiple domains of research. 3

Objectives of this module - 20hours 1. Introduction of protein and DA sequence and structure. 2. Present different biological databases (sequence and structure) and their associated interrogration tools. 3. Find information on a protein from its sequence. 4. Visualize, analyze and find information on a protein from its 3-D structure - use of the visualization program VMD. 5. Presentation of the basics of molecular modeling applied to biological molecules. An introduction of energy minimization and molecular dynamics. 4

An introduction to DA and protein structure Relationship to function Roland Stote rstote@igbmc.fr Sources of supplementary information Introduction à la structure des protéines Branden & Tooze Ed DeBoeck Université Proteins: structures and molecular properties Thomas E. Creighton W.. Freeman On the web: http://www.expasy.ch/swissmod/course/course-index.htm http://www.cryst.bbk.ac.uk/pps2/course/index.html

Structure des acides nucléiques Roland Stote Basé sur le cour de Annick Dejaegere à l ESBS Acides nucléiques formés de Phosphates Sucres Bases

Bases: Pyrimidines Cytosine: 2 oxy 4 amino pyirmidine Uracile: 2, 4 dioxy pyrimidine Thymine: 5 methyl uracile 1 2 3 4 5 6 2 O O O O O 3 C Bases: Purines Adenine: 6 amino purine Guanine: 2 amino 6 oxy purine 7 8 9 1 2 3 4 5 6 O

Liaisons hydrogènes A A D D D D D A D D Cytosine Uracile 2 O O O Liaisons hydrogènes A A D D D D D A A D D A Adenine Guanine A: interaction avec accepteur D: interaction avec donneur O

Interactions dans le plan Liaisons hydrogènes Paires de bases 10 possibilités dʼassemblage de paires purines-pyrimidines 11 purines-purines 7 pyrimidines-pyrimidines http://www.imb-jena.de/imglibdoc/

Interactions verticales Empilement (stacking) vertical Effet hydrophobe Interactions électrostatique des bases élices A, B

élices A, B Les sillons

Les sillons Les sillons

DA Z Séquences G -C Alternance de conformations syn et anti DA A, B, Z B DA: faible force ionique - conformation native de la chromatine A DA: forte force ionique ou en présence dʼalcohol A RA: conformation native de lʼar Z DA: séquences alternées poly dg - dc à forte force ionique.

ucléosome Molecular Biology DA String of four-letter alphabet of nucleotides A, C, G, T Usually double stranded with complementary anti-parallel strands 5 ʼ ATCGCCTTATTCAT 3 ʼ 3 ʼ TAGCGGAATAAGTA 5 ʼ

Genetic Code A gene is a specific sequence of nucleotide bases, whose sequences carry the information required for constructing proteins, which provide the structural components of cells and tissues as well as enzymes for essential biochemical reactions. The human genome is estimated to comprise more than 30,000 genes. The Genetic Code describes the translation of genes into protein Genetic Code TTT F Phe TCT S Ser TAT Y Tyr TGT C Cys TTC F Phe TCC S Ser TAC Y Tyr TGC C Cys TTA L Leu TCA S Ser TAA * Ter TGA * Ter TTG L Leu i TCG S Ser TAG * Ter TGG W Trp CTT L Leu CCT P Pro CAT is CGT R Arg CTC L Leu CCC P Pro CAC is CGC R Arg CTA L Leu CCA P Pro CAA Q Gln CGA R Arg CTG L Leu i CCG P Pro CAG Q Gln CGG R Arg ATT I Ile ACT T Thr AAT Asn AGT S Ser ATC I Ile ACC T Thr AAC Asn AGC S Ser ATA I Ile ACA T Thr AAA K Lys AGA R Arg ATG M Met i ACG T Thr AAG K Lys AGG R Arg GTT V Val GCT A Ala GAT D Asp GGT G Gly GTC V Val GCC A Ala GAC D Asp GGC G Gly GTA V Val GCA A Ala GAA E Glu GGA G Gly GTG V Val GCG A Ala GAG E Glu GGG G Gly

Biology at the Molecular Level Proteins are essentially biological polymers -terminus terminates by an amino group Peptide bond Amino acid A peptide: Phe-Ser-Glu-Lys (F-S-E-K) C-terminus terminates by a carboxyl group

General form of amino acids R 2 C α COO The alpha carbon is asymmetic. 2 stereoisomers are possible, D or L Chirality in proteins D form L form

Interactions between amino acids and their environment. Elecrostatic interactions E(r)=A/r r r + - + + van der Waals interactions r E(r)= B/r 12 - C/r 6 Interactions with solvent Mesured by: - solubility - chromatography - surface tension ydrophobic interactions ydrophilic interactions

The hydrogen bond. δ - δ + δ - - D --- A - donor acceptor Double character: - electrostatic - covalent An acceptor can be shared geometry: Linéaire +/-20 deg d = 1.7 Å - 2.0 Å O =

Classification of amino acids Amphipatics Charged R D Q K E Small hydrophilics T C S Y W G M A V P F I L Small hydrophobics Bulky hydrophobics The peptide bond O C i+1 C α - O C + i+1 C α C α C α i i

CIS-TRAS Isomerixation i+1 O C α O C C C α i C α C α i i+1 ω = 180 ω = 0 TRAS CIS Four Levels of Structure Determine the Shape of Proteins Primary structure The linear arrangement (sequence) of amino acids and the location of covalent (mostly disulfide) bonds within a polypeptide chain. Determined by the genetic code. Secondary structure local folding of a polypeptide chain into regular structures including the α helix, β sheet, and U-shaped turns and loops. Tertiary structure overall three-dimensional form of a polypeptide chain, which is stabilized by multiple non-covalent interactions between side chains. Quaternary structure: The number and relative positions of the polypeptide chains in multisubunit proteins. ot all protein have a quaternary structure.

Primary Structure of a protein: determined by the nucleotide sequence of its gene Bovine Insulin: the first sequenced protein In 1953, Frederick Sanger determined the amino acid sequence of insulin, a protein hormone. This work is a landmark in biochemistry because it showed for the first time that a protein has a precisely defined amino acid sequence. it demonstrated that insulin consists only of amino acids linked by peptide bonds between -amino and -carboxyl groups. the complete amino acid sequences of more than 100,000 proteins are now known. Each protein has a unique, precisely defined amino acid sequence. Amino acid substitution in proteins from different species Conservative Substitution of an amino acid by another amino acid of similar polarity (Val for Ile in position 10 of insulin) on conservative Invariant residues Substitution involving replacement of an amino acid by another of different polarity (sickle cell anemia, 6th position of hemoglobin replace from a glutamic acid to a valine induce precipitation of hemoglobin in red blood cells) Amino acid found at the same position in different species (critical for for the sructure or function of the protein)

Protein conformation: many (but not all) proteins fold into a stable conformation, otherwise known as the native conformation More than 50 amino acids becomes a protein,otherwise known as a peptide. Secondary structure of proteins

The 3D structure is defined by the orientation of successive petide planes There are degrees of freedom for each amino acid (phi and psi) Conformation of the polypeptide chain Phi

Conformation of the polypeptide chain Psi The Ramachandran plot

The alpha helix elix Parameters elix step: 5,4 Å 3,6 residues per turn ydrogen bonds Oi - i+4 phi = -60 / psi = -50 Vue axiale:

The 3 10 helix 3 residues/turn bond Oi to i+3 phi = -50 / psi = -25 The beta strand

Assembly of beta strands into sheets Parallel and anti-parallel beta sheets

The beta sheet is not flat but curved The beta turns

Triple helix of Collagen Limited to tropocollagen molecule 3 left-handed helices wound together to give a right-handed superhelix Stable superhelix : glycines located on the central axis (small R group) of triple helix One interchain -bond for each triplet of amino acids between of Gly and CO of X (or Proline) in the adjacent chain

Side chain conformations Definition o dihedral angles along the side chain Most frequently observed rotamers > > on covalent interactions involved in the shape of proteins

Tertiary structure: the overall shape of a protein or a telephone cord!!! The secondary structure of a telephone cord A telephone cord, specifically the coil of a telephone cord, can be used as an analogy to the alpha helix secondary structure of a protein. The tertiary structure of a telephone cord The tertiary structure of a protein refers to the way the secondary structure folds back upon itself or twists around to form a three-dimensional structure. The secondary coil structure is still there, but the tertiary tangle has been superimposed on it. Secondary structure motifs Motifs: assembly (simple) of secondary structural elements elix-turn-helix β air-pin

Secondary structure motifs β α β motic Greek key motif Fold Classification α β α/β α+β

Tertiary structure: the overall shape of a protein Full three dimensional organization of a protein The three-dimensional structure of a protein kinase The role of side chain in the shape of proteins Where is water? ydrophilic ydrophobic

TERTIARY STRUCTURE R-group interactions result in 3D structures of globular proteins Types of interactions : -, ionic- (salt linkage), hydrophobic- and disulphide- bond ydrophilic R groups on surface while hydrophobic R groups buried inside of molecule Wide variety of 3 o structures: since large variation in protein sizes and amino acid sequences

After X-ray crystallographic studies of hen lysozyme (Phillips, 1966), papain (Drenth et al., 1968) and by limited proteolysis studies of immunoglobulins (Porter, 1973; Edelman, 1973), Donald B. Wetlaufer Wetlaufer defined domains as stable units of protein structure that could fold autonomously. A protein domain is a part of a protein that can evolve, function, and exist independently of the rest of the protein chain. each domain forms a compact three-dimensional structure and often can be independently stable and folded. many proteins consist of several structural domains. one domain may appear in a variety of evolutionarily related proteins. domains vary in length from between about 25 amino acids up to 500 amino acids in length. examples, zinc fingers (stabilized by metal ions), calcium-binding EF hand domain. Protein domains Cytochrome b562 A single domain protein involved in electron transport in mitochondria The AD * -binding domain of the enzyme lactic dehydrogenase The variable domain of an immunoglobulin *nicotinamide adenine dinucleotide

The Src protein Quaternery structure: If protein is formed as a complex of more than one protein chain, the complete structure is designed as quaternery structure: Generally formed by non-covalent interactions between subunits Either as homo- or hetero-multimers

Primary structure Secondary structure Tertiary structure Quaternary structure Function of peptides and proteins

STRUCTURE - FUCTIO RELATIOSIPS In general, all globular proteins have distinctive 3D structures that are specialized for their particular functions. Shape and function

Membrane transport proteins Mechanical support - skin and bone are strengthened by the protein collagen. Abnormal collagen synthesis or structure causes dysfunction of cardiovascular organs, bone, skin, joints eyes Refer to Devlin Clinical correlation 3.4 p121

Transport and storage - small molecules are often carried by proteins in the physiological setting (for example, the protein hemoglobin is responsible for the transport of oxygen to tissues). Many drug molecules are partially bound to serum albumins in the plasma. The binding of oxygen is affected by molecules such as carbon monoxide (CO) (for example from tobacco smoking, cars and furnaces). CO competes with oxygen at the heme binding site. emoglobin binding affinity for CO is 200 times greater than its affinity for oxygen, meaning that small amounts of CO dramatically reduces hemoglobin's ability to transport oxygen. When hemoglobin combines with CO, it forms a very bright red compound called carboxyhemoglobin. 3-dimensional structure of hemoglobin. The four subunits are shown in red and yellow, and the heme groups in green. When inspired air contains CO levels as low as 0.02%, headache and nausea occur; if the CO concentration is increased to 0.1%, unconsciousness will follow. In heavy smokers, up to 20% of the oxygen-active sites can be blocked by CO. Cell adhesion and signaling: Integrins are cell adhesion molecules that couple the cytoskeleton to the extracellular matrix Inside signal outside

Integrin Topology http://www.multimedia.mcb.harvard.edu/

The relationship between shape and function of proteins: The relationship between shape and function of proteins:

The Shape of proteins: Occurs Spontaneously ative conformation determined by different Levels of structure Disease and protein folding: Disease Exemple: eurodegenerative diseases

An X-ray diffraction image for the protein myoglobin. The first protein crystal structure was of sperm whale myoglobin, as determined by Max Perutz and Sir John Cowdery Kendrew in 1958, which led to a obel Prize in Chemistry. MR is a field of structural biology, that applies nuclear magnetic resonance spectroscopy to investigating proteins The field was pioneered by among others, Richard Ernst (obel prize 1991) and Kurt Wüthrich (obel prize 2002), Pacific orthwest ational Laboratory's high magnetic field (800 Mz) MR spectrometer being loaded with a sample. The MR sample is prepared in a thin walled glass tube. Protein MR is performed on aqueous samples of highly purified protein. Sample consist of between 300 and 600 microlitres with a protein concentration in the range 0.1 3 millimoles. The source of the protein can be either natural or produced in an expression system using recombinant DA techniques through genetic engineering.

Acknowledgements Marie-Véronique Clement Annick Dejaegere Bruno Kieffer