Statistical mechanics of secondary structures formed by random RNA sequences. Ralf Bundschuh The Ohio State University



Similar documents
RNA Structure and folding

Large-N Random Matrices for RNA Folding

DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

Basic Concepts of DNA, Proteins, Genes and Genomes

Replication Study Guide

Protein Protein Interaction Networks

Topic 3b: Kinetic Theory

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

Statistical mechanics for real biological networks

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

arxiv:hep-th/ v1 25 Jul 2005

RNA & Protein Synthesis

1. Molecular computation uses molecules to represent information and molecular processes to implement information processing.

Monte Carlo Simulation

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Complex dynamics made simple: colloidal dynamics

Chapter 5: The Structure and Function of Large Biological Molecules

Introduction to Proteins and Enzymes

Chapter 6 DNA Replication

Mean Field Flory Huggins Lattice Theory

Introduction To Real Time Quantitative PCR (qpcr)

HiPer RT-PCR Teaching Kit

Lecture 19: Proteins, Primary Struture

RNA and Protein Synthesis

PRACTICE TEST QUESTIONS

THERMODYNAMIC PROPERTIES OF POLYPEPTIDE CHAINS. PARALLEL TEMPERING MONTE CARLO SIMULATIONS

Disaccharides consist of two monosaccharide monomers covalently linked by a glycosidic bond. They function in sugar transport.

2.500 Threshold e Threshold. Exponential phase. Cycle Number

Today you will extract DNA from some of your cells and learn more about DNA. Extracting DNA from Your Cells

arxiv: v1 [q-bio.qm] 6 Mar 2013

Basic Equations, Boundary Conditions and Dimensionless Parameters

Activity 7.21 Transcription factors

A disaccharide is formed when a dehydration reaction joins two monosaccharides. This covalent bond is called a glycosidic linkage.

Stochastic Gene Expression in Prokaryotes: A Point Process Approach

agucacaaacgcu agugcuaguuua uaugcagucuua

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

Q: How are proteins (amino acid chains) made from the information in mrna? A: Translation Ribosomes translate mrna into protein

Spectra of Sample Covariance Matrices for Multiple Time Series

Tutorial on Markov Chain Monte Carlo

Theorem (informal statement): There are no extendible methods in David Chalmers s sense unless P = NP.

3D plasticity. Write 3D equations for inelastic behavior. Georges Cailletaud, Ecole des Mines de Paris, Centre des Matériaux

Chemical Basis of Life Module A Anchor 2

Real-time PCR: Understanding C t

DNA, RNA, Protein synthesis, and Mutations. Chapters

Some stability results of parameter identification in a jump diffusion model

Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Accuracy of the coherent potential approximation for a onedimensional array with a Gaussian distribution of fluctuations in the on-site potential

Two-Dimensional Conduction: Shape Factors and Dimensionless Conduction Heat Rates

Backbone and elastic backbone of percolation clusters obtained by the new method of burning

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Protein Melting Curves

Mass Spectra Alignments and their Significance

Introduction to the Finite Element Method

Genetics Lecture Notes Lectures 1 2

Genetics Module B, Anchor 3

Translation Study Guide

Topic 2: Energy in Biological Systems

Chapter 12 - Liquids and Solids

DNA and Forensic Science

Molecular Genetics. RNA, Transcription, & Protein Synthesis

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 6 Sep 1997

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Small window overlaps are effective probes of replica symmetry breaking in three-dimensional spin glasses

Portfolio Distribution Modelling and Computation. Harry Zheng Department of Mathematics Imperial College

Gene Expression Assays

[Image removed due to copyright concerns]

Proteins and Nucleic Acids

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Introduction to Engineering System Dynamics

Structure and Function of DNA

DNA Scissors: Introduction to Restriction Enzymes

Papers listed: Cell2. This weeks papers. Chapt 4. Protein structure and function

Real-Time PCR Vs. Traditional PCR

NO CALCULATORS OR CELL PHONES ALLOWED

Contrôle dynamique de méthodes d approximation

A Non-Linear Schema Theorem for Genetic Algorithms

Statistics Graduate Courses

Stochastic Gene Expression in Prokaryotes: A Point Process Approach

Essentials of Real Time PCR. About Sequence Detection Chemistries

Models of Switching in Biophysical Contexts

High flexibility of DNA on short length scales probed by atomic force microscopy

1. A covalent bond between two atoms represents what kind of energy? a. Kinetic energy b. Potential energy c. Mechanical energy d.

Errors induced during PCR amplification. Hoda Sharifian

1 The Brownian bridge construction

Lecture 3: Models of Solutions

Modelling Emergence of Money

EXIT TIME PROBLEMS AND ESCAPE FROM A POTENTIAL WELL

VERTICES OF GIVEN DEGREE IN SERIES-PARALLEL GRAPHS

Pipe Cleaner Proteins. Essential question: How does the structure of proteins relate to their function in the cell?

12.1 The Role of DNA in Heredity

CHM 579 Lab 1: Basic Monte Carlo Algorithm

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Built from 20 kinds of amino acids

4. Which carbohydrate would you find as part of a molecule of RNA? a. Galactose b. Deoxyribose c. Ribose d. Glucose

To be able to describe polypeptide synthesis including transcription and splicing

Lecture 3. Turbulent fluxes and TKE budgets (Garratt, Ch 2)

Transcription:

Statistical mechanics of secondary structures formed by random RN sequences Ralf Bundschuh The Ohio State University Collaborator: Terence Hwa, University of California at San Diego Outline: Introduction to RN Uniform sequences: the phase Disorder: phase and transition? Bias: the phase Conclusions supported by DD, Beckman foundation, and NSF

Introduction I 2 RN is heteropolymer of four different bases G, C,, and U Primary structure: Sequence, e.g., GCGGUUUGCUCGUUGGGGGCGCCGCUGUCUGGGGUCCUGUGUUCGUCCCGUUCGCCC Strongest interaction: Watson-Crick base pairing (G C and U) secondary structure Spatial arrangement tertiary structure (looks locally like DN double helix)

Introduction II 3 Secondary structure: Set of base pairs formed Pseudo-knots neglected Diagramatic representation ssign energy E[S] to each structure S partition function Z = {S} exp( E[S]/T ) Partition function generated exactly by Hartree equation = + i j i j 1 j Σ k i k j 1 j Electron in disordered medium, meanders Handle for analytical treatment O(N 3 ) algorithm for exact partition function (McCaskill, Biopolymers 29, 1990.)

Introduction III 4 Two important parameters for generic properties: Temperature Bias for structure Use long hairpin as (designed) structure Create sequences by randomly choosing first half of the sequence assigning exact complement as second half changing bias by mutations with probability p Expected phase diagram: bias for structure temperature

Molten Phase I 5 De Gennes (1968) proposed: start with uniform sequences UUUUU... or GCGCGCGCGC... Uniform attraction between any two elements of the polymer only one effective interaction parameter ε 0 ε 0 contains binding energy and entropic terms relative to unbound RN Most monomers engaged in base pairs Main effect in phase: branching entropy Hartree equation i j = i j 1 j + Σ becomes k i k j 1 j Ẑ(µ) 1 = (e µ 1) Π(µ) Π(µ) = e ε 0 /T Ẑ(µ) in Laplace domain Z(N) N θ e µ 0N with θ = 3 2 de Gennes, Biopolymers 6, 1968; Waterman, dv. Math. Suppl. Studies 1, 1978

5 Molten Phase II 6 mountain representation (N = 18) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 4 3 2 1 18 17 16 6 7 8 9 10 13 12 14 15 11 one to one correspondence: RN secondary structures mountains all bases attract equally strong counting structures counting mountains free random walk in presence of a hard wall Z(N) N 3/2 e µ 0N h N 1/2 branched polymer R g N 1/4 <h>

Glass Transition I 7 Is phase stable? perturbative approach Interaction energy between base i and base j ε ij = ε 0 + ε ij ssume ε ij independent Gaussian variables ε ij = 0 ε ij ε kl = ɛ δ ik δ jl First term in free energy expansion in powers of ɛ: two-replica system Z 2 two replicas which gain energy ε for every common bond ( ) 7 ε exactly solvable phase transition at finite ε c ε< ε c : 2 replicas fluctuate independently () ε> ε c : 2 replicas have same configuration () phase perturbatively stable

Glass Transition II 8 Is phase stable? Study pinching excitations ssume phase is stable for all temperatures Z(T, N) = (T )N 3/2 exp[ f 0 (T )N] Tang and Chaté, PRL 86 (2001) On the one hand: F (N) = 2T log(n/2) 3 2 + T log N 3 2 3 2 T log N

Glass Transition III 9 On the other hand: find piece of length log N/ log 2 in first half exactly complementary to piece in second half ll pairs in this piece contribute pairing energy ε P in unpinched configuration Estimate for pinching free energy: F (N) [ɛ P + 2f 0 (T )] log N/ log 2 Combine two results: 3 2 T [ɛ P + 2f 0 (T )]/ log 2 ɛ P + 2f 0 (T 0) > 0 (for four or more base alphabet next talk) contradiction for T < T RN cannot be in phase for T < T different () phase at low temperatures

Glass Transition IV 10 Properties of the phase (important for folding behavior)? Probe free energy F of low energy, large scale excitations (droplets) F F F (N) N θ θ > 0 θ < 0 no Pinching provides such excitations Just seen: F (N) [ɛ P + 2f 0 (T )] log N/ log 2 Numerically at low temperatures: F (N) a(t ) log N F 20 15 10 T=0.025 Glass very weak 5 0 10 100 1000 10000 For practical purposes no difference between and phase N

Molten-Native Transition I 11 How does structure appear? Start from phase dd bias towards structure (hairpin) 1 N/2 N Use simplified model in spirit of phase description Possible Watson-Crick pairs: 1 1 N 1 1 N N Random sequence with perfect bias N Simplified model only two different interaction energies strong interaction ε 0 + U 0 for base pairs weak interaction ε 0 for all other base pairs U 0 is an effective measure of the bias Model similar to Gō model of protein folding (Gō, J. Stat. Phys. 30, 1983)

Molten-Native Transition II 12 Model can be exactly solved by Laplace transform Phase transition between and phase at finite critical bias U c or at critical temperature T c Phase transition is second order with finite jump in specific heat (α = 0) but large finite size effects possible Calculate fraction of contacts Q Exhibits scaling form and scaling function ( ) T Q N 1/2 Tc g N 1/2 T c (ν = 2) where g(y) y 1 y 1 1 1 y 1 y 1 y 1 g(y) 3 2 1 y 1/y 0 10 5 0 5 10 y Can be numerically verified to apply to randomly chosen RN sequences 1

Conclusions and Outlook I 13 bias for structure temperature RN shows a large variety of interesting behavior RN secondary structure formation is tractable analytically and numerically by methods of statistical mechanics RN secondary structures offer an alter approach to studying a variety of issues of general heteropolymer behavior + + + + +...

Conclusions and Outlook II 14 bias for structure temperature Future work: Understand phase properties analytically - transition more realistic RN models, self-avoidance interactions between several molecules kinetics pseudo-knots tertiary stucture biological applications (RN finding, Huntington s desease)

Biological function of RN 15 Biological functions: Structure ( proteins) Ribosomal RN Transfer RN Information ( DN) Messenger RN single-stranded DN T instead of U more rigid backbone Interplay of structure and information Splicing Ribozymes RN world (origin of life)

Molten Phase in Natural Molecules I 16 pplication to real sequences Use experimentally determined parameters from RN secondary structure prediction which take all energetic details into account Hofacker et al., Monatshefte f. Chemie 125, 1994 Uniform sequences UUUUU... and GCGCGCGCGC... need very long sequences ( 8000 bases, Tsunglin Liu & RB B9.013) Hairpin loops must contain at least 3 bases Loss in binding energy large in hairpin loops

Molten Phase in Natural Molecules II 17 Naturally occuring in human DN: (CG) n with large n Connected with Huntington s disease Hereditary neurodegenerative disease n < 35 normal n > 35 Huntington s disease If n > 35, n usually very large CG codes for Glutamine repeats appear in protein Single-stranded DN can undergo self-binding during replication Biologist s model: only minimal free energy structure competes with single-stranded configuration C G G C C G G C C G C G G C G C C G G C C G G C C G G C

Molten Phase in Natural Molecules III 18 (CG) n can be in phase C G G C C G G C C G C G C G G C G C C G G C C G G C C G G C C G C G G C G C C G G C C G G C C G G C Crossover length 7 bases (Tsunglin Liu & RB B9.013) Molten phase relevant Kinetics possibly important (single molecule?) experiments necessary

Solution of Gō model 19 Partition function: order arbitrary structure by number of contacts Z(N, U 0 ) = + +... + + = W (l) = sum over all ways to place non- bonds W similar to phase partition function expect W (l) l 3/2 Relation between bubble (W ) and full (Z) partition functions Ẑ(µ; U 0 ) = Ŵ (µ) + Ŵ (µ) e (ε 0+U 0 )/T Ŵ (µ) (Laplace domain) + Ŵ (µ) e (ε 0+U 0 )/T Ŵ (µ) e (ε 0+U 0 )/T Ŵ (µ) +... Ẑ(µ; U 0) 1 = Ŵ (µ) 1 e (ε 0+U 0 )/T Exact expression for Ẑ(µ; U 0) Partition function relation Ẑ(µ; U 0) 1 = Ŵ (µ) 1 e ε 0 +U 0 T Boundary condition Z(N, U 0 = 0) = Z 0 (2N) gives Ŵ

Molten-Native Transition III 20 Free energy as a function of the number of total contacts K and the number of contacts Q In the phase t the phase transition In the phase Q Q Q K K K

Molten-Native Transition IV 21 pplicability to heterogeneous sequences verage numerically over many self-complementary random sequences C 2.0 1.5 1.0 N=200 N=400 N=800 N=1600 0.5 Critical temperature found from specific heat 0.0 0.0 0.2 0.4 0.6 0.8 T/ ε + Fraction of contacts vanishes at phase transition Q 0.8 0.6 0.4 0.2 N=200 N=400 N=800 N=1600 Scaling plot confirms power laws predicted in the framework of the Gō-like model QN 1/2 0.0 0.0 0.2 0.4 0.6 0.8 T/ ε + 12 10 8 6 4 L=800 L=400 L=200 L=100 Go 2 0 6 4 2 0 2 4 6 N 1/2 (T C T)/T C

DN Hybridization I 22 Same effects play a role in DN hybridization Hybridization is widely used experimental method in molecular biology Homology detection without sequencing PCR DN chips Sequencing by hybridization Gene expression analysis DN computer Two single stranded DN can form base pairs with each other with themselves Connect ends of the two DN strands in Gedanken task RN structure formation Different applications need different experimental conditions

DN Hybridization II 23 Example: detection of weak homologies Single stranded DN in solution form hybrid, if complementary enough Not complementary enough remain single-stranded Existence of hybrids detected by enzyme Weak homology: need to reduce stringency (e.g. lower temperature) Problem: self-binding instead of hybrid formation Experimentalist has to know which phase is present complementarity hybridized temperature

Branched Polymers 24 RN in phase equivalent to branched polymer Possible forms of branched polymers: B B B B B B B B B B B B... Zimm and Stockmeyer, J. Chem. Phys. 17, 1949 Lubensky et al., J. Physique 41, 1981 Lubensky and Isaacson, Phys. Rev. 20, 1979 Parisi and Sourlas, Phys. Rev. Lett. 49, 1981 ll results without self-avoidance agree with Z N 3/2 e µ 0N and R g N 1/4 multiple branching irrelevant pseudo-knots irrelevant =

Structure Size Scaling 25 Characterize all phases by scaling laws Choose random sequences and calculate for each of them their typical size h numerically Different behavior in all three compact phases: bias 1000 <h> 1 <h>~n <h> 100 N 1 <h>~n 0.7 10 100 1000 N <h>~n 1/2 100 100 <h> 10 N 0.7 <h> 10 N 1/2 1 100 1000 N 1 100 1000 N temperature

Denaturation I 26 Description of denaturation Have to include spatial entropy W (l) l d/2 of loops of l unbound bases l=4 l=4 l=9 l=5 l=4 Hartree equation changes from with G 1 0 = e µ 1 to with G 1 0 = µ + k 2 Ẑ(µ) 1 = G 1 0 (µ) e ε 0/T Ẑ(µ) Ẑ(µ, k) 1 = G 1 0 (µ, k) e ε 0/T Ẑ(µ, k)dk Studied by de Gennes (1968) for RN in three dimensional space no phase transition

Denaturation II 27 Repeat de Gennes calculation for arbitrary d Changing binding strength ε 0 or temperature T leads to no phase transition for d < 4 second order phase transition for 4 < d < 6 first order phase transition for 6 < d For any dimension we should get transition to self-avoiding walk for repulsive interactions What is missing to get phase transition in d = 3? Stacking arbitrarily sharp pseudo-transition Self-avoidance of a single loop: d/2 νd (better but not yet enough) Self-avoidance between different loops Different description necessary in phase, since contacts of secondary structure do not make sense for repulsive interactions D. Moroz and T. Hwa

Two Replicas I 28 Ensemble average Z 2 two replicas which gain energy ε for every common bond ( ) 7 ε Order configurations of 2 replica system by configurations of common bonds Common bonds ( ) form RN structure themselves represents sum over all possible choices of non-common bonds in the two replicas 1 replica l 3/2 2 replicas (l 3/2 ) 2 = l 6/2 effective picture: single RN with 6-dimensional loop entropies

Two Replicas II 29 exactly solvable phase transition at finite ε c ε< ε c : 2 replicas fluctuate independently () ε> ε c : 2 replicas have same contacts () specific heat exponent α = 1 marginally first order transition 1.2 1.0 fraction of common contacts agrees well with Monte Carlo simulations Q 0.8 0.6 Exact solution MC simulation 0.4 Large critical disorder ε c phase stable towards disorder 0.2 0.0 ε C 0 10 20 30 40 exp( ε/t)

Directed Polymer nalogy 30 Direct relation between RN structure formation + +... + + and unbinding of a directed polymer + +... + + Lipowsky, Europhys. Lett. 15, 1991 Different sources of entropy: RN Binding/branching entropy W l 3/2 DP Spatial entropy W l d/2 Same critical behavior as unbinding transition in d = 3 Note: RN interactions long-ranged, DP interactions local

Typical Bias 31 How much bias is needed? Numerical result for toy model: a ground state of a random sequence contains 95% Watson-Crick pairs biologically useful structure must beat this threshold Numerical results for real RN hairpins with different mutation rates p Q 0.8 0.6 0.4 0.2 U. Gerland, RB, and T. Hwa 1 0 T=40 o C, N=200 0 0.2 0.4 0.6 0.8 1 p Has to be compared with natural RN sequences Systematic experiments necessary Evolution has to find very small number of good RN sequences out of a vast amount of /y sequences

32 Statistical mechanics of secondary structures formed by random RN sequences Ralf Bundschuh The Ohio State University Collaborator: Terence Hwa, University of California at San Diego Outline: Introduction to RN RN phase diagram Possible applications supported by DD (RB), Beckman foundation (TH), and NSF (TH)

Introduction I 33 RN is heteropolymer of four different bases G, C,, and U Primary structure: Sequence, e.g., GCGGUUUGCUCGUUGGGGGCGCCGCUGUCUGGGGUCCUGUGUUCGUCCCGUUCGCCC Strongest interaction: Watson-Crick base pairing (G C and U) secondary structure Spatial arrangement tertiary structure (looks locally like DN double helix)

RN phase diagram I 34 Concentrate on secondary structure Questions: What are generic properties of structures formed by random sequences? What are evolution or human RN designers up against? Depends on external parameters C.f., water, ice, and vapor: pressure 1 atm ice water vapor o o 0 C 100 C temperature Three phases with vastly different properties

RN phase diagram II 35 Three important ingredients: thermal fluctuations sequence disorder bias for structure (biology) Possible two parameter phase diagram bias for structure Phases have very different properties temperature : molecule takes biologically meaningful structure : many structures coexist : molecule gets stuck in a random configuration : no structure at all

RN phase diagram III 36 bias for structure Questions: Do all of these phases really exist? temperature How do we recognize which phase is present and when we change from one phase to another? Questions not resolved in spite of 50 years of knowledge of DN structure 10 years of computer simulations Properties of phases for the first time determined mathematically Proof of existence of - phase transition Quantitative characterization of - phase transition

pplications 37 Basic research Physics: basis for understanding of y systems in general (spin es, structural es, protein folding) Biology: basis for understanding evolution of RN sequences Possible practical applications Identification of RN sequences in genomes G C T...TGCGTTC TTGGTGCCCTTGCTTCGGCTGCTTG... G C G C? T T T Quantitative modeling of Huntigton s desease G C C C G G G C C G G C G C G C G C C G C G C G C G C G C G C G C G C G G C G C G C G C G C G C G C Optimization of experimental parameters in DN hybridization