Statistical mechanics of secondary structures formed by random RN sequences Ralf Bundschuh The Ohio State University Collaborator: Terence Hwa, University of California at San Diego Outline: Introduction to RN Uniform sequences: the phase Disorder: phase and transition? Bias: the phase Conclusions supported by DD, Beckman foundation, and NSF
Introduction I 2 RN is heteropolymer of four different bases G, C,, and U Primary structure: Sequence, e.g., GCGGUUUGCUCGUUGGGGGCGCCGCUGUCUGGGGUCCUGUGUUCGUCCCGUUCGCCC Strongest interaction: Watson-Crick base pairing (G C and U) secondary structure Spatial arrangement tertiary structure (looks locally like DN double helix)
Introduction II 3 Secondary structure: Set of base pairs formed Pseudo-knots neglected Diagramatic representation ssign energy E[S] to each structure S partition function Z = {S} exp( E[S]/T ) Partition function generated exactly by Hartree equation = + i j i j 1 j Σ k i k j 1 j Electron in disordered medium, meanders Handle for analytical treatment O(N 3 ) algorithm for exact partition function (McCaskill, Biopolymers 29, 1990.)
Introduction III 4 Two important parameters for generic properties: Temperature Bias for structure Use long hairpin as (designed) structure Create sequences by randomly choosing first half of the sequence assigning exact complement as second half changing bias by mutations with probability p Expected phase diagram: bias for structure temperature
Molten Phase I 5 De Gennes (1968) proposed: start with uniform sequences UUUUU... or GCGCGCGCGC... Uniform attraction between any two elements of the polymer only one effective interaction parameter ε 0 ε 0 contains binding energy and entropic terms relative to unbound RN Most monomers engaged in base pairs Main effect in phase: branching entropy Hartree equation i j = i j 1 j + Σ becomes k i k j 1 j Ẑ(µ) 1 = (e µ 1) Π(µ) Π(µ) = e ε 0 /T Ẑ(µ) in Laplace domain Z(N) N θ e µ 0N with θ = 3 2 de Gennes, Biopolymers 6, 1968; Waterman, dv. Math. Suppl. Studies 1, 1978
5 Molten Phase II 6 mountain representation (N = 18) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 4 3 2 1 18 17 16 6 7 8 9 10 13 12 14 15 11 one to one correspondence: RN secondary structures mountains all bases attract equally strong counting structures counting mountains free random walk in presence of a hard wall Z(N) N 3/2 e µ 0N h N 1/2 branched polymer R g N 1/4 <h>
Glass Transition I 7 Is phase stable? perturbative approach Interaction energy between base i and base j ε ij = ε 0 + ε ij ssume ε ij independent Gaussian variables ε ij = 0 ε ij ε kl = ɛ δ ik δ jl First term in free energy expansion in powers of ɛ: two-replica system Z 2 two replicas which gain energy ε for every common bond ( ) 7 ε exactly solvable phase transition at finite ε c ε< ε c : 2 replicas fluctuate independently () ε> ε c : 2 replicas have same configuration () phase perturbatively stable
Glass Transition II 8 Is phase stable? Study pinching excitations ssume phase is stable for all temperatures Z(T, N) = (T )N 3/2 exp[ f 0 (T )N] Tang and Chaté, PRL 86 (2001) On the one hand: F (N) = 2T log(n/2) 3 2 + T log N 3 2 3 2 T log N
Glass Transition III 9 On the other hand: find piece of length log N/ log 2 in first half exactly complementary to piece in second half ll pairs in this piece contribute pairing energy ε P in unpinched configuration Estimate for pinching free energy: F (N) [ɛ P + 2f 0 (T )] log N/ log 2 Combine two results: 3 2 T [ɛ P + 2f 0 (T )]/ log 2 ɛ P + 2f 0 (T 0) > 0 (for four or more base alphabet next talk) contradiction for T < T RN cannot be in phase for T < T different () phase at low temperatures
Glass Transition IV 10 Properties of the phase (important for folding behavior)? Probe free energy F of low energy, large scale excitations (droplets) F F F (N) N θ θ > 0 θ < 0 no Pinching provides such excitations Just seen: F (N) [ɛ P + 2f 0 (T )] log N/ log 2 Numerically at low temperatures: F (N) a(t ) log N F 20 15 10 T=0.025 Glass very weak 5 0 10 100 1000 10000 For practical purposes no difference between and phase N
Molten-Native Transition I 11 How does structure appear? Start from phase dd bias towards structure (hairpin) 1 N/2 N Use simplified model in spirit of phase description Possible Watson-Crick pairs: 1 1 N 1 1 N N Random sequence with perfect bias N Simplified model only two different interaction energies strong interaction ε 0 + U 0 for base pairs weak interaction ε 0 for all other base pairs U 0 is an effective measure of the bias Model similar to Gō model of protein folding (Gō, J. Stat. Phys. 30, 1983)
Molten-Native Transition II 12 Model can be exactly solved by Laplace transform Phase transition between and phase at finite critical bias U c or at critical temperature T c Phase transition is second order with finite jump in specific heat (α = 0) but large finite size effects possible Calculate fraction of contacts Q Exhibits scaling form and scaling function ( ) T Q N 1/2 Tc g N 1/2 T c (ν = 2) where g(y) y 1 y 1 1 1 y 1 y 1 y 1 g(y) 3 2 1 y 1/y 0 10 5 0 5 10 y Can be numerically verified to apply to randomly chosen RN sequences 1
Conclusions and Outlook I 13 bias for structure temperature RN shows a large variety of interesting behavior RN secondary structure formation is tractable analytically and numerically by methods of statistical mechanics RN secondary structures offer an alter approach to studying a variety of issues of general heteropolymer behavior + + + + +...
Conclusions and Outlook II 14 bias for structure temperature Future work: Understand phase properties analytically - transition more realistic RN models, self-avoidance interactions between several molecules kinetics pseudo-knots tertiary stucture biological applications (RN finding, Huntington s desease)
Biological function of RN 15 Biological functions: Structure ( proteins) Ribosomal RN Transfer RN Information ( DN) Messenger RN single-stranded DN T instead of U more rigid backbone Interplay of structure and information Splicing Ribozymes RN world (origin of life)
Molten Phase in Natural Molecules I 16 pplication to real sequences Use experimentally determined parameters from RN secondary structure prediction which take all energetic details into account Hofacker et al., Monatshefte f. Chemie 125, 1994 Uniform sequences UUUUU... and GCGCGCGCGC... need very long sequences ( 8000 bases, Tsunglin Liu & RB B9.013) Hairpin loops must contain at least 3 bases Loss in binding energy large in hairpin loops
Molten Phase in Natural Molecules II 17 Naturally occuring in human DN: (CG) n with large n Connected with Huntington s disease Hereditary neurodegenerative disease n < 35 normal n > 35 Huntington s disease If n > 35, n usually very large CG codes for Glutamine repeats appear in protein Single-stranded DN can undergo self-binding during replication Biologist s model: only minimal free energy structure competes with single-stranded configuration C G G C C G G C C G C G G C G C C G G C C G G C C G G C
Molten Phase in Natural Molecules III 18 (CG) n can be in phase C G G C C G G C C G C G C G G C G C C G G C C G G C C G G C C G C G G C G C C G G C C G G C C G G C Crossover length 7 bases (Tsunglin Liu & RB B9.013) Molten phase relevant Kinetics possibly important (single molecule?) experiments necessary
Solution of Gō model 19 Partition function: order arbitrary structure by number of contacts Z(N, U 0 ) = + +... + + = W (l) = sum over all ways to place non- bonds W similar to phase partition function expect W (l) l 3/2 Relation between bubble (W ) and full (Z) partition functions Ẑ(µ; U 0 ) = Ŵ (µ) + Ŵ (µ) e (ε 0+U 0 )/T Ŵ (µ) (Laplace domain) + Ŵ (µ) e (ε 0+U 0 )/T Ŵ (µ) e (ε 0+U 0 )/T Ŵ (µ) +... Ẑ(µ; U 0) 1 = Ŵ (µ) 1 e (ε 0+U 0 )/T Exact expression for Ẑ(µ; U 0) Partition function relation Ẑ(µ; U 0) 1 = Ŵ (µ) 1 e ε 0 +U 0 T Boundary condition Z(N, U 0 = 0) = Z 0 (2N) gives Ŵ
Molten-Native Transition III 20 Free energy as a function of the number of total contacts K and the number of contacts Q In the phase t the phase transition In the phase Q Q Q K K K
Molten-Native Transition IV 21 pplicability to heterogeneous sequences verage numerically over many self-complementary random sequences C 2.0 1.5 1.0 N=200 N=400 N=800 N=1600 0.5 Critical temperature found from specific heat 0.0 0.0 0.2 0.4 0.6 0.8 T/ ε + Fraction of contacts vanishes at phase transition Q 0.8 0.6 0.4 0.2 N=200 N=400 N=800 N=1600 Scaling plot confirms power laws predicted in the framework of the Gō-like model QN 1/2 0.0 0.0 0.2 0.4 0.6 0.8 T/ ε + 12 10 8 6 4 L=800 L=400 L=200 L=100 Go 2 0 6 4 2 0 2 4 6 N 1/2 (T C T)/T C
DN Hybridization I 22 Same effects play a role in DN hybridization Hybridization is widely used experimental method in molecular biology Homology detection without sequencing PCR DN chips Sequencing by hybridization Gene expression analysis DN computer Two single stranded DN can form base pairs with each other with themselves Connect ends of the two DN strands in Gedanken task RN structure formation Different applications need different experimental conditions
DN Hybridization II 23 Example: detection of weak homologies Single stranded DN in solution form hybrid, if complementary enough Not complementary enough remain single-stranded Existence of hybrids detected by enzyme Weak homology: need to reduce stringency (e.g. lower temperature) Problem: self-binding instead of hybrid formation Experimentalist has to know which phase is present complementarity hybridized temperature
Branched Polymers 24 RN in phase equivalent to branched polymer Possible forms of branched polymers: B B B B B B B B B B B B... Zimm and Stockmeyer, J. Chem. Phys. 17, 1949 Lubensky et al., J. Physique 41, 1981 Lubensky and Isaacson, Phys. Rev. 20, 1979 Parisi and Sourlas, Phys. Rev. Lett. 49, 1981 ll results without self-avoidance agree with Z N 3/2 e µ 0N and R g N 1/4 multiple branching irrelevant pseudo-knots irrelevant =
Structure Size Scaling 25 Characterize all phases by scaling laws Choose random sequences and calculate for each of them their typical size h numerically Different behavior in all three compact phases: bias 1000 <h> 1 <h>~n <h> 100 N 1 <h>~n 0.7 10 100 1000 N <h>~n 1/2 100 100 <h> 10 N 0.7 <h> 10 N 1/2 1 100 1000 N 1 100 1000 N temperature
Denaturation I 26 Description of denaturation Have to include spatial entropy W (l) l d/2 of loops of l unbound bases l=4 l=4 l=9 l=5 l=4 Hartree equation changes from with G 1 0 = e µ 1 to with G 1 0 = µ + k 2 Ẑ(µ) 1 = G 1 0 (µ) e ε 0/T Ẑ(µ) Ẑ(µ, k) 1 = G 1 0 (µ, k) e ε 0/T Ẑ(µ, k)dk Studied by de Gennes (1968) for RN in three dimensional space no phase transition
Denaturation II 27 Repeat de Gennes calculation for arbitrary d Changing binding strength ε 0 or temperature T leads to no phase transition for d < 4 second order phase transition for 4 < d < 6 first order phase transition for 6 < d For any dimension we should get transition to self-avoiding walk for repulsive interactions What is missing to get phase transition in d = 3? Stacking arbitrarily sharp pseudo-transition Self-avoidance of a single loop: d/2 νd (better but not yet enough) Self-avoidance between different loops Different description necessary in phase, since contacts of secondary structure do not make sense for repulsive interactions D. Moroz and T. Hwa
Two Replicas I 28 Ensemble average Z 2 two replicas which gain energy ε for every common bond ( ) 7 ε Order configurations of 2 replica system by configurations of common bonds Common bonds ( ) form RN structure themselves represents sum over all possible choices of non-common bonds in the two replicas 1 replica l 3/2 2 replicas (l 3/2 ) 2 = l 6/2 effective picture: single RN with 6-dimensional loop entropies
Two Replicas II 29 exactly solvable phase transition at finite ε c ε< ε c : 2 replicas fluctuate independently () ε> ε c : 2 replicas have same contacts () specific heat exponent α = 1 marginally first order transition 1.2 1.0 fraction of common contacts agrees well with Monte Carlo simulations Q 0.8 0.6 Exact solution MC simulation 0.4 Large critical disorder ε c phase stable towards disorder 0.2 0.0 ε C 0 10 20 30 40 exp( ε/t)
Directed Polymer nalogy 30 Direct relation between RN structure formation + +... + + and unbinding of a directed polymer + +... + + Lipowsky, Europhys. Lett. 15, 1991 Different sources of entropy: RN Binding/branching entropy W l 3/2 DP Spatial entropy W l d/2 Same critical behavior as unbinding transition in d = 3 Note: RN interactions long-ranged, DP interactions local
Typical Bias 31 How much bias is needed? Numerical result for toy model: a ground state of a random sequence contains 95% Watson-Crick pairs biologically useful structure must beat this threshold Numerical results for real RN hairpins with different mutation rates p Q 0.8 0.6 0.4 0.2 U. Gerland, RB, and T. Hwa 1 0 T=40 o C, N=200 0 0.2 0.4 0.6 0.8 1 p Has to be compared with natural RN sequences Systematic experiments necessary Evolution has to find very small number of good RN sequences out of a vast amount of /y sequences
32 Statistical mechanics of secondary structures formed by random RN sequences Ralf Bundschuh The Ohio State University Collaborator: Terence Hwa, University of California at San Diego Outline: Introduction to RN RN phase diagram Possible applications supported by DD (RB), Beckman foundation (TH), and NSF (TH)
Introduction I 33 RN is heteropolymer of four different bases G, C,, and U Primary structure: Sequence, e.g., GCGGUUUGCUCGUUGGGGGCGCCGCUGUCUGGGGUCCUGUGUUCGUCCCGUUCGCCC Strongest interaction: Watson-Crick base pairing (G C and U) secondary structure Spatial arrangement tertiary structure (looks locally like DN double helix)
RN phase diagram I 34 Concentrate on secondary structure Questions: What are generic properties of structures formed by random sequences? What are evolution or human RN designers up against? Depends on external parameters C.f., water, ice, and vapor: pressure 1 atm ice water vapor o o 0 C 100 C temperature Three phases with vastly different properties
RN phase diagram II 35 Three important ingredients: thermal fluctuations sequence disorder bias for structure (biology) Possible two parameter phase diagram bias for structure Phases have very different properties temperature : molecule takes biologically meaningful structure : many structures coexist : molecule gets stuck in a random configuration : no structure at all
RN phase diagram III 36 bias for structure Questions: Do all of these phases really exist? temperature How do we recognize which phase is present and when we change from one phase to another? Questions not resolved in spite of 50 years of knowledge of DN structure 10 years of computer simulations Properties of phases for the first time determined mathematically Proof of existence of - phase transition Quantitative characterization of - phase transition
pplications 37 Basic research Physics: basis for understanding of y systems in general (spin es, structural es, protein folding) Biology: basis for understanding evolution of RN sequences Possible practical applications Identification of RN sequences in genomes G C T...TGCGTTC TTGGTGCCCTTGCTTCGGCTGCTTG... G C G C? T T T Quantitative modeling of Huntigton s desease G C C C G G G C C G G C G C G C G C C G C G C G C G C G C G C G C G C G G C G C G C G C G C G C G C Optimization of experimental parameters in DN hybridization