An Application of Model Checking to a realistic biological problem: Qualitative Simulation and Model Checking in Genetic Regulatory Networks A presentation of Formal Methods in Biology Justin Hogg justinshogg@gmail.com 4 march 2008
Motivation We have lots of Model Checking machinery. But so far, we don't haven't seen a non-trivial application in biology. The challenge is posing a biological problem in a way that is meaningful and suitable for Model Checking. Further, the questions we can ask the Model Checker should be relevant to the biology. This presentation will review an application to Gene Regulatory Networks using Qualitative Simulation and Model Checking. This method can be easily adapted to any system which can be modeled as a system of piecewiselinear differentential equations.
Quantititive Simulation / Model Checking Literature Analysis of Genetic Regulatory Networks: A Model-Checking Approach. Gregory Batt, Hidde de Jong, Johannes Geiselmann, Michel Page. Pre-print. Qualitative Simulation of Genetic Regulatory Networks Using Piecewise-Linear Models. Hidde de Jong, Celine Hernandez, Michel Page, Tewfik Sari, Johannes Geiselmann. Bulletin of Mathematical Biology (2004). Genetic Network Analyzer: qualitative simulation of genetic regulatory networks. Hidde de Jong, Johannes Geiselmann, Celine Hernandex, Michel Page. Bioinformatics 2003. Proving properties of continuous systems: Qualitative simulation and temporal logic. B. Shults and BJ Kuipers. Artificial Intelligence (1997) Focusing qualitative simulation using temporal logic. Brajnik and Clancy. Annals of Mathematics and Artifical Intelligence (1998). Piecewise Linear Models
Gene Regulatory Networks Central Dogma of Molecular Biology Information flow in the cell: Genes encoded in DNA are transcribed to messenger RNA, which relays the message to Ribosomes that translate the gene into a Protein. Proteins are the Machinery of the cell. BUT, this is only half of the story. A cell must decide which genes to express. This is the subject of Gene Regulation. Regulation happens at many levels: http://faculty.uca.edu/~johnc/central%20dogma.gif Transcription (repressors, promoters) mrna (micro RNAs) Protein (phosphorylation, degradation) Regulation often involves many proteins/genes interaction in complex ways. This is a Gene Regulatory Network.
Regulation at the Transcription Level http://biomserv.univ-lyon1.fr/baobab/images/reg.jpg
The Lac Operon, a Classic Example E. Coli prefers to dine on Glucose. But if it can't find glucose in a small town Lactose will do in a pinch. The Lac Operon encodes Lactose metabolism genes. The Operon is expressed only when glucose is absent and Lactose is present. How is this regulated? lactose? Lactose LacI LacZYA Glucose
A Complex Example: Arabidopsis Flower Development
Modeling Regulatory Networks: Differential Equations let xn be the concentration of protein N x A = x B = x C = k A k 2 2 I 2 C 2 A 2 A 2 B 2 B activator dx/dt k I x k B x k 2 MA x k C x A inhibitor dx/dt B C k 2MB x Step 1. Describe the System using Differential Equations Step 2. Plug into Numerical Solver and see the results! Or use Dynamical analysis to stable fix points, periodic solutions, etc. Problem: Measuring kinetic parameters is a lot of work. In complex systems this may be nearly impossible! x
Experimental Tools: measuring mrna expression Microarray Purpose: Measure relative quantity of mrna templates in a collection of cells. Benefits: 1. Analyze entire genome in one experiment. 2. Generic protocols. Disadvantages: 1. Expensive to perform experiments under many conditions. 2. Difficult to calibrate Quantitative data. 3. Noisy data, Crosshybridization. www.microarray.lu
Experimental Tools: measuring mrna expression Reporter Constructs Purpose: Measure the expression at a single promoter. Benefits: 1. Allows detailed research using precisely targeted experiments. 2. No expensive equipment. Disadvantages: 1. Unique construct required for each promoter. 2. Noisy data. 3. Difficult to calibrate quantitative data. target gene reporter
Experimental Tools: measuring Protein concentration Immunoblot Assays Purpose: Detect quantity of protein in a population of cells. Benefits: 1. Direct measure of protein concentration. 2. No fancy equipment required labelled secondary antibody primary antibody immobilized protein target westernblotting.org Disadvantages: 1. Requires antibodies against each distinct protein. 2. Noisy data. 3. Diffulct to calibrate quantitative data. dot plot
Experimental Tools: measuring Protein concentration DIGE + Mass spec Purpose: Measure differences in protein quantity between two cell populations. Benefits: 1. Whole proteome approach. 2. Easy to compare results from two populations. www.smbs.buffalo.edu/bch/images/figures/dige.jpg Disadvantages: 1. Requires Mass Spec to identify proteins. 2. Not suitable for targeted experiments 3. Difficult to calibrate quantitative data. 2-D gel electrophoresis DIGE gel
Whats the best way to proceed? Facts: Kinetic parameters are usually not available. Protein and mrna data is noisy and qualitative. We need another approach: Boolean Networks? Successfully applied for some Gene regulatory networks. (REF) Feedback loops are a problem. Not applicable when quantity is important. Use training data to find maximum likelihood solutions for DE models? A traditional approach Small, noisy datasets may lead to overfitting. Sloppy Parameter problem: System behavior may be very sensitive to some parameters, and indifferent to others. Requires a noise model.
Qualitative Simulation? Observation: Reliable quantitative data is a rare. But LOTS of qualitative data is available. Idea: Focus on qualitative aspects of the data Think like a Cell Biologist! Protein expression is High, Medium, or Low. Real numbers aren't very useful. Inhibition is Strong or Weak. Exact parameter values aren't interesting. Expression is Activated when a transcription factor is present above a Threshold level. The precise mathematical form isn't important. MODELING PROBLEM! In general, its tough to constrain a DE model using parameter ranges. The possibilities are wide.
Qualitative Simulation in Piecewise Linear Models Restrict the class of ODE models to PiecewiseLinear Equations. dx = f x G x x, dt x 0 x = x 1, x 2,..., x n ' f = f 1, f 2,... f n ' G = diag g 1, g 2,... g n f i x = k il bil x l L k is a rate parameter. b is a boolean function of x, a product of s+ and s- step functions:
Example: Piecewise Linear Models
Qualitative Simulation in Piecewise-Linear Models space Threshold parameters in partition the state into domains. The behavior of the system is linear in each domain! self-inhibition inhibit A self inhibit inhibition B
Qualitative Simulation in Piecewise-Linear Models In each domain, the solution moves monotonically to a stable fix point: j j j j j j D j = 1 / i, 2 / 2,... n / n If the fix point is in the same domain, the solution stays in that domain forever. If the fix point is outside the domain, the solution must leave the domain. Starting from the initial state we can iteratively construct a transistion graph.
Switching Domains are trickier (no details here)
Kripke Structure
Atomic Propositions in Piecewise Linear Models
Atomic Propositions in Piecewise Linear Models
Asking Questions about the Model in CTL Input the Kripke Structure and CTL statement into a Model Checker, such as NuSMV2.
Method Overview
Bacillus Subtilis Gram-positive soil bacteria Able to form a tough, protective endospore which tolerates extreme environmental conditions A model organism: easy to genetically manipulate. Sporulation is a model of cellular differentiation Sporulation: DNA is replicated and a membrane wall known as a spore septum begins to form between it and the rest of the cell. The plasma membrane of the cell surrounds this wall and pinches off to leave a double membrane around the DNA, and the developing structure is now known as a forespore. Calcium dipicolinate is incorporated into the forespore during this time. Next the peptidoglycan cortex forms between the two layers and the bacterium adds a spore coat to the outside of the forespore. source: wikipedia (B.subtilis and Endospore electron micrograph of B. subtilis sporulating B. subtilis (blue)
Sporulation regulatory network in B. subtilis
Key to Regulation Symbols
State Transistion Graph generated by Genetic Network Analyzer
Partial State Transistion Graph for Vegetative Growth Conditions
Qualitative Simulations with Genetic Network Analyzer (software) spo+ spo_
Expressing Experimental Observations in Temporal Logic T1 = early stationary phase hpr expression English: There is a path where hpr expression will increase and then eventually reach a steady state. Temporal Logic: EF sign x hpr =1 EFEG sign x hpr =0