A Brief Introduction to Systems Biology: Gene Regulatory Networks Rajat K. De

Similar documents
Feed Forward Loops in Biological Systems

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

A Mathematical Model of a Synthetically Constructed Genetic Toggle Switch

Understanding the dynamics and function of cellular networks

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Creating Metabolic and Regulatory Network Models using Fuzzy Cognitive Maps

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Bioinformatics: Network Analysis

Recurrent Neural Networks

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Activity 7.21 Transcription factors

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Protein Protein Interaction Networks

Neural network model of gene expression

Qualitative Simulation and Model Checking in Genetic Regulatory Networks

FACULTY OF MEDICAL SCIENCE

How To Understand Enzyme Kinetics

AP Biology Essential Knowledge Student Diagnostic

Boolean Network Models

White Paper. Yeast Systems Biology - Concepts

Application of Graph-based Data Mining to Metabolic Pathways

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Dr Alexander Henzing

Doctor of Philosophy in Computer Science

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Graph theoretic approach to analyze amino acid network

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Chapter 6: Biological Networks

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams.

In developmental genomic regulatory interactions among genes, encoding transcription factors

Basic Scientific Principles that All Students Should Know Upon Entering Medical and Dental School at McGill

Lasso on Categorical Data

Qualitative modeling of biological systems

Genetics Lecture Notes Lectures 1 2

Replication Study Guide

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

How To Understand How Gene Expression Is Regulated

Name: Date: Period: DNA Unit: DNA Webquest

Genetomic Promototypes

Fuzzy Cognitive Map for Software Testing Using Artificial Intelligence Techniques

Translation Study Guide

Healthcare Analytics. Aryya Gangopadhyay UMBC

Qualitative analysis of regulatory networks

13.2 Ribosomes & Protein Synthesis

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Artificial neural networks

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

Neural Networks and Support Vector Machines

Neural Networks in Data Mining

Basic Concepts of DNA, Proteins, Genes and Genomes

Graphical Modeling for Genomic Data

Ph.D. in Bioinformatics and Computational Biology Degree Requirements

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Two Forms of Energy

JOMO KENYATTA UNIVERSITY OF AGRICULTURE AND TECHNOLOGY.

Enzymes: Practice Questions #1

Metabolic Network Analysis

Analysis of gene expression data. Ulf Leser and Philippe Thomas

RNA & Protein Synthesis

Design call center management system of e-commerce based on BP neural network and multifractal

NEUROMATHEMATICS: DEVELOPMENT TENDENCIES. 1. Which tasks are adequate of neurocomputers?

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Molecular Computing Athabasca Hall Sept. 30, 2013

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

Big Data - Lecture 1 Optimization reminders

Cellular Respiration: Practice Questions #1

Quantitative proteomics background

Ms. Campbell Protein Synthesis Practice Questions Regents L.E.

5.1 Bipartite Matching

Course Requirements for the Ph.D., M.S. and Certificate Programs

Gene Expression Assays

Discuss the size of the instance for the minimum spanning tree problem.

Accurate and robust image superresolution by neural processing of local image representations

FACULTY OF MEDICAL SCIENCE

Neural network software tool development: exploring programming language options

Lecture 6. Artificial Neural Networks

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

A greedy algorithm for the DNA sequencing by hybridization with positive and negative errors and information about repetitions

S1 Text. Modeling deterministic single-cell microrna-p53-mdm2 network Figure 2 Figure 2

Web-Based Genomic Information Integration with Gene Ontology

Hidden Markov Models

GenBank, Entrez, & FASTA

Statistics Graduate Courses

CPO Science and the NGSS

Analecta Vol. 8, No. 2 ISSN

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Feedforward Neural Networks and Backpropagation

Course Curriculum for Master Degree in Medical Laboratory Sciences/Clinical Biochemistry

Transcription:

A Brief Introduction to Systems Biology: Gene Regulatory Networks Rajat K. De Machine Intelligence Unit, Indian Statistical Institute 203 B. T. Road Kolkata 700108 email id: rajat@isical.ac.in 1

Informatics related to various molecules: Bioinformatics Genome project proving information on structures of genes Structures of proteins Mapping functional annotation Expression levels of genes 2

Central dogma Garland Science, Molecular Biology of The Cell, 4th Edition

No use of isolated Information/Individual December 2, 2010 5

Information processing Activity of a living system is based on Information stored within the system Information processing mechanism within the system System components must be dynamic in nature Must exist some sort of interactions among the system components and environment Knowledge about an organism = Information + Information processing mechanism December 2, 2010 6

Cell: A huge information processing system! Information contained in genes in the form of relative positions of nucleotides Basic mechanisms for information processing include Gene regulation Metabolism Signal Transduction Protein binding December 2, 2010 7

Cellular information processing: Interactions among cellular components Gene regulation (Indirect) interactions among genes Metabolism Interactions among metabolites, enzymes Signal Transduction Interactions among various proteins and other small molecules Protein binding Interactions among proteins Gene regulatory networks and metabolic pathways Pathway pathway interactions Gene regulatory networks and metabolic pathways Cross talk among pathways December 2, 2010 8

Gene regulation December 2, 2010 9

A hypothetical gene regulatory network

Metabolic pathways/networks December 2, 2010 11

Signal transduction pathways December 2, 2010 12

Protein interaction networks December 2, 2010 13

Pathway Pathway interactions December 2, 2010 14

Interaction among cells December 2, 2010 15

Whole organism = complex network December 2, 2010 16

Systems Biology First described in 1999 by Leroy Hood, President and co founder of the Institute for System Biology, Seattle. Systems Biology: The study of the interactions underlying complex biological processes as integrated systems of many interacting components. 17

Objectives Unlike bioinformatics which focuses on individual molecules, such as sequence of nucleotide acids and amino acids, systems biology focuses on systems that are composed of molecular components and their interactions. Within this context, Understanding of structure of the system, such biochemical pathways, cellular interactions. Understanding of dynamics of the system, both quantitative and qualitative analysis as well as construction of theory/model with powerful prediction capability. Understanding of control methods of the system. Understanding of design methods of the system under the framework, called synthetic biology. Both structure of the system and components plays indispensable role. 18

Systems Biology involves: 1. Collection of large sets of experimental data related to a system. 2. Proposing mathematical models that can account for at least some significant aspects of the collected data set. 3. Providing in silico solution of the mathematical equations to obtain numerical predictions. 4. Assessment of the quality of the model by comparing numerical simulations with the experimental data. 5. Generation of some novel hypothesis which can be validated through experimental laboratories. 19 Continued

Underlying issues/constraints 20

Approach for solving a problem in systems biology: Four key parameters (Kitano (2002), Science) System structure: networks of gene interactions, biochemical pathways System dynamics: behavior of a system over time under various conditions, e.g., time series microarray gene expression data for gene regulatory networks Control method: minimizing malfunctions and providing potential therapeutic targets for treatment of disease Design method: strategies to modify and construct biological systems having desired properties December 2, 2010 21

An example December 2, 2010 22

23

Gene Regulatory Networks Tasks involved: Reverse engineering Based on biological information, e.g., gene expression, known experimental results etc. Analysis of a gene regulatory networks Finding an optimal pathway, testing robustness etc. Effect of gene expression/regulation on other pathways

Reverse engineering gene regulatory networks Expressed genes produce mrna through transcription. mrnas form proteins through translation. Protein is bound on the promoter region of a gene to make it expressed or repressed. This leads to a dependence of expression of a gene on binding protein, and thereby expression of other genes.

Temporal gene expression pattern For a particular tissue, every gene is not expressed, only a subset of them expressed. Due to an external influence, some genes become differentially expressed. Gene expression values over time exhibits a pattern. The objective is to find this temporal pattern.

Approaches for Reverse Engineering Gene Regulatory Networks Kinetic equations: based on differential equations involving rate parameters Boolean: state of a gene either ON or OFF Bayesian networks: based on statistical method Circuit: interpreting genetic control system as a circuit similar to electrical one Artificial neural networks: considering gene interactions as weight matrix

Kinetic model (Yeung et. Al. Proc. National Academy of Sciences, 2002) Assumption: Dynamics of gene expression profile follows linear differential equation, i.e., dx/dt = WX X = nxm gene expression matrix, where n = #genes, m = #time points W = nxn gene interaction matrix, w ij = influence of gene j on gene i w ij > 0 gene j activates gene i w ij < 0 gene j represses gene i w ij = 0 no interaction

Kinetic model contd Using existence theorem of singular value decomposition, X = UDV U = mxm unitary matrix, V = nxn unitary matrix D = mxn diagonal matrix with diagonal elements being singular values of X in decreasing order W = (dx/dt)ud 1 V, D 1 = mxn diagonal matrix with diagonal elements being reciprocal of corresponding elements of D Sparse W: Using L 1 minimization

Boolean Modeling of Genetic Regulatory Networks Each mrna or protein is represented by a node of a network, The interactions between them are encoded as directed edges The state of each node is 1 or 0, according as the corresponding substance is present or not. The states of the nodes can change in time. Choose a time interval or length of a unit timestep that is larger or equal to the duration of all transcription and translation processes Next state of node i is determined by a Boolean function of its state and the states of those nodes that that have edges incident on it.

Methodology Based on Perceptron Model Kim et. al., Genomics, vol. 67, pp. 201 209, 2000 Linear model Using time series data Based on Perceptron model and backpropagation training Confirmed some known interactions and found some additional ones

Hybridization of Artificial Neural Networks and Genetic Algorithms Keedwell and Narayanan, IEEE/ACM Trans. On Computational Biology and Bioinformatics, vol. 2, pp. 231 242, 2005. Using time series gene expression data Artificial neural networks for estimation Genetic algorithms for optimization Tested on various time series gene expression data set including yeast cell cycle data

Based on Recurrent Networks D. haeseleer et. al., Bioinformatics, vol. 16, pp. 707 726, 2000. Measuring rate of change in gene expression Using recurrent networks Input: all regulatory genes, input from kainate level, constant bias term, tissue specific differences in regulation Sigmoid transfer function, proportional decay term Backprogation through time algorithm Pruning Connectivity matrix giving the interactions

Based on Two Feedforward Networks Vohradsky, FASEB J., vol. 15, pp. 846 854, 2001. Using time series gene expression data Using Two feedforward networks: one corresponding to transcription and the other for translation

Recurrent Analog Neural Networks Mjolsness et. al., Journal of Theoretical Biology, vol. 152, pp. 429 453, 1991. Considered ordinary differential equations for rate of change in gene expression in terms of sigmoid function of connectivity matrix and expression values Connectivity matrix determined through training the neural network Gene regulation considered as a combination of cis acting regulation by extended promoter of a gene through transcription complex, and transacting regulation by transcription factor products of other genes For cis acting, one neural network, for trans acting other neural network

Flux Balance Analysis (FBA) A constraint based approach. Assumption steady state of the system. Based on principle of conservation of mass in a network. Utilizes stoichiometric matrix and biologically relevant objective function, such as, maximization of biomass production or minimization of nutrient utilization, on the premise that selection pressures during evolution guide systems towards optimality. Identifies optimal reaction flux distribution. S.v =0 The objective function z = s j = 1 cv 36 Continued

FBA contd 37 (Raman and Nagasuma, Briefings in Bioinformatics (2009))

Analysis of gene regulatory pathways: Determining optimal regulatory pathways (Das, Mukhopadhyay, De, PLoS One, 2010) Input reaction from reaction database Compute the node edge incidence matrix Generate flow vectors Formulate new constraint Generate the optimal regulatory pathway Minimize objective function using gradient descent technique Formulate new objective function

Data generation g denote the expression levels of the genes in the network and f denote the vector of non-linear functions Rate equations indicating the change of expression levels of the genes over time is dg/dt = f(g, u) where u is the set of transcriptional perturbations For small perturbations the non linear system can be approximated by a linear set of equations At steady state dg/dt = Bv Bv 0 We generate p number of random numbers a j, j = 1, 2,..., p and a vector v = p j= 1 a j v bj until certain inequality constraint on v is satisfied for all its components

Formulation of a new constraint All the TFs that are not shown in a system may not be expressed at the required level so that the corresponding target genes may not be expressed / inhibited fully This leads to variation in the concentration of other TFs and hence another constraint can be defined as B.(C.v) = 0 where C is an n n diagonal matrix whose diagonal elements are the components of the vector c. That is, if C = [γ ij ] n n, then γ ij = δ ij c i, where δ ij is the Kronecker delta Thus the optimization problem of determining a gene regulatory pathway yielding maximum expression of the target gene B starting from the initial gene A, reduces to a maximization problem, where z is maximized with respect to c, subject to satisfying the above constraint along with the inequality constraints

Estimation of weighting coefficients c i The reformulated objective function is y = 1/z + Λ T.(B.(C.v)) that needs to be minimized with respect to the weighting factors c i for all i The term Λ = [1, 2,..., m] T is the regularizing parameter c i s are generated through random values in [0, 1] c i s are then modified iteratively by the new learning algorithm incorporating modulus of the second order derivative, where the amount of modification for c i in each iteration is defined as ci = η y / ci Thus the modified value of c i is given by c i (t + 1) = c i (t) + Δc i, i, t = 0, 1, 2,... c i (t + 1) is the value of c i at iteration (t + 1), which is computed based on the c i -value at iteration t

Path diagram of Th Cell Gene Regulatory Network There are 33 reactions and 23 genes in the network. The starting gene is TCR and the target gene is STAT3 The objective function Optimal pathway obtained by the proposed method is v1 v4 v10 v11 v12 v22 v27 v16 v17 v19 v20 v21 shown by bold black arrows EPA method generates the extreme regulatory pathway as v1 v4 v10 v11 v12 v30 v15 v16 v17 v19 v20 v21 shown by bold white arrows

43