A Computer Scientist s Guide to the Regulatory Genome

Size: px
Start display at page:

Download "A Computer Scientist s Guide to the Regulatory Genome"

Transcription

1 Fundamenta Informaticae 103 (2010) DOI /FI IOS Press A Computer Scientist s Guide to the Regulatory Genome Bartek Wilczyński Institute of Informatics, Warsaw University Banacha 2, Warsaw, Poland and European Molecular Biology Laboratory, Meyerhofstrasse 1, Heidelberg, Germany bartek@mimuw.edu.pl Torgeir R. Hvidsten Umeå Plant Science Centre, Department of Plant Physiology Umeå University, Umeå, Sweden and Computational Life Science Cluster (CLiC), Umeå University, Umeå, Sweden torgeir.hvidsten@plantphys.umu.se Abstract. Recent years have seen a wealth of computational methods applied to problems stemming from molecular biology. In particular, with the completion of many new full genome sequences, great advances have been made in studying the role of non-protein-coding parts of the genome, reshaping our understanding of the role of DNA sequences. Recent breakthroughs in experimental technologies allowing us to inspect the innards of cells on a genomic scale has provided us with unprecedented amounts of data, posing new computational challenges for scientists working to uncover the secrets of life. Due to the binary-like nature of the DNA code and switch-like behavior of many regulatory mechanisms, many of the questions that are currently in focus in biology are surprisingly related to problems that have been of long-term interest to computer scientists. In this review, we present a glimpse into the current state of research in computational methods applied to modeling the regulatory genome. Our aim is to cover current approaches to selected problems from molecular biology that we consider most interesting from the perspective of computer scientists as well as highlight new challenges that will most likely draw the attention of computational biologists in the coming years. Keywords: computational biology, gene regulation, DNA motifs, regulatory elements Address for correspondence: Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden

2 324 B. Wilczyński and T.R. Hvidsten / Guide to the Regulatory Genome 1. Introduction Current research in molecular biology continue to provide inspirations for quantitative scientists. The volumes of data flowing from the work of an ever-growing army of experimental biologists can be overwhelming for even the fastest computers available, making it necessary to use state-of-the-art computational methods and develop new algorithms for data analysis. While statistical data analysis is the most important approach in most cases, there is one field of molecular biology that seems to be particularly interesting for computer scientists. We are referring to the field of regulatory genomics that studies the architecture and function of elements of non-protein-coding DNA sequences involved in regulation of gene expression. Since this review is addressed to computer scientists, we shall first recall some relevant basic facts from molecular biology. We will focus our attention on the genome, which is the total content of the DNA sequences in a given species. Genes are the parts of the genomic sequences encoding for proteins (i.e. protein-coding DNA), the building blocks of a living cell. While DNA are sequences written in a four letter alphabet (four nucleotides), proteins are sequences written in a 20 letter alphabet (20 amino acids). Simplistically speaking, fragments of DNA (i.e. genes) are directly translated into corresponding protein sequences by interpreting each nucleotide triplet as one specific amino acid. While the structure of DNA is relatively homogeneous (i.e. the famous double stranded helix), the structure of a protein is a direct result of the size, shape and chemical properties of its amino acids and varies enormously between different proteins. It is the structure of proteins that in turn determine their functions, much like the shapes of different tools in a mechanical workshop determine their possible uses. And it is this direct path from DNA sequence to protein sequence to protein structure, and ultimately to protein function, that implements the classic effects of genetic events such as mutations and cross-overs on the fitness of organisms, and thus drives the evolution of life. However, while genes and proteins are very important for the possible functions of a cell, they are surprisingly conserved through evolution of different species. For example, the protein catalog of humans is up to 99% identical with that of chimpanzees [22]. Nonetheless, we can clearly see the difference between any individuals from the two species. These differences originate to a large degree from the non-protein-coding parts of the genome, which are also affected by mutations and cross-overs, and that contain regulatory sequences determining the timing and scale of gene transcription (DNA RNA) and translation (RNA Protein). This process by which a gene is used to produce protein is referred to as gene expression. The function of the noncoding sequences is mediated by specific interactions between particular classes of protein and DNA motifs (i.e. relatively short words written in the four letter alphabet of DNA, e.g. TGAT). Some proteins, most notably transcription factors, possess the ability to bind DNA motifs (often referred to as binding sites) and through this binding affect the process of transcription in a localized fashion. Such binding events are usually depending on specific sequence motifs being present in the DNA sequence nearby the gene subject to regulation (the so-called promoter region). In a simplistic, switch-like interpretation of the regulatory genome, transcription factors bind DNA motifs in the non-coding parts of the genome to turn on or off protein-coding genes depending on whether the corresponding proteins are needed in the cell or not. If we accept this simplified description of the aforementioned biological processes, we can make some analogies between the components of regulatory systems and computer systems. In particular, we can think of the protein world as hardware. It contains fixed components based on a very slowly evolving DNA blueprint. It also includes peripheral devices for communication: signal receptors for input and

3 B. Wilczyński and T.R. Hvidsten / Guide to the Regulatory Genome 325 secretory pathways for output. In the same methaphoric way, non-coding sequences can be considered similar to software. They are modular, fast-evolving, and they carry information on how and when the protein hardware should be utilized, and how it should respond to external stimulae. The aim of this review is to spark interest in regulatory genomics among computer scientists. We provide the readers with a few, hopefully interesting, applications of concepts well-known in computer science and that are useful for solving real problems of modern molecular biology. Towards the end, we provide an overview of currently open problems and directions we believe will be of interest to computer scientists attracted to the study of biological phenomena connected to the process of gene regulation. 2. Fishing for Informative Bits: Sequence Motifs Transcription factors (TFs) are regulatory proteins that serve a key role in regulating transcription by possessing the ability to bind DNA at specific sites and homing the transcription initiation machinery to the requested locations. TFs thus determine which genes are going to be transcribed at which time [36]. In this way, TFs can be considered the machines for reading and executing the regulatory code of the genome. In order to fully understand their function, one needs to be able to accurately describe their DNA-reading abilities: namely one needs to know which sequences can be recognized by a given TF and whether there is any difference in the affinity of the TF to different sequences. In bacteria, this can be done quite accurately, because there are usually only a few TFs in a given bacteria and the sequences they recognize are long (> 16 symbols) and strongly conserved both between species and between different binding sites in the same species (not more than 1 error per 8 symbols). In eukaryotes, however, the number of different TFs can be very high (there is an estimated > 2048 TFs in the human genome) and the sequence motifs they recognize can be very short and degenerate (e.g. the motif for the TF called activator protein (AP) is so degenerate that its recognition site occurs by random every 256 nucleotides in any genome). Even though the motifs for different TFs can exhibit large variation both in length and error tolerance, we will show in this section that they share common properties, allowing us to sift these bits of information from a sea of non-informative genomic sequences. Since the experimental techniques to discover sites on the genome that can be recognized by TFs give only approximate positions, we need computational methods to further narrow down the search space in order to find the true recognition sites. To properly define this computational problem, we need to specify the space of acceptable descriptions of sequences recognized by TFs for binding. The range of possible representations is quite broad. Even if we only consider representations which have been shown to successfully capture the biological properties of TF-DNA binding, the possibilities span simple words and regular expressions [16], probability matrices [17] and Bayesian models [33]. The most popular description, however, is to describe the binding specificity as a so-called Position-Specific Scoring Matrix (PSSM) consisting of as many columns as the length of the motif, where each column contains a probability distribution over symbols (4 nucleotides in the case of DNA motifs) for a corresponding position in the binding site. These probabilities are considered independent, which is a serious simplification of the biological reality. Nonetheless, PSSM models have been proven to be very useful in practical applications because of their simplicity and relative ease of estimation from limited amounts of experimental data. In order to fully specify our computational task, we need to specify the cost function that will select the optimal PSSM model with respect to the experimental data available. As indicated in the beginning of this paragraph, such data is usually presented in the form of a number of sub-sequences from the genome

4 326 B. Wilczyński and T.R. Hvidsten / Guide to the Regulatory Genome that are known to be bound by a TF of interest, but that are much longer than the expected motif. Hence we should expect the true motif to be present in most of these sub-sequences. While such constraints are eliminating some of the obviously wrong PSSMs, there is still a need to find the real motif among very many potential motifs. Surprisingly, the best cost function for PSSMs does not come from the data itself, instead it is based on the observation that DNA motifs in fact are messages encoding information about gene regulation. If we think about them as binary codes, we can recall the works of Shannon [31] on information theory. Namely, we should recall that efficient codes should have high information content; this means that if TFs are able to decode information quickly and reliably, the DNA-binding motifs should contain substantial information. Schneider [28] was the first to notice the importance of information content (IC) of sequence motifs, which led to the development of now standard ways of evaluating and presenting motifs. The problem of finding TF binding motifs has in recent years seen an explosive growth of different approaches [35], which all in some way make use of the IC measure, but differ greatly with respect to optimization strategies and the way they treat the original experimental data. Such proliferation of methods can create certain difficulties in the interpretation of results. In particular, if we compare results of different motif finding procedures it is not always clear which of the results represent different variations of the same motif, and which represent qualitatively different motifs (e.g. binding sequence of another TF, biologically related to the one we are interested in). This task can be solved by clustering the results of different motif finding methods [41, 24]. Another, similar problem occurs if we want to compare a newly discovered motif with a database of known TF binding motifs, such as JASPAR [7]. In such situations, again, while there exist multiple different measures for comparing PSSMs, it seems that the common ground of these methods is the use of the IC measure to limit the similarity computation to the most informative columns, as pioneered by the CompareACE method [18]. 3. Building Reliable Modules: Regulatory Elements Information theory can help us to find small regulatory motifs. However, in all organisms but the simplest bacteria, regulation of transcription of any single gene is determined by larger sequence elements containing meaningful combinations of binding motifs that drive the assembly of condition specific TF combinations, and that in turn determine the condition specific switching of gene transcription. These sequences containing several motifs, so called cis-regulatory modules or CRMs, comprise the core of the regulatory system of information processing in higher organisms. Keeping to our software analogies, we can think of these modules as software modules: their job is to combine the atomic bits of information into larger sequences able to reliably perform a given function. Interestingly, due to different reasons, both evolutionary and physical, such regulatory sequences show highly modular architecture [39] and tend to be re-used by different genes in the same species as well as homologous genes across related species. Knowing this presents us with a great opportunity for finding such modules by selecting for motif combinations that tend to be re-used in different contexts. The first approaches to find CRMs date back to more than 10 years ago [40]. The approach was based on finding unusual concentrations of motifs corresponding to TFs involved in a particular developmental process. These results were later verified and generalized to other species [6] while at the same time it was observed that restricting the analysis of regulatory motifs to those which are conserved across species increases significantly the chance of finding a functional binding site [23]. It did not take long

5 B. Wilczyński and T.R. Hvidsten / Guide to the Regulatory Genome 327 for researchers to combine the two approaches and search for clusters of binding sites within highly conserved regions [32]. However, it took much longer to describe the first truly integrated model using both conservation and motif sequence alignment that was applicable to multiple species on a genome scale [14]. The Enhancer Element Locator (EEL) method was using a very elegant binding site alignment method, however, it was dependent on the assumption of exact conservation of the binding site order along the sequence, which was later proven to be a serious simplification of the biology [15]. This assumption was dropped by later approaches (such as [42]) which are able to detect conservation of CRMs with rearranged binding sites. 4. Evaluating Functions: Linking Modules with Gene Expression Knowing that gene regulation is organized in cis-regulatory modules (CRMs), and that these modules tend to be reused by different genes in the genome, can we say something about the function that these modules implement? Thanks to high-throughput technologies that can measure the expression profiles of all genes in a genome over time or in different conditions, we can use machine learning and data mining methods to model the regulatory logic hard-wired in the DNA. By observing the dynamic execution of the system in terms of gene expression (the output) we can learn CRMs (inputs) that agree with the assumption that the underlying function should produce similar output given similar input. Thus, we can discover the CRMs, and in principle also reverse-engineer the underlying function by assuming that genes exhibiting similar expression profiles also contain common CRMs in their promoter regions. Here we will consider two quite different examples separately: microbes that are highly specialized, but robust single-cellular organisms and animals that are enormously complex systems of specialized cells Highly specialized circuits: microbial regulatory systems Microbial gene regulation often imply a relatively simple system where typically one gene correspond to one protein, promoters are short and well defined and regulatory motifs are organized into one CRM per gene. Furthermore, these systems consist of a single cell-type. Thus perturbing the system either indirectly by altering its environment (temperature stress, starvation, etc.) or directly through knocking out genes, results in high quality data that makes it possible to reverse-engineer the function producing the observed response. Yeast has been the main microbial model organism for computational modeling of gene regulation. In a seminal paper, Pilpel et al. [26] showed in 2001 that pairs of genes with the same transcription factor binding sites in their promoters exhibit significantly higher expression similarity than genes sharing only single binding sites. The conclusion was that in order to regulate a large number of processes, and respond to a large number of stress factors, with a relatively small number of transcription factors ( 200), yeast takes extensive use of combinatorial regulation where more than one transcription factor is required to produce a response. Following this relatively simple computational approach, a large number of more advanced, machine learning-based approaches followed [30, 29, 4]. The aim of these studies were to identify non-overlapping sets of genes (gene modules) with common regulatory mechanisms. Segal et al. used Bayesian models and the EM algorithm to iteratively refine initial cluster by re-assigning genes whose promoters did not match the current motif profile of the other genes in the cluster. Beer and Tavazoie took a slightly different approach where a Bayesian network model was used to predict the

6 328 B. Wilczyński and T.R. Hvidsten / Guide to the Regulatory Genome expression profile of genes (defined by a set of fixed clusters) from their promoter content. In both these studies, complex models with many parameters were used to describe the system. Later, Yuan et al. [45] showed that a simpler model based on the naive Bayes classifier obtained similar results. We proposed an alternative approach where rule induction was used to associate sets of binding sites with possibly overlapping clusters of genes characterized by similar expression profiles [19, 44, 1]. As we have seen, several heuristics have been used to model the underlying function taking regulatory sequence to dynamic expression. But how to determine what is the most biologically relevant approach is still open for discussion. In yeast, predicted regulatory mechanisms have been evaluated either against high-throughput interaction measurements between transcription factors and promoters (sometimes difficult to interpret due to their context dependence), low-throughput experimentally confirmed interactions (typically giving anecdotal evidence for one or a few of the predictions) and gene function information. The rationally for the latter is that genes regulated together often participate in the same pathway or biological process, and should therefore be associated with similar functional information in relevant databases [2]. This is often measured by computing the probability that the correspondence between predictions and prior functional knowledge could have occurred by chance (i.e. the p-value) System integration: how to make an animal The regulation of gene expression in multicellular organisms, and animals in particular, is a much more complex process. While at the adult stage, it is frequently assumed that whole tissues behave like populations of homogeneous cells, the developmental processes that give rise to the complex body plans made up of billions of cells all sharing the same genome originating from a single fertilized cell, pose completely new challenges when attempting to understand the regulatory mechanisms. From the very early years, scientists with a background in mathematics were interested in modeling the processes of pattern formation in biology. For example, it is not widely known among computer scientists that the most cited work of Alan Turing is actually his groundbreaking work on modeling self-organizing pattern formation inspired by developmental biology [37]. We now know that the substances he called morphogenes, which were able to diffuse in space and generate different patterns in developing organisms, are in fact TFs, and their function is exerted through regulatory modules. However, in order to fully understand the action of TFs in the developmental context of pattern formation, we need to incorporate an additional step of signal integration. Since the spatio-temporal patterns of gene expression depend on the action of multiple TFs and through multiple CRMs, we need to learn how to assemble complex gene regulatory functions from simpler rules governing the activity of single CRMs [43]. Currently, it is usually assumed that different CRMs can activate genes independently [38], however, there is some experimental evidence of long-range repression mechanisms [8] in developmental contexts that makes the problem of integrating inputs from multiple CRMs more complicated. Once we make the step from single CRMs to gene regulatory functions, we can describe how different TFs affect pattern formation in morphogenesis. Since cells during development make discrete choices concerning their fate (e.g. a cell can be either a muscle cell or a bone cell), Boolean networks are typically chosen as the formalism to describe gene regulatory networks. In this field the pioneering work was done by Stuart Kauffman [20] who showed how we can get new insights into biological phenomena such as homeostasis from computer simulations of randomized Boolean networks. Importantly, these models can be used both to discuss general properties of biological systems such as evolvability and robustness [10]

7 B. Wilczyński and T.R. Hvidsten / Guide to the Regulatory Genome 329 as well as to provide biologists with a formalism to describe particular biological systems such as the segmentation network [27] and predict its behavior under different perturbations. 5. A Look Ahead: Debugging and Code Generation In the present review, we have climbed through multiple levels of abstraction from tiny regulatory motifs carrying atoms of regulatory information to Boolean gene regulatory networks describing phenomena concerning multicellular organisms. Yet, we are still very far from a sufficient understanding of all important aspects of regulatory processes. In particular, two areas of biological research emerge as rich sources of new problems for computational modeling approaches: personal genomics and synthetic biology. Personal genomics aims at understanding how the observable characteristics (phenotypes) are linked to the underlying variability in the genomes of individuals (genotype). Multiple ongoing scientific endeavors, such as the personal genome project [9] and the 1000 genomes project [11], explore the differences between the genetic code of different individuals. While these studies focus on the medical aspects of personal genomics, their results will without doubt influence our understanding of regulatory mechanisms. As we get to know more and more individual genome sequences, we are no longer looking at a single regulatory genome. Keeping to our software analogy, we are in fact confronted not with one regulatory program of a given species but with many imperfect copies of the same program coming from different individuals. We already know about many mutations of the code that lead to buggy programs, i.e. genetic diseases. For example, the Human Gene Mutations Database [34] lists more than 1500 noncoding mutations associated with different diseases. Even though studying these mutations and their role in diseases is of great value for medical applications, the remaining challenge for basic science is to understand the majority of mutations that currently remain unassigned to any known disease. The question is whether these mutations are truly innocuous or maybe their relevance is masked by our incomplete understanding of the regulatory processes. In any case, just as understanding the semantics of a programming language is indispensable for finding bugs in programs written in this language, we should expect that understanding gene regulation will increase our knowledge of the basis of genetic diseases. Synthetic biology looks at regulation from a completely different angle. It s main goal is to create new life forms, but in a much more creative way than current biotechnology that focuses on modifying existing organisms by either deleting or transplanting genes between species. Synthetic biology aims at creating new genes, new regulatory mechanisms and ultimately new organisms [5]. Even though we are far from creating truly new life forms, the first strides have been made by successfully creating simple regulatory circuits in bacteria [12] and eukaryotes [21]. Combining the ability to make simple working circuits in living cells and massive synthesis of different sequences opens many possibilities for testing hypotheses regarding regulatory mechanisms. For example, a novel approach by Patwardhan [25] finds new regulatory motifs by testing large number of random sequences. However, without a better understanding of regulatory systems it is not likely, if at all possible, to scale up these approaches in order to make new kinds of living cells. Nonetheless, a technical milestone to this end was recently reached when Venter et al. successfully transferred a synthetic genome (although virtually identical to that of a natural bacterium) into a new bacterium that, based on it s new genome, started replicating and making proteins [13].

8 330 B. Wilczyński and T.R. Hvidsten / Guide to the Regulatory Genome The system-level approach to modeling biological systems is often referred to as systems biology. The aim is to model the entire biological system by considering its entities (genes, RNAs, proteins and metabolites), not in isolation, but in the context of each other. At the heart of systems biology modeling is gene regulation since it is here that dynamic responses are initiated. Unraveling the hard-coded regulatory logic in the regulatory genome, and identifying the transcription factors that cooperatively bind (synergistically or competitively) the discovered cis-regulatory modules, is a hard computational problem. First and foremost, this requires large amounts of data of high quality. Although molecular biology today is considered a data rich science, the number of measurement points (time points, conditions) is still small compared to the number of variables (e.g. genes). For example, extensive research on network inference from expression data [3] indicate that this is an enormous challenge and that a large number of hard links (e.g. experimentally observed binding of transcription factors to promoters or promoter motifs) is needed as constraint in order to lift the quality of these models to an acceptable level. References [1] Andersson, C. R., Hvidsten, T. R., Isaksson, A., Gustafsson, M. G., Komorowski, J.: Revealing cell cycle control by combining model-based detection of periodic expression with cis-regulatory descriptors, BMC Systems Biology, 1, 2007, 45. [2] Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., Sherlock, G.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nature Genetics, 25(1), 2000, [3] Bansal, M., Belcastro, V., Ambesi-Impiombato, A., di Bernardo, D.: How to infer gene networks from expression profiles, Molecular Systems Biology, 3, 2007, 78. [4] Beer, M. A., Tavazoie, S.: Predicting gene expression from sequence, Cell, 117(2), 2004, [5] Benner, S., Sismour, A.: Synthetic biology, Nature Reviews Genetics, 6(7), 2005, [6] Berman, B. P., Nibu, Y., Pfeiffer, B. D., Tomancak, P., Celniker, S. E., Levine, M., Rubin, G. M., Eisen, M. B.: Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome, Proceedings of the National Academy of Sciences of the United States of America, 99(2), 2002, [7] Bryne, J., Valen, E., Tang, M., Marstrand, T., Winther, O., da Piedade, I., Krogh, A., Lenhard, B., Sandelin, A.: JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update, Nucleic Acids Research, 36, 2008, D102 D106. [8] Cai, H., Arnosti, D., Levine, M.: Long-range repression in the Drosophila embryo, Proceedings of the National Academy of Sciences of the United States of America, 93, 1996, [9] Church, G., et al.: Personal Genome Project, [10] Ciliberti, S., Martin, O., Wagner, A.: Robustness can evolve gradually in complex regulatory gene networks with varying topology, PLoS Computational Biology, 3(2), 2007, e15. [11] Durbin, R., Altshuler, D., McVean, G., Abecasis, G., Brooks, L.: 1000 genomes project, [12] Gardner, T., Cantor, C., Collins, J.: Construction of a genetic toggle switch in Escherichia coli, Nature, 403, 2000,

9 B. Wilczyński and T.R. Hvidsten / Guide to the Regulatory Genome 331 [13] Gibson, D. G., Glass, J. I., Lartigue, C., Noskov, V. N., Chuang, R. Y., Algire, M. A., Benders, G. A., Montague, M. G., Ma, L., Moodie, M. M., Merryman, C., Vashee, S., Krishnakumar, R., Assad-Garcia, N., Andrews-Pfannkoch, C., Denisova, E. A., Young, L., Qi, Z. Q., Segall-Shapiro, T. H., Calvey, C. H., Parmar, P. P., Hutchison, C. A., r., Smith, H. O., Venter, J. C.: Creation of a bacterial cell controlled by a chemically synthesized genome, Science, 329(5987), 2010, [14] Hallikas, O., Palin, K., Sinjushina, N., Rautiainen, R., Partanen, J., Ukkonen, E., Taipale, J.: Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity, Cell, 124(1), 2006, [15] Hare, E., Peterson, B., Iyer, V., Meier, R., Eisen, M.: Sepsid even-skipped enhancers are functionally conserved in Drosophila despite lack of sequence conservation, PLoS Genetics, 4(6), 2008, e [16] van Helden, J.: Regulatory Sequence Analysis Tools, Nucleic Acids Research, 31(13), 2003, [17] Hertz, G. Z., Stormo, G. D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences, Bioinformatics, 15(7-8), 1999, [18] Hughes, J., Estep, P., Tavazoie, S., Church, G.: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae, Journal of molecular biology, 296(5), 2000, [19] Hvidsten, T. R., Wilczynski, B., Kryshtafovych, A., Tiuryn, J., Komorowski, J., Fidelis, K.: Discovering regulatory binding-site modules using rule-based learning, Genome Research, 15(6), 2005, [20] Kauffman, S.: Homeostasis and differentiation in random genetic control networks, Nature, 224(5215), 1969, [21] Kim, J., White, K., Winfree, E.: Construction of an in vitro bistable circuit from synthetic transcriptional switches, Molecular Systems Biology, 2, 2006, 68. [22] King, M., Wilson, A.: Evolution at two levels in humans and chimpanzees, Science, 188(4184), 1975, [23] Loots, G., Ovcharenko, I., Pachter, L., Dubchak, I., Rubin, E.: rvista for comparative sequence-based discovery of functional transcription factor binding sites, Genome Research, 12(5), 2002, [24] Mahony, S., Benos, P.: STAMP: a web tool for exploring DNA-binding motif similarities, Nucleic acids research, 35, 2007, W253 W258. [25] Patwardhan, R., Lee, C., Litvin, O., Young, D., Pe er, D., Shendure, J.: High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis, Nature biotechnology, 27(12), 2009, [26] Pilpel, Y., Sudarsanam, P., Church, G. M.: Identifying regulatory networks by combinatorial analysis of promoter elements, Nature Genetics, 29(2), 2001, [27] Sánchez, L., Chaouiya, C., Thieffry, D.: Segmenting the fly embryo: logical analysis of the role of the segment polarity cross-regulatory module, International Journal of Developmental Biology, 52(8), 2008, [28] Schneider, T., Stephens, R.: Sequence logos: a new way to display consensus sequences, Nucleic Acids Research, 18(20), 1990, [29] Segal, E., Shapira, M., Regev, A., Pe er, D., Botstein, D., Koller, D., Friedman, N.: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data, Nature Genetics, 34(2), 2003, [30] Segal, E., Yelensky, R., Koller, D.: Genome-wide discovery of transcriptional modules from DNA sequence and gene expression, Bioinformatics, 19(Suppl 1), 2003, I273 I282.

10 332 B. Wilczyński and T.R. Hvidsten / Guide to the Regulatory Genome [31] Shannon, C., Petigara, N., Seshasai, S.: A Mathematical Theory of Communication, Bell System Technical Journal, 27, 1948, [32] Sharan, R., Ben-Hur, A., Loots, G., Ovcharenko, I.: CREME: Cis-Regulatory Module Explorer for the human genome, Nucleic acids research, 32, 2004, W253 W256. [33] Sharon, E., Lubliner, S., Segal, E.: A Feature-Based Approach to Modeling ProteinDNA Interactions, PLoS Computational Biology, 4(8), 2008, e [34] Stenson, P., Ball, E., Mort, M., Phillips, A., Shiel, J., Thomas, N., Abeysinghe, S., Krawczak, M., Cooper, D.: Human gene mutation database (HGMD R ): 2003 update, Human mutation, 21(6), 2003, [35] Tompa, M., Li, N., Bailey, T. L., Church, G. M., De Moor, B., Eskin, E., Favorov, A. V., Frith, M. C., Fu, Y., Kent, W. J., Makeev, V. J., Mironov, A. A., Noble, W. S., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites, Nature Biotechnology, 23(1), 2005, [36] Tsonis, P.: Anatomy of gene regulation: A three-dimensional structural analysis, Garland Publishing, [37] Turing, A. M.: The chemical basis of morphogenesis, Philosophical Transactions of the Royal Society of London, 237(641), 1952, [38] Visel, A., Akiyama, J., Shoukry, M., Afzal, V., Rubin, E., Pennacchio, L.: Functional autonomy of distantacting human enhancers, Genomics, 93(6), 2009, [39] Wasserman, W., Sandelin, A.: Applied bioinformatics for the identification of regulatory elements, Nature Reviews Genetics, 5(4), 2004, [40] Wasserman, W. W., Fickett, J. W.: Identification of regulatory regions which confer muscle-specific gene expression, Journal of Molecular Biology, 278(1), 1998, [41] Wilczynski, B., Darzynkiewicz, M., Tiuryn, J.: MEMOFinder: combining de novo motif prediction methods with a database of known motifs, Nature Precedings, 2008, Available from [42] Wilczynski, B., Dojer, N., Patelak, M., Tiuryn, J.: Finding evolutionarily conserved cis-regulatory modules with a universal set of motifs, BMC bioinformatics, 10, 2009, 82. [43] Wilczynski, B., Furlong, E.: Challenges for modeling global gene regulatory networks during development: Insights from Drosophila, Developmental Biology, 340(2), 2010, [44] Wilczynski, B., Hvidsten, T. R., Kryshtafovych, A., Tiuryn, J., Komorowski, J., Fidelis, K.: Using local gene expression similarities to discover regulatory binding site modules, BMC Bioinformatics, 7, 2006, 505. [45] Yuan, Y., Guo, L., Shen, L., Liu, J. S.: Predicting gene expression from sequence: a reexamination, PLoS Computational Biology, 3(11), 2007, e243.

Genetomic Promototypes

Genetomic Promototypes Genetomic Promototypes Mirkó Palla and Dana Pe er Department of Mechanical Engineering Clarkson University Potsdam, New York and Department of Genetics Harvard Medical School 77 Avenue Louis Pasteur Boston,

More information

Understanding the dynamics and function of cellular networks

Understanding the dynamics and function of cellular networks Understanding the dynamics and function of cellular networks Cells are complex systems functionally diverse elements diverse interactions that form networks signal transduction-, gene regulatory-, metabolic-

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

Current Motif Discovery Tools and their Limitations

Current Motif Discovery Tools and their Limitations Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.

More information

COMPUTATIONAL FRAMEWORKS FOR UNDERSTANDING THE FUNCTION AND EVOLUTION OF DEVELOPMENTAL ENHANCERS IN DROSOPHILA

COMPUTATIONAL FRAMEWORKS FOR UNDERSTANDING THE FUNCTION AND EVOLUTION OF DEVELOPMENTAL ENHANCERS IN DROSOPHILA COMPUTATIONAL FRAMEWORKS FOR UNDERSTANDING THE FUNCTION AND EVOLUTION OF DEVELOPMENTAL ENHANCERS IN DROSOPHILA Saurabh Sinha, Dept of Computer Science, University of Illinois Cis-regulatory modules (enhancers)

More information

Basic Concepts of DNA, Proteins, Genes and Genomes

Basic Concepts of DNA, Proteins, Genes and Genomes Basic Concepts of DNA, Proteins, Genes and Genomes Kun-Mao Chao 1,2,3 1 Graduate Institute of Biomedical Electronics and Bioinformatics 2 Department of Computer Science and Information Engineering 3 Graduate

More information

The world of non-coding RNA. Espen Enerly

The world of non-coding RNA. Espen Enerly The world of non-coding RNA Espen Enerly ncrna in general Different groups Small RNAs Outline mirnas and sirnas Speculations Common for all ncrna Per def.: never translated Not spurious transcripts Always/often

More information

AP Biology Essential Knowledge Student Diagnostic

AP Biology Essential Knowledge Student Diagnostic AP Biology Essential Knowledge Student Diagnostic Background The Essential Knowledge statements provided in the AP Biology Curriculum Framework are scientific claims describing phenomenon occurring in

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

Genetics Lecture Notes 7.03 2005. Lectures 1 2

Genetics Lecture Notes 7.03 2005. Lectures 1 2 Genetics Lecture Notes 7.03 2005 Lectures 1 2 Lecture 1 We will begin this course with the question: What is a gene? This question will take us four lectures to answer because there are actually several

More information

Structure and Function of DNA

Structure and Function of DNA Structure and Function of DNA DNA and RNA Structure DNA and RNA are nucleic acids. They consist of chemical units called nucleotides. The nucleotides are joined by a sugar-phosphate backbone. The four

More information

13.4 Gene Regulation and Expression

13.4 Gene Regulation and Expression 13.4 Gene Regulation and Expression Lesson Objectives Describe gene regulation in prokaryotes. Explain how most eukaryotic genes are regulated. Relate gene regulation to development in multicellular organisms.

More information

Human Genome and Human Genome Project. Louxin Zhang

Human Genome and Human Genome Project. Louxin Zhang Human Genome and Human Genome Project Louxin Zhang A Primer to Genomics Cells are the fundamental working units of every living systems. DNA is made of 4 nucleotide bases. The DNA sequence is the particular

More information

Translation Study Guide

Translation Study Guide Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to

More information

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!! DNA Replication & Protein Synthesis This isn t a baaaaaaaddd chapter!!! The Discovery of DNA s Structure Watson and Crick s discovery of DNA s structure was based on almost fifty years of research by other

More information

Name Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Name Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d. 13 Multiple Choice RNA and Protein Synthesis Chapter Test A Write the letter that best answers the question or completes the statement on the line provided. 1. Which of the following are found in both

More information

Biological Sciences Initiative. Human Genome

Biological Sciences Initiative. Human Genome Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.

More information

The Making of the Fittest: Evolving Switches, Evolving Bodies

The Making of the Fittest: Evolving Switches, Evolving Bodies OVERVIEW MODELING THE REGULATORY SWITCHES OF THE PITX1 GENE IN STICKLEBACK FISH This hands-on activity supports the short film, The Making of the Fittest:, and aims to help students understand eukaryotic

More information

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland

More information

A Mathematical Model of a Synthetically Constructed Genetic Toggle Switch

A Mathematical Model of a Synthetically Constructed Genetic Toggle Switch BENG 221 Mathematical Methods in Bioengineering Project Report A Mathematical Model of a Synthetically Constructed Genetic Toggle Switch Nick Csicsery & Ricky O Laughlin October 15, 2013 1 TABLE OF CONTENTS

More information

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains Proteins From DNA to Protein Chapter 13 All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequence of a gene The Path From Genes

More information

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

Dr Alexander Henzing

Dr Alexander Henzing Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander

More information

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism )

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Biology 1406 Exam 3 Notes Structure of DNA Ch. 10 Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Proteins

More information

In developmental genomic regulatory interactions among genes, encoding transcription factors

In developmental genomic regulatory interactions among genes, encoding transcription factors JOURNAL OF COMPUTATIONAL BIOLOGY Volume 20, Number 6, 2013 # Mary Ann Liebert, Inc. Pp. 419 423 DOI: 10.1089/cmb.2012.0297 Research Articles A New Software Package for Predictive Gene Regulatory Network

More information

Chapter 6: Biological Networks

Chapter 6: Biological Networks Chapter 6: Biological Networks 6.4 Engineering Synthetic Networks Prof. Yechiam Yemini (YY) Computer Science Department Columbia University Overview Constructing regulatory gates A genetic toggle switch;

More information

Feed Forward Loops in Biological Systems

Feed Forward Loops in Biological Systems Feed Forward Loops in Biological Systems Dr. M. Vijayalakshmi School of Chemical and Biotechnology SASTRA University Joint Initiative of IITs and IISc Funded by MHRD Page 1 of 7 Table of Contents 1 INTRODUCTION...

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge

More information

RNA & Protein Synthesis

RNA & Protein Synthesis RNA & Protein Synthesis Genes send messages to cellular machinery RNA Plays a major role in process Process has three phases (Genetic) Transcription (Genetic) Translation Protein Synthesis RNA Synthesis

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,

More information

Cancer Genomics: What Does It Mean for You?

Cancer Genomics: What Does It Mean for You? Cancer Genomics: What Does It Mean for You? The Connection Between Cancer and DNA One person dies from cancer each minute in the United States. That s 1,500 deaths each day. As the population ages, this

More information

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE QUALITY OF BIOTECHNOLOGICAL PRODUCTS: ANALYSIS

More information

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, 2.36 Email: b.patel@griffith.edu.

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, 2.36 Email: b.patel@griffith.edu. Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION Professor Bharat Patel Office: Science 2, 2.36 Email: b.patel@griffith.edu.au What is Gene Expression & Gene Regulation? 1. Gene Expression

More information

Human Genome Organization: An Update. Genome Organization: An Update

Human Genome Organization: An Update. Genome Organization: An Update Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion

More information

1 Mutation and Genetic Change

1 Mutation and Genetic Change CHAPTER 14 1 Mutation and Genetic Change SECTION Genes in Action KEY IDEAS As you read this section, keep these questions in mind: What is the origin of genetic differences among organisms? What kinds

More information

Activity 7.21 Transcription factors

Activity 7.21 Transcription factors Purpose To consolidate understanding of protein synthesis. To explain the role of transcription factors and hormones in switching genes on and off. Play the transcription initiation complex game Regulation

More information

Positive Feedback and Bistable Systems. Copyright 2008: Sauro

Positive Feedback and Bistable Systems. Copyright 2008: Sauro Positive Feedback and Bistable Systems 1 Copyright 2008: Sauro Non-Hysteretic Switches; Ultrasensitivity; Memoryless Switches Output These systems have no memory, that is, once the input signal is removed,

More information

Chapter 5: Organization and Expression of Immunoglobulin Genes

Chapter 5: Organization and Expression of Immunoglobulin Genes Chapter 5: Organization and Expression of Immunoglobulin Genes I. Genetic Model Compatible with Ig Structure A. Two models for Ab structure diversity 1. Germ-line theory: maintained that the genome contributed

More information

TITLE MOTIVATION OBJECTIVES AUDIENCE COURSE INSTRUCTORS. Analysis of regulatory sequences controlling the expression of gene networks

TITLE MOTIVATION OBJECTIVES AUDIENCE COURSE INSTRUCTORS. Analysis of regulatory sequences controlling the expression of gene networks TITLE Analysis of regulatory sequences controlling the expression of gene networks MOTIVATION Functional genomics techniques are defining sets of genes likely to act in concert. From expression profiles,

More information

RAP: Accurate and fast motif finding based on protein binding microarray data

RAP: Accurate and fast motif finding based on protein binding microarray data RAP: Accurate and fast motif finding based on protein binding microarray data Yaron Orenstein 1, Eran Mick 1,2 and Ron Shamir 1 * 1 Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv,

More information

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:

More information

Web-Based Genomic Information Integration with Gene Ontology

Web-Based Genomic Information Integration with Gene Ontology Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

Localised Sex, Contingency and Mutator Genes. Bacterial Genetics as a Metaphor for Computing Systems

Localised Sex, Contingency and Mutator Genes. Bacterial Genetics as a Metaphor for Computing Systems Localised Sex, Contingency and Mutator Genes Bacterial Genetics as a Metaphor for Computing Systems Outline Living Systems as metaphors Evolutionary mechanisms Mutation Sex and Localized sex Contingent

More information

An Overview of Cells and Cell Research

An Overview of Cells and Cell Research An Overview of Cells and Cell Research 1 An Overview of Cells and Cell Research Chapter Outline Model Species and Cell types Cell components Tools of Cell Biology Model Species E. Coli: simplest organism

More information

Doctor of Philosophy in Computer Science

Doctor of Philosophy in Computer Science Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects

More information

Probabilistic methods for post-genomic data integration

Probabilistic methods for post-genomic data integration Probabilistic methods for post-genomic data integration Dirk Husmeier Biomathematics & Statistics Scotland (BioSS) JMB, The King s Buildings, Edinburgh EH9 3JZ United Kingdom http://wwwbiossacuk/ dirk

More information

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

More information

Quantitative and Qualitative Systems Biotechnology: Analysis Needs and Synthesis Approaches

Quantitative and Qualitative Systems Biotechnology: Analysis Needs and Synthesis Approaches Quantitative and Qualitative Systems Biotechnology: Analysis Needs and Synthesis Approaches Vassily Hatzimanikatis Department of Chemical Engineering Northwestern University Current knowledge of biological

More information

GENE REGULATION. Teacher Packet

GENE REGULATION. Teacher Packet AP * BIOLOGY GENE REGULATION Teacher Packet AP* is a trademark of the College Entrance Examination Board. The College Entrance Examination Board was not involved in the production of this material. Pictures

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

Chapter 6 DNA Replication

Chapter 6 DNA Replication Chapter 6 DNA Replication Each strand of the DNA double helix contains a sequence of nucleotides that is exactly complementary to the nucleotide sequence of its partner strand. Each strand can therefore

More information

How To Understand How Gene Expression Is Regulated

How To Understand How Gene Expression Is Regulated What makes cells different from each other? How do cells respond to information from environment? Regulation of: - Transcription - prokaryotes - eukaryotes - mrna splicing - mrna localisation and translation

More information

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in DNA, RNA, Protein Synthesis Keystone 1. During the process shown above, the two strands of one DNA molecule are unwound. Then, DNA polymerases add complementary nucleotides to each strand which results

More information

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006 Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

CCR Biology - Chapter 9 Practice Test - Summer 2012

CCR Biology - Chapter 9 Practice Test - Summer 2012 Name: Class: Date: CCR Biology - Chapter 9 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Genetic engineering is possible

More information

About the Author. The Role of Artificial Intelligence in Software Engineering. Brief History of AI. Introduction 2/27/2013

About the Author. The Role of Artificial Intelligence in Software Engineering. Brief History of AI. Introduction 2/27/2013 About the Author The Role of Artificial Intelligence in Software Engineering By: Mark Harman Presented by: Jacob Lear Mark Harman is a Professor of Software Engineering at University College London Director

More information

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

More information

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34 Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search

More information

European Medicines Agency

European Medicines Agency European Medicines Agency July 1996 CPMP/ICH/139/95 ICH Topic Q 5 B Quality of Biotechnological Products: Analysis of the Expression Construct in Cell Lines Used for Production of r-dna Derived Protein

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977

More information

Unit I: Introduction To Scientific Processes

Unit I: Introduction To Scientific Processes Unit I: Introduction To Scientific Processes This unit is an introduction to the scientific process. This unit consists of a laboratory exercise where students go through the QPOE2 process step by step

More information

White Paper. Yeast Systems Biology - Concepts

White Paper. Yeast Systems Biology - Concepts White Paper Yeast Systems Biology - Concepts Stefan Hohmann, Jens Nielsen, Hiroaki Kitano (see for further contributers at end of text) Göteborg, Lyngby, Tokyo, February 2004 1 Executive Summary Systems

More information

INSECT: In silico search for co-occurring transcription factors

INSECT: In silico search for co-occurring transcription factors Bioinformatics Advance Access published September 4, 2013 INSECT: In silico search for co-occurring transcription factors Cristian O. Rohr 1, R. Gonzalo Parra 2, Patricio Yankilevich 3 and Carolina Perez-Castro

More information

Control of Gene Expression

Control of Gene Expression Home Gene Regulation Is Necessary? Control of Gene Expression By switching genes off when they are not needed, cells can prevent resources from being wasted. There should be natural selection favoring

More information

1.5 page 3 DNA Replication S. Preston 1

1.5 page 3 DNA Replication S. Preston 1 AS Unit 1: Basic Biochemistry and Cell Organisation Name: Date: Topic 1.5 Nucleic Acids and their functions Page 3 l. DNA Replication 1. Go through PowerPoint 2. Read notes p2 and then watch the animation

More information

Control of Gene Expression

Control of Gene Expression Control of Gene Expression What is Gene Expression? Gene expression is the process by which informa9on from a gene is used in the synthesis of a func9onal gene product. What is Gene Expression? Figure

More information

Plant Growth & Development. Growth Stages. Differences in the Developmental Mechanisms of Plants and Animals. Development

Plant Growth & Development. Growth Stages. Differences in the Developmental Mechanisms of Plants and Animals. Development Plant Growth & Development Plant body is unable to move. To survive and grow, plants must be able to alter its growth, development and physiology. Plants are able to produce complex, yet variable forms

More information

MCAS Biology. Review Packet

MCAS Biology. Review Packet MCAS Biology Review Packet 1 Name Class Date 1. Define organic. THE CHEMISTRY OF LIFE 2. All living things are made up of 6 essential elements: SPONCH. Name the six elements of life. S N P C O H 3. Elements

More information

MORPHEUS. http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.

MORPHEUS. http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix. MORPHEUS http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix. Reference: MORPHEUS, a Webtool for Transcripton Factor Binding Analysis Using

More information

Comparing Methods for Identifying Transcription Factor Target Genes

Comparing Methods for Identifying Transcription Factor Target Genes Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF

More information

Sickle cell anemia: Altered beta chain Single AA change (#6 Glu to Val) Consequence: Protein polymerizes Change in RBC shape ---> phenotypes

Sickle cell anemia: Altered beta chain Single AA change (#6 Glu to Val) Consequence: Protein polymerizes Change in RBC shape ---> phenotypes Protein Structure Polypeptide: Protein: Therefore: Example: Single chain of amino acids 1 or more polypeptide chains All polypeptides are proteins Some proteins contain >1 polypeptide Hemoglobin (O 2 binding

More information

How To Understand The Human Body

How To Understand The Human Body Introduction to Biology and Chemistry Outline I. Introduction to biology A. Definition of biology - Biology is the study of life. B. Characteristics of Life 1. Form and size are characteristic. e.g. A

More information

Genetics Module B, Anchor 3

Genetics Module B, Anchor 3 Genetics Module B, Anchor 3 Key Concepts: - An individual s characteristics are determines by factors that are passed from one parental generation to the next. - During gamete formation, the alleles for

More information

GA as a Data Optimization Tool for Predictive Analytics

GA as a Data Optimization Tool for Predictive Analytics GA as a Data Optimization Tool for Predictive Analytics Chandra.J 1, Dr.Nachamai.M 2,Dr.Anitha.S.Pillai 3 1Assistant Professor, Department of computer Science, Christ University, Bangalore,India, chandra.j@christunivesity.in

More information

Transcription and Translation of DNA

Transcription and Translation of DNA Transcription and Translation of DNA Genotype our genetic constitution ( makeup) is determined (controlled) by the sequence of bases in its genes Phenotype determined by the proteins synthesised when genes

More information

BioBoot Camp Genetics

BioBoot Camp Genetics BioBoot Camp Genetics BIO.B.1.2.1 Describe how the process of DNA replication results in the transmission and/or conservation of genetic information DNA Replication is the process of DNA being copied before

More information

The Cell Teaching Notes and Answer Keys

The Cell Teaching Notes and Answer Keys The Cell Teaching Notes and Answer Keys Subject area: Science / Biology Topic focus: The Cell: components, types of cells, organelles, levels of organization Learning Aims: describe similarities and differences

More information

Gene Models & Bed format: What they represent.

Gene Models & Bed format: What they represent. GeneModels&Bedformat:Whattheyrepresent. Gene models are hypotheses about the structure of transcripts produced by a gene. Like all models, they may be correct, partly correct, or entirely wrong. Typically,

More information

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1 Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat

More information

How many of you have checked out the web site on protein-dna interactions?

How many of you have checked out the web site on protein-dna interactions? How many of you have checked out the web site on protein-dna interactions? Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail. Find and be ready to discuss

More information

Genome and DNA Sequence Databases. BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009

Genome and DNA Sequence Databases. BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009 Genome and DNA Sequence Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009 Admin Reading: Chapters 1 & 2 Notes available in PDF format on-line (see class calendar page): http://www.soe.ucsc.edu/classes/bme110/spring09/bme110-calendar.html

More information

14.3 Studying the Human Genome

14.3 Studying the Human Genome 14.3 Studying the Human Genome Lesson Objectives Summarize the methods of DNA analysis. State the goals of the Human Genome Project and explain what we have learned so far. Lesson Summary Manipulating

More information

Bob Jesberg. Boston, MA April 3, 2014

Bob Jesberg. Boston, MA April 3, 2014 DNA, Replication and Transcription Bob Jesberg NSTA Conference Boston, MA April 3, 2014 1 Workshop Agenda Looking at DNA and Forensics The DNA, Replication i and Transcription i Set DNA Ladder The Double

More information

CPO Science and the NGSS

CPO Science and the NGSS CPO Science and the NGSS It is no coincidence that the performance expectations in the Next Generation Science Standards (NGSS) are all action-based. The NGSS champion the idea that science content cannot

More information

Principles of Evolution - Origin of Species

Principles of Evolution - Origin of Species Theories of Organic Evolution X Multiple Centers of Creation (de Buffon) developed the concept of "centers of creation throughout the world organisms had arisen, which other species had evolved from X

More information

FINDING RELATION BETWEEN AGING AND

FINDING RELATION BETWEEN AGING AND FINDING RELATION BETWEEN AGING AND TELOMERE BY APRIORI AND DECISION TREE Jieun Sung 1, Youngshin Joo, and Taeseon Yoon 1 Department of National Science, Hankuk Academy of Foreign Studies, Yong-In, Republic

More information

DNA and the Cell. Version 2.3. English version. ELLS European Learning Laboratory for the Life Sciences

DNA and the Cell. Version 2.3. English version. ELLS European Learning Laboratory for the Life Sciences DNA and the Cell Anastasios Koutsos Alexandra Manaia Julia Willingale-Theune Version 2.3 English version ELLS European Learning Laboratory for the Life Sciences Anastasios Koutsos, Alexandra Manaia and

More information

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript

More information

National Education Technology Standards

National Education Technology Standards National Education Technology Standards Objectives Satisfied by Each Deliverable in the Program 1 Basic operations and concept Students demonstrate a sound understanding of the nature and operation of

More information

Essentials of Human Anatomy & Physiology 11 th Edition, 2015 Marieb

Essentials of Human Anatomy & Physiology 11 th Edition, 2015 Marieb A Correlation of Essentials of Human Anatomy Marieb To the Next Generation Science Standards Life A Correlation of, HS-LS1 From Molecules to Organisms: Structures and Processes HS-LS1-1. Construct an explanation

More information

CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA

CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA INTRODUCTION DNA : DNA is deoxyribose nucleic acid. It is made up of a base consisting of sugar, phosphate and one nitrogen base.the

More information

Next Generation Sequencing: Technology, Mapping, and Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took

More information

Qualitative modeling of biological systems

Qualitative modeling of biological systems Qualitative modeling of biological systems The functional form of regulatory relationships and kinetic parameters are often unknown Increasing evidence for robustness to changes in kinetic parameters.

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Network Motifs COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Recall Not all subgraphs occur with equal frequency Motifs are subgraphs that

More information

A greedy algorithm for the DNA sequencing by hybridization with positive and negative errors and information about repetitions

A greedy algorithm for the DNA sequencing by hybridization with positive and negative errors and information about repetitions BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES, Vol. 59, No. 1, 2011 DOI: 10.2478/v10175-011-0015-0 Varia A greedy algorithm for the DNA sequencing by hybridization with positive and negative

More information