Genetomic Promototypes Mirkó Palla and Dana Pe er Department of Mechanical Engineering Clarkson University Potsdam, New York and Department of Genetics Harvard Medical School 77 Avenue Louis Pasteur Boston, Massachusetts 1
I. Introduction Transcriptional regulation plays a vital role in all living organisms. It influences development, complexity, diversity, homeostasis and other important biological functions (Davidson, 2001). Transcription is the first stage in the universal information flow from genome, where all genetic programs are stored, to proteome, through which these programs are executed. Thus, understanding the complex mechanism behind the control of transcription machinery constitutes one of the fundamental goals of quantitative biology. At the most fundamental level, transcription is controlled by the combinatorial interplay of cis-regulatory elements (or motifs) present in the gene s promoter region 1 and associated regulatory proteins (or transcription factors) present in the cytoplasm (Jacob and Monod, 1961). Because all transcription factors are gene products themselves, this mechanism is regulated by a set of motifs present in the particular gene s promoter. Thus, the elementary principles governing transcription can be understood by a quantitative description of how the motif s influence on gene expression depends on promoter context. In spite of major efforts aimed at identifying motifs in different species using a variety of approaches and analyzing their precise influence on gene expression (McGuire et al., 2000), little is known about the principles by which a gene s motifs translate into an expression level. In other words, quantitative effects of motifs on gene expression as a function of their promoter context is still poorly understood. II. Background Modern molecular biology has brought many new tools to the research scientists as well as an expanding database of genomes and new genes for study. Of particular use in the analysis of these genes is the synthetic promoter region, a 600-1000 base pair nucleotide sequence designed to the specifications of the investigator, which controls the transcription machinery. Synthetic promoters are responsible to control the same product 1 See figure 1 on page 6 for hypothetical gene control mechanism 2
as the gene of interest, but the bioengineered nucleotide sequence regulating that protein may express it differently under various environmental conditions. Designing synthetic promoters by hand is a time-consuming and error-prone process that may involve several computer programs. For this reason, an integrated bioengineering tool (a design software called BASHER) is under development, that combines many modules to provide a platform for high-throughput synthetic promoter region design for multi-kilobase sequences. Of all sequenced genomes, the yeast Saccharomyces cerevisiae has gained the most attention due to the availability of multiple yeast genomes and high quality mrna data. For this reason, this yeast species was chosen as our core model in the genomic analysis. III. Research methodology The power and flexibility of oligonucleotide synthesis is increasingly being recognized in the bioengineering community. Traditional promoter region synthesis applications include facilitation of site-directed mutagenesis, structural analysis and investigation of transcription regulation. The new theory of promoter variant design takes combinational and spacial effects (Beer and Tavazoie, 2004) of cis-binding sites 2 into account and incorporates them into the modeling process. Since binding sites can act as activators or inhibitors and can form modules (set of cis-elements) with linear, epistatic, synergistic or switch effects as result of their interaction, a deep combinatorial analysis is needed to decipher the governing regulatory logic. Previous studies show that there are functional and mechanistic implications of spatial organization of these regulatory elements. There are physical interactions between them as certain transcription factor binding sites overlap, implying the possibility for protein complex formation. Also, in the higher chromatin structure, there are regions of 3-dimensional occlusions blocking protein binding to regulatory motif sequence. Motif positioning relative to transcription start plays a significant role in the transcription regulatory mechanism, so synthetic DNA segment 2 Example of cis-binding sites (motifs) of promoter YCL027W figure 3 on page 7 3
insertions might reveal some functionality. Finally, the distance between cis-elements plays a major role in regulation; certain motif pairs only occur in a particular base pair distance form each other and some pairs occur more frequently then others in the promoter. It was also shown, that motif orientation and order has regulatory effects, i.e., a regulatory module will only influence gene expression in the right spatial combination (orientation, order). To decipher the governing regulatory logic, first combinations of elements must be removed or replaced with new synthetic motif sequences and the resulting gene expression profile can be analyzed under various environmental conditions 3. Furthermore the additional logical design steps should include: randomly moving a binding site to other locations, making small changes to cis-elements or adding new motifs based on new statistical data. These designing steps are performed by BASHER resulting in a set of systematic promoter variants in a high-throughput manner. In the past, researchers used many different programs to address the requirements of the separate steps of synthetic promoter design. Alternatively, they sent off their requirements to a black box provided by a gene synthesis company and let it use its proprietary programs to design nucleotide sequences of interest. To facilitate the use of synthetic promoter regions in both traditional and high-throughput applications, new and more flexible solutions are required. BASHER is a useful tool for investigators who wish to optimize protein expression and/or redesign their promoter of interest for detailed structure/function (Giaever, 2002) studies (e.g., mutagenesis). The objective of this research project is to create a Web-based program that is able to perform all of the functions outlined above for promoter design in a directed, step-wise manner. It accepts as input both ortholog promoter sequences and global transcription factor binding site maps of the organism of interest and allows users to move through the process of design in a series of modules that address practical issues surrounding oligonucleotide design. Users can follow the main design a promoter path or use the modules individually as needed. 3 See figure 2 on page 6 for flow chart of experimental steps 4
IV. References 1. Davidson EH (2001) Genomic Regulatory Systems: Development and Evolution. San Diego: Academic Press 2. Giaever G. et al. (2002) Functional profiling of the Saccharomyces cerevisiae genome. Nature 418: 387 391 3. Jacob F., Monod J. (1961) Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 3: 318 356 4. Beer MA, Tavazoie S. (2004) Predicting gene expression from sequence. Cell 117: 185 198 5. McGuire AM, Hughes JD, Church GM (2000) Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res 10: 744 757 5
V. Figures and tables Figure 1 - Gene control mechanism for gene X Ortholog promoters Expression data PSSMs YFG Basher Cis element map Conditions Promoter variants Figure 2 Flow chart for experimental steps 6
Figure 3 Transcription factor binding sites for promoter YCL027W [Output example of visualization software see more in Manual] 7