Algorithms for genomes 2b Biology introduction Transcription: RNA synthesis, RNA processing, Intron-exon structure Promoters: Regulation, Chromatin structure, CpG islands Dr. Jan M. Kooter Department of Genetics, FALW jan.kooter@falw.vu.nl http://www.bio.vu.nl/genetica/ Protein domains: Modular structure of proteins, exon-shuffling Protein-homologs: paralogues and orthologues Central Dogma Transcription start and termination promoter Start Stop RNA 1
RNA Polymerases Pribnow Box of Promoter: Prokaryote Where does RNA pol bind? Template (Anticoding) Strand >> terminologie 2
Terminator Regions of DNA Initiation of Transcription (prokaryote) - RNA pol. - Sigma factor - Rho-independent vanwege de U s! (Rho-dependent terminators do not have these Us Transcription Overview: prokaryote DNA Transcribed into RNA Some terminology: 5 en 3 Untranslated region (leader, trailer) upstream downstream 3
In prokaryotes: Transcription and Translation are coupled! Not the case in eukaryotes: transport of mrna from nucleus to cytoplasm Transcription start, termination, RNA processing in eukaryotes DNA(gene) promoter pre-mrna Cap transcription Splicing (Spliceosome) mrna Cap (A)n translation Protein Important difference between pro- and eukaryotes is that genes in eukaryotes consist of exons and introns. RNA Polymerase II Transcription-initiationcomplex (many subunits!!) RNA Polymerase II: promoter Regulation Interactions between proteins bound to promoter elements Specific sequences recognized by transcription factors (activators, repressors) 4
Example: Yeast Transcriptional Factor: DNA binding: cis-acting elements DNA is associated/packaged with proteins: Chromatin DNA winds around histone proteins (nucleosomes). Transcription factor DNA (promoter element= cis-acting element) Other proteins wind DNA into more tightly packed form, the chromosome. Unwinding portions of the chromosome is important for mitosis, replication and making RNA. Histone tails extend beyond the nucleosome, and are sites of (mostly) reversible post-translational modification H3 and H4 have been most extensively studied to date Basically: The type of histone modification (mostly acetylation and methylation) and the position of the modified amino acid determines whether a gene will be expressed or not. Transcription factors and associated proteins can modifiy the amino acids in the histone tails. 5
HAT Ac Ac Bromodomain Ac Ac Ac The action of Histone acetyl transferases (HATs) both decondenses chromatin, and provides recognition sites for bromodomain containing proteins (eg. Gcn5) CpG islands: - CpG rich regions (GC% >50%; >200 bp long, often much longer >1000 bp, observed/expected ratio of CpG > 0.6) - Part of a promoter, and actually function as promoter - Current estimate > 50% of the genes in human genome contain a CpG island - CpG are unmethylated - But, when methylated, it leads to transcriptional repression - Several tumors contained methylated CpG islands!!! Example of a CpG island in the Retinoblastoma gene An example of a CpG Island in the Retinoblastoma gene region. The dotted line represents the statistically expected frequency of CpG sites (1/16), while the solid line represents the measured frequency of CpG sites in the 180 kb of DNA sequence that encompass the Rb gene exons and introns. The location of two CpG islands is indicated by arrows. Only the most 5' CpG island corresponds to the promoter of the gene. http://www.mdanderson.org/departments/methylation/ 6
Transcription start, termination, RNA processing in eukaryotes Processing of pre-mrnas RNA polymerase II synthetised RNAs: DNA(gene) promoter transcription - capping (7m-Guanosine addition at 5 of mrna) pre-mrna Cap - poly-adenylation (50-200 Adenosines at 3 of mrna) mrna Cap Splicing (Spliceosome) translation (A)n -splicing (removal of intron(s)): nuclear process >>> mrna in cytoplasma does not contain introns Protein - Important difference between pro- and eukaryotes is that genes in eukaryotes consist of exons and introns. - Introns removed by a process called splicing >> Transport of mrna from nucleus to cytoplasm: Active and regulated process. Messenger RNA 5' Cap Example: Mouse Beta-Globin Gene 7
Nuclear Introns and splicing (removal of intron sequences) Nuclear mrna Intron Removal Rol van U-RNAs: herkenning van exon-intron grenzen en branchpoint - Exon-intron junctions characterized by specific bases sequences Conservation, although there are junctions with different bases! >> Property can be used to identify genes and exon/introns Intron Removal in Nuclear RNA Splicing Gene recognition in the genome - Scan genomic DNA for exon/intron sequences, promoter sequences, open reading frames, etc. -Relevant informations comes from RNA because gene sequences are expressed via RNA Copy DNA sequences ( isolate mrna > cdna > sequence) - compare cdna (no introns!) wih genomic DNA Expressed sequence tags (ESTs, not necessarily complete mrnas, non-coding RNAs) (RNAs > cdna) - compare ESTs with genomic DNA 8
Protein domains and exon-shuffling hypothesis for the assembly and origin of new genes. Many Proteins have a modular structure: functional domains > Each domain has a specific function, and can be shared by different proteins: Some proteins contain multiple copies of a domain. Examples: Modular build-up of proteins: visualized Model: Assembly of different modules into a single protein occurs via exon shuffling at the DNA level: >> 1 or more exons encode for a particular protein domain; By DNA rearrangements or via a RNA, exon sequences can be duplicated and inserted in other genomic sites; for example, in other genes. With this mechanism, it is assumed that new genes are created. - Calmodulin and kinase (enzyme that phosphorylates proteins), are often separate één eiwit. Calcium bindend eiwitdomein en kinase domein veelal gecodeerd door aparte eiwitten: In Arabidopsis en veel andere planten speciefieke domeinen samengebracht. Example 2 distinct proteins Both functions into one protein Calmodulin: Ca-binding Kinase (activated by Calmodulin- Ca + ) Senses [Ca ++ ] Model: by exon-shuffling DNAs combined Not on scale: Principle! 9
Principle exon-shuffling: (Know principle and how to apply in order to explain modular build-up of genes/proteins) Protein / gene homology Definition of important concepts in (comparative) genomics - Homologous genes or homologs: Genes derived from a common ancestral gene. Level of similarity often reflects the time they diverged: >>>> Homologs can be devided in orthologs and paralogs - Orthologous genes or orthologs: Genes are orthologous if their divergence reflects a speciation event. Have similar developmental and physiological functions and very similar in protein sequence Example: alpha-globin (human) < - > alpha-globin (chimpanzee) - Parologous genes or paralogs: Genes are paralogous if their divergence reflects a gene duplication event within a species, and have novel functions. Example: alpha-globin (human) < - > beta-globin (human) < - > myoglobin (human) Gene/Protein Evolution Homologs Common ancestor Common 3D Structure At least some sequence similarity (sequence motifs or more close similarity) > Paralog Derived by duplication > Ortholog Derived by Speciation Anastasia Nikolskaya, Georgetown Univ. Homologs > paralogs + orthologs 10
Orthologs and Paralogs Myo (Hagfish) Hb (Hagfish) Myo (Cod) HbA (Cod) HbB (Cod) Myo (Frog) HbA (Frog) HbB (Frog) Myo (Rat) HbA (Rat) HbB (Rat) Myxinidae Teleostomi Amphibia Craniata Vertebrata Tetrapoda Mammalia Anastasia Nikolskaya, Georgetown Univ Thanks for your attention!! 11