DNA as a Mass-Storage Device: 1



Similar documents
Transcription and Translation of DNA

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, b.patel@griffith.edu.

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Translation Study Guide

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

Basic Concepts of DNA, Proteins, Genes and Genomes

The world of non-coding RNA. Espen Enerly

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

Chapter 18 Regulation of Gene Expression

Protein Synthesis How Genes Become Constituent Molecules

Complex multicellular organisms are produced by cells that switch genes on and off during development.

Structure and Function of DNA

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Gene Transcription in Prokaryotes

Control of Gene Expression

Chapter 5: Organization and Expression of Immunoglobulin Genes

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams.

Molecular Genetics. RNA, Transcription, & Protein Synthesis

Lecture Series 7. From DNA to Protein. Genotype to Phenotype. Reading Assignments. A. Genes and the Synthesis of Polypeptides

a. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled

Sample Questions for Exam 3

The Nucleus: DNA, Chromatin And Chromosomes

GENE REGULATION. Teacher Packet

Control of Gene Expression

Gene Models & Bed format: What they represent.

How To Understand How Gene Expression Is Regulated

Transcription in prokaryotes. Elongation and termination

AP BIOLOGY 2009 SCORING GUIDELINES

Biological Sciences Initiative. Human Genome

MUTATION, DNA REPAIR AND CANCER

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

4. DNA replication Pages: Difficulty: 2 Ans: C Which one of the following statements about enzymes that interact with DNA is true?

PRACTICE TEST QUESTIONS

PrimePCR Assay Validation Report

To be able to describe polypeptide synthesis including transcription and splicing

RNA & Protein Synthesis

1 Mutation and Genetic Change

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Human Genome and Human Genome Project. Louxin Zhang

RNA: Transcription and Processing

mirnaselect pep-mir Cloning and Expression Vector

Control of Gene Expression

13.4 Gene Regulation and Expression

Introduction to Genome Annotation

restriction enzymes 350 Home R. Ward: Spring 2001

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

Sickle cell anemia: Altered beta chain Single AA change (#6 Glu to Val) Consequence: Protein polymerizes Change in RBC shape ---> phenotypes

Central Dogma. Lecture 10. Discussing DNA replication. DNA Replication. DNA mutation and repair. Transcription

Searching Nucleotide Databases

Next Generation Sequencing

Activity 7.21 Transcription factors

DNA, RNA, Protein synthesis, and Mutations. Chapters

Transfection-Transfer of non-viral genetic material into eukaryotic cells. Infection/ Transduction- Transfer of viral genetic material into cells.

Transcription: RNA Synthesis, Processing & Modification

13.2 Ribosomes & Protein Synthesis

The Steps. 1. Transcription. 2. Transferal. 3. Translation

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

From DNA to Protein

Description: Molecular Biology Services and DNA Sequencing

European Medicines Agency

Replication Study Guide

PrimePCR Assay Validation Report

Translation. Translation: Assembly of polypeptides on a ribosome

Genomes and SNPs in Malaria and Sickle Cell Anemia

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Mitochondrial DNA Analysis

Specific problems. The genetic code. The genetic code. Adaptor molecules match amino acids to mrna codons

CCR Biology - Chapter 8 Practice Test - Summer 2012

somatic cell egg genotype gamete polar body phenotype homologous chromosome trait dominant autosome genetics recessive

Problem Set 3 KEY

Special report. Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006

Problem Set 1 KEY

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

Genetics Lecture Notes Lectures 1 2

BCOR101 Midterm II Wednesday, October 26, 2005

Error Tolerant Searching of Uninterpreted MS/MS Data

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

AP BIOLOGY 2007 SCORING GUIDELINES

Provincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme.

Micro RNAs: potentielle Biomarker für das. Blutspenderscreening

Next Generation Sequencing: Technology, Mapping, and Analysis

Lab # 12: DNA and RNA

Outline. interfering RNA - What is dat? Brief history of RNA interference. What does it do? How does it work?

Plant Growth & Development. Growth Stages. Differences in the Developmental Mechanisms of Plants and Animals. Development

Coding sequence the sequence of nucleotide bases on the DNA that are transcribed into RNA which are in turn translated into protein

1.5 page 3 DNA Replication S. Preston 1

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG)

MICROBIAL GENETICS. Gene Regulation: The Operons

Milestones of bacterial genetic research:

Overview of Eukaryotic Gene Prediction

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Gene Regulation -- The Lac Operon

HCS Exercise 1 Dr. Jones Spring Recombinant DNA (Molecular Cloning) exercise:

Thymine = orange Adenine = dark green Guanine = purple Cytosine = yellow Uracil = brown

Transcription:

DNA as a Mass-Storage Device Robert J. Robbins Johns Hopkins University rrobbins@gdb.org DNA as a Mass-Storage Device: 1

Goals of the Genome Project Sequence the Genome equivalent to obtaining an image of a mass-storage device Map the Genome equivalent to developing a fileallocation table for the mass-storage device Understand the Genome equivalent to reverse engineering the files on the mass-storage device all the way back to design and maintenance specifications DNA as a Mass-Storage Device: 2

Sequencing the Genome DNA as a Mass-Storage Device: 3

Getting the Sequence Obtaining one full human sequence will be a technical challenge. If the DNA sequence from a single human sperm cell were typed on a continuous ribbon in ten-pitch type, that ribbon could be stretched from San Francisco to Chicago to Washington to Houston to Los Angeles, and back to San Francisco, with about 60 miles of ribbon left over. The amount of human sequence currently sequenced is equal to less than onethird of that left-over 60-mile fragment. We have a long way to go, and getting there will be expensive. Computers will play a crucial role in the entire process, from robotics to control experimental equipment to complex analytical methods for assembling sequence fragments. year per base percent cost budget year cumulative completed 1995 $0.50 16,000,000 10,774,411 10,774,411 0.33% 1996 $0.40 25,000,000 21,043,771 31,818,182 0.96% 1997 $0.30 35,000,000 39,281,706 71,099,888 2.15% 1998 $0.20 50,000,000 84,175,084 155,274,972 4.71% 1999 $0.15 75,000,000 168,350,168 323,625,140 9.81% 2000 $0.10 100,000,000 336,700,337 660,325,477 20.01% 2001 $0.05 100,000,000 673,400,673 1,333,726,150 40.42% 2002 $0.05 100,000,000 673,400,673 2,007,126,824 60.82% 2003 $0.05 100,000,000 673,400,673 2,680,527,497 81.23% 2004 $0.05 100,000,000 673,400,673 3,353,928,171 101.63% DNA as a Mass-Storage Device: 4

Understanding the Genome DNA as a Mass-Storage Device: 5

Reverse Engineering Codes Understanding the genome involves the equivalent of reverse engineering binary codes for an unknown computer system. This is nearly impossible for a single program, but comparive analyses of similar programs can provide a start. Suppose you had two programs, one of which caused a computer to undergo a cold boot, the other a warm boot. A comparison of these programs would give some small insights into the workings of that computer. WARMBOOT: BA 40 00 8E DA BB 72 00 C7 07 00 00 EA 00 00 FF FF COLDBOOT: BA 40 00 8E DA BB 72 00 C7 07 34 12 EA 00 00 FF FF Alignments of the codes can provide insights into regions of common function: WARMBOOT: COLDBOOT: BA 40 00 8E DA BB 72 00 C7 07 00 00 EA 00 00 FF FF BA 40 00 8E DA BB 72 00 C7 07 34 12 EA 00 00 FF FF DNA as a Mass-Storage Device: 6

Reverse Engineering Codes Similar methods could be used to analyze programs that caused messages to be displayed on screen. Assume that you have the sources for four short programs, each of which causes a short message to be written to the screen: 1 = Hello world 2 = Hi world 3 = Goodbye world 4 = Hello Aligning the sequences (inserting blanks where necessary) allows the detection of common features: 1: EB 0D 90 48 65 6C 6C 6F -- -- 20 77 6F 72 6C 64 24 B4 00 B4 09 BA 03 01 CD 21 C3 2: 3: 4: EB 0A 90 48 69 -- -- -- -- -- 20 77 6F 72 6C 64 24 B4 00 B4 09 BA 03 01 CD 21 C3 EB 0F 90 47 6F 6F 64 62 79 65 20 77 6F 72 6C 64 24 B4 00 B4 09 BA 03 01 CD 21 C3 EB 07 90 48 65 6C 6C 6F -- -- -- -- -- -- -- -- 24 B4 00 B4 09 BA 03 01 CD 21 C3 DNA as a Mass-Storage Device: 7

Reverse Engineering Codes Now, suppose you get a fifth program, that writes the same "Hello world" message as the first program, but which has different binaries. At first, the sequences appear fairly different: 1: EB 0D 90 48 65 6C 6C 6F 20 77 6F 72 6C 64 24 B4 00 B4 09 BA 03 01 CD 21 C3 5: EB 01 90 B4 00 B4 09 BA 0F 01 CD 21 EB 0D 90 48 65 6C 6C 64 20 77 6F 72 6C 6C 24 C3 Again, sequence similarities can provide clues... 1: -- -- -- EB 0D 90 48 65 6C 6C 6F 20 77 6F 72 6C 64 24 B4 00 B4 09 BA 03 01 CD 21 C3 5: EB 01 90 B4 00 B4 09 BA 0F 01 CD 21 EB 0D 90 48 65 6C 6C 64 20 77 6F 72 6C 6C 24 C3 DNA as a Mass-Storage Device: 8

Reverse Engineering Codes This kind of comparative technique is used in biology. gene name DNA sequence near transcription initiation site lacz malt arac galp1 deop2 cat tnaa arae ---ccaggc TTtACA ctttatgcttccggctcg- TATgtT --gtgtgga--- --tcatcgc TTGcat tagaaaggtttctggcc-- gacctt --ataacca--- --atccatg TgGACt tttctgccgtgattata-- gacact tttgttacg--- ----catgt cacact tttcgcatctttgttatgc TATggT --tatttca--- ----gtgta TcGAag tgtgttgcggagtagatgt TAgAAT --actaaca--- gatcggcac gtaaga ggttccaactttcac---- cataat -gaaataag--- -tttcagaa TaGACA aaaactctgagtgtaa--- TAatgT --agcctcg--- ----ccgac ctgaca cctgcgtgagttgttcacg TATttT ttcactatg--- consensus --------- TTGACA ------------------- TATAAT ------------ - 35-10 DNA as a Mass-Storage Device: 9

Mapping the Genome DNA as a Mass-Storage Device: 10

The Simplistic View of a Gene P coding region T T 1 T 2 P 1 P 2 coding region -- gene2 coding region -- gene 1 ATGCTGCATAGACGATC Genes are sequences of DNA. Mapping the genome involves identifying and locating specific functional regions of DNA. DNA as a Mass-Storage Device: 11

DNA as a Mass-Storage Device Mass Storage System: Underlying primitive structure is linked list, not physical medium List has polarity Addressing is associative, not physical No content restriction on linear order of data 00 01 10 11 01 10 00 01 10 11 01 10 00 01 10 11 01 10 00 01 10 11 01 10 Understanding how DNA is constructed is necessary for understanding how it functions. DNA as a Mass-Storage Device: 12

DNA as a Mass-Storage Device Redundant Mass Storage System: Full structure is dual linked list Lists have opposite polarity Total content restriction on paired data Either list can be data, the other redundant 00 01 10 11 01 10 01 10 00 01 10 11 DNA has much in common with a mass-storage device, but a mass-storage device based on an underlying linked list, not a physical system with spatial addresses independent of what is being addressed. DNA as a Mass-Storage Device: 13

DNA as a Mass-Storage Device An expressed region of the genome consists of a coding region flanked by an upstream associative start signal and a downstream associative stop signal. Associative start-reading signal (promoter) Associative stop reading signal (terminator) Intervening coding region Expressed regions may occur on either strand of the DNA double helix. The two strands are transcribed in opposite directions. Gene 1 P 1 T 1 Gene 3 P 3 T 3 Gene 4 P 4 T 4 T 2 Gene 2 P 2 DNA as a Mass-Storage Device: 14

DNA as a Mass-Storage Device The expression of an individual region occurs as a series of steps... Gene 1 P 1 T 1 Primary Transcript: Transcription mrna: Polypeptide: Post-transcriptional modification Translation Modified Polypeptide: Post-translational modification Self-asembly to final protein DNA as a Mass-Storage Device: 15

The MMS Operon Escherichia coli P 1 P2 P x P 3 P a nut Pb eq T 1 P T 2 hs orfx rpsu dnag rpod S21 Primase Sigma-70 Translation Replication Transcription Lupski, J.R., Godson, G.N., 1989, DNA ->DNA, and DNA -> RNA->Protein: Orchestration by a single complex operon, BioEssays, 10:152-157. In practice, many genomic regions exhibit considerable complexity. DNA as a Mass-Storage Device: 16

The Gart/Lcp Locus Drosophila melanogaster T G P L T L P x 3' 5' 3' 5' 5' 3' Henikoff, S., Keene, M.A., Fechtel, K., and Fristrom, J.W., 1986, Gene within a gene: Nested Drosophila genes encode unrelated proteins on opposite strands, Cell 44:33. Genes can be broken up with non-coding sub-sequences (introns) interspersed among coding sub-sequences (exons). An entire gene may lie within an intro of another gene. DNA as a Mass-Storage Device: 17

The UGT1 Locus P f P e P d P c P b P a T UGT1F UGT1E UGT1D UGT1C UGT1BP UGT1A 2 3 4 5 phenol UDP-glucuronosyltransferase: UGT1F 2 3 4 5 bilirubin UDP-glucuronosyltransferases: UGT1D 2 3 4 5 UGT1A 2 3 4 5 Ritter, J.K., Chen, F., et al., 1992, A novel complex locus UGT1 encodes human bilirubin, phenol, and other UDPglucuronosyltransferase isozymes with identical carboxyl termini, J. Biol. Chem. 267:3257. The human UGT1 locus seems to be an example of a nested gene family. DNA as a Mass-Storage Device: 18

The POMC Locus Products Homo sapiens Preproopiomelanocortin 26 48 12 40 14 21 40 18 31 signal peptide 12 14 21 40 18 31 γ-msh Corticotropin β-lipotropin 14 18 31 α-msh β-msh Endorphin 40 18 γ Lipotropin Enkephalin Some regions of the genome produce multiple proteins from a single polypeptide precursor. DNA as a Mass-Storage Device: 19

Guide RNAs P T P T gene 1 gene 2 enzyme grna + RNA edited mrna protein Complex interactions among transcipts from different genomic locations may be required to produce a single protein. DNA as a Mass-Storage Device: 20

The rrna Loci Homo sapiens P T P T P rrna gene cluster IGS rrna gene cluster IGS rrna gene cluster 8-14 kbp 18S RNA 5S RNA 28S RNA 2300 bp 4200 bp 156 bp pre rrna transcription unit Some regions of the genome are characterized by multiple tandem repeats of the same gene. DNA as a Mass-Storage Device: 21

D14S1 is a VNTR Locus Frequency of PstI fragment sizes (kb) 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA A A A A A A A A A A A AA A A A A AA A A A A AA AA AA A A AA AA A AA A AA AA A AA 3 4 5 6 7 8 9 1 0 1 2 1 4 1 6 1 8 Balazs, I., Neuweiler, J., Gunn, P., Kidd, J., Kidd, K.K., Kuhl, J., and Mingjun, L., 1992, Human population genetic studies using hypervariable loci, Genetics, 131:191-198. Regions of the genome may differ in length among normal individuals. DNA as a Mass-Storage Device: 22