First generation" sequencing technologies and genome assembly Roger Bumgarner ssociate Professor, Microbiology, UW Rogerb@u.washington.edu
Why discuss a technology that appears to be being replaced? Next gen technologies are great for obtaining large numbers of sequences (thousands to billions) but are not necessarily applicable to smaller projects. Most clinical sequencing is done using Sanger sequencing methods.
Overview How to sequence any DN How to sequence a lot of DN What have we learned from 20 years of the genome project? What s next?
Intended outcomes n understanding of: The process of DN sequencing the types, sources and rates of errors in DN sequence data historical perspective of genome sequencing n understanding of the methods used to sequence and assemble genomes
utomated DN Sequencing
Goal - To Read the Sequence of the Basepairs in a region of DN
DN Structure
DN Sequencing: Process Overview Generation of a nested set of fragments Separation of the fragments Detection nalysis or base calling
Maxam-Gilbert Sequencing Maxam M, Gilbert W (February 1977). " new method for sequencing DN". Proc. Natl. cad. Sci. U.S.. 74 (2): 560 4.
DN Replication single stranded DN binding proteins 5 5 3 helicase primosome primase 3 5 replicating 3 5 DN polymerase III active sites 3 5 RN primer 3 DN polymerase I ligase 5
The 3 hydroxyl group is the point of attachment of the next base What happens if the 3 OH is not there? X
Sanger Sequencing Sanger F, Coulson R (May 1975). " rapid method for determining sequences in DN by primed synthesis with DN polymerase". J. Mol. Biol. 94 (3): 441 8
n utorad of a Sequencing Gel CGT
With 4-colors, all reaction can be run in one lane C G T C G T C G C G C C T C G C T T C T C G C G C C T C G C T T C T C G C G C C T C G C T T C T Label each with a different color Mix all reactions prior to loading utomated DN sequencing and analysis of the human genome. Genomics. 1987 Nov;1(3):201-12. Hood LE, Hunkapiller MW, Smith LM.
The Principle of 4-color Fluorescent DN Sequencing
The First 4-Color Sequencing Instrument Fluorescence detection in automated DN sequence analysis. Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SB, Hood LE. Nature. 321(6071):674-9(1986).
The Perkin Elmer/BI 370/373 Fluorescence Based DN Sequencer By about 1987 pplied Biosystems had developed a slab gel system capable of sequencing 16 samples to about 250 bp in a 24 hour run
Sequencing Gel Image
utomated DN Sequencing - CGTT. C CG CGT CGTT +
Different Labeling Chemistries can be used Dye Primer - dye is attached to the 5 end of the sequencing primer. Dye Terminator - dye is attached to the ddntp - allows all 4 reactions to be run in same tube. Internal Labeling - dye is attached to a dntp - signal/molecule increases with length (rarely used today)
Processed Electropherogram
Errors and error rate verage error rate <1% Highest error rates are: t the beginning of the run (due to misalignment of the peaks and noise from unpurified fluorescent material) t the end of the run (due to loss in gel resolution often results in indel errors). lso have errors due to: compression Mixed samples (heterozygosity, repeats and PCR, etc)
Higher Voltages Produce Faster rates of Electrophoresis Speed is proportional to Voltage (V) Current (I) is depends on the resistance of the gel I=V/R Energy in Watts is W = V*I Thinner gels give higher R. Hence, thin or otherwise small gels must be used for higher voltages.
Capillaries automate loading of samples - + Sample buffer
The current range of capillary sequencers
Typical Specifications Read length 400-900bp Run times of 36mins to 3-4 hours. Total throughput per machine BI 3730, 96 capillaries: 2100 kbp/day (run 24hours/day) BI 310, 1 capillary: 5200 bp/day
Large Scale Sequencing
The (Human) Genome Project. The ultimate goal of the Human Genome Project is to decode, letter by letter, the exact sequence of all 3 billion nucleotide bases that make up the human genome. Just a single misplaced letter is sufficient to cause disease. GCTTCTGGTCTGTGCTTCGT 3,400,000,000 letters total
The (Human) Genome Project. Begun in 1990 with a 15 year budget of $3.0B overall. Goals: To obtain the sequences of human and model Organisms - E-Coli, Drosophila (fruit fly), C-Elegans (a worm), Yeast, Mouse Develop the necessary technologies to obtain the above.
How do we begin to analyze a genome? We want DN sequence for the entire genome (3.5 Bbp for human, 4Mbp for a bacterium). Sequencing allows one to read about 750 base pairs/sample. We need a method to sequence bigger pieces.
Primer Walking Vector Clone to sequence Primer Sequence New Primer Sequence Repeat
Shotgun sequencing Copy Subclone Clone to sequence Sequence and assemble.gtctcctgtctgtctgc.... CCTGTCTGTCTGCTT.... GTCTGTCTGCTTCG...
Shotgun vs. walking Method dvantage Disadvantage Shotgun Easy to automate Highly redundant Walking Not very redundant Harder to automate
Methods for very large scale sequencing hierarchical approach Map on a large scale (physical mapping), sequence specific clones whose position in the genome is known Shot gun sequencing Tear up the genome and sequence random fragments until it is done Sequence tagged connectors (STC) Sequence the ends of many clones and use this info to pick overlapping clones
Making a genomic library Cells Isolate DN Fragment DN Clone Library {
Library Types Chromosome specific libraries Chromosomes can be sorted from one another based on size and GC content. Genomic Libraries - made from the entire genome. Large insert/small insert : combination of vector choice (YC, BC, plasmid, m13), fragmentation method (enzymatic, shearing, sonication), and size selection (by gel or other method).
nother view of a library Multiple copies of the genome (streched out) Randomly fragment and clone Can we order these fragments relative to one another?
Restriction Enzymes - 1970 Copyright 1998 ccess Excellence www.gene.com
Physical Mapping : Digest and look for common features in clones B B
Repeat Pick a many minimal times tiling to construct path a physical map Sequence these mapped clones (typically by the shotgun method).
Path that was initially used for genome sequencing YCs map (MBP) BCs or Cosmids map (200kBP) m13, plasmid sequence (kbp)
Shotgun the genome Genome to sequence Subclone Sequence and assemble.gtctcctgtctgtctgc.... CCTGTCTGTCTGCTT.... GTCTGTCTGCTTCG...
Sequence tagged connectors (STC) Genome to sequence Subclone Sequence the ends and store in a db Sequence a clone, look for overlaps in the db
Which method? Whole genome shot-gun Method of choice today for small genomes and genomes with good reference sequences (good implies conserved genomic structure across the species) Celera s approach to the human genome, but what about repeats? Physical mapping Traditional method rarely used today STC Hybrid method, not a difficult as physical mapping, can resolve some issue with repeats. Common method today in genomic sequencing.
Who can do sequencing for me? Visit www.iths.org/resources Put in the word sequencing University of Washington - CEEH Facility Core #1: Functional Genomics & Proteomics, CFRTC - Genomics Core,DERC - Virus, Molecular Biology and Cell Core, DN Sequencing and Gene nalysis Center, High Throughput Genomics Unit Fred Hutchison Cancer Researc Center Genomics Resource at the Fred Hutchinson Cancer Research Center University of Idaho IBEST DN Sequence nalysis Core Facility Institute for Systems Biology - DN Sequencing Core Idaho State University - SU Molecular Research Core Facility University of Montanta - Murdock Molecular Biology Facility Seattle Biomed - Sequencing Core at Seattle Biomed University of Wyoming - The Nucleic cid Exploration Facility