July 7th 2009 DNA sequencing
Overview Sequencing technologies Sequencing strategies Sample preparation Sequencing instruments at MPI EVA 2 x 5 x ABI 3730/3730xl 454 FLX Titanium Illumina Genome Analyzer ABI SOLiD HeliScope pictures by Illumina, Roche, ABI, Helicos
Sanger sequencing - principle = dideoxy method = chain termination method Template (PCR product, plasmid) dntp ddntp
Original method Sanger sequencing - principle autoradiogram annotated bands
Sanger sequencing - technology Improved over time, automated sequencing - dye-labelled ddntps - capillary electrophoresis + = ABI 3730/3730xl
Sanger sequencing accuracy Phred-scores = quality scores: - peak height - peak shape - peak density
Sanger sequencing throughput Technology Read length Sequence s per run Bases per run run base Sanger 500 1100 b 96 50-100 kb 200 2 cents
Sanger sequencing what you need 1) Sample: Clonal copies of your sequencing template - PCR product -plasmid 2) Sequencing primer
Sanger sequencing strategies A small and simple exon (1 000 bp) PCR sequencing 2 (4) sequences DNA PCR product A human mitochondrial genome (16 500 bp) PCR PCR product sequencing 64 sequences
Sanger sequencing strategies Lysozyme, short exon 1 (500 bp); many paralogues! many sequences DNA PCR PCR product subcloning sequencing A bush baby mitochondrial genome (16 500 bp); divergent! 64 sequences LR-PCR LR-PCR product sequencing by primer walking
Genome sequencing Sanger sequencing strategies Venter style (WGS) DNA Chop into pieces subcloning A lot of sequencing, assembly
Genome sequencing Sanger sequencing strategies Consortium style (hierarchical shotgun) Lander et al. 2001. Nature 409:860-921
454 sequencing principle Pyrosequencing (Nyrén / Ronaghi 1996) Sequencing by synthesis - Successive addition of nucleotides (datpαs,dctp,dgtp,dttp) - Nucleotide incorporation enzymatically translated into light dttp datpαs dctp dttp dgtp datpαs dctp dgtp dttp datpαs dctp dgtp TACACGACGCTCTTCCGATCTAAGTTG GATGTGCTGCGAGAAGGCTAGATTCAACGAGGAGCATTGCACTAGCCTTCTCGAGCATACG
454 sequencing principle Pyrosequencing massively parallelized by 454 Life Sciences 454 sequencing is not single molecule sequencing Parallelization of sample preparation and amplification required
454 sequencing principle 454 Sequenzier-Technologie Preparation of a I - III sequencing library
454 sequencing principle 454 Sequenzier-Technologie Emulsions PCR (empcr) IV
454 sequencing principle 454 Sequenzier-Technologie V Bead enrichment Primer annealing
454 sequencing principle 454 Sequenzier-Technologie Sequenzierung
454 sequencing accuracy Phred Q44 Show homopolymer problems
454 sequencing throughput Technology Read length [bp] Sequence s per run Bases per run run base Sanger 500 1100 96 50-100 kb 200 2 cents 454 Titanium 500 ~1 million 500 Mb 6000 0.001 cents
454 sequencing applications Genome sequencing: Sanger Venter style (shotgun) DNA Chop into pieces subcloning A lot of sequencing, assembly Genome sequencing: 454 Venter style (shotgun) DNA Chop into pieces library preparation less sequencing, assembly
454 sequencing applications Bush baby mitochondrial genome: Sanger LR-PCR LR-PCR product sequencing by primer walking 64 sequences Bush baby mitochondrial genome: 454 660 sequences LR-PCR LR-PCR product Shotgun sequencing ~ 20x oversampling
454 sequencing applications PCR product sequencing (Lysozyme): Sanger many sequences DNA PCR PCR product subcloning sequencing PCR product sequencing (Lysozyme): 454 a LOT of sequences DNA PCR PCR product library preparation sequencing
454 sequencing limitations and solutions Large amounts of starting material (5 ug) quantitative PCR reduces material demands from ~ 5 μg to ~ 20 pg Meyer et al.; Nucleic Acids Research 2008
454 sequencing limitations and solutions Sequencing samples in parallel - Initially limited to 16 GS FLX Titanium platform - 1/16th lane ~ 25,000 sequences, 500 ~ 2000 x coverage of a 6 kb plasmid ~ 700 x coverage of a mitochondrial genome
454 sequencing limitations and solutions Meyer et al. Nucleic Acids Research 2007 Nature Protocols 2008
454 sequencing limitations and solutions Using barcoding (e.g. PTS) - 1/16th lane ~ 25,000 sequences, 500 ~ 100 plasmids (6 kb) with 20 x coverage ~ 35 mitochondrial genomes with 20 x coverage ~ 1,250 PCR products with 20 x coverage Limitations in sequencing throughput => Limitations in sample preparation
454 sequencing limitations and solutions Direct multiplex sequencing Stiller et al. Genome Research (in press)
Solexa (Illumina) sequencing principle Reversible terminator sequencing Modified polymerase incorporates dye-labeled, terminated nucleotides 1) Incorporation of a single nucleotide 2) Detection of label 3) Removal of terminator/label A G G T T C A C ACACTCTTTCCCTACACGACGCTCTTCCGATCT TGC TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAACGTTGCAGGAGCATTGCACTAGCCTTCTCGAGCATACGGCAGAAGACGAAC
Solexa (Illumina) sequencing principle Sodium hydroxide melting flow cell pictures by Illumina, Inc.
Solexa (Illumina) sequencing principle pictures by Illumina, Inc.
Solexa (Illumina) sequencing principle
Solexa (Illumina) sequencing principle
Solexa (Illumina) sequencing accuracy Artifact sequences 1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATATATAAATTA 1 AAAAAAAAAAAAACAAAAAACAAAAAAAAAACAAACAAAACAACAAATAA 1 AAAAAAAATATTTAATTATTTTTATTTATAATTTTTTTGTTTTTTGTTTT 1 AAACAAACCACACAAACAAAAAAACACAACAAAACAACACCACCACCCAA 1 ATTCTATTTAATACAAATAAAATATCAATTTAAAACTACACTATACATAA 1 CAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACA 1 CAAATATATTTATATTTATTTTTTTATTTAATTTTTATATTTTTATTTAT 1 CATTTATTCTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTTTTTTTTTTT 1 CCCCCCCCCCCCCCCCCCCCACCCCCCCCCCACCCACCCCACCCCCCCCC 1 CCCCCCCCCCCCCCTTCCCCCCTCTTCTTCTCTCTTTTCTTTTTTTTTTT 1 CCCCCCCCCCCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT 1 CCCGCGCCCCCCCGCCGCCGCGCCCAGCCCAGGCCACCACACACGCACCC 1 CCTCCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
Solexa (Illumina) sequencing accuracy 1) Map all reads against a reference sequence 2) Eliminate reads with > 2 mismatches in the first 36 bp 3) Check error profiles for the remaining reads Mismatch rate 0.00 0.01 0.02 0.03 Bustard Ibis Average raw error: 2.02% Average raw error: 1.13% A/C A/G A/T C/A C/G C/T G/A G/C G/T T/A T/C T/G N 0 10 20 30 40 50 0 10 20 30 40 50 Position in read
Solexa (Illumina) sequencing throughput Technology Read length [bp] Sequences per run Bases per run run base Sanger 500 1100 96 50-100 kb 200 2 cents 454 Titanium 500 ~1 million 500 Mb 6,000 0.001 cents Solexa (currently) 2 x 100 ~ 140 million 28 Gb 10,000 0.00004 cents Ultra high-throughput sequencing
Solexa (Illumina) sequencing applications Genome Re-Sequencing 8x coverage of human genome DNA Chop into pieces library preparation sequencing mapping assembly Targeted Sequencing ~ 1 lane of the flowcell ~ 20 million sequences? 1 million PCR products 12,500 mitochondrial genomes at 20 x coverage
Target enrichment methods Array capture Probes ~5Mb targeted per array 7 arrays, whole exome ~98% of exons retrieved 6,000 LR-PCRs Glass slide Genome-wide in situ exon capture for selective resequencing Hodges et al., Nature Genetics., 2007
Target enrichment methods Combine multiplex array capture and sequencing DNA shearing prepare barcoded libraries pooling shearing DNA For each project, array with different targets Solexa sequencing How long until we only sequence genomes?
Other sequencing technologies ABI/SOLiD Polonator Helicos And dozens under development: - PacBio - Oxford Nanopore -...
Be warned... Skills required for DNA sequencing projects 1 % 99 %
Thanks! For your attention... MPI EVAN Martin Kircher