Overview of Next Generation Sequencing platform technologies Dr. Bernd Timmermann Next Generation Sequencing Core Facility Max Planck Institute for Molecular Genetics Berlin, Germany
Outline 1. Technologies Illumina Roche / 454 2. Projects and Applications whole Genome Re-sequencing Sequence Capture Amplicon Sequencing 3. Outlook
Max Planck Society 80 institutes and research facilities 20,435 people Budget 1,400 million euro in 2010 Max Planck Institute for molecular Genetics
Development of Sequencing Throughput Throughput per system [kilobases/day] 1000,000,000 100,000,000 10,000,000 1,000,000 100,0000 10,000 1000 100 10 Gel-based Systems Capillary Sequencing First Generation Capillary Sequencer Next Generation Sequencing Short-Read Sequencer Microwell Pyrosequencing Second Generation Capillary Sequencer 1980 1985 1990 1995 2000 2005 2010 Year Modified after MR Stratton et al. Nature 458, 719 724 (2009)
Development of Sequencing Technologies Human Genome Project 1000 Genomes Project 96 sequences in parallel 3.2 billions of sequences per run
Sequencing Capacities at the MPI-MG 7 x Illumina 5 x Roche GS 3 x SOLiD 3 x Capillary Systems
IT Infrastructure Short Read Technologies TB GB Long Read Technologies 25 x 32 (64) Compute Server with 128 (512 GB) RAM 4 peta byte Storage Capacity
Technologies 1. Technologies Illumina Roche / 454 2. Projects and Applications whole Genome Re-sequencing Sequence Capture Amplicon Sequencing 3. Outlook
Genome Sequencer FLX HiSeq 2000/ SOLiD de novo Sequencing Metagenome Analyses Amplicon Sequencing Full length Transcriptome Analyses Sequencing of target regions ChipSeq MeDipSeq mirna RNAseq Sequencing of target regions Whole genome resequencing
Principle Illumina Sequencing Library Preparation Cluster Generation Attachement of single molecules to surface Amplification to form clusters
3 5 Sequencing by Synthesis (SBS) Cycle 1: Add sequencing reagents A T C A G T C T G C T A C G A First base incorporated Remove unincorporated bases Detect signal Cycle 2-n: Add sequencing reagents and repeat G T C A G T A C C C G A T C G A T 5
Conversion of image data to DNA sequences Sequence Reads TCGGGAGTCCTAATGAGCCCGTAATCCCGTTAGTA TGAAGTCGGGAGTCCTAATGAGCCCGTAATCCCGTT CGAATGAAGTCGGGAGTCCTAATGAGCCCGTAATCC GAGCGAATGAAGTCGGGAGTCCTAATGAGCCCGTAA CGAGCGAATGAAGTCGGGAGTCCTAATGAGCCCGTA Referenzsequenz...CGAGCGAATGAAGTCGGGAGTCGTAATGAGCCCGTAATCCCGTTAGTA...
Facts Illumina Sequencing (HiSeq 2000) Input Material: Library Preparation: ~ 1.5 days Cluster Generation: ~ 1-3 µg DNA shotgun Sequencing ~ 10 ng ChipSeq Sequencing ~ 1 day Run Time/ Single read ~ 2 days (36 b) Read Length: Paired End ~ 10 days (2 x 100 b) Data Processing: ~ 1 day Output: Reads: Paired End ~ 500 Gb up to 4800 Mio
454 Sequencing Instrument 2. Load PicoTiter plate into instrument 3. Load Reagents in a single rack 4. Sequencing 1. Genome is loaded into a PicoTiter plate
Principle 454 Sequencing Emulsion Breaking Library Preparation Emulsion PCR Depositing DNA Beads into the PicoTiter Plate Pyrosequencing
Facts 454 Sequencing Input Material: Library Preparation : Emulsion PCR: Run Time: Data Processing: Output: ~ 0.5 µg DNA ~ 4 hours ~ 1 day 20 hours ~ 10 hours Titanium+ 700-1000 MB Reads: Titanium+ 1.000.000-1.600.000 Read length: 700-800 bases
Sequencing Pipeline Library Quantification Library Preparation Sequencing Bead Enrichment
Projects and Applications 1. Technologies Illumina Roche / 454 2. Projects and Applications whole Genome Re-sequencing Sequence Capture Amplicon Sequencing 3. Outlook
Goals A public database of essentially all SNPs and detectable CNVs with allele frequency >1% in each of multiple human population samples Pioneer and evaluate methods for: Generating data from next-generation sequencing platforms Exchanging and combining data and analytical methods Discovering and genotyping SNPs and CNVs from nextgen data Imputation with and from next generation sequencing data 454, Illumina and AB SOLiD platforms Academic genome centers in US, UK, Germany, China and platform companies (Nature 2010, Science 2010 and Nature 2011)
OncoTrack, Methods for systematic next generation oncology biomarker development, is an international consortium of over 60 scientists, that has launched one of Europe s largest collaborative academicindustry research projects to develop and assess novel approaches for identification of new markers for colon cancer.
Protein Cell lines Tissues Mutations Methylation DNA mrna RNA mirna Sequencing Bioinformatics
total RNA Isolation RNAseq small RNA Depletion Mapping quality control dsdna generation using random hexamers expression profiling Illumina library preparation massive parallel sequencing
Sequence Capture GWAS Candidate Genes Whole Exome 0.5 5 MB 35 MB 385 k Array, Nimblegen In-solution Enrichment 2.1 Mio Array, Nimblegen In-solution Enrichment
Targeted Resequencing: Project outline Identification patients Sample preparation Sequence capture Work-flow Next-Gen sequencing Functional characterization Follow-up sequencing Bioinformatics
Principle of sequence capture DNA Preparation Enrichment of target regions Sequencing genomic DNA Hybridization Fragments (200 500bp) Selection with streptavidin beads Ligation of adapters A1 SP1 Amplification and Quantification A2
Cleft lip with or without cleft palate (CL/P) Cooperation with M. Nöthen and E. Mangold Epidemiology of nonsyndromic CL/P Prevalence among live births ~ 1 : 1.000 Risk for siblings 1 : 20 1 : 25 λ s 40-50 Mangold E. et al. (2010), Nature Genetics
Cleft lip with or without cleft palate (CL/P) Resequencing as follow up of GWAS 3 Loci on chr 8 (640Kb), 10 (161Kb) and 17 (340Kb) in 20 affected individuals MID tagging and pooling of 10 samples Enrichment using the 2.1M NimbleGen array Sequencing on a Roche GS FLX system
Mapping
Cleft lip with or without cleft palate (CL/P) Preliminary Results 6.726 unique variants (>10 x Coverage) 3.783 variants not listed in dbsnp (hg19) 4 coding Variants Detection of structural Variations not yet finished
Mutation detection pipeline quality Concordance with Affymetrix Array "genome-wide human SNP array 6.0"
Amplicon Sequencing Aim Detection and quantification of new and known variants METHOD Amplification and sequencing of target regions Multiple alignments of sequences against a reference reference patient sequences
Amplicon Sequencing A A-primer (21 bp) key MID Sequence of interest Locus specific PCR amplification MID key B B-primer (21 bp) empcr Amplification and sequencing Long reads required to sequence through the locus specific primer, enable haplotyping over longer distances 100s to 1000s of amplicon clones sequenced simultaneously
Amplicon Sequencing IRON Study Interlaboratory Robustness of NGS
Amplicon Sequencing IRON Study Hematology Focus Group
Amplicon Sequencing IRON Study Results per each amplicon, the median coverage eached was 713-fold, ranging from 553-fold to 878-fold a total of 92 variants (44 distinct mutations and 10 SNPs) were observed in comparison to data available from Sanger sequencing, 454 amplicon deep-sequencing detected all mutations and SNPs that were previously known we here confirm in a multicenter analysis that ampliconbased deep-sequencing is technically feasible, achieves a high concordance across multiple laboratories, and therefore allows a broad and in-depth molecular characterization of hematological malignancies. Kohlmann et al. (2011), Leukemia
Sensitivity of mutation detection as a function of tumor cell content Querings et al. (2011), PlosOne
Outlook Establishment of small scale NGS systems Analysis of complete genomes Personalized medicine
Acknowledgments Sequencing Facility: Ilona Hauenschild Sonia Paturej Tina Moser Ina Lehmann Norbert Merges Daniela Roth Sabrina Rau Heiner Kuhl Sven Klages Martin Werber Hans Lehrach Bernhard Herrmann Hilger Ropers Martin Vingron Michal Schweiger Martin Kerick Markus Ralser
Thanks for your attention!