Next Generation Sequencing I: Technologies. Jim Noonan Department of Genetics

Similar documents
Next generation DNA sequencing technologies. theory & prac-ce

Next Generation Sequencing

Introduction to next-generation sequencing data

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

July 7th 2009 DNA sequencing

Overview of Next Generation Sequencing platform technologies

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

NGS data analysis. Bernardo J. Clavijo

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Next Generation Sequencing

Introduction to NGS data analysis

Next Generation Sequencing: Technology, Mapping, and Analysis

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

FOR REFERENCE PURPOSES

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Illumina Sequencing Technology

Introduction Bioo Scientific

Single Nucleotide Polymorphisms (SNPs)

G E N OM I C S S E RV I C ES

SEQUENCING. From Sample to Sequence-Ready

Data Analysis for Ion Torrent Sequencing

Next Generation Sequencing for DUMMIES

How many of you have checked out the web site on protein-dna interactions?

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Computational Genomics. Next generation sequencing (NGS)

DNA Sequencing & The Human Genome Project

Concepts and methods in sequencing and genome assembly

PreciseTM Whitepaper

DNA Sequence Analysis

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

How Sequencing Experiments Fail

Core Facility Genomics

TruSeq Custom Amplicon v1.5

Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne

Third Generation Sequencing

NEXT GENERATION SEQUENCING

Software Getting Started Guide

Complete Genomics Sequencing

NGS Technologies for Genomics and Transcriptomics

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production

Personal Genome Sequencing with Complete Genomics Technology. Maido Remm

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Forensic DNA Testing Terminology

Microbial Oceanomics using High-Throughput DNA Sequencing

Introduction To Real Time Quantitative PCR (qpcr)

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

The author(s) shown below used Federal funds provided by the U.S. Department of Justice and prepared the following final report:

Analysis of DNA methylation: bisulfite libraries and SOLiD sequencing

TGC AT YOUR SERVICE. Taking your research to the next generation

Welcome to Pacific Biosciences' Introduction to SMRTbell Template Preparation.

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

BIOO LIFE SCIENCE PRODUCTS

An Overview of DNA Sequencing

Introduction. Preparation of Template DNA

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

The RNAi Consortium (TRC) Broad Institute

RNAseq / ChipSeq / Methylseq and personalized genomics

Non-invasive prenatal detection of chromosome aneuploidies using next generation sequencing: First steps towards clinical application

14/12/2012. HLA typing - problem #1. Applications for NGS. HLA typing - problem #1 HLA typing - problem #2

Techniques in Molecular Biology (to study the function of genes)

Procedures For DNA Sequencing

Next Generation Sequencing; Technologies, applications and data analysis

School of Nursing. Presented by Yvette Conley, PhD

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Microarray Technology

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Consistent Assay Performance Across Universal Arrays and Scanners

Genetic diagnostics the gateway to personalized medicine

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls?

Mitochondrial DNA Analysis

LifeScope Genomic Analysis Software 2.5

Illumina GAIIx Sequencing Service

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Tools for human molecular diagnosis. Joris Vermeesch

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

All your base(s) are belong to us

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

DNA Sequencing. Ben Langmead. Department of Computer Science

Cluster Generation. Module 2: Overview

Sequencing and microarrays for genome analysis: complementary rather than competing?

Services. Updated 05/31/2016

Discovery & Modeling of Genomic Regulatory Networks with Big Data

Genomics GENterprise

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Whole genome Bisulfite Sequencing for Methylation Analysis Preparing Samples for the Illumina Sequencing Platform

SRA File Formats Guide

Next-generation DNA sequencing techniques

escience and Post-Genome Biomedical Research

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

Reading DNA Sequences:

Keeping up with DNA technologies

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Transcription:

Next Generation Sequencing I: Technologies Jim Noonan Department of Genetics

Sequence as the readout for biological processes Determining the biological state of cells, tissues and organisms requires the quantification of sequence information Gene expression Protein-DNA interactions (ChIP) DNA-DNA interactions (3C/4C/5C) Chromatin state DNA methylation Genetic variation (SNPs/CNVs) Indirect measures Direct measures CTATGATCAGTC... TCAATCTGATCTG... GGACTTCGAGATC... AAGTCGCTGACGT... microarrays, PCR, etc. Sequencing

Outline First-generation sequencing technology Sanger sequencing Parallelization in human genome project Current massively parallel sequencing platforms 454 Illumina SOLiD Helicos Applications, advantages, issues Third-generation sequencing Pacific Biosciences Complete Genomics Ion Torrent

Metrics for evaluating sequencing methods Throughput Number of high quality bases per unit time Number of independent samples run in parallel - multiplexing Difficulty of sample prep Yield Number of useful/mappable reads per sample Read length Cost Per run and per base Equipment Reagents Infrastructure Labor Analysis The goal of all new sequencing technologies is to increase throughput and yield while reducing cost

Sanger sequencing (1975-1977) 1980 Nobel Prize in chemistry gels read by hand phi X 174 ~5300 bp radiolabeled dideoxyntps one lane per nucleotide 800 bp reads low throughput (several kb/gel)

Parallelization of Sanger sequencing: Technology Semi-automated gel electrophoresis four color ddntp labeling 800 bp reads 96 samples/gel 70,000 bp/gel automated readout metrics for basecalling and quality scoring Capillary electrophoresis 800-1000 bp reads 384 samples/cap 300,000 bp/cap

Parallelization of Sanger sequencing: Infrastructure enormous increase in sequencing production capacity throughout the HGP industrialization of Sanger sequencing, library construction, sample preparation, analysis, etc. $3 billion total cost 1 billion bp/month at largest centers (2005)

Second-generation sequencing Democratizing sequencing production Massive parallelization Reduction in per-base cost Eliminate need for huge infrastructure Millions of reads - >1Gb sequence per run Novel sequencing applications RNA-seq ChIP-seq Methyl-seq Counting applications Whole-genome and targeted resequencing Challenges Read length Quality Data analysis

1 cycle: T-A-G-C flowed in sequence across plate Intensity of signal determines how many nt (i.e. A vs. AAAAA) are incorporated 454 pyrosequencing

454 pyrosequencing Throughput & Yield 1 million 400 bp reads/10 hour run >8 samples/run (more with barcoding) Cost Machine: $500k; reagents ~$8000k/run Issues High indel rate in homopolymers Longer reads but fewer than other systems

454 sequencing applications 106.5 million reads 24.5 Gb seq 7.4 x coverage >10,000 amino acid replacements CNVs up to 1.5 Mb Nature 452:872 (2008)

Illumina Short read technologies (currently dominant) Sequencing by synthesis 250 million 36-100 bp reads/run ~32-40 Gb alignable sequence $6500-$12000 in reagent cost/run 3-10 day run time SOLiD Sequencing by ligation ~400 million 35-50 bp reads/run 100 Gb/run (SOLiD 4) ~$5000 in reagent cost/run 3-6 day run time Helicos Sequencing by synthesis No amplification 750 million reads/run $18k run cost 8 day run time

Illumina Cluster PCR on flowcell (8 lanes) Sequencing by synthesis with reversible dye terminators 1 cycle Scan flowcell Reverse termination Add next base

Error rates increase with read length in Illumina sequencing

Up to 100 bp paired end reads Paired end sequencing

Multiplex sequencing Up to 96 samples on a flowcell

Yoruba male from Nigeria Nature 456:53 (2008)

ChIP-seq: enhancer identification in vivo Visel et al. Nature 457:854 (2009) p300 = enhancer-associated factor p300 binding = ~90% predictive of enhancer activity

30 ng Gene expression profiling by massively parallel RNA sequencing (RNA-seq)

Array Sequence Capture NimbleGen whole exome arrays: 2.1 M features; > 60 bp probes

HiSeq 2000 2 flowcells/run 60x coverage of 1 human genome SINGLE READ Reads/flowcell* Reads/lane GB per lane Length of run in bp 50 75 100 GAIIe 100,000,000 12,500,000 0.63 0.94 1.25 GAIIx 250,000,000 31,250,000 1.56 2.34 3.13 HiSeq2000 1,000,000,000 62,500,000 3.13 4.69 6.25 Big data: Tb-scale datasets YCGA : ~1 Pb storage for 12 Illumina GAIIx PAIRED- END READ Reads/flowcell* Reads/lane GB per lane Length of run in bp 50 75 100 GAIIe 200,000,000 25,000,000 1.25 1.88 2.50 GAIIx 500,000,000 62,500,000 3.13 4.69 6.25 HiSeq2000 2,000,000,000 125,000,000 6.25 9.38 12.50

SOLiD

SOLiD Two base encoding in color space

Single molecule sequencing: Helicos

Single molecule sequencing: Helicos No PCR 800 million reads, 50 samples high indel rate (3%)

Third-generation sequencing Extremely high-throughput sequencing at very low cost Pacific Biosciences Sequence in real time with fluorescent NTPs Rate limited by processivity of polymerase Very long reads (>10 kb) Not well parallelized (few reads) Complete Genomics Sequencing hybridization & ligation of DNA nanoballs Short reads (70 bp) Very low cost ($5,000/genome) Ion Torrent Sequencing on semiconductors

Sequencing in real time: Pacific Biosciences SMRT cells Zero Mode Waveguides

PacBio Specs Autumn 2010 160,000 reads per SMRT cell; 96 SMRT cells per run (~15 M reads) ~1 kb reads 5% error rate (deletions due to dark bases ) ~15 b/second $695k per instrument; $99 per SMRT cell 2 Gb data footprint per SMRT cell Targets 320,000 reads per SMRT cell 50,000-70,000 bp reads Methylation studies and direct RNA seq

Complete Genomics

Complete Genomics Sequencing chemistry Sequencing service model Provide summary stats, variant calls (SNP and CNV), reads, alignments Proof of principle: sequenced 3 human genomes (2 HapMap and George Church) Drmanac et al. Science 327:78 (2010)

Ion Torrent Sequencing on semiconductor chip $45,000 per instrument; $300-$500/run 750,000 100 bp reads; 75 Mb 2 hour run time

Conclusions High-throughput sequencing has become democratized - moved out of industrial-scale genome centers Sequence is no longer limiting - next generation of sequencers will make sequencing very inexpensive Earlier methods for counting / resequencing applications are largely obsolete Scale of data production outstripping our ability to store and analyze it Next: dealing with the data and personal genomics