The Nancy and Stephen Grand Israel National Center for Personalized Medicine (G-INCPM) Introduction to Illumina Next Generation Sequencing Technology Shmulik Motola, PhD March 2016
DNA Sequencing a process of determining the precise order of nucleotides (A, C, G, T) within a DNA molecule The order, or sequence, of nucleotides determines the genetic information available for building and maintaining an organism Sequence variation Natural polymorphism Mutation
Seq. Primer Sanger Sequencing (Est. 1975) DNA template (cloned & isolated from an E.Coli colony) Frederick Sanger 1918-2013 Replication products are separated by Electrophoresis
The Human Genome Project Used Sanger Sequencing Global international effort involving 20 Research Centers Lasted 13 years (first completed draft published in 2003) Cost: 3,000,000,000 $! Facilitated the discovery of more than 1800 diseaseassociated genes
Cost per human genome sequence Sanger Sequencing Next Generation Sequencing van Dijk EL et. al. Trends Genet. 2014 Sep;30(9):418-26
DNA Sequencing Methods: From Sanger to Next Generation Sequencing (NGS) Sanger Seq. Next Generation Seq. Throughput Low High Prior knowledge of template DNA sequence? Required. For PCR and Seq. primer design Not required. PCR and Seq. primers are universal DNA template to be sequenced Quantitative or Gene expression assay? Single DNA region Not supported May map anywhere on the genome (in case of whole genome seq.) Supported http://support.illumina.com/training/online-courses/sequencing.html
Schematic view of illumina NGS Technology NGS resolves hundreds of Millions of DNA Sequences on a single run! Complex DNA sample Attachment to solid surface Parallel Sequencing of all DNA fragments Data analysis
Common NGS Applications Whole Transcriptome (RNA-Seq) Whole Genome Sequencing Whole Exome Sequencing 16S Microbiome Small-RNA Seq DNA Methylation Analysis Chromatin Immunoprecipitation (ChIP)-Seq
Illumina Sequencing Overview 9 FOR RESEARCH USE ONLY
Illumina Sequencing Workflow Library Preparation Cluster Generation cbot MiSeq HiSeq 2500 Sequencing HiSeq MiSeq NextSeq500 Data Analysis ICS/RTA CASAVA MSR BaseSpace 10 FOR RESEARCH USE ONLY
Sample ( Library ) Preparation Overview: Aim: Obtaining Nucleic Acid Fragments with Adapters attached on both ends Nucleic acid (DNA/RNA) Modify to proper insert size Add adapters with sites for: - Flow cell binding and - Sequencing primer binding Same general template architecture regardless of application
Sample ( Library ) Preparation Overview: Sample Indexing Index= known short DNA sequence included in the DNA adapter which labels all DNA molecules of a particular sample Adapted from illumina
Single vs. Dual-indexed NGS Libraries Single-indexed libraries Index sequence P5 P7 Dual-indexed libraries P5 P7 The number of samples pooled determines the need for single vs. dual indexing
Illumina Sequencing Workflow Library Preparation Cluster Generation cbot MiSeq HiSeq 2500 Sequencing Data Analysis HiSeq MiSeq NextSeq500 GAIIx ICS/RTA CASAVA MSR BaseSpace 14 FOR RESEARCH USE ONLY
Cluster Generation: Aims Attachment of DNA molecules to the FlowCell Amplification of single DNA molecules into clonal clusters FlowCell (HiSeq High Output) FOR RESEARCH USE ONLY
What is a Flow Cell? Cluster generation occurs on a flow cell A flow cell is a thick glass slide with channels or lanes Each lane is randomly coated with a lawn of oligos that are complementary to library adapters 16 FOR RESEARCH USE ONLY
Instrumentation Single DNA Library Amplified Clonal Cluster cbot Sequencer 17 FOR RESEARCH USE ONLY
Hybridize Fragment & Extend Single strand DNA libraries are hybridized to primer lawn Adapter sequence Bound libraries are then extended by polymerases Surface of flow cell coated with a lawn of oligo pairs 3 extension 18 FOR RESEARCH USE ONLY
Denature Double-Stranded DNA Double-stranded molecule is denatured Original template Newly synthesized strand Original template washed away discard Newly synthesized strand is covalently attached to flow cell surface 19 FOR RESEARCH USE ONLY
Single-Stranded DNA NOTE: Single molecules bind to flow cell in a random pattern 20 FOR RESEARCH USE ONLY
Bridge Amplification Single-stranded molecule flips over and forms a bridge by hybridizing to adjacent, complementary primer Hybridized primer is extended by polymerases 21 FOR RESEARCH USE ONLY
Bridge Amplification Double-stranded bridge is formed 22 FOR RESEARCH USE ONLY
Denature Double-Stranded Bridge Double-stranded bridge is denatured Result: Two copies of covalently bound single-stranded templates 23 FOR RESEARCH USE ONLY
Bridge Amplification Single-stranded molecules flip over to hybridize to adjacent primers Hybridized primer is extended by polymerase 24 FOR RESEARCH USE ONLY
Bridge Amplification Bridge amplification cycle is repeated until multiple bridges are formed 25 FOR RESEARCH USE ONLY
Linearization dsdna bridges are denatured 26 FOR RESEARCH USE ONLY
Reverse Strand Cleavage Reverse strands are cleaved and washed away, leaving a cluster with forward strands only 27 FOR RESEARCH USE ONLY
Blocking Free 3 ends are blocked to prevent unwanted DNA priming 28 FOR RESEARCH USE ONLY
Read 1 Primer Hybridization Sequencing primer is hybridized to adapter sequence Sequencing primer 29 FOR RESEARCH USE ONLY
Illumina Sequencing Workflow Library Preparation Cluster Generation cbot MiSeq HiSeq 2500 Sequencing HiSeq NextSeq GAIIx MiSeq Data Analysis ICS/RTA CASAVA MSR BaseSpace 30 FOR RESEARCH USE ONLY
Sequencing By Synthesis Add 4 Fl-NTP s + Polymerase Incorporated FI- NTP imaged Terminator & fluorescent dye cleaved from FI-NTP X 36-251 31 FOR RESEARCH USE ONLY
Reversible Terminator Chemistry All 4 labeled nucleotides in 1 reaction Higher accuracy No problems with homopolymer repeats Next Cycle Incorporation Detection Deblock Fluor Removal 32 FOR RESEARCH USE ONLY
Clusters (of DNA molecules sequenced): Cluster Intensities collected following every base addition 100 Microns 33 FOR RESEARCH USE ONLY
Illumina Sequencing Workflow Library Preparation Cluster Generation cbot MiSeq HiSeq 2500 Sequencing Data Analysis HiSeq MiSeq NextSeq500 GAIIx ICS/RTA CASAVA MSR BaseSpace 34 FOR RESEARCH USE ONLY
Data Analysis Overview Analysis Type Software Outputs Sequencing ICS/RTA Images/TIFF files Primary Analysis ICS/RTA Intensities Base Calling Secondary Analysis HiSeq Analysis Software Alignments and Variant Detection 35 FOR RESEARCH USE ONLY
Paired End Sequencing 36 FOR RESEARCH USE ONLY
Single End Sequencing 37 FOR RESEARCH USE ONLY
Paired End Sequencing 38 FOR RESEARCH USE ONLY
Paired End Sequencing Reference Single-reads Paired-reads This is really the best way to do sequencing This is is really really the the best sequencing This is (----100 characters-------) sequencing Assembly becomes easier!! 39 FOR RESEARCH USE ONLY
Paired End Sequencing Sequenced strand is stripped off Blocked 3 -ends Sequenced strand 3 -ends of template strands and lawn primers are unblocked 40 FOR RESEARCH USE ONLY
Paired End Sequencing Single-stranded template loops over to form a bridge by hybridizing with a lawn primer 3 -ends of lawn primer is extended Bridge formation 3 extension 41 FOR RESEARCH USE ONLY
Paired End Sequencing Double stranded DNA 42 FOR RESEARCH USE ONLY
Paired End Sequencing Bridges are linearized and the original forward template is cleaved Original forward strand 43 FOR RESEARCH USE ONLY
Paired End Sequencing Free 3 ends of the reverse template and lawn primers are blocked to prevent unwanted DNA priming Blocked 3 -ends Sequencing primer Sequencing primer is hybridized to adapter sequence Reverse strand template 44 FOR RESEARCH USE ONLY
Sequencing By Synthesis 2 nd Read Add 4 Fl-NTP s + Polymerase Incorporated FI- NTP imaged Terminator & fluorescent dye cleaved from FI-NTP X 36-251 45 FOR RESEARCH USE ONLY
Sequencing Paired End Libraries with Single Index Read DNA Insert Index 46 FOR RESEARCH USE ONLY
Paired End Sequencing of Single-indexed libraries Read 1 Seq Primer (HP6) Utilizes 3 sequencing reads Read 2 Seq Primer (HP7) 1 3 Paired End Turnaround 2 Index Seq Primer (HP8) 47 FOR RESEARCH USE ONLY
Sequencing Paired End Libraries with Dual Index Read DNA Insert Index2 Index1 48 FOR RESEARCH USE ONLY
Paired End Sequencing of Dual Indexed Libraries Utilizes 4 Sequencing Reads 1 2 3 4 Paired End Turnaround
Questions?
Part II: NGS Library Preparation and Quality Control
user responsibility user illumina user / illumina Taken from: http://rnaseq.uoregon.edu/library_prep.html
Common NGS Applications RNA-Seq DNA-Seq (Whole genome, ChIP-Seq)
5 step procedure separated by Bead-based size selection
Step1: DNA/RNA Fragmentation Physical Fragmentation Acoustic shearing: breaks DNA into 100 bp-5kb (Covaris) Sonication: shears chromatin & DNA into 150 bp-1 kb (Bioruptor) Enzymatic Fragmentation (DNA endonucleases, Transposase) Considered consistent, but less random when compared to physical DNA-shearing methods Chemical Fragmentation Heat and divalent metal Cation (Mg +2 /Zn +2 ): used for breakup of RNA molecules Ideally results in 115-350 nt RNA molecules
Step 2: End repair and bead based size selection
Step 3: Adenylate 3 End
Step 4: Ligate indexed paired end adapters
Step 5: PCR enrich ligation product
RNA-Seq library preparation protocol: (TruSeq RNA v2, illumina) (Similar to a DNA-Seq library prep procedure) Total RNA Purify and Fragment mrna cdna Synthesis (First & Second strand) Ends Repair Adenylate 3 Ends Ligated Indexed Paired-End Adapters PCR Amplification
Library Validation: Critical for Successful Sequencing Sample Preparation Library Validation: Accurate quantification Library size & quality Cluster Generation cbot MiSeq Sequencing HiSeq HiScan SQ GA IIx MiSeq Data Analysis
Accurate Library Quantification (Why?): It Maximizes Data Quality and Quantity Optimized flow cell clustering determines data quality and overall data yield 20pM 10pM 5pM 1pM Overclustering can result in: Loss of data quality and data output Loss of focus Reduced base calls and Q30 scores Complete run failure Underclustering can result in: Loss of time and money Loss of focus Complete run failure
Accurate Quantification Is Critical When Multiplexing Sample Calculated concentration is 10X higher for one library in pool Expected Output Actual Output 1 16% 20% 2 16% 20% 3 16% 20% 4 16% 20% 5 16% 20% 6 16% 2% Sample Expected Output Calculated concentration is 10X lower for one library in pool Actual Output 1 16% 66% 2 16% 6% 3 16% 6% 4 16% 6% 5 16% 6% 6 16% 6%
Quantification Methods of NGS Libraries UV- spectrophotometer Nanodrop Detects nucleic acids nonspecifically Contaminants elevate values Should not be used for input or library quantification Bioanalyzer 2100 Accuracy highly dependent on dilution and sample handling Recommended for quality control only Fluorescence-based ds-dna assay Qubit or PicoGreen Specifically detects double-stranded DNA Does not discriminate incomplete libraries qpcr Specifically measures full-length libraries Detection very sensitive
Library Quantification using qpcr
Library qpcr Overview qpcr Designed to quantify only cluster-forming fragments in the samples Uses primers complementary to adapters to mimic amplification on the flow cell Only amplifies and quantifies library fragments with proper adapters at both ends
Steps for Quantifying Libraries with qpcr Step 1 Create a Control standard curve using a Control template of known concentration Step 2 Run qpcr on Control template standard curve and unknown libraries Step 3 Extrapolate concentration of unknown libraries from standard curve
Assessing Library Quality with Bioanalyzer
Agilent Bioanalyzer 2100: Overview Image from Bioanalyzer Applications for Next-Gen Sequencing: updates and tips from Agilent Technologies
Understanding a Bioanalyzer Trace Lower Marker Upper Marker Sample Peak Baseline
Understanding a Bioanalyzer Report Summary Page Sample Details
Bioanalyzer Details Region can be set in 2100 Expert software Average Library Size Don t use to quantify
Calculation of Library Molar Concentration Library concentration (ng/ul) (Fluoremetric assay Qubit, qpcr) + Average library size (bp) (BioAnalyzer/ Tapestation) Library Molar Concentration Optimized flow cell clustering & seq. data
Summary Accurate quantitation is critical for maximizing high quality data output Library quantitation is especially critical when pooling indexed libraries Library Validation Use recommended method to quantify final libraries prior to sequencing Check library quality using a Bioanalyzer 2100
Garbage in Garbage out Bad Sample Bad Library Bad Sequencing Data
user responsibility illumina user / illumina Taken from: http://rnaseq.uoregon.edu/library_prep.html
RNA Handling Best Practices: Harvest RNA quickly Use filter pipette tips Treat work area and equipment with RNAse decon soln Avoid RNA degradation Use RNAsefree plastics and solutions Store RNA by freezing Wear gloves
Best Practices Summary Follow protocol as written Take care when adding viscous reagents Complete all wash steps Follow magnetic beads best practices Heat thermocycler lid during incubations Don t over amplify libraries! Validate your libraries for quality and quantity
Questions?