Introduction to next-generation sequencing data

Similar documents
July 7th 2009 DNA sequencing

Next Generation Sequencing

NGS data analysis. Bernardo J. Clavijo

Next Generation Sequencing for DUMMIES

Next generation DNA sequencing technologies. theory & prac-ce

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

FOR REFERENCE PURPOSES

Introduction to NGS data analysis

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Data Analysis for Ion Torrent Sequencing

PreciseTM Whitepaper

DNA Sequencing. Ben Langmead. Department of Computer Science

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

An Overview of DNA Sequencing

Illumina Sequencing Technology

Concepts and methods in sequencing and genome assembly

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

How many of you have checked out the web site on protein-dna interactions?

DNA Sequence Analysis

Computational Genomics. Next generation sequencing (NGS)

MiSeq: Imaging and Base Calling

Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

SEQUENCING. From Sample to Sequence-Ready

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls?

Introduction Bioo Scientific

DNA Sequencing & The Human Genome Project

1/12 Dideoxy DNA Sequencing

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

Next Generation Sequencing: Technology, Mapping, and Analysis

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

The Biotechnology Education Company

An Introduction to Next-Generation Sequencing for in vitro Fertilization

- In , Allan Maxam and walter Gilbert devised the first method for sequencing DNA fragments containing up to ~ 500 nucleotides.

Next Generation Sequencing

How is genome sequencing done?

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

How Sequencing Experiments Fail

14/12/2012. HLA typing - problem #1. Applications for NGS. HLA typing - problem #1 HLA typing - problem #2

Overview of Next Generation Sequencing platform technologies

Troubleshooting Sequencing Data

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

G E N OM I C S S E RV I C ES

HiPer RT-PCR Teaching Kit

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Analysis of DNA methylation: bisulfite libraries and SOLiD sequencing

Next Generation Sequencing

Sanger Sequencing. Troubleshooting Guide. Failed sequence

A Brief Guide to Interpreting the DNA Sequencing Electropherogram Version 3.0

History of DNA Sequencing & Current Applications

Description: Molecular Biology Services and DNA Sequencing

Forensic DNA Testing Terminology

Electrophoresis, cleaning up on spin-columns, labeling of PCR products and preparation extended products for sequencing

Cluster Generation. Module 2: Overview

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Universidade Estadual de Maringá

NGS Technologies for Genomics and Transcriptomics

Bioanalyzer Applications for

Introduction To Real Time Quantitative PCR (qpcr)

Sequencing the Human Genome

Thermo Scientific DyNAmo cdna Synthesis Kit for qrt-pcr Technical Manual

Introduction. Preparation of Template DNA

Genomic Services and Development Unit User Manual

Algorithms for Next Generation Sequencing Data Analysis

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

The RNAi Consortium (TRC) Broad Institute

Recombinant DNA and Biotechnology

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

DNA Sequencing Handbook

Sequencing Library qpcr Quantification Guide

Athanasia Pavlopoulou University of Thessaly, Lamia June 2015

1. Molecular computation uses molecules to represent information and molecular processes to implement information processing.

Reading DNA Sequences:

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Procedures For DNA Sequencing

BIOO LIFE SCIENCE PRODUCTS

Welcome to Pacific Biosciences' Introduction to SMRTbell Template Preparation.

Reduced Representation Bisulfite-Seq A Brief Guide to RRBS

Bioinformatics I, WS 09-10, D. Huson, January 27,

Illumina TruSeq DNA Adapters De-Mystified James Schiemer

DNA sequencing. Dideoxy-terminating sequencing or Sanger dideoxy sequencing

VLLM0421c Medical Microbiology I, practical sessions. Protocol to topic J10

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

Biotechnology: DNA Technology & Genomics

Genomics 92 (2008) Contents lists available at ScienceDirect. Genomics. journal homepage:

qpcr Quantification Protocol Guide

LESSON 9. Analyzing DNA Sequences and DNA Barcoding. Introduction. Learning Objectives

SRA File Formats Guide

Sequencing Guidelines Adapted from ABI BigDye Terminator v3.1 Cycle Sequencing Kit and Roswell Park Cancer Institute Core Laboratory website

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Genomic DNA Clean & Concentrator Catalog Nos. D4010 & D4011

DNA sequencing is the process of determining the precise order of the nucleotide bases in a particular DNA molecule. In 1974, two methods of DNA

Improved methods for site-directed mutagenesis using Gibson Assembly TM Master Mix

CCR Biology - Chapter 9 Practice Test - Summer 2012

High Performance Compu2ng Facility

Transcription:

Introduction to next-generation sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/research-centres/cem/

Outline History of DNA sequencing NGS or massively parallel sequencing How it works: Illumina sequencing by synthesis Library preparation Clonal amplification future single molecule Characteristics of the data: Quality control Base calling and quality (FastQ format) Phasing and homopolymers Trimming Implications of PCR Duplicates and bias Contamination

Sequencing time-line 2014 : Illumina HiSeq X10 - $1,000 Genome? Andy Vierstraete

Conventional DNA sequencing Dideoxy terminator Sanger method Fluorescent dyes Gel electrophoresis 1 lane = 1 sequence Capillary electrophoresis Primer Electropherogram "G" tube: All four dntp's, ddgtp and DNA polymerase "A" tube: All four dntp's, ddatp and DNA polymerase "T" tube: All four dntp's, ddttp and DNA polymerase "C" tube: All four dntp's, ddctp and DNA polymerase http://www.bio.davidson.edu/courses/molbio/molstudents/ spring2003/obenrader/sanger_method_page.htm

Next Generation Sequencing (NGS) Process millions of sequencing reads in parallel Common concept is the analysis of millions of sequences associated with a solid surface (or in wells) Contrast with traditional gel electrophoresis Range of platforms available Illustrate with Illumina Ion Torrent (Life Technologies/Thermo Fisher)

NGS workflow Library preparation RNA DNA Fragmentation/size selection Addition of adaptors Template preparation: Single molecule clonal amplification Bridge PCR on a slide (cluster generation) Emulsion PCR Sequencing Reversible terminator (Illumina) Semiconductor (Ion Torrent) Single molecule (Nanopore)

Overview of DNA-Seq and RNA-Seq Genomic DNA cdna library AAAAAAA Extract RNA Fragmented DNA Library TACATTTGGGAAAAGTAAATTTGCTGAAAATAATCCCGGT AAGAAAGAAACACTTTTCATGTAATTAGCTTTTTTACATC AAACTTCAGAACCCAAAGTCATTGAGAATATTAGGGATCA CAGAACCACATGAGTCAGAATCATCAGAATATCCCACCAA AGGAGAAGGAAGGAGCAGAGGATTCAAAAGGAAATGGAAT GATGAATATGAAGAAATGTCAGAAATGAAAGAAGGGAAAG GAAATTGAATTCGATGAAATAAATGATACTTGCTTATCTG...... >10 million reads Exon 1 Exon 2 Reference sequence Massively parallel sequencing Align to reference sequence

Library preparation http://res.illumina.com/documents/products/research_reviews/sequencing-methods-review.pdf

Illumina: Cluster generation Clonal amplification achieved by generating clusters on the surface of a flow cell (slide) See SBS technology video at www.illumina.com/

Massively parallel sequencing Glowing dots on a glass slide mark cloned DNA being sequenced

Reading the sequence Wash over all 4 nucleotides each with a fluorescent dye Only one complementary nucleotide incorporated

Illumina: Sequencing by synthesis: Prepare libraries with different index sequences Pool and sequence together multiplexing

Platforms Illumina has several instruments Desktop-sized MiSeq that can complete smaller runs in under a day NextSeq 500 High throughput HiSeq 2500 Ion Torrent semi-conductor sequencing (Life Technologies) Fast, cheap entry level, output increasing rapidly Personal Genome Machine Proton HiSeq 2500 PGM 314 chip Proton P1 chip Total output 600/120 Gb up to 100Mb 10Gb Run time 11 days/27 hrs 2-4 hrs 2-4 hrs Output/day 55 Gb up to 200 Mb ~20 Gb Read length 2 x 100/150bp up to 400b up to 200bp # of single reads 3/0.6 Billion up to 0.6M up to 82 Million

Ion torrent Semiconductor sequencing Incorporation of a nucleotide changes ph Beads with template attached (prepared by emulsion PCR) No optics required! Detected on a semiconductor sequencing chip

Signal processing to optimise base calling Signal Decay Phase correction phasing is the rate at which single molecules within a cluster loose sync with each other. Incomplete Extension Limit read length Further discussion Ion torrent: http://biolektures.wordpress.com/2011/08/10/fundamentals-of-base-calling-part-1/ Illumina: http://pathogenomics.bham.ac.uk/blog/2013/11/diagnosing-problems-with-phasing-and-pre-phasing-on-illumina-platforms/

Read length and quality Per base sequence quality Phred quality score: Q an integer mapping of p, the probability that the corresponding base call is incorrect Damien Gregory: http://www.somewhereville.com/?p=1508

FASTQ format Nucleotide sequence and associated quality score (represented by ASCI characters) Illumina: Flowcell lane & tile X'-and Y coordinates of the cluster Index of multiplex sample @PSI179204_0007:4:1:1025:10482#0/1 GAGCAAAATTGTAGAAGAATTCAGGATCTCGTATGCCGTC +PSI179204_0007:4:1:1025:10482#0/1 C-:AC:?5:C-AAA-5>-,A5A>5:A?-DD?5A::>;><B P. J. A. Cock, C. J. Fields, N. Goto, M. L. Heuer and P. M. Rice, The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research, 2010, Vol. 38, No. 6, 1767 1771 doi:10.1093/nar/gkp1137

Homopolymers (runs of the same nucleotide) Illumina: Flow all 4 nucleotides, incorporate single one Ion torrent: Sequential flows of individual unmodified nucleotides Ionogram (Ion torrent) EBI

Trimming Quality Ends Adaptors Clip adaptors (fastx clipper) Adaptor A Insert Adaptor B Adaptor A Adaptor B FASTX-toolkit by Assaf Gordon

Implications of PCR Duplicate reads Erroneous quantification or variant detection Uneven coverage Additional sequencing required to achieve minimal coverage

Single nucleotide resolution High specificity Show ZEB1 mutation ZEB1 exon 7 Mutation: c.1920g>t p.gln640his CAG = Gln CAT = His

Contamination Sample mix ups (!) - indexing Carry-over from previous run FastQ screen

Single molecule sequencing: Nanopore Single-stranded DNA polymer is passed through a protein nanopore Individual DNA bases on the strand are identified in sequence as the DNA molecule passes through Oxford Nanopore https://www.nanoporetech.com/news/movies#movie-24-nanopore-dna-sequencing

Summary NGS works by sequencing millions of reads in parallel Library preparation Add adaptors to DNA of interest Requires clonal amplification (template preparation) Sequence data presented in FastQ format Quality control critical Errors inherent in the technology, eg. Phasing and homopolymers, PCR Trimming Contamination

To analyze NGS data effectively you need to understand the technology