Next generation DNA sequencing technologies. theory & prac-ce



Similar documents
Next Generation Sequencing

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

July 7th 2009 DNA sequencing

Introduction to next-generation sequencing data

NGS data analysis. Bernardo J. Clavijo

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

RNAseq / ChipSeq / Methylseq and personalized genomics

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Introduction to NGS data analysis

Overview of Next Generation Sequencing platform technologies

PreciseTM Whitepaper

G E N OM I C S S E RV I C ES

SEQUENCING. From Sample to Sequence-Ready

Next Generation Sequencing: Technology, Mapping, and Analysis

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

Data Analysis for Ion Torrent Sequencing

Next Generation Sequencing for DUMMIES

14/12/2012. HLA typing - problem #1. Applications for NGS. HLA typing - problem #1 HLA typing - problem #2

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls?

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

TruSeq Custom Amplicon v1.5

Computational Genomics. Next generation sequencing (NGS)

How many of you have checked out the web site on protein-dna interactions?

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Illumina Sequencing Technology

School of Nursing. Presented by Yvette Conley, PhD

TGC AT YOUR SERVICE. Taking your research to the next generation

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

DNA Sequencing & The Human Genome Project

IMBB Genomic DNA purifica8on

Dal proge*o genoma umano ad oggi: evoluzione delle tecniche di sequenziamento, analisi genomica e proteomica e prospe9ve future!

Concepts and methods in sequencing and genome assembly

Core Facility Genomics

NGS Technologies for Genomics and Transcriptomics

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

History of DNA Sequencing & Current Applications

Introduction Bioo Scientific

How is genome sequencing done?

An Introduction to Next-Generation Sequencing for in vitro Fertilization

NEXT GENERATION SEQUENCING

Services. Updated 05/31/2016

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

The NGS IT notes. George Magklaras PhD RHCE

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Microbial Oceanomics using High-Throughput DNA Sequencing

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

PrimePCR Assay Validation Report

Techniques in Molecular Biology (to study the function of genes)

FOR REFERENCE PURPOSES

GenomeStudio Data Analysis Software

Athanasia Pavlopoulou University of Thessaly, Lamia June 2015

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

Next Generation Sequencing data Analysis at Genoscope. Jean-Marc Aury

Biotechnology: DNA Technology & Genomics

Introduction To Real Time Quantitative PCR (qpcr)

How Sequencing Experiments Fail

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

A Primer of Genome Science THIRD

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

Next Generation Sequencing; Technologies, applications and data analysis

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG)

Transcription and Translation of DNA

The Chinese University of Hong Kong igem 2010 Bacterial based storage and encryp2on device

GenomeStudio Data Analysis Software

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Key Principles and Clinical Applications of "Next-Generation" DNA Sequencing

The Steps. 1. Transcription. 2. Transferal. 3. Translation

Analysis of NGS Data

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

Next Generation Sequencing; Technologies, applications and data analysis

Real-time quantitative RT -PCR (Taqman)

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production

Overview sequence projects

GeneSifter: Next Generation Data Management and Analysis for Next Generation Sequencing

Delivering the power of the world s most successful genomics platform

Human Genome and Human Genome Project. Louxin Zhang

Disease gene identification with exome sequencing

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

CCR Biology - Chapter 9 Practice Test - Summer 2012

DNA Sequencing and PCR Markets

Human Genome Organization: An Update. Genome Organization: An Update

Nicolas Pons INRA Ins(tut Micalis Plateforme MetaQuant Jouy- en- Josas, France

DNA Sequence Analysis

An Overview of DNA Sequencing

Welcome to Pacific Biosciences' Introduction to SMRTbell Template Preparation.

Transcription:

Next generation DNA sequencing technologies theory & prac-ce

Outline Next- Genera-on sequencing (NGS) technologies overview NGS applica-ons NGS workflow: data collec-on and processing the exome sequencing pipeline

PART I: NGS technologies Next- Genera-on sequencing (NGS) technologies overview

Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977 A Maxam and W Gilbert "DNA seq by chemical degrada-on" F Sanger"DNA sequencing with chain- termina-ng inhibitors" 1984 DNA sequence of the Epstein- Barr virus, 170 kb 1987 Applied Biosystems - first automated sequencer 1991 Sequencing of human genome in Venter's lab 1996 P. Nyrén and M Ronaghi - pyrosequencing 2001 A drah sequence of the human genome 2003 human genome completed 2004 454 Life Sciences markets first NGS machine

Massive parallel sequencing 11/10/13

DNA Sequencing the next generation Commercially available technologies Roche 454 GSFLX -tanium Junior Illumina HiSeq2000/2500 MiSeq Life SOLiD 5500 Ion torrent/proton (Helicos BioSciences HeliScope) Pacific Biosciences PacBio RS 11/10/13

DNA Sequencing the next generation The newer technologies cons-tute various strategies that rely on a combina-on of Library/template prepara-on Parallel sequencing

Template preparation: STEP1

Template preparation Produce a non- biased source of nucleic acid material from the genome Current methods: randomly breaking genomic DNA into smaller sizes Ligate adaptors anach or immobilize the template to a solid surface or support the spa-ally separated template sites allows thousands to billions of sequencing reac-ons to be performed simultaneously

Template preparation Clonal amplifica-on Roche 454 Illumina HiSeq Life SOLiD Life Ion Torrent Single molecule sequencing Helicos BioSciences HeliScope Pacific Biosciences PacBio RS

Template preparation: Clonal amplification In solu-on emulsion PCR (empcr) Roche 454 Life SOLiD Solid phase Bridge PCR Illumina HiSeq Life SOLiD - wildfire 11/10/13

Template preparation: Clonal amplification empcr 11/10/13

Template preparation: SOLiD 454 Ion Torrent Jeroen Van Houdt - Genomics Core - UZ Leuven- KU Leuven 11/10/13

Template preparation: Clonal amplification Bridge PCR 11/10/13

Template preparation: Single molecule templates Heliscope PacBio 11/10/13

Sequencing Sequencing By Synthesis (SBS) Roche 454 Illumina HiSeq Life Ion Torrent (label- free) Helicos BioSciences HeliScope Pacific Biosciences PacBio RS Sequecing By Liga-on Life SOLiD 11/10/13

454 - Pyrosequencing Pico-tre plate Pyrosequencing 11/10/13

454 - Pyrosequencing 11/10/13

Ion torrent label free sequencing 11/10/13

HiSeq Heliscope

Sequencing PacBio single molecule 11/10/13

Sequencing by ligation 11/10/13

DNA Sequencing The major advance offered by NGS is the ability to cheaply produce an enormous volume of data The arrival of NGS technologies in the marketplace has changed the way we think about scien-fic approaches in basic, applied and clinical research NGS allows to study different aspects of the gene-c architecture at the whole genome scale 11/10/13

Whole-Genome SEQUENCING DNA SEQUENCING

Whole-Genome SEQUENCING

WGS - Copy number variation analysis

WGS - Structural variation analysis

Whole-Genome Sequencing (WGS) Copy number varia-on analysis Sequencing a genome at 0.1-0.3x Sequencing a genome at 1-3x Structural varia-on analysis Sequencing a genome at 5-10x Whole genome re- sequencing Sequencing a genome at >30x yeast, fruit fly, bacterial genomes, human

Targeted re-sequencing DNA SEQUENCING

The beginning Random genome sequencing?????? Sanger sequencing Targeted 700-1000 bp

DNA Sequencing the next generation Library/template prepara-on Library enrichment for target Sequencing and imaging

Target enrichment strategies Random genome sequencing Hybrid Capture PCR based Sanger sequencing

Target enrichment strategies

Target enrichment strategies

Target enrichment strategies

Hybrid Capture In solu-on Agilent Nimblegen... Solid phase Agilent Nimblegen Febit...

Hybrid Capture In solu-on Rela-vely cheap High throughput is possible Small amounts of DNA sufficient Solid phase Straighworward method Flexible Higher amounts of DNA

PCR based approaches Uniplex Mul-plex Fluidigm Raindance Mul-plicon Longrange PCR products Raindance

RNA Sequencing Rapid expression profiling, transcriptome sequencing and small RNA s

RNA-seq

RNAseq: Gene Expression through sequencing Supports discovery, screening, and profiling Does not require prior gene knowledge or annota-on Unique combina-on of Qualita-ve and quan-ta-ve measurement Digital counts vs analog intensi-es Increased dynamic range and sensi-vity No probes or primers Any species - Even when reference genome not available Analyze gene expression

RNAseq: summary Coun-ng or Profiling 10 million total reads of 35 bp length from poly- A selected RNA will give performance bener than any microarray Studying Alterna-ve Splicing or quan-fying csnps for most transcripts Deeper profiling of 50 to 100 million reads, with read lengths of 50 to 100 bps, from poly- A selected RNA using mrna- Seq assay Complete Annota-on of an en-rely New Transcriptome ~500 Million reads of 100 bp read length from mul-ple -ssues Normalized stranded mrna- Seq & ncrnas Small RNA- Seq for micrornas

PART III: NGS workflow data collec-on and processing the exome sequencing pipeline

Whole Exome Sequencing The human genome Genome = 3Gb Exome = 30Mb 180 000 exons Protein coding genes cons-tute only approximately 1% of the human genome It is es-mated that 85% of the muta-ons with large effects on disease- related traits can be found in exons or splice sites

Exome sequencing gdna 3 Gb Exome 38Mb NGS

The past, present & future exome capture Seq - 2.5Gbases total cost 7000 5900 3460 2600 1100 860 300 1000 1300 Jeroen Van Houdt - 1/01/2010 Genomics Core - UZ Leuven- KU Leuven 1/08/2010 1/01/2011

Exome sequencing capacity HiSeq specifica-ons: 2 flow cells 16 lanes (8 per flow cell) 200-300 Gbases per flow cell 10 days for a single run Exome throughput 96 @ 60x coverage per run 3000 @ 60x coverage per year

Exome sequencing

Data processing workflow Data forma ng & QC Mapping & QC Variant calling Variant annota-on Variant filtering/comparison

DATA GENERATION DATA PROCESSING DATA STORAGE INTERPRETATION RESULTS REPORTING & VALIDATION

DATA GENERATION Prepare sample library Perfom exome capture Perform sequencing

DATA GENERATION Prepare sample library Perfom exome capture Perform sequencing

DATA GENERATION Prepare sample library Perfom exome capture Perform sequencing

DATA GENERATION DATA PROCESSING DATA STORAGE Image processing Base calling Sequence Data 10-15 Gb / exome

NGS data processing: overview 1 Mapping 2 3 4 5 Duplicate marking Local realignment Base quality recalibra-on Analysis- ready mapped reads

DATA GENERATION DATA PROCESSING DATA STORAGE Image processing Base calling Sequence Data 10-15 Gb / exome QC sequencing Mapping sequences QC capture exp

DATA PROCESSING QC NGS Mapping QC HC

DATA PROCESSING QC NGS Mapping QC HC

DATA GENERATION DATA PROCESSING DATA STORAGE Image processing Base calling Sequence Data 10-15 Gb / exome QC sequencing Mapping sequences QC capture exp Mapping results 5 Gb / exome Variant Calling Variant Annota-on

DATA GENERATION DATA PROCESSING DATA STORAGE Image processing Base calling Sequence Data 10-15 Gb / exome QC sequencing Mapping sequences QC capture exp Mapping results 5 Gb / exome Variant Calling Variant Annota-on Variant Calls 100Mb / exome

SNPs vs Indels 1200000 1000000 800000 600000 INDEL SNP 400000 200000 0

exonic vs non-exonic 1000000 900000 800000 700000 600000 500000 400000 300000 stopgain SNV nonsynonymous SNV nonframeshih inser-on nonframeshih dele-on non- coding frameshih inser-on frameshih dele-on 200000 100000 0

Exonic 20000 18000 16000 14000 12000 10000 8000 6000 4000 synonymous SNV stoploss SNV stopgain SNV nonsynonymous SNV nonframeshih inser-on nonframeshih dele-on frameshih inser-on frameshih dele-on 2000 0

Exonic 500 450 400 350 300 250 200 150 stoploss SNV stopgain SNV nonframeshih inser-on nonframeshih dele-on frameshih inser-on frameshih dele-on 100 50 0

DATA GENERATION DATA PROCESSING DATA STORAGE Image processing Base calling Sequence Data 10-15 Gb / exome QC sequencing Mapping sequences QC capture exp Mapping results 5 Gb / exome Variant Calling Variant Annota-on Variant Calls 100Mb / exome Variant Filtering Database known Variants Public & Private

DATA GENERATION DATA PROCESSING DATA STORAGE Image processing Base calling Sequence Data 10-15 Gb / exome INTERPRETATION QC sequencing Mapping sequences QC capture exp Mapping results 5 Gb / exome RESULTS Validated variants in candidate genes Variant Calling Variant Annota-on Variant Calls 100Mb / exome REPORTING & VALIDATION Variant Filtering Database known Variants Public & Private

DNA Sequencing the next generation 11/10/13