New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova



Similar documents
Analysis of DNA methylation: bisulfite libraries and SOLiD sequencing

Next Generation Sequencing: Technology, Mapping, and Analysis

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Next Generation Sequencing

Next Generation Sequencing

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Next Generation Sequencing for DUMMIES

Next generation DNA sequencing technologies. theory & prac-ce

Concepts and methods in sequencing and genome assembly

DNA Sequencing & The Human Genome Project

Introduction to next-generation sequencing data

How is genome sequencing done?

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

How many of you have checked out the web site on protein-dna interactions?

An Overview of DNA Sequencing

Computational Genomics. Next generation sequencing (NGS)

DNA Sequence Analysis

Introduction to NGS data analysis

Overview of Next Generation Sequencing platform technologies

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Techniques in Molecular Biology (to study the function of genes)

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

July 7th 2009 DNA sequencing

Recombinant DNA Unit Exam

CCR Biology - Chapter 9 Practice Test - Summer 2012

G E N OM I C S S E RV I C ES

Single Nucleotide Polymorphisms (SNPs)

History of DNA Sequencing & Current Applications

restriction enzymes 350 Home R. Ward: Spring 2001

Real-Time PCR Vs. Traditional PCR

Biotechnology: DNA Technology & Genomics

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

Core Facility Genomics

Description: Molecular Biology Services and DNA Sequencing

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

HCS Exercise 1 Dr. Jones Spring Recombinant DNA (Molecular Cloning) exercise:

How Sequencing Experiments Fail

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Difficult DNA Templates Sequencing. Primer Walking Service

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

A Primer of Genome Science THIRD

Next-generation DNA sequencing techniques

Technical Note. Roche Applied Science. No. LC 18/2004. Assay Formats for Use in Real-Time PCR

14.3 Studying the Human Genome

Introduction Bioo Scientific

Recombinant DNA and Biotechnology

Forensic DNA Testing Terminology

Genomics GENterprise

MiSeq: Imaging and Base Calling

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

Genomics 92 (2008) Contents lists available at ScienceDirect. Genomics. journal homepage:

HiPer RT-PCR Teaching Kit

Bioinformatic Approaches for Genome Finishing

LifeScope Genomic Analysis Software 2.5

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

Universidade Estadual de Maringá

Nucleic Acid Techniques in Bacterial Systematics

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Next-Generation Sequencing: From Basic Research to Diagnostics

Welcome to Pacific Biosciences' Introduction to SMRTbell Template Preparation.

Introduction To Real Time Quantitative PCR (qpcr)

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

Reading DNA Sequences:

Personal Genome Sequencing with Complete Genomics Technology. Maido Remm

Genome Sequencer System. Amplicon Sequencing. Application Note No. 5 / February

CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA

Fluorescent dyes for use with the

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

PreciseTM Whitepaper

- In , Allan Maxam and walter Gilbert devised the first method for sequencing DNA fragments containing up to ~ 500 nucleotides.

Mir-X mirna First-Strand Synthesis Kit User Manual

DNA Sequencing Overview

Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

All your base(s) are belong to us

Illumina Sequencing Technology

Gene Expression Assays

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

Genetic Technology. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Design of conditional gene targeting vectors - a recombineering approach

ONLINE SUPPLEMENTAL MATERIAL. Allele-Specific Expression of Angiotensinogen in Human Subcutaneous Adipose Tissue

Gene Models & Bed format: What they represent.

Gene Mapping Techniques

Mitochondrial DNA Analysis

SNP genotyping. Gene expression. And now Solexa sequencing.

1. Molecular computation uses molecules to represent information and molecular processes to implement information processing.

DNA Fingerprinting. Unless they are identical twins, individuals have unique DNA

FOR REFERENCE PURPOSES

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

SEQUENCING. From Sample to Sequence-Ready

Complete Genomics Sequencing

Reduced Representation Bisulfite Sequencing for Methylation Analysis Preparing Samples for the Illumina Sequencing Platform

DNA-functionalized hydrogels for confined membrane-free in vitro transcription/translation

SOLiD System accuracy with the Exact Call Chemistry module

Structure and Function of DNA

Procedures For DNA Sequencing

Transcription:

New generation sequencing: current limits and future perspectives Giorgio Valle CRIBI Università di Padova

Around 2004 the Race for the 1000$ Genome started A few questions... When? How? Why?

Standard strategy for shotgun sequencing

Why genomic sequencing is (was) difficult... Three main problems make difficult the process of sequencing long regions of DNA: 1. 2. 3. The current Sanger technology does not allow to read more that 900-1000 bases per run. Therefore longer pieces of DNA require to be sequenced in parts that must then be assembled (typically by shotgun approach). The sensitivity of the current Sanger technology does not allow to read the signal from a single DNA molecule. Therefore, the fragment of DNA to be sequenced must be physically amplified and the signal is obtained from many identical fragments. The Sanger technology requires the separation of the sequencing reaction by individual electrophoresis.

Sequencing by DNA synthesis (Sanger)

Next (now) generation sequencing 1. Cloning in bacteria should not be required. 2. Electrophoresis step should not be required. Three main technologies are currently available: Roche/454 (pyrosequencing) Illumina/Solexa (modified sanger) Applied Biosystems SOLiD (sequencing by ligation)

1. Cloning in bacteria should not be required. 2. Electrophoresis step should not be required. How can we avoid the bacterial cloning process? 1. Single molecule sequencing 2. Molecular cloning without bacteria There are two main strategies to perform molecular cloning without bacteria, using PCR colonies often called polymerase colonies or polonies : Emulsion PCR (used by Roche and SOLiD) PCR Bridge amplification (used by Illumina)

1. Cloning in bacteria should not be required. 2. Electrophoresis step should not be required. Sequencing without electrophoresis

Pyrosequencing

Illumina chemistry

SOLiD chemistry: ligation probes 3 Ligation site, cleavage site & dye are spatially separated Cleavage site 3 Ligation site Fluorescent dye interrogates base on 1st + 2nd position 2nd Base A C G T A T n n n z z z N=degenerate bases, Z=universal bases 45 = 1024 probes (256 probes per color) es t1as B Ligation Probes are Octamers A C G T

SOLiD 4-color ligation Ligation reaction universal seq primer 3 5 ligase Y-probe 3 5 3 5 XXnnnzzz 1µm 1µm bead bead 5 P1 Primer 5 XXnnnzzz X Xn n n z z z B-probe G-probe Template Sequence 3 R-probe 5 XXnnnzzz 3

SOLiD 4-color ligation Ligation reaction ligase Y-probe 3 5 3 5 XXnnnzzz X Xn n n z z z B-probe G-probe 5 XXnnnzzz 3 R-probe 5 XXnnnzzz ligase universal seq primer 1µm 1µm bead bead p5 xx 5 P1 Primer Template Sequence 3

SOLiD 4-color ligation Visualization universal seq primer 1µm 1µm bead bead xx 5 P1 Primer Template Sequence 3 Y 1-2

SOLiD ligation-based sequencing chemistry (2) Image Cap unextended strands Cleave-off fluor

SOLiD 4-color ligation Cleavage universal seq primer 1µm 1µm bead bead xx 5 P1 Primer p5 Template Sequence 3 Y 1-2

SOLiD 4-color ligation Ligation (2nd cycle) ligase Y-probe 3 5 3 5 XXnnnzzz X Xn n n z z z B-probe G-probe 5 XXnnnzzz 3 R-probe 5 XXnnnzzz ligase universal seq primer 1µm 1µm bead bead xx 5 Adapter Oligo Sequence xx Template Sequence 3 Y 1-2

SOLiD 4-color ligation Visualization (2nd cycle) universal seq primer 1µm 1µm bead bead XX 5 xx Adapter Oligo Sequence Template Sequence Y R 1-2 6-7 3

SOLiD 4-color ligation Cleavage (2nd cycle) universal seq primer 1µm 1µm bead bead XX 5 xx Adapter Oligo Sequence p5 Template Sequence Y R 1-2 6-7 3

SOLiD 4-color ligation interrogates every 4th-5th base universal seq primer 1µm 1µm bead bead XX 5 XX XX Adapter Oligo Sequence XX XX Template Sequence Y R R B 1-2 6-7 11-12 16-17 21-22 3 G

SOLiD 4-color ligation Reset 1µm 1µm bead bead 5 Adapter Oligo Sequence Template Sequence 3

SOLiD 4-color ligation (1st cycle after reset) universal seq primer n-1 3 p5 ligase Y-probe 3 5 3 5 XXnnnzzz X Xn n n z z z B-probe G-probe 5 XXnnnzzz 3 R-probe 5 XXnnnzzz ligase universal seq primer n-1 p5 1µm 1µm bead bead xx 5 Adapter Oligo Sequence Template Sequence 3

SOLiD 4-color ligation (1st cycle after reset) universal seq primer n-1 1µm 1µm bead bead xx 5 Adapter Oligo Sequence Template Sequence 3 R 0-1

SOLiD 4-color ligation (2nd Round) universal seq primer n-1 1µm 1µm bead bead XX 5 XX XX Adapter Oligo Sequence XX XX Template Sequence R R R B G 01 56 1011 1516 2021 3

Sequential rounds of sequencing Multiple cycles per round 1µm 1µm bead bead 5 Adapter Oligo Sequence 3 Template Sequence universal seq primer 1-2 3 reset 11-12 16-17 21-22 universal seq primer n-1 0-1 3 reset 5-6 10-11 15-16 20-21 14-15 19-20 24-25 universal seq primer n+3 3 reset 4-5 spacer 9-10 universal seq primer n+2 3-4 3 8-9 13-14 18-19 23-24 spacer reset universal seq primer n+1 3 6-7 spacer 2-3 7-8 12-13 17-18 22-23

2 base pair encoding reference alignment in color space A C G G T C G T C G T G T G C G T reference expected observed A C G G T C G C C G T G T G C G T A SNP to be real must be encoded by two color changes

rd 3 generation sequencing Single Molecule Sequencing

Pacific Biosciences

Michael R. Stratton, Peter J. Campbell & P. Andrew Futreal Nature 458, 719-724 (9 April 2009)

~2004: Start of the race to the 1000$ genome Target 1: 1000x cost reduction in 5 years * Target 2: further 100x cost reduction in another 5 years *

Genomics at CRIBI 1990 1995 2000 2005 2010 CRIBI Olive YEAST GENOME Wheat YEAST FAN Tomato Arabidopsis Grape Genome Telethon: Muscle functional genomics P. profundum AGER CHROMUS Nannochloropsis... more... DNA seq service BMR Genomics

SOLiD 5500 Main applications Whole genome sequencing Chromatin immunoprecipitation (ChIP) Microbial and eukaryotic resequencing Digitial karyotyping Structural variations Genotyping Gene expression Small RNA discovery

Bioinformatics at CRIBI RNA-seq Resequencing mirna de novo genomic sequencing DNA methylation ChIP De novo Assembly Mapping reads Gene prediction SNPs & structural variations Expression analysis Gene annotation Data management and analysis

Mapping reads on the genome: developmentally regulated alternative splicing

Reads alignment CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment Svetlin A Manavski and Giorgio Valle BMC Bioinformatics 2008, 9: S10 PASS: a Program to Align Short Sequences Davide Campagna, Alessandro Albiero, Alessandra Bilardi, Elisa Caniato, Claudio Forcato, Svetlin Manavski, Nicola Vitulo and Giorgio Valle Bioinformatics 2009

http://pass.cribi.unipd.it

Mate pair signatures

Mate pair libraries

CRIBI APPROACH STEP 1 Use of insert length statistics for the identification of structural variations STEP 2 Use of sequence alignment ( splice-like alignment) for the identification of the precise points of insertion/deletion.

FIRST STEP 4 indexes are created: unique MP, right distance, right orientation useful for short indels (difference between observed and expected, after filtering low coverage) unique MP, wrong distance, rigth orientation useful for long deletions (physical coverage) unique MP, wrong orientation useful for inversions (physical coverage) unique reads lacking the partner useful for long insertions (number of reads)

SECOND STEP The alignment of a structural variation aligns like a splicing site Reads that cover a breakpoint can be spliced-aligned, showing a pattern of alignment compatible with that specific structural variation By analysing these patterns, it is possibile to detect the correct breakpoint with a base-precision

Long deletions

Short deletions

Long insertions

Inversions and more

Results on random SV random genome structural variations randomly added (type, position, length, hetero/homozygosity) SNP random added (type, position, nucleotide) SV \ Coverage 5X 10X 20X 40X DELETIONS 58% 59% 72% 90% INSERTIONS 43% 76% 78% 85% INVERSIONS 56% 58% 74% 88%

How can we make sense of the data?

Advanced query platform

Acknowledgement Genomics Bioinformatics Stefano Campanaro Alessandro Vezzi Michela D'Angelo Rosanna Zimbello Riccardo Schiavon Chiara Rigobello Elisa Corteggiani Carpinelli Riccardo Rosselli Fabio De Pascale Nicola Vitulo Davide Campagna Erika Feltrin Claudio Forcato Alessandro Albiero Elisa Caniato Alessandro Maccagnan Gianpiero Zamperin Andrea Telatin Georgine Faulkner Rusha Guha Lisa Marchioretto