Application du séquençage génomique et transcriptomique à haut débit au Genoscope. Jean-Marc Aury

Similar documents

Next Generation Sequencing data Analysis at Genoscope. Jean-Marc Aury

Next Generation Sequencing

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Next Generation Sequencing: Technology, Mapping, and Analysis

Next generation DNA sequencing technologies. theory & prac-ce

Computational Genomics. Next generation sequencing (NGS)

Introduction to next-generation sequencing data

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

July 7th 2009 DNA sequencing

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

NGS data analysis. Bernardo J. Clavijo

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

Introduction to NGS data analysis

History of DNA Sequencing & Current Applications

Overview of Next Generation Sequencing platform technologies

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

DNA Sequencing & The Human Genome Project

A Primer of Genome Science THIRD

G E N OM I C S S E RV I C ES

Core Facility Genomics

NGS Technologies for Genomics and Transcriptomics

An Overview of DNA Sequencing

LifeScope Genomic Analysis Software 2.5

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG)

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

CAMÉRAS EMCCD POUR LA QUANTIFICATION DE FLUORESCENCE IN VIVO

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Next Generation Sequencing; Technologies, applications and data analysis

Analysis of NGS Data

Microbial Oceanomics using High-Throughput DNA Sequencing

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

How is genome sequencing done?

Concepts and methods in sequencing and genome assembly

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Next Generation Sequencing for DUMMIES

Next generation sequencing (NGS)

Introduction to Bioinformatics 3. DNA editing and contig assembly

QUALITY AND SAFETY TESTING

Data Analysis for Ion Torrent Sequencing

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

Text file One header line meta information lines One line : variant/position

escience and Post-Genome Biomedical Research

Next Generation Sequencing; Technologies, applications and data analysis

De Novo Assembly Using Illumina Reads

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Challenges associated with analysis and storage of NGS data

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

Human Genome Organization: An Update. Genome Organization: An Update

Illumina Sequencing Technology

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

Frequently Asked Questions Next Generation Sequencing

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

SEQUENCING. From Sample to Sequence-Ready

Introduction au BIM. ESEB Seyssinet-Pariset Economie de la construction contact@eseb.fr

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

NEXT GENERATION SEQUENCING

Welcome to Pacific Biosciences' Introduction to SMRTbell Template Preparation.

Algorithms for Next Generation Sequencing Data Analysis

Keeping up with DNA technologies

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Genomics GENterprise

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

School of Nursing. Presented by Yvette Conley, PhD

Big data in cancer research : DNA sequencing and personalised medicine

An Introduction to Next-Generation Sequencing Technology

Structure and Function of DNA

DNA Sequence Analysis

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

RNAseq / ChipSeq / Methylseq and personalized genomics

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

CHALLENGES IN NEXT-GENERATION SEQUENCING

PrimePCR Assay Validation Report

MUTATION, DNA REPAIR AND CANCER

To be able to describe polypeptide synthesis including transcription and splicing

Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne

Techniques in Molecular Biology (to study the function of genes)

Analysis of ChIP-seq data in Galaxy

Transcription:

Application du séquençage génomique et transcriptomique à haut débit au Genoscope Jean-Marc Aury

Introduction Presentation of Genoscope and NGS activities Overview of sequencing technologies Sequencing and assembly of prokaryotic and eukaryotic genomes Annotating genomes using massive-scale RNA-Seq Future projects, applications and Next-Next Generation Sequencing

Genoscope (National Sequencing center) Among the largest sequencing center in Europe http://www.genoscope.cns.fr Part of the CEA Institut de Génomique since 05/2007 Provide high-throughput sequencing data to the French Academic community, and carry out in-house genomic projects Involved in large genome projects : human genome project, arabidopsis, rice, Coordination of large genome projects : tetraodon, paramecium, vitis, oikopleura, and as well fungal genomes (botrytis, tuber) and prokaryotic genomes

Genoscope (National Sequencing center) NGS activities : http://www.genoscope.cns.fr Sequencing of prokaryotic genomes (2007) RNA-Seq / Annotation of eukaryotic genomes (2008) SNP calling : identification of mutations (2008) Metagenomic projects (2008) Sequencing of large eukaryotic genomes (2009/2010) Chip-Seq Seq, detection of structural variations, (2009/2010)

Genoscope (National Sequencing center) Sequencing capacity : http://www.genoscope.cns.fr 19 ABI 3730 4 454/Roche Titanium 2 Illumina GA2x 3 Illumina HiSeq 2000 1 Solid v3

Genoscope (National Sequencing center) Access to the Genoscope sequencing capacity by call for tender Project Sequencing Assembly / Finishing Prokaryotics genomes annotation (MAGE) Eukaryotics genomes annotation (GAZE)

Sequencing technologies Applied Biosystems ABI 3730XL Roche / 454 Genome Sequencer FLX Illumina / Solexa Genetic Analyzer Applied Biosystems SOLiD What s different : Quantity and types of data Quality of data

Sequencing technologies 454 / Roche Genome Sequence FLX 2006 Gs20 20Mb / run 100bp / read 2007 GSFLX Standard 100Mb / run 250bp / read 2009 GSFLX Titanium 500Mb / run 500bp / read Actual version (GSFLX Titanium) : Majority of 500bp reads Around 1.000.000 reads / run and 500Mbp / run Run duration : 8h High error rate in homopolymer sequences Good assemblies at 20X of coverage No cloning biases

Sequencing technologies 454 / Roche Genome Sequence FLX 1 fragment -> 1 bead 1 bead -> 1 read

Sequencing technologies Illumina / Solexa Genetic Analyzer 2007 2008 2009 2010 GA 1 1Gb / run 32bp / read GA II 8Gb / run 50bp / read Paired reads Actual version (Genome Analyzer IIx): GA IIx 90 Gb / run 150bp / read 14 days / run Paired reads and mate pairs Hi-Seq 2000 200 Gb / run 100bp / read 8 days / run Paired reads and mate pairs Reads of 108bp Around 640M reads / run and 70Gbp / run 10 days / run very low error rate (70% of perfect reads) No indels => good complementarity with the 454 technology No coverage gaps Price

Sequencing technologies Illumina / Solexa Genetic Analyzer

Sequencing technologies 454 / Roche Genome Sequence FLX ¼ run on Acinetobacter baylyi (3,5Mb) ~300.000 reads Cumulative size of 130Mb 99,9% of aligned reads Average error rate : 0,55% 37% deletions, 53% insertions, 10% substitutions. Errors accumulated around homopolymers => error rate is not constant

Sequencing technologies Illumina / Solexa Genetic Analyzer 1l lane on Acinetobacter t baylyi (3,5Mb) 11,4M reads cumulative size of 900Mb 98,5% aligned reads Average error rate : 0,38% 3% deletions, 2% insertions, 95% substitutions.

Sequencing technologies

Sequencing of prokaryotic genomes

Whole genome shotgun sequencing

Prokaryotic genomes sequencing 454 / Roche Genome Sequence FLX Required genome coverage :

Prokaryotic genomes sequencing 454 / Roche Genome Sequence FLX Required genome coverage :

Prokaryotic genomes sequencing 454 / Roche Genome Sequence FLX Required genome coverage :

Procaryotic genomes sequencing Sanger Unpaired 454 Unpaired + PE 454 Coverage 7.4X 20X 25X Assembler Arachne (Broad Institute) Newbler (454/Roche) Newbler (454/Roche) # of contigs 173 119 119 Contigs N50 (Kb) 39.0 48.7 58.2 # of scaffolds 2 119 10 Scaffolds N50 (Kb) 2,200 48.7 1,000 Assembly size 3.417Mb 3.542 Mb 3.544 Mb (% of reference) (95%) (98%) (98%) Mis-assemblies 0 0 0 # of errors 3,442 420 431 Substitutions 2,494 67 75 Insertions / Deletions 948 353 356

Prokaryotic genomes sequencing Sanger Unpaired + PE 454 unpaired + paired 454 with Illumina / Solexa GA1 Coverage 7.4X 25X 25X and 50X Assembler Arachne (Broad Institute) Newbler (454/Roche) Newbler (454 / Roche) # of contigs 173 119 119 Contigs N50 (Kb) 39.0 58.2 58.2 # of scaffolds 2 10 10 Scaffolds N50 (Kb) 2,200 1,000 1,000 Assembly size (% of reference) 3.417Mb (95%) 3.544 Mb (98%) 3.544 Mb (98%) Mis-assemblies 0 0 0 # of errors 3,442 431 163 (1 error / 8Kb) (1 error / 22Kb) Substitutions 2,494 75 71 Insertions / Deletions 948 356 92

Microbial Genome Sequencing Evolution Propose a strategy to sequence prokaryotic genomes, accounting for assembly quality and costs Mixing 454 and illumina technologies to obtain high quality drafts (454 provide read length and illumina low error rate) 2002 2003 2004 2005 2006 2007 2008 2009 Sanger - 12X ~40 projects 454 (15X) + Sanger (4X) ~20 projects 454 (20X) + Sanger (1X) + Solexa (~50X) ~15 projects Phrap Arachne Newbler 2002 2003 2004 2005 2006 2007 2008 2009 454 (20X) + Solexa (~50X) 1 project

Prokaryotic genomes sequencing Sanger unpaired + paired 454 + illumina GAI Illumina 76bp + Illumina MP 10Kb 52bp Illumina 76bp + Illumina MP 10Kb 52bp Coverage 7.4X 25X and 50X 100X and 50X 100X and 50X Assembler Arachne (Broad Institute) Newbler (454 / Roche) Soap (BGI) Velvet (EBI) # of contigs 173 53 123 37 Contigs N50 (Kb) 39.0 139.0 59.7 251.1 # of scaffolds 2 2 58 1 Scaffolds N50 (Kb) 2,200 3,600 3,183 3,601 Assembly size 3.417Mb 3.552 Mb 3.523 Mb 3.606 Mb (% of reference) (95%) (98%) 58K N (98%) 457K N (100%) 11K N Misassemblies 0 0 0 0 # of errors 3,442 <127 20 170 1 error / 28Kb 1 error / 175Kb 1 error / 17Kb Substitutions 2,494 43 19 163 Insertions / Deletions 948 84 1 7

Eukaryotic Genome Sequencing Evolution Extend prokaryotic strategy to eukaryotic genomes Sanger sequencing is still used to sequence long DNA fragments : >20Kb, BAC ends, 2002 2003 2004 2005 2006 2007 2008 2009 Sanger - 10X - >12 projects Tetraodon, paramecium, vitis, meloidogyne, tuber, blastocystis, chondrus, podospora, oikopleura, Ectocarpus 454 (~15X) + Sanger BAC ends + Solexa (~50X) 9 projects : Cocoa, banana, trypanosome, citrus, clytia, adineta, coffee, trout, colza Arachne 2002 2003 2004 2005 2006 2007 2008 2009 Newbler

Annotating genomes using RNA-Seq

Annotating genomes using RNA-Seq Goal : annotate eukaryotic genomes using transcriptomic data from ultra-high throughput sequencers : Illumina and Solid Difficulties : Predict complete gene structures with 40 bp reads Align short reads to exon/exon junctions (mapping algorithms allow a limited number of gaps during alignments). Molecular biology: Power sequencing. Brenton R. Graveley. Nature 453, 1197-1198(26 1198(26 June 2008)

Typical RNA-Seq experiments

Annotating genomes using RNA-Seq 1. covtigs construction mapped reads genome coverage depth threshold covtigs Step 1. covtigs construction

Annotating genomes using RNA-Seq Inaccurate detection ti of splicing i junctions GGTGTTCACTACTTAGCCATGAAGATCTAGATTTCACACTTTTAGAAGCCTTAGAAAGCTG... covtig Mapped reads TGAAGATCTAGATTTCACACTTTTAGAAGCCTT TGAAGATCTAGATTTCACACTTTTAGAAGCCTT TGAAGATCTAGATTTCACACTTTTAGAAGCCTT TGAAGATCTAGATTTCACACTTTTAGAAGCCTT TGAAGATCTAGATTTCACACTTTTAGAAGCCTT TGAAGATCTAGATTTCAAACTTTTAGAAGCCTT TGAAGATCTAGATTTCACACTTTTAGAAGCCTT TGAAGATCTAGATTTCACACTTTTAGAAGCCTT TGAAGATCTAGATTTCACACTTTTAGAAGCCTT TGAAGATCTAGATTTCACACTTTTAGAAGCCTT TGAAGATCTAGATTTCACACTTTTAGAAGCCTT TGAAGATCTAGATTTCACACTTTTAGAAGCCT TGAAGATCTAGATTTCACACTTTTAGAAGCCT TGAAGATCTAGATTTCACACTTTTAGAAGCCT TGAAGATCTAGATTTCACACTTTTAGAAGCCT TGAAGATCTAGATTTCACACTTTTAGAAGCCT CTAGATTTCACACTTTTAGAAGCCTTAGAAAGC TCTAGATTTCACACTTTTAGAAGCCTTAGAAAG CTAGATTTCACACTTTTAGAAGCCTTAGAAAGC CTAGATTTCACACTTTTAGAAGCCTTATAAAG ATCTAGATTTCACACTTTTAGAAGCCTTAGAA CTAGATTTCACACTTTTAGAAGCCTTAGAAAG CTAGATTTCACACTTTTAGAAGCCTTATAAAG ATCTAGATTTCACACTTTTAGAAGCCTTAGAA GACACCATGAAGATCTAGATTTCACACTTTTAG CACCAACACCATGAAGATCTAGATTTCACACTT CACCAACACCATGAAGATCTAGATTTCACACTT Unmapped reads Improvement : CGACACCATGAAGATCTAGATTTCACACTTTTA CCAGCACCCACCAACACCATGAAGATCTAGATT CACCAACACCATGAAGATCTAGATTTCACACTT GGTGCACCCACCAACACCATGAAGATCTAGATT CAACACCATGAAGATCTAGATTTCACACTTTTA CCAACACCATGAAGATCTAGATTTCACACTTTT CACCAACACCATGAAGATCTAGATTTCACACTT Extension of covtigs using unmapped reads

2. candidate exons covtig 100 nt ag gt forward and reverse candidate exons Step 2. extraction of candidate exons

Annotating genomes using RNA-Seq unmapped reads word dictionary k-mer 1 X 1 Step 3: Validation of exon/exon junctions Validation of junctions between candidate exons using a word dictionary built from the unmapped reads.. k-mer 2 k-mer n X 2 X n verify words existence in the dictionary candidate exons gt ag validated junction covtig1...ggtgttcactacttacccatgt...agatctacacacttttagaagcctgaaag... covtig2 Kmers TTACCCAT ATCTACACACTTTTAGA CTTACCCAT ATCTACACACTTTTAG ACTTACCCAT ATCTACACACTTTTA TACTTACCCAT ATCTACACACTTTT CTACTTACCCAT ATCTACACACTTT ACTACTTACCCAT ATCTACACACTT CACTACTTACCCAT ATCTACACACT TCACTACTTACCCAT ATCTACACAC TTCACTACTTACCCAT ATCTACACA GTTCACTACTTACCCAT ATCTACAC Junction validation Unmapped reads Creation of the dictionnary TGTTCACTACTTACCCATATCTACACACTTTTAGAA TGTTCACTACTTACCCATATCTACA TCACTACTTACCCATATCTACACACTTTTAGAAGCC GTTCACTACTTACCCATATCTACAC GTTCACTACTTACCCATATCTACACACTTTTAGAAG TTCACTACTTACCCATATCTACACA TTCACTACTTACCCATATCTACACACTTTTAGAAGC TCACTACTTACCCATATCTACACAC TGTTCACTACTTACCCATATCTACACACTTTTAGAA CACTACTTACCCATATCTACACACT GTTCACTACTTACCCATATCTACACACTTTTAGAAG... GTGTTCACTACTTACCCATATCTACACACTTTTAGA

Annotating genomes using RNA-Seq 4. graph of candidate exons linked by validated junctions Open Reading Frame 5. model construction and coding sequence detection G-Mo.R-Se models M1 M 2 M3 M 4 M5 M 6 M 7 Real transcripts T 1 T 2 T 3 T 4 T 5

Annotating genomes using RNA-Seq Method set-up to annotate the vitis genome (500Mb) Around 175 million of illumina reads 4 tissues : leaf, root, stem and callus 140 million of uniquely aligned reads (73,5Mb) around 380 000 covtigs (38,5Mb) 46 062 transcript models (19 486 loci), and 28 399 with a plausible CDS (12 341 loci) Around one week of computation with a desktop computer

Annotating genomes using RNA-Seq

Annotating genomes using RNA-Seq G-Mo.R-Se (Gene MOdeling using Rna-Seq), is downloadable from Genoscope website : http://www.genoscope.cns.fr/gmorse Used with illumina data, but it can be easily adapt to manage Solid data (in colorspace) Method used to annotate the whole vitis genome Example of a novel gene identified by G-Mo.R-Se

Capture and sequencing for mutation discovery Laboratoire de Ressources Génomique : Gabòr Gyapay Laboratoire de Séquençage : Patrick Wincker Laboratoire d Analyse BIoinformatique des Séquences : François Artiguenave, Vincent Meyer, Marc Wessner, Benjamin Noel

Capture and sequencing for mutation discovery Goal : detection of mutations (SNPs) on large genomes (typically the human genome) parallelization of several samples reasonable cost Principle : Target regions on the human genome of several megabases. Amplify these targeted regions High throughput sequencing Nimblegen chips and 454 sequencing For which projects? Rare genetic diseases Cancer research

Capture and sequencing for mutation discovery Digital Light Processing technology

Capture and sequencing for mutation discovery

Capture and sequencing for mutation discovery Pilot project : 1.251 genes, 13.315 315 exons, cumulative of 4 Mb 8 samples : 4 tumor samples et 4 corresponding normal samples with 1 GSFLX run per sample (~ 100Mb) 13.315 selected regions : 3,97Mb (average size of 300bp) Nimblegen design : 13.944 targeted regions ; 5,6Mb (average size of 400bp) Selected regions Targeted regions

Plateforme détection de mutations Alignment of sequencing data with the human genome

Capture and sequencing for mutation discovery Fragments size : 300-700 bp Sequencing of ~225 bp Targeted region Sequencing region

Capture and sequencing for mutation discovery Detection of around 20.000 SNPs Extraction of 50 SNPs after selection and classification Criteria quality of the predicted SNP (sequencing depth and quality values) localization of the predicted SNP comparison between samples and with known SNPs

Current and future projects/applications pp Diversification of applications and projects : de novo sequencing, genome annotation, re-sequencing, metagenomic, functional genomic, identification of mutations and structure variations Increase of number and size of projects

Current and future projects De novo sequencing : Assembly of prokaryotic genomes with Illumina sequencing only Sequencing of large eukaryotic genomes with 454 and Sanger : banana (~500Mb ; WGS with 20X 454 + 4X Sanger), cocoa (~400Mb ; WGS with 20X 454 + Sanger BAC ends), trout (~2Gb ; WGS with 454 and Sanger BAC ends), wheat chromosome 3b (~1Gb ; 454) Benchmarking assembly of illumina data for large eukaryotic genomes : banana, cocoa, Re-sequencing : 100 Arabidopsis genomes : transposons mobility and methylation

Tara Oceans Project: Current and future projects Eukaryotic meta-genomic and meta-transcriptomic 3 years expedition with regular sampling at different depth and different cell size fractions. Pilot project in progress 1 sampling station, 3 different depths, 4 cell size fractions, 2 sequencing technologies (454 and illumina) Sequencing of DNA, total RNA, messenger RNA and Tags (V4 and V9 18S) Establish a collection of reference genome sequences (culture and single cell) and gene catalogue

Structural Variation Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome. Jan O. Korbel et al. Science. Oct 2007 Deux individus : NA15510 et NA18505 Recircularisation de grands fragments d ADN (3Kb) et séquençage 454

Structural Variation Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome. Jan O. Korbel et al. Science. Oct 2007 NA15510 : 10M de lectures (2,1X) et NA18505 : 21M de lectures (4,3X) Identification de 1.175 SV indels et 122 inversions (45% de SVs partagés) Distribués sur l ensemble du génome avec quelques hotspots

Targeted RNA-Seq Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts. Levin JZ et al. Genome Biology. Oct. 2009 combine le RNA-Seq et la capture de séquence (capture en milieu liquide : Agilent) test sur 467 gènes (protein tyrosine kinase, nuclear hormone-receptor, Cancer Gene Census) d une librairie de cdna provenant d une tumeur (K-562) RNA-Seq encore trop couteux lorsque l on veut détecter des événements rares (réarrangements structuraux, transcrits aberrants, ) et/ou que l on travaille sur des gènes faiblement exprimés

Targeted RNA-Seq Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts. Levin JZ et al. Genome Biology. Oct. 2009 En plus de BCR-ABL1 et NUP214-XKR3 (détectés aussi sans capture), 4 autres fusions detectées. Dans tous les cas, un seul des deux gènes était ciblé.

Next-Next Generation Sequencing

Next-Next Generation Sequencing Plusieurs technologies différentes sont accessibles aujourd hui Ce sont des méthodes par flux : cycles incorporation, lecture, nouvelle incorporation Elles nécessitent une amplification de l ADN de départ Arrivée du séquençage «single-molecule, real-time»

Next-Next Generation Sequencing Heliscope Helicos Biosciences Single-molecule, flow sequencing Specifications : 30 Gb per run; 25 to 55 bases in length ; 600M reads 0,2% substitutions ; 1,5% insertions and 3,0% deletions Homopolymers trouble, but without phasing issue. Applications : aged DNA, expression level

Next-Next Generation Sequencing Heliscope Helicos Biosciences

Next-Next Generation Sequencing Pacific Biosciences i : SMRT (Single Molecule l Real Time) La polymérase est attachée au fond d un puit, d une dizaine de nanomètres de diamètre et d un volume de 20 zeptolitres (10-21 litres). Dans ce volume, l activité d un dun seul nucléotide peut être détecté parmi un bruit de fond de plusieurs centaines de nucléotides fluorescents.

Next-Next Generation Sequencing Pacific Biosciences : SMRT (Single Molecule Real Time) Vitesse de 4-5 bases / secondes et 150.000000 puits par puce (36Mb / heure), pour des lectures > 1000pb Prévision à 5 ans : 50 bases / secondes, avec des puces à 1M de puits, pour un débit d environ 100Gb / heure

Next-Next Generation Sequencing Pacific Biosciences i : SMRT (Single Molecule l Real Time) Possibilité de longue lecture : problème de perte de signal Nouveau concept : «Strobe Sequencing» ATGGTGTCCTGCATAGCATACAGTATGGTGGCATAGCATACAGTATGGTGTCCTGTCCTGCATAGCATACAGT Laser on Laser off Laser on ATGGTGTCCTGC..CAGTATGGTGTCC.

Next-Next Generation Sequencing

Next-Next Generation Sequencing

Next-Next Generation Sequencing Here s how the technology is used to call a base: If a nucleotide, for example a C, is added to a DNA template and is then incorporated into a strand of DNA, a hydrogen ion will be released. The charge from that ion will change the ph of the solution, which can be detected by our proprietary ion sensor

Next-Next Generation Sequencing If there are two identical bases on the DNA strand, the voltage will be double, and the chip will record two identical bases called. Because this is direct detection no scanning, no cameras, no light each nucleotide incorporation is recorded in seconds.

Next-Next Generation Sequencing ~50,000 $ par machine Run : 250 $ / run Read Lengths : 100-200 bases Reads per run : 100.000 Error rate : ~1% Portabilité, turn-over rapide : adapté à des tests rapides, proches du terrain Débit insuffisant pour les applications massives (ex. génome humain)

Next-Next Generation Sequencing FRETing FRET : Fluorescence Resonance Energy Transfer. Transfert d énergie d un «quantum dot» associé à une polymérase vers un nucléotide. Quand un nucléotide est incorporé, la proximité (entre le fluorophore donneur de la polymérase et le fluorophore accepteur du nucléotide) déclenche un signal FRET.

Next-Next Generation Sequencing Nanopore Sequencing

Next-Next Generation Sequencing Continuous base identification for single-molecule nanopore DNA sequencing. Clarke J, Wu HC, Jayasinghe L, Patel A, Reid S, Bayley H. Nature Nanotechnology 2009 Apr;4(4):265-70.

Next-Next Generation Sequencing Perspective : Séquenceurs de 3 ème génération Avantages du temps réel - molécule unique -Rapidité : pas d étape de préparation p d échantillon - Pas d amplification (pas de biais, travail possible sur des échantillons anciens, dégradés ) - Longues lectures : assemblage facilité; résolution d haplotype - Coût probablement très diminué

Conclusion Price-cutting => burst of novel applications Still at the beginning, NGS change every month : Illumina announce 200Gbases per run (8days) with the HiSeq2000 454/Roche will provide Kb reads in 2011 Next-next generation is coming : single molecule sequencing, longer reads, runtime decrease,... Necessity to provide adequate IT infrastructure : production, storage and analysis.

Conclusion Because of next-generation sequencing (NGS), the cost of sequencing a base is dropping faster than the cost of storing a byte. Scales are logarithmic and not corrected for inflation or costs of personnel, overheads and depreciation. Image is reprinted from Stein, L.D. Genome Biol. 11, 207 (2010).

Thank you Laboratoire d Analyse Bioinformatique des Séquences François Artiguenave Jean-Marc Aury Christophe Battail Maria Bernard France Denoeud Kamel Jabbari Olivier Jaillon Hamid Khalili Amine Madoui Vincent Meyer Benjamin Noel Corinne da Silva Marc Wessner Laboratoire de séquençage Patrick Wincker Corinne Cruaud Julie Poulain Adriana Alberti Karine Labadie Laboratoire finishing Valérie Barbe Sophie Mangenot Laboratoire d informatique Claude Scarpelli Veronique Anthouard Arnaud Couloux Frederic Gavory Ei Eric Pelletier Gaelle Samson