NGS Technologies for Genomics and Transcriptomics



Similar documents
Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

NGS data analysis. Bernardo J. Clavijo

Next Generation Sequencing

July 7th 2009 DNA sequencing

Introduction to next-generation sequencing data

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Introduction to NGS data analysis

Next generation DNA sequencing technologies. theory & prac-ce

Next Generation Sequencing: Technology, Mapping, and Analysis

Computational Genomics. Next generation sequencing (NGS)

MiSeq: Imaging and Base Calling

Putting Genomes in the Cloud with WOS TM. ddn.com. DDN Whitepaper. Making data sharing faster, easier and more scalable

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

Microbial Oceanomics using High-Throughput DNA Sequencing

History of DNA Sequencing & Current Applications

Automated and Scalable Data Management System for Genome Sequencing Data

Overview of Next Generation Sequencing platform technologies

SEQUENCING. From Sample to Sequence-Ready

14/12/2012. HLA typing - problem #1. Applications for NGS. HLA typing - problem #1 HLA typing - problem #2

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

How long is long enough?

The NGS IT notes. George Magklaras PhD RHCE

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

Analysis of NGS Data

RNAseq / ChipSeq / Methylseq and personalized genomics

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille

Next Generation Sequencing for DUMMIES

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

PAGANTEC: OPENMP PARALLEL ERROR CORRECTION FOR NEXT-GENERATION SEQUENCING DATA

Next Generation Sequencing data Analysis at Genoscope. Jean-Marc Aury

How Sequencing Experiments Fail

Keeping up with DNA technologies

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

NGS and complex genetics

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production

DNA Sequencing & The Human Genome Project

Genomic Testing: Actionability, Validation, and Standard of Lab Reports

Bacterial Next Generation Sequencing - nur mehr Daten oder auch mehr Wissen? Dag Harmsen Univ. Münster, Germany dharmsen@uni-muenster.

Intro to Bioinformatics

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

DNA Sequencing and Personalised Medicine

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Worldwide Collaborations in Molecular Profiling

Illumina Sequencing Technology

Concepts and methods in sequencing and genome assembly

-> Integration of MAPHiTS in Galaxy

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

NEXT GENERATION SEQUENCING

Searching Nucleotide Databases

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

A Primer of Genome Science THIRD

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Data Analysis for Ion Torrent Sequencing

Big Data Challenges. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

G E N OM I C S S E RV I C ES

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

OpenCB development - A Big Data analytics and visualisation platform for the Omics revolution

HADOOP IN THE LIFE SCIENCES:

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

Typing in the NGS era: The way forward!

Integrated Rule-based Data Management System for Genome Sequencing Data

Analysis of VDI Storage Performance During Bootstorm

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Next generation sequencing (NGS)

Personalized Medicine and IT

MoBEDAC -- Integrated data and analysis for the indoor and built environment. Folker Meyer Argonne National Laboratory GSC 13 Shenzhen, China

Scaling from 1 PC to a super computer using Mascot

GC3 Use cases for the Cloud

OpenPower: IBM s Strategy for Best of Breed 64-bit Linux

4. GS Reporter Application 65

SAP HANA Enabling Genome Analysis

Analysis of ChIP-seq data in Galaxy

Introduction Bioo Scientific

Transcription:

NGS Technologies for Genomics and Transcriptomics Massimo Delledonne Department of Biotechnologies - University of Verona http://profs.sci.univr.it/delledonne 13 years and $3 billion required for the Human Genome Project's reference genome 1

Clone-by-clone shotgun sequencing Whole-genome shotgun sequencing 3 The $300,000,000 Celera effort was intended to proceed at a faster pace and at a fraction of the cost.. 4 2

Sanger sequencing (invented in the early 1970s) 96 samples 800 nt / sample 1.600.000 nt / day Human genome: 3.000.000.000 bp Minimum coverage for an accurate analysis: 8X = 24.000.0000.000 nt 24.000.000.000 1.600.000 = 15.000 days! No Generation Sequencing 3

New Generation Sequencing! GS20 (2005) 100 bp x 200.000 Reads = 20 Mbp x5 GS FLX (2007) 200 bp x 400.000 Reads = 100 Mbp x5 GS FLX Titanium (2008) 400 bp x 1 M. Reads = 500 Mbp 454 Revolution GS20 (2005) 100 bp x 200.000 Reads = 20 Mbp x5 GS FLX (2007) 200 bp x 400.000 Reads = 100 Mbp x5 GS FLX Titanium (2008) 400 bp x 1 M. Reads = 500 Mbp >00405_2045_2005 CAGTCTCGTCGTCGTACGATCGTACGTAGCTTCACTTACGTACGCGGCG GGGCGCATCTGCGCGCGCGATTATATCATCATCATACTCAGCATCGTCC ATC >00343_3489_2007 CAGTCTCGTCGTCGTACGATCGTACGTAGCTTCACTTACGTACGCGGCG GGGCGCATCTGCGCGCGCGATTATATCATCATCATACTCAGCATCGTCC ATCGATCGATCGATGATCGACGATCGTAGTCTACGTAGTACGTAGCTAG CTTCGATCGATCGTACGTACGTCGTACGTAGTCAGACGTCAGCTACAGT CATCTACGTAGCTCTACGTCGTGCATGCTAGCTATCGATCACGACTTAT GCATC >01384_3992_2008 CAGTCTCGTCGTCGTACGATCGTACGTAGCTTCACTTACGTACGCGGCG GGGCGCATCTGCGCGCGCGATTATATCATCATCATACTCAGCATCGTCC ATCGATCGATCGATGATCGACGATCGTAGTCTACGTAGTACGTAGCTAG CTTCGATCGATCGTACGTACGTCGTACGTAGTCAGACGTCAGCTACAGT CATCTACGTAGCTCTACGTCGTGCATGCTAGCTATCGATCACGACTTAT GCATCCTCGGTATTCGGCGTACGATCGCTGACTGCTCGATTTTCGATCG TACGTACCGTCAGCTAGCTAAAAAAAAAGCTGCGCGGTCAGCATCGTTG CATGCGCTACTGCTAGTACGTAGTAGTGCTACGTTTTATATCGTCGCCA AATCTCTTCGGCTTTCG 4

April 2008 2 months and 2.000.000 USD with 454 Life Sciences apoe unknown Now Generation Sequencing GSFLX 1.100.000 reads ( 800 nt) GSJunior 100.000 reads (400-500 nt) 5

Genome Analyzer II (x) End 2007: 30 million reads, 33 nucleotides each (1 Gbase) End 2008: 50 million reads, 50 nucleotides each (2.5 Gbases) 11 2010: 300 million single reads, 150 nucleotides each 600 million paired reads, 150 nucleotides each (90 Gbases) HiSeq 2000 HIGHEST OUTPUT 600 Gb per run HIGHEST NUMBER OF READS Three billion (100 or 2 x 100) reads 12 6

5500xl SOLiD System 7

8

9

The commercial PacBio RS instrument generates average read lengths of 1,000 bases, with a small fraction of reads longer than 10 kilobases, and have a typical run time of 30 minutes. The commercial instrument runs chips with 75,000 wells, or zero-mode waveguides. The commercial raw read accuracy is between 85 and 90 percent. September 27, 2011 With the C2 chemistry, the Expression Analysis team achieved average read lengths of 2,715 base pairs, with a mean maximum read length of 13,091 base pairs. "Our personal best is just over 14,000 bases," Hurban said. Despite the potential of PacBio's platform to produce very long reads, which would enable better assemblies, adoption of the instrument since its launch in the second quarter of this year has been slower than expected. As a result, PacBio last week announced that it would have to lay off around 130 employees 10

December, 2010 11

MiSeq January, 2011 23 Next Generation Sequencing Ion Proton Sequencer The Benchtop Genome Center January 10th, 2012 Supports Ion Proton I and Proton II chips 209,800 system list price (sequencer and server) $164,900 for Ion PGM owners State-of-the-art electronics to support highest throughput 24 12

January 10th, 2012 Resequencing vs. de-novo assembling Long reads sequencing Short reads sequencing 700-800 bases 35-150 bases Reference genome 26 13

Resequencing is good only if the reference genome is good Next Next (Next) Generation Sequencing Technologies for haplotyping and de-novo assembling 14

Biological nanopores http://www.nanoporetech.com/sequences 15

16

Current Status: alpha testing (beta testing to start by end of 2012) Projected commercial availability: 2013 Sequencing the Human Genome Human Genome Project James Watson Personal genome 1988 2001 13 years 3.000.000.000 dollars 2007 2 months 2.000.000 dollars 2010 7 days 10.000 dollars 17

18

Center for Functional Genomics Grid-ION-Proton? Illumina HiSeq 1000 Verona, 9 Febbraio 2007 PowerEdge TM R900 4 Intel Xeon 7450 6-core 128 Gb RAM Cluster HP DL380 G7 6 Xeon 5645 6-core 144 Gb RAM 19

Alberto Ferrarini Luciano Xumerle Luca Venturini Andrea Minio Plant Biology Mario Pezzotti (UniVR) Grant Cramer (USA) Medicine Giovanni Martinelli (UniBO) Paolo Garagnani (UniBO) Marina Noris (Mario Negri) Alberto Zamò (UniVR) Elisabetta Trabetti (UniVR) Domenico Girelli (UniVR) Giovanni Malerba Paola Tononi Gianpiero Zamperin Genny Buson Elisa Zago Anne-Marie Digby 20