History of DNA Sequencing & Current Applications Christopher McLeod President & CEO, 454 Life Sciences, A Roche Company
IMPORTANT NOTICE Intended Use Unless explicitly stated otherwise, all Roche Applied Science and 454 Sequencing products and services referenced in this presentation are intended for the following use: For Life Science Research Only. Not for Use in Diagnostic Procedures.
Sequencing James Watson s Genome The first of the rest of us First whole human genome to be sequenced with next-generation technology 24.5 Billion bases of genomic DNA sequence generated at the 454 Sequencing Center 3.6 Million variants detected, including several disease susceptibility gene associations Jim Watson 454 Life Sciences, A Roche Company 2 months, 3 instruments <$1 million $250,000 with Titanium 7.4x coverage 250 bp read length 400bp with Titanium Human Genome Project Sanger 10-13 years $100 million - $2.7 billion 7.5x coverage 500-800 bp read length
Genome Preparation Sanger vs. 454 Sequencing Systems Replaced by Bead Emulsion Technology Unique one sample preparation per genome any size genome Faster hours instead of months Cheaper - no robotics or expensive infrastructure needed Improved Data Quality less bias, no cloning or cloning artifacts
Genome Sequencing Sanger vs. 454 Sequencing Systems Genome Sequencer FLX System w/ technology of PicoTiterPlate Device Faster - 500 times throughput Cheaper - 50 times cheaper Improved Sensitivity digital precision, detection rare variants Diameter of Human Hair
The DNA Sequencing Revolution Impact on nearly every field of biological research Human Genetics & Genomics Plants & Agriculture Microbes, Viruses & Infectious Diseases Environmental Genomics
Uncovering the Past and Present Sequencing the Bonobo & Neanderthal Genomes Bonobo Genome Complete sequence and assembly of the Bonobo (pygmy chimpanzee) genome- a close living relative to humans Relationship between Bonobo and Common chimpanzee is analogous to relationship between humans and Neanderthals Neanderthal Genome Sequencing of ancient DNA extracted from bone fossils to compare genome to modern human and chimp Goal is to identify areas of genome where humans have undergone rapid evolution since split from Neanderthals Initial nucleotide differences in mitochondrial DNA established divergence date at 660,000 +/- 140,000 years Green et al. (2008) Cell.
Microbes, Viruses & Infectious Disease Sequencing to identify drug-resistance in HIV HIV drug resistance is attributed to minority viral variants which can lead to regimen failure Current methodologies, based on Sanger technology, can only detect rare variants present at >20% frequency Research study used 454 Sequencing Systems to detect rare drug-resistance variants in a little as 1% of the viral population HIV virus Low-frequency mutations had significant impact on clinical outcomes, i.e. early antiretroviral treatment failure The fraction of infected subjects harboring drug-resistant variants was twice as high as previously thought FDA guidance now requires a viral population profiling test prior to, during and after antiretroviral therapy during drug trials to identify drug-resistance Simen et al. (2009) Journal of Infectious Disease. For life science research use only. Not for use in diagnostic procedures.
Sequencing the Immune System Response High-resolution HLA Genotyping High level of genetic variation in HLA region between individuals New alleles discovered every month Genes encode for the cell-surface proteins that differentiate between self, non-self and other antigens Accurate HLA genotyping critical for research on: Autoimmune diseases Cancer Infectious diseases Tissue transplantation class II Gene Locus HLA A HLA B HLA C class I # of Variant Alleles* 893 1,431 547 * As of Oct 2009 from the IMGT-HLA For life science research use only. Not for use in diagnostic procedures.
Human Gut Metagenomics Characterizing the communities within each of us Metagenomics-- Sequencing a mixed sample to identify the diversity of organisms present and their function The human body harbors trillions of microbial organisms which collectively make-up the human microbiome We are dependent on these organisms for known functions such as digestion and immune defense Sequencing studies to characterize the human gut microbiome, transplanting human microbes into germ-free mice models Two groups of mice with the same transplanted human gut microbial community One group on new high-fat diet, one on same low-fat diet as before transplant Types of bacteria changed rapidly and dramatically with high-fat, high-sugar diet What we eat has a significant impact on our gut microbial communities!! This has significant implications for research on human nutrition, obesity and famine Turnbaugh et al. (2006) Nature, Turnbaugh et al. (2009) Science
Environmental Metagenomics Characterizing earth s extreme environments Metagenomics sequencing study to explore the role of viral pathogens in declining coral health Sequence coral samples under varying environmental stressors- reduced ph, elevated nutrients, increased temperature- to mimic current ecological changes Study found high levels of a herpes-like virus in stressed coral samples Virus was not detected in healthy, unstressed coral Study sheds light into one of many factors which explain the death of coral reefs as ocean temps rise and pollution increases
Plant & Agricultural Research Sequencing the Oil Palm genome Oil harvested from tree s fruit used commonly in vegetable oils, detergents, and biofuels Malaysia is the largest producer of palm oil with near 50% of world production Project in collaboration with Sime Darby Plantations and Synamatics of Malaysia Plans to mine the 1.7 billion base genome database for genetic variations related to agriculturally important features such as drought resistance and oil yield
DNA Sequencing Revolution The sequence data tsunami The number of sequenced base pairs increased 10x from 2000-2008 Shifting costs of sequencing from data generation to bioinformatics (data storage and analysis) http://www.ncbi.nlm.nih.gov/genbank/genbankstats.html
DNA Sequencing Revolution Moore s Law -Like Performance Improvement Millions of Instructions per Second 100000 10000 1000 100 10 Intel Processors 454 Sequencing Pentium III Pentium Pro Pentium 4 Core Duo Core 2 Extreme GS 20 System GS FLX Titanium Series GS FLX System 1000 100 10 Millions of Nucleotides per Run 1 1994 1996 1998 2000 2002 2004 2006 2008 2010 1 Year
Not All Sequencing Data Is Created Equally Read length and data quality A number of factors determine the usability of data obtained from high-throughput systems: Read Length: The size of the chunks of sequencing data generated. Large puzzle pieces are easier to assemble than small pieces Data Quality: The accuracy of the sequence bases generated, i.e. how confident you are that a G is really a G 454 Sequencing data offers long reads and high single-read accuracy which simplify bioinformatic analysis You don t need to be a sequencing expert to go from data to discovery
The Next-Generation Sequencing Revolution Remaining challenges Next-generation sequencing has revolutionized genomic research in nearly every field of biology, but. Access to high-throughput sequencing is still primarily limited to large research facilities IT infrastructure requirements Cost of capital equipment & disposables Data analysis DNA Sample Many scientists send samples across oceans to service centers or do not have access to next-generation sequencing at all
Personal Computing What happened when the barriers were eliminated? Changed the way individuals carry out their daily work and planning Opened the doors to completely new applications of the technology, e.g. the Internet Empowered individual users by giving them control of their computing needs The next revolution in genomic research is next-generation sequencing for all researchers and scientists! GS Junior System
The next big thing in sequencing is small Perfectly suited for medical research applications Tailored to the needs of individual labs Perfectly sized for labs that require: Targeted sequencing for researching genomic regions associated with disease, e.g. diabetes, cancer Genotyping research, e.g. HLA typing Whole microbial genome sequencing Metagenomics Novel pathogen detection GS Junior Bench Top System For life science research use only. Not for use in diagnostic procedures.
For life science research only. Not for use in diagnostic procedures. 454, 454 SEQUENCING, 454 LIFE SCIENCES, empcr, GS FLX, GS JUNIOR, GS FLX TITANIUM and PICOTITERPLATE are trademarks of Roche.
454 Sequencing System Overview Sequencing from individual DNA molecules Library of DNA molecules One DNA molecule per bead Clonal amplification to ~10 million copies Independent sequencing of each bead One Bead = One Read = One DNA molecule
Library Preparation Easy-to-use strategies for every sample type Shotgun Amplicons Any 200-400 bp amplified product (HIV, exons, 16S) Whole genomes BACs Long Range PCR Full length transcripts Blunt-end Ligation ncrna Ancient DNA Short ESTs Short DNA fragments Paired End Reads (3 Kb, 8 Kb, 20 Kb) ~150 bp ~150 bp De novo assembly Structural Variation detection DNA Library
GS FLX Titanium Sequencing Workflow Short hands-on time, quick total time to result empcr Amplification Prep Run Sequencing Hands-on Time Total Time 2 h 6 h 2 h 2 h 0 h 10 h Total Hands-on Time Total Time 4 h 18 h
GS FLX Titanium Series Technology Sequencing by Synthesis Bases (TACG) are flowed sequentially and always in the same order across the PicoTiterPlate device during a sequencing run. A nucleotide complementary to the template strand generates a light signal. The light signal is recorded by the CCD camera. The signal strength is proportional to the number of nucleotides incorporated.
GS FLX Titanium Series PicoTiterPlate Device 3.5 million wells Raw Image
Genome Sequencer FLX Instrument Data Image Processing Overview 1. Raw data is series of images T A G C T dntp Base Addition 2. Each well s data extracted, quantified and normalized 3. Read data converted into flowgrams
Genome Sequencer FLX Instrument Data Flowgram Generation 4-mer Flowgram T A C G Flow Order 3-mer T T C T G C G A A 2-mer 1-mer Key = 4 base sequence for read identification and signal calibration
Sequencing the Immune System Response Human Blood Cells and the Immune System Adaptive Immunity
Sequencing the Immune System Response Human Blood Cells and the Immune System B cells Generate Antibodies T cells Play Regulatory and Cytotoxic Roles HLA Sequencing