How To Improve Sequencing With An Ogt Custom Bait



Similar documents
Sequencing and microarrays for genome analysis: complementary rather than competing?

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

G E N OM I C S S E RV I C ES

Information leaflet. Centrum voor Medische Genetica. Version 1/ Design by Ben Caljon, UZ Brussel. Universitair Ziekenhuis Brussel

Single Nucleotide Polymorphisms (SNPs)

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Next Generation Sequencing: Technology, Mapping, and Analysis

Genetic diagnostics the gateway to personalized medicine

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Human Genome Organization: An Update. Genome Organization: An Update

Commonly Used STR Markers

SEQUENCING. From Sample to Sequence-Ready

Biological Sciences Initiative. Human Genome

School of Nursing. Presented by Yvette Conley, PhD

Core Facility Genomics

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

Overview of Genetic Testing and Screening

escience and Post-Genome Biomedical Research

Next generation DNA sequencing technologies. theory & prac-ce

Delivering the power of the world s most successful genomics platform

1 Mutation and Genetic Change

Introduction to next-generation sequencing data

Introduction to NGS data analysis

Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator

July 7th 2009 DNA sequencing

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Genomes and SNPs in Malaria and Sickle Cell Anemia

Data Analysis for Ion Torrent Sequencing

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Introduction To Real Time Quantitative PCR (qpcr)

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

The following chapter is called "Preimplantation Genetic Diagnosis (PGD)".

Disease gene identification with exome sequencing

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

How Sequencing Experiments Fail

Consistent Assay Performance Across Universal Arrays and Scanners

RNAseq / ChipSeq / Methylseq and personalized genomics

Biomedical Big Data and Precision Medicine

March 19, Dear Dr. Duvall, Dr. Hambrick, and Ms. Smith,

Next Generation Sequencing

CCR Biology - Chapter 9 Practice Test - Summer 2012

Applications of comprehensive clinical genomic analysis in solid tumors: obstacles and opportunities

PreciseTM Whitepaper

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Oncology Insights Enabled by Knowledge Base-Guided Panel Design and the Seamless Workflow of the GeneReader NGS System

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Simplifying Data Interpretation with Nexus Copy Number

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Overview of Next Generation Sequencing platform technologies

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

ITT Advanced Medical Technologies - A Programmer's Overview

Services. Updated 05/31/2016

ncounter Leukemia Fusion Gene Expression Assay Molecules That Count Product Highlights ncounter Leukemia Fusion Gene Expression Assay Overview

Marker-Assisted Backcrossing. Marker-Assisted Selection. 1. Select donor alleles at markers flanking target gene. Losing the target allele

Genetics Lecture Notes Lectures 1 2

REI Pearls: Pitfalls of Genetic Testing in Miscarriage

TITLE: Next Generation DNA Sequencing: A Review of the Cost Effectiveness and Guidelines

MUTATION, DNA REPAIR AND CANCER

MiSeq: Imaging and Base Calling

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

SNPbrowser Software v3.5

MRC-Holland MLPA. Description version 12;

DNA Sequencing and Personalised Medicine

PrimePCR Assay Validation Report

PrimePCR Assay Validation Report

High Performance Compu2ng Facility

SAP HANA Enabling Genome Analysis

Agilent CytoGenomics Software A Complete Solution for Cytogenetic Research Data Analysis

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production

HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Information for patients and the public and patient information about DNA / Biobanking across Europe

Leukemia Drug Pathway Analyzer

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Umm AL Qura University MUTATIONS. Dr Neda M Bogari

TECHNOLOGIES, PRODUCTS & SERVICES for MOLECULAR DIAGNOSTICS, MDx ABA 298

Investigating the genetic basis for intelligence

A map of human genome variation from population-scale sequencing

Roberto Ciccone, Orsetta Zuffardi Università di Pavia

Microarray Technology

TruSeq Custom Amplicon v1.5

Becker Muscular Dystrophy

SNP Essentials The same SNP story

HIV NOMOGRAM USING BIG DATA ANALYTICS

Methodology for Copy Number Variant Detection from High. Throughput DNA Exome Sequencing and Application to the

Genomic Medicine The Future of Cancer Care. Shayma Master Kazmi, M.D. Medical Oncology/Hematology Cancer Treatment Centers of America

Illumina Sequencing Technology

The National Institute of Genomic Medicine (INMEGEN) was

Genetic testing. The difference diagnostics can make. The British In Vitro Diagnostics Association

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Transcription:

Fishing for variants in the deep end of the gene pool: OGT s custom bait designs Jolyon Holdstock, Simon Hughes and Daniel Swan Abstract Oxford Gene Technology (OGT) has extensive expertise in probe design for solid and liquid phase hybridisation and is now applying this experience to the design of custom baits for targeted sequencing. Targeted custom sequencing offers significant benefits over whole genome or whole exome sequencing including: Enabling the focused sequencing of particular regions of interest Greater depth of sequencing coverage Reduced cost Simpler data analysis Shorter time to results Potential to study larger number of patients However custom bait design is not straightforward and incorrect bait design can render results unusable. This is where OGT s custom bait designs can add significant value, by ensuring: Increased depth of coverage Decreased off-target noise Increased sensitivity for variant detection Improved capture of GC rich targets and even coverage across the entire region of interest Introduction The on-going development of next generation sequencing (NGS) technologies have provided the researcher with the ability to screen for tens of thousands of sequence variants of possible clinical relevance in a single patient simultaneously. NGS offers the possibility to identify aneuploidy, unbalanced chromosomal rearrangements, sub-chromosomal deletions or duplications, loss of heterozygosity, SNPs, indels as well as the more difficult to detect copy-neutral variants (e.g. balanced chromosomal inversions or translocations). In contrast with other technologies, NGS offers the capability to scan for disease causing variants without a priori information about the causative genes and gives hugely increased levels of sensitivity at single bass resolution. Exome and custom targeted approaches to sequencing have already had a major impact on the diagnosis of disease by permitting the successful identification of causal mutations for a number of monogenic disorders 1-3 as well as for some cancers 4,5. There has also been success in using exome screening for complex disorders 6 and targeted sequencing in assessing personal disease risk 7. This article examines how OGT s custom bait design can facilitate sequencing projects by: Targeting specific regions of interest for variant detection (rather than whole genome or exome) Improving call accuracy by increasing depth of coverage (>1000 fold versus 30) Enabling high-throughput processing of samples at reduced cost (i.e. hundreds of custom samples rather than tens of exomes or even fewer whole genomes per sequencing lane) Streamlining data processing by only analysing your regions of interest rather than an exome or genomes worth of clinically irrelevant sequence data

What to sequence Project design, part of OGT s Targeted Sequencing service, is essential to a successful sequencing project and starts with the selection of the most appropriate genomic content for your study. When c The human genome is 3 billion base pairs and when sequenced at 30x coverage allows for 6 genomes to be sequenced in a single run*. The human exome, which is 1.5% of the human genome and corresponds to gene encoding regions, when sequenced at 30x coverage allows for ~200 exomes to be sequenced in a single run (depending on multiplexing capacity)*. Custom targets ranging from 0.2-34Mb when sequenced at 30x coverage allow for several hundred to tens of thousands of samples to be sequenced in a single run (depending on multiplexing capacity)*. * At OGT sequencing is performed using the Illumina HiSeq 2000, running the latest chemistry. Custom Sequencing While both whole genome and whole exome sequencing generate large amounts of data, sequencing more of the genome is not always better; indeed many of the findings from whole genome studies would have been discovered more quickly, more cost effectively and with lower data complexity using an exome-based approach. Similarly custom targeted sequencing provides a logical focused approach rather than the nonselective sequencing of informative regions. As a consequence, analysis of custom regions offers significant benefits for some studies including: Enabling the sequencing of non-coding regions, or focus on particular candidate regions (exonic and intronic), identified by genome-wide association studies (GWAS) Much greater depth of sequencing coverage, increasing the chance of mutation detection when studying heterogeneous tumour samples or circulating cell free tumour or foetal DNA When combined with high-throughput processing, custom sequencing offers an attractive area for NGS diagnostic development and the advantage of quicker turn-around of samples at reduced cost with simpler data analysis and the potential to study larger numbers of patients The importance of accuracy Targeted approaches using off-the-shelf exome capture kits can lead to significant imbalances in sequence coverage. Achieving 99.999% accuracy in calling heterozygous bases (assuming no allelic bias) requires a minimum depth of 25 reads at the site of interest. However, a typical off-the-shelf exome capture run may only have 70% of bases covered at 25x depth. Intelligent bait design with OGT Custom bait design requires extensive optimisation of the capture probes to ensure the entire region of interest receives even coverage. OGT s expertise in probe design for solid and liquid phase hybridisation and >10 years of experience in microarray design and analysis ensures that we can add significant value in this area. When attempting to increase sequencing coverage there are two options. The first is to perform more runs on your platform of choice and increase coverage by generating more reads. However, this increases the coverage of all targets proportionally, inflating costs and still not guaranteeing good data for hard to capture (and thus hard to sequence) regions.

The second option, offered by OGT, is to carry out refined, intelligent design of capture baits. This can increase the coverage of hard to sequence loci without increasing the amount of sequencing that needs to be performed. This is a cost-effective way to generate more even coverage and increase the power to detect variants. Bait design considerations Designing baits for sequence capture is not a straightforward process. Bait design software is freely available but not generally user-friendly. The draft designs generated by such software often need additional refinement before the baits are ready to be used in an experimental setting. It is easy to create potential sources of capture bias by creating region of interests (ROI) that are too short, or affected by thermodynamic behaviours such as GC content or melting temperatures (Tm). OGT has extensive experience in designing oligonucleotide probes and this allows us to provide bait capture designs that minimise these issues, giving the best possible opportunity for variant detection. All that is required to start the bait design process is a list of genes or chromosomal regions and the genome build version on which these are based. An initial draft is produced and then assessed for coverage of the ROIs, bait distribution and sequence complexity. Iterative rounds of improvement are then applied to the design, correcting for singleton baits (regions spanning less than 120 bases and thus covered by a single bait) by addition of baits to the design to ensure even coverage in these regions. GC content is calculated for all baits and where extreme biases of GC content are identified (baits with GC <40% or >65%) additional copies of these baits are added. This corrects a common issue in targeted capture where regions of extreme GC content lead to reduced coverage 8. Similarly, T m is also calculated for each bait, and where T m is extreme (e.g. >75 C) additional copies of these baits are also added. Custom baits in action OGT has designed custom baits and compared their performance to publicly available exome data from the 1000 Genomes Project 9 and whole exome data from OGT on the HapMap sample NA12878. Increased depth of coverage Figure 1 shows coverage for a representative exon captured by OGT custom baits, a 3.5 fold increase in coverage (1024x vs. 282x at the centre of the capture target). The OGT whole exome capture has similar depth to the 1000 Genomes data at this position (251x). OGT custom baits will generally provide 3 5.5 fold more coverage than a standard exome capture. Decreased off-target noise Figure 2 shows how the OGT bait design decreases off-target noise. Reducing off-target hits increases the certainty that variations observed are true positives and biologically relevant, removing SNPs that have been called in intronic or extragenic regions. Increased sensitivity for variant detection As depth increases, the peak tail off reduces, which allows nucleotides towards the outer edge of the capture regions to be assayed more accurately for variations. Figure 3 shows a deletion that would not be detected from whole exome capture, but is clearly seen in the OGT custom bait capture.

Figure 1: Depth of coverage is increased with OGT custom baits (above) vs. 1000 Genomes whole exome capture (below).the yellow boxes show the total read count covering the position, and the distribution of nucleotides at that position, along with their strand distribution. Figure 2: Custom baits reduce hybridisation artefacts in off-target regions, OGT custom baits design (above) vs. 1000 Genomes whole exome (below).

Figure 3: OGT custom bait capture (above), 1000 Genomes whole exome capture (below) shows that the increased read depth of custom baits allows detection of a deletion at the edge of the target region. Increased depth also increases the number of accurate calls. The example in Figure 4 shows a SNP that is unambiguously detected with OGT custom capture baits, but is not detected in the whole exome capture in the same analysis pipeline, despite the whole exome capture having a good read depth and allelic spread of the heterozygous SNP that is present at this location.

Figure 4: Even at >50x coverage, whole exome capture (below) does not accurately identify all SNPs, whereas the increased coverage with OGT custom capture baits allows the variant to be detected. Improved capture of GC rich targets GC content bias is a known issue in whole exome capture, where areas of high GC content (>65%) are under-represented due to thermodynamic constraints of the hybridisation. OGT s custom bait designs are refined to compensate for this bias, which often affects the capture of first exons that are often GC rich relative to the rest of the transcribed sequence. Figures 5 and 6 show this clearly. Figure 5 shows the first exon of HDAC10, which is targeted for capture by an OGT custom bait design and the Agilent SureSelect 50Mb kit. The data shows a patient sample vs a test HapMap sample at the same site with a GC content of 70% in the target interval.

Figure 5: OGT custom bait capture of a region with 70% GC content showing a maximum read depth of 50x (above). The Agilent SureSelect 50Mb kit does not capture any reads in this region (below). Figure 6 shows a region of 65% GC content sequenced with capture by both OGT custom baits (above) and the Agilent SureSelect 50Mb kit (below). Whilst the Agilent SureSelect kit captures 20x coverage on the leftmost target region, it has very low coverage of the two capture regions to the right of the figure. In contrast, OGT s custom baits in this GC biased region achieve a coverage of 425x.

Figure 6: Relative capture of targets within a single gene. Agilent coverage is 20x for the target with no GC content bias, and minimal for targets with a GC content of 65%. In contrast, OGT custom baits perform excellently in this region. Summary Whilst whole exome sequencing offers a powerful route into analysis of mendelian disorders and provides a platform for GWAS studies, custom designs offer significant advantages where the biological question is more focused, such as GWAS follow up or investigations into the mutational analysis of specific pathways or genes in a clinical context. With the increased focus on a smaller number of targets a number of advantages are realised including: Higher sample throughput with increased multiplexing opportunities Increased read depth and decreased off-target noise leading to improved sensitivity and specificity for variant detection Decreased computational complexity of analysis The ability to capture and sequence regions not covered by whole exome capture kits Increased cost-efficiency saving money or allowing more samples For more information about Genefficiency Sequencing Services, visit www.ogt.co.uk/genefficiency or contact us on +44 (0) 1865 856826.

References 1. Choi, M. et al (2009) Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences of the United States of America 106, 19096-19101 2. Ng, S.B. et al (2010) Exome sequencing identifies the cause of a mendelian disorder. Nature Genetics 42, 30-35 3. Ng, S.B. et al (2009) Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272-276 4. Wei, X. et al (2011) Exome sequencing identifies GRIN2A as frequently mutated in melanoma. Nature Genetics 43, 442-446 5. Yan, X.J. et al (2011) Exome Sequencing identifies somatic mutations of DNA methyltransferase gene DNMT3A in acute monocytic leukemia. Nature Genetics 43, 309-315 6. Lehne, B. et al (2011) Exome localisation of complex disease association signals. BMC Genomics 12:92 7. Klassen, T. et al (2011) Exome Sequencing of Ion Channel Genes Reveals Complex Profiles Confounding Personal Risk Assessment in Epilepsy. Cell, 145, 1036-1048 8. Tewhey, R. et al (2009) Enrichment of sequencing targets from the human genome by solution hybridisation. Genome Biology 10(10): R116 9. The 1000 Genomes Project Consortium (2010) A map of human genome variation from populationscale sequencing. Nature 467, 1061-1073 Genomic Services from OGT: Combining industry leading platforms, expert people, unparalleled process power and performance to rapidly deliver high quality genomic data to you. We call this Genefficiency Oxford Gene Technology T: +44 (0)1865 856826 E: services@ogt.co.uk W: www.ogt.co.uk This document and its contents are Oxford Gene Technology IP Limited 2012. All rights reserved. OGT, Genefficiency and Oxford Gene Technology are trademarks of Oxford Gene technology IP Limited.