Data Analysis for Ion Torrent Sequencing



Similar documents
SEQUENCING. From Sample to Sequence-Ready

Introduction to next-generation sequencing data

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls?

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

PreciseTM Whitepaper

Introduction to NGS data analysis

TruSeq Custom Amplicon v1.5

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Next Generation Sequencing

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Application Guide... 2

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Introduction To Real Time Quantitative PCR (qpcr)

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Genome Sequencer System. Amplicon Sequencing. Application Note No. 5 / February

DNA and Forensic Science

How many of you have checked out the web site on protein-dna interactions?

Next Generation Sequencing: Technology, Mapping, and Analysis

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

How is genome sequencing done?

Next generation DNA sequencing technologies. theory & prac-ce

Technical Note. Roche Applied Science. No. LC 18/2004. Assay Formats for Use in Real-Time PCR

PrimePCR Assay Validation Report

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

360 Master Mix. , and a supplementary 360 GC Enhancer.

Development of two Novel DNA Analysis methods to Improve Workflow Efficiency for Challenging Forensic Samples

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

Essentials of Real Time PCR. About Sequence Detection Chemistries

PNA BRAF Mutation Detection Kit

14/12/2012. HLA typing - problem #1. Applications for NGS. HLA typing - problem #1 HLA typing - problem #2

MUTATION, DNA REPAIR AND CANCER

FOR REFERENCE PURPOSES

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

Forensic DNA Testing Terminology

BacReady TM Multiplex PCR System

Multiplex your most important

July 7th 2009 DNA sequencing

Methylation Analysis Using Methylation-Sensitive HRM and DNA Sequencing

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Chapter 6 DNA Replication

DNA Sequence Analysis

Overview of Next Generation Sequencing platform technologies

Real-time quantitative RT -PCR (Taqman)

Oncology Insights Enabled by Knowledge Base-Guided Panel Design and the Seamless Workflow of the GeneReader NGS System

Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA

Validation parameters: An introduction to measures of

Rapid Acquisition of Unknown DNA Sequence Adjacent to a Known Segment by Multiplex Restriction Site PCR

Core Facility Genomics

Commonly Used STR Markers

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

DNA Sequencing Troubleshooting Guide

Gene Expression Assays

DNA SEQUENCING SANGER: TECHNICALS SOLUTIONS GUIDE

Delivering the power of the world s most successful genomics platform

Improved methods for site-directed mutagenesis using Gibson Assembly TM Master Mix

Illumina TruSeq DNA Adapters De-Mystified James Schiemer

Description: Molecular Biology Services and DNA Sequencing

GenScript BloodReady TM Multiplex PCR System

Accurate and sensitive mutation detection and quantitation using TaqMan Mutation Detection Assays for disease research

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

Real-Time PCR Vs. Traditional PCR

EU Reference Laboratory for E. coli Department of Veterinary Public Health and Food Safety Unit of Foodborne Zoonoses Istituto Superiore di Sanità

History of DNA Sequencing & Current Applications

Getting Started Guide

Real-time PCR: Understanding C t

Single Nucleotide Polymorphisms (SNPs)

Sequencing Guidelines Adapted from ABI BigDye Terminator v3.1 Cycle Sequencing Kit and Roswell Park Cancer Institute Core Laboratory website

Molecular and Cell Biology Laboratory (BIOL-UA 223) Instructor: Ignatius Tan Phone: Office: 764 Brown

PATHOGEN DETECTION SYSTEMS BY REAL TIME PCR. Results Interpretation Guide

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

Factors Influencing Multiplex Real-Time PCR

Sequencing and microarrays for genome analysis: complementary rather than competing?

Genomic DNA Extraction Kit INSTRUCTION MANUAL

European Medicines Agency

Rapid Aneuploidy and CNV Detection in Single Cells using the MiSeq System

Troubleshooting for PCR and multiplex PCR

510K Summary. This summary of 510(k) safety and effectiveness information is being submitted in accordance with the requirements of 21 CFR

Mitochondrial DNA Analysis

Gene Mapping Techniques

Illumina Sequencing Technology

PCR was carried out in a reaction volume of 20 µl using the ABI AmpliTaq GOLD kit (ABI,

Complete Genomics Sequencing

Next Generation Sequencing for DUMMIES

How To Use An Enzymatics Spark Dna Sample Prep Kit For Ion Torrent

Lectures 1 and February 7, Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

DNA: A Person s Ultimate Fingerprint

PrimePCR Assay Validation Report

Highly specific and sensitive quantitation

Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne

Comparing Methods for Identifying Transcription Factor Target Genes

Annex to the Accreditation Certificate D-PL according to DIN EN ISO/IEC 17025:2005

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

Amazing DNA facts. Hands-on DNA: A Question of Taste Amazing facts and quiz questions

Intended Use: The kit is designed to detect the 5 different mutations found in Asian population using seven different primers.

ChIP TROUBLESHOOTING TIPS

Transcription:

IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page 1 of 9

TABLE OF CONTENTS 1. KITS AND INTENDED USE... 3 2. PRINCIPLE OF THE METHOD... 3 3. MATERIALS AND EQUIPMENT REQUIRED BUT NOT PROVIDED... 5 4. FILES PROVIDED... 5 5. GENERAL CONSIDERATIONS... 5 5.1. DATA FILES... 5 5.2. STRUCTURE OF THE SEQUENCING READS... 5 5.3. DEMULTIPLEXING OF THE SEQUENCING READS... 5 5.4. TRIMMING OF THE SEQUENCING READS... 6 5.5. ALIGNMENT TO THE REFERENCE SEQUENCE... 6 5.6. VARIANT CALLING... 6 5.6.1. MINIMAL COVERAGE... 6 5.6.2. QUALITY SCORES... 7 5.7. CNV ANALYSIS... 7 6. SPECIFIC INSTRUCTIONS... 8 6.1. TORRENT SUITE TM SOFTWARE... 8 6.2. DROPGEN INSTRUCTIONS... 8 7. LIST OF ABBREVIATIONS... 9 Revision date: August 21, 2014 Page 2 of 9

1. KITS AND INTENDED USE The combined use of Multiplicom s MASTR (Multiplex Amplification of Specific Targets for Resequencing) kits with one or more of Multiplicom s molecular identifier (MID) kit(s) or Short Read Amplification kit enables the preparation of libraries for sequencing the gene(s) of interest using massively parallel sequencing (MPS) instruments. A list of available MASTR assays and Complementary MASTR products can be found on Multiplicom s website (http://), under Products section. These MASTR assays are for Research Use Only, unless otherwise stated, enabling the identification or confirmation of the presence or absence of mutations and/or copy number variations (CNV) in target regions. 2. PRINCIPLE OF THE METHOD Multiplicom s MASTR assays enable multiplex PCR amplification of all required target regions of the gene(s) of interest in a limited number of PCR reactions. The recommended amount of DNA for each multiplex PCR reaction is between 20 and 50 ng of purified genomic DNA for the germline MASTRs and somatic MASTRs for DNA derived from fresh frozen tissue (FFT), or a minimum of 20 ng for DNA derived from FFPE (formalin fixed paraffin embedded) material for somatic MASTRs. Next, the resulting amplicons are barcoded, pooled and sequenced using a MPS instrument according to the manufacturer s instructions. The resulting sequence read pairs are subsequently analyzed to identify variant positions compared with the reference sequence of the targeted gene(s). Comparing those variants with public and/or private databases and analyzing the predicted change on the protein level will allow the identification of mutations associated with health and disease. Moreover, a number of MASTR assays enable CNV analysis directly from MPS data. MASTR assays serve as front end amplification for sequence analysis on all commercially available bench top MPS instruments. The technology is based on target amplification. The principle of the MASTR assays relies on two key technologies: multiplex PCR amplification and Massively Parallel Sequencing (the detection method). In the first step, all target regions of the gene of interest are amplified in separate multiplex PCR amplification reactions (number of multiplex reactions is defined per MASTR assay) per individual, using a hot start DNA polymerase (Figure 1). The resulting amplicons of each multiplex are diluted 2,000 fold. Figure 1. First step: multiplex PCR Revision date: August 21, 2014 Page 3 of 9

For detailed workflow of this first step, please refer to the Instructions for Use Part I Multiplex PCR with amplicon specific primers: MASTR assays (IFU016). In the second step, a second round of PCR is performed enabling tagging of all the amplicons to incorporate MID and A and P1 adaptors required for Ion Torrent Sequencing (Figure 2). Figure 2. Second step: Universal PCR (example for Ion Torrent systems) The resulting tagged amplicons are mixed per individual applying a predefined assay specific mixing scheme. Each amplicon library is subsequently purified from small residual DNA fragments and the DNA concentration determined. For the detailed workflow of the second Universal PCR and subsequent mixing, purification and pooling steps please refer to the IFU Part II MID for Ion PGM TM System (IFU241 or IFU242). Next, these purified and individually tagged amplicon libraries are pooled equimolar, resulting in an amplicon pool or sequencing sample, which is then further processed with the Ion PGM TM Template OT2 400 Kit resulting in a template that is sequenced on an MPS Instrument according to the manufacturer s instructions. The positions of the Ion Torrent sequencing primers are indicated in Figure 3. Figure 3. Third step: Sequencing run. Revision date: August 21, 2014 Page 4 of 9

3. MATERIALS AND EQUIPMENT REQUIRED BUT NOT PROVIDED Equipment Analysis software for read counts and variant calling of the MPS data Recommendations/Comments Several software packages are commercially available. 4. FILES PROVIDED Table 1. Explanation of files supplied for data analysis File description MID sequences* (IFU333) PCR specific primers BED file Type and content General.pdf file listing the sequences of the MIDs present in the MID for Ion PGM TM System kits: for demultiplexing of reads (Section 5.3) MASTR specific.txt file listing the primers used for the amplification of the different amplicons: for sequence trimming (Section 5.4) MASTR specific.txt file listing the amplicon positions in Homo sapiens hg19 (MASTR specific primers are trimmed off): target info for data analysis in general format (Section 5.5) All files listed above can be downloaded from http:///keycode All documents mentioned above can be downloaded from http:///keycode using the KEY CODE printed on the box label of the specific MASTR kit (or MID for Ion PGM TM System kit*). 5. GENERAL CONSIDERATIONS 5.1. Data files For Ion Torrent sequencing, the Torrent Suite TM Software generates for each MID an SFF (Standard Flowgram Format) file or a FASTQ file containing all filter passed sequencing reads generated during the run. 5.2. Structure of the sequencing reads The structure of the sequencing reads is depicted in Figure 3: the reads start with the MID, followed by the universal tag sequence (Tag1 or Tag2), the PCR specific primer (Forward or Reverse) and the amplified region. Depending on the size of the amplified region and the length of the read, this sequence of the amplified region is further followed by the other PCR specific primer, universal tag and P1 adaptor. 5.3. Demultiplexing of the sequencing reads The MID sequences at the beginning and/or at the end of the reads are used to demultiplex the sequencing reads: to attribute the reads to one of the analysed samples or a no match residual category. Depending on the software tool used, the default being the Torrent Suite TM Software the number of allowed mismatches between the observed MID sequence and the expected MID sequences is an input parameter for the demultiplexing step. We advise to allow maximally 2 (tolerant) mismatches. Reducing the allowable mismatches reduces the risk for barcode misassignment; however, the number of reads assigned to a barcode will be reduced concomittantly. Revision date: August 21, 2014 Page 5 of 9

5.4. Trimming of the sequencing reads The PCR specific primer part in the sequencing reads is by definition equal to the genomic reference sequence and thus independent of the individual sample that is sequenced. As depicted in Figure 4, when 2 amplicons overlap, failure to trim the PCR primer sequences from the reads can result in skewed variant allele frequencies. Since virtually all MASTR assays contain overlapping amplicons, primer trimming is a mandatory step in the data analysis. The sequences of PCR primers (Figure 4a Forward2 and Reverse2) should be removed from those reads generated directly with them (Figure 4a Amplicon2 reads), and should not be removed from reads generated with other PCR primers (ie, from overlapping amplicons; Figure 4a Amplicon1 reads). This discrimination can be made based on the fact that the sequences of the PCR primers are flanked by the universal tags (Tag1, AAGACTCGGCAGCATCTCCA, or Tag2, GCGATCGTCACTGTTCTCCA), while the same sequences in the overlapping amplicons are not. Figure 4. PCR Primer trimming. a) Illustration before PCR primer trimming: alignment of Amplicon1 and Amplicon2 reads with Forward and Reverse primers. b) Illustration after PCR primer trimming. Remark: During design, great care was taken to select primer binding sites avoiding regions with variants. In addition, a periodic review is performed to identify newly reported variants in those regions and to test their impact on amplification. It can however not be excluded that a variant in a binding site of a primer may be present in a sample, which may lead to the amplification of only one of the alleles, masking the presence of a clinically relevant mutation in the amplicon. If such a case is suspected, calculation of the dosage quotient of each amplicon can be used for confirmation (as desctibed in Section 5.7). For further support, contact customer services at customerservice@multiplicom.com. 5.5. Alignment to the reference sequence The sequence reads can be aligned to the targeted regions or to the entire human genomic sequence. To facilitate the transfer of assay specific information to the different analysis software packages, a BED file with the trimmed amplicon positions on hg19 is available for download at our website. 5.6. Variant calling Different parameters can be analyzed to discriminate true positive variants from false positive or background signals. Below, you find a non exhaustive list of parameters whose effect on the sensitivity and specificity of variant calling might be evaluated: 5.6.1. Minimal coverage The coverage, or number of aligned reads, at the site of the variant has to reach a given threshold for confident variant detection. The minimal coverage recommended by Multiplicom for MASTRs in combination with an Ion PGM System is 100 reads for each position at the region of interest (50 reads per allele) for SNV analysis and 300 reads per amplicon for CNV analysis. It is advised that target regions that do not reach this minimal coverage are eliminated from the list of analysed target regions in the final variant calling report. Revision date: August 21, 2014 Page 6 of 9

In case of an amplicon library derived from a tumor tissue sample (FFPE or FFT) deeper sequencing might be needed to obtain the required minimal coverage of 50 reads per affected allele. Examples are when the sample contains clonal populations of tumor cells and/or has a lower percentage of tumor cells. In these cases the minimal numbers of reads should be recalculated accordingly (eg, 2 fold higher to identify positions with a variant allele frequency (VAF) of 25%, or for a sample with 50% tumor tissue content). 5.6.2. Quality scores The quality of the aligned bases at the position of the potential variant has an effect on the confidence in the variant call. This quality is generally influenced by the position in the read (the overall quality decreases along the reads) and the genomic context (eg, homopolymer stretches have a negative impact on the quality of the following bases). This leads to two derived parameters: Presence in forward and reverse reads Since the quality decreases along the reads and forward and reverse reads start at opposite positions on an amplicon, the quality of the forward reads is highest where the quality of the reverse reads is the lowest (and vice versa). If all target positions are covered by both forward and reverse reads, the presence of a variant in both forward and reverse reads is a good predictor for a true positive variant call. Changes in/around homopolymeric stretches In view of the inherent difficulties of the Ion Torrent sequencing technology to call the actual length of homopolymer stretches, special care has to be taken when calling variants in or flanking a homopolymeric stretch. Based on our experience, homopolymeric stretches with a length of 4 bp or more require special care. Remark: for specific MASTR assays, we offer a complementary homopolymer (HP) kit. For an overview of all available HP kits, please refer to the Products section on Multiplicom s website (http://). 5.7. CNV analysis CNV analysis is possible for a selected number of MASTR assays. These MASTR assays contain a separate set of control amplicons for each plex (located on chromosomes different from the target genes), which are amplified, tagged and sequenced in parallel with the targeted region. Only MASTRs listing such control amplicons on their GS Reference Pattern are suited for CNV analysis. Remark: Excel template sheets are available upon request (at customerservice@multiplicom.com) for the specific MASTR assays enabling CNV analysis. To use these sheets, the read counts (number of reads) of all amplicons in all samples should be extracted from the sequencing data. For CNV analysis using MPS data, read count comparison between target and control amplicons is performed to calculate the Dosage Quotient (DQ) as described: Read count of the amplicon of interest is divided by the sum of read counts of control amplicons of that plex (in other words: normalize on sum of control amplicons) = normalized read count The average of the normalized read counts of that amplicon for all samples is calculated = reference normalized read count The normalized read count is divided by the reference normalized read count = DQ When the DQ 1.3, the corresponding genomic fragment is considered to be present in 3 copies (duplication of one allele); when the DQ 0.7, the genomic fragment is considered to be present in only 1 copy (deletion of one allele). Revision date: August 21, 2014 Page 7 of 9

Remarks: (1) CNV analysis calculations always need to be made within a plex. (2) For the proper calculation of the reference normalized read count (in the calculation of the DQ as described above), the set of samples should meet the following requirements: o When using a set of known samples as references (no CNVs), the libraries of these samples should be constructed together with the unknown samples. o When using the other unknown samples of your run as references, only a 40% of samples from the total set is allowed to have a CNV. (3) Since polymorphisms in primer sites may lead to amplification of only one of the alleles, resulting in a false positive DQ 0.5, a detected CNV is only considered to be valid when 2 adjacent amplicons show a significantly altered DQ and/or when confirmed by an independent method. (4) Compared to variant analysis deeper sequencing is required for CNV analysis. For the precise list of amplicons that will be amplified using a certain PCR Mix, refer to the MASTR specific GS Reference Pattern, which can be obtained from http:///keycode using the KEY CODE printed on the box label of the used MASTR kit. 6. SPECIFIC INSTRUCTIONS Data analysis can be performed using a variety of analysis software packages. Below we provide some specific instructions for the use of the Torrent Suite TM software of Life Technologies (Section 6.1), and the dropgen application of the Integrated Clinical NGS Dry Lab Service of Sophia Genetics 6.2). 6.1. Torrent Suite TM software Life Technologies advises to align the generated sequences using the Torrent Suite Software and analyse the generated BAM files with the Torrent Variant Caller. One step in this process is the definition of the target regions. For this, the BED file mentioned in Table 1 should be used. More detailed information on these software solutions can be found on the Ion Community website (http://ioncommunity.lifetechnologies.com). 6.2. dropgen instructions The dropgen application should be used according to manufacturer s instructions. To access and use Sophia Genetics' service, laboratories shall request the creation of an account on the dropgen application by contacting Sophia Genetics directly: http://www.sophiagenetics.com/contact.php. Revision date: August 21, 2014 Page 8 of 9

7. LIST OF ABBREVIATIONS CNV: DNA: FFPE: IFU: MASTR: MID: MPS: PCR: Plex: ROI: SFF: TTC: VAF: Copy Number Variant Deoxyribonucleic acid formalin fixed paraffin embedded Instructions For Use Multiplex Amplification of Specific Target for Resequencing Molecular Identifiers Massively Parallel Sequencing Polymerase Chain Reaction Set of MASTR derived amplicons Region of Interest Standard Flowgram Format Tumor Tissue Content Variant Allele Frequency Revision date: August 21, 2014 Page 9 of 9