Amplicon DS App v1.1. Introduction 3 Running Amplicon DS 5 Amplicon DS Output 7 Amplicon DS Methods 21 Technical Assistance

Similar documents
Illumina Security Best Practices Guide

MiSeq Reporter Generate FASTQ Workflow Guide

RNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

TruSeq Custom Amplicon v1.5

Decode File Client User Guide

Illumina Q Financial Results. July 21, 2015

Rapid Aneuploidy and CNV Detection in Single Cells using the MiSeq System

HiSeq Analysis Software v0.9 User Guide

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Notice. DNA Sequencing Module User Guide

iscan System Quick Reference Guide

Illumina. LIMS Project Manager Guide. For Reasearch Use Only. ILLUMINA PROPRIETARY Catalog # SW Part # Rev. C

GenomeStudio Data Analysis Software

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

bcl2fastq2 Conversion Software User Guide

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Data Analysis for Ion Torrent Sequencing

GenomeStudio Data Analysis Software

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Illumina Sequencing Technology

Frequently Asked Questions Next Generation Sequencing

Nextera XT Library Prep: Tips and Troubleshooting

Text file One header line meta information lines One line : variant/position

LifeScope Genomic Analysis Software 2.5

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

Consistent Assay Performance Across Universal Arrays and Scanners

Simplifying Data Interpretation with Nexus Copy Number

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Step by Step Guide to Importing Genetic Data into JMP Genomics

Delivering the power of the world s most successful genomics platform

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

BlueFuse Multi Analysis Software for Molecular Cytogenetics

G E N OM I C S S E RV I C ES

Sequencing Library qpcr Quantification Guide

MiSeq System User Guide FOR RESEARCH USE ONLY

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Single Nucleotide Polymorphisms (SNPs)

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

SEQUENCING. From Sample to Sequence-Ready

Analysis of FFPE DNA Data in CNAG 2.0 A Manual

AS Replaces Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

Introduction to NGS data analysis

MassARRAY Typer 3.4 Software User s Guide for iplex and hme

Analysis of ChIP-seq data in Galaxy

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

MiSeq: Imaging and Base Calling

CNV Univariate Analysis Tutorial

HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT

QuantStudio 3D AnalysisSuite Software: Relative Quantification

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

Information Server Documentation SIMATIC. Information Server V8.0 Update 1 Information Server Documentation. Introduction 1. Web application basics 2

Reporting Student Progress and Achievement

Release Information. Copyright. Limit of Liability. Trademarks. Customer Support

Job Streaming User Guide

Version 5.0 Release Notes

Custom TaqMan Assays For New SNP Genotyping and Gene Expression Assays. Design and Ordering Guide

A guide to the analysis of KASP genotyping data using cluster plots

Business Portal for Microsoft Dynamics GP Field Service Suite

Module 1. Sequence Formats and Retrieval. Charles Steward

Business Portal for Microsoft Dynamics GP. Key Performance Indicators Release 10.0

NetApp SANtricity Management Pack for Microsoft System Center Operations Manager 3.0

Bioinformatics Resources at a Glance

DNA Copy Number and Loss of Heterozygosity Analysis Algorithms

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

Sequencing Analysis Software User Guide

DocAve 6 Service Pack 1 Job Monitor

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

Disease gene identification with exome sequencing

Practical Guideline for Whole Genome Sequencing

User Guide Package Exception Management

Replacing TaqMan SNP Genotyping Assays that Fail Applied Biosystems Manufacturing Quality Control. Begin

MicroStrategy Desktop

rbweb RB Web 8 online office for attorneys, paralegals and secretaries User Guide

UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C Form 10-K

Towards Integrating the Detection of Genetic Variants into an In-Memory Database

IGV Hands-on Exercise: UI basics and data integration

ATLAS.ti for Mac OS X Getting Started

Agilent CytoGenomics Software A Complete Solution for Cytogenetic Research Data Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis

Exercises for the UCSC Genome Browser Introduction

Radius Maps and Notification Mailing Lists

Microsoft Access 2010 handout

User Guide. Analytics Desktop Document Number:

Scheduling Guide Revised August 30, 2010

PHI Audit Us er Guide

ADP Workforce Now V3.0

for Sage 100 ERP General Ledger Overview Document

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Vector NTI Advance 11 Quick Start Guide

Business Portal for Microsoft Dynamics GP. Electronic Document Delivery Release 10.0

User Manual. Transcriptome Analysis Console (TAC) Software. For Research Use Only. Not for use in diagnostic procedures. P/N Rev.

Scientific Graphing in Excel 2010

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

Welcome to PowerClaim Net Services!

Software Getting Started Guide

Transcription:

Amplicon DS App v1.1 Introduction 3 Running Amplicon DS 5 Amplicon DS Output 7 Amplicon DS Methods 21 Technical Assistance ILLUMINA PROPRIETARY 15066594 Rev. B December 2014

This document and its contents are proprietary to Illumina, Inc. and its affiliates ("Illumina"), and are intended solely for the contractual use of its customer in connection with the use of the product(s) described herein and for no other purpose. This document and its contents shall not be used or distributed for any other purpose and/or otherwise communicated, disclosed, or reproduced in any way whatsoever without the prior written consent of Illumina. Illumina does not convey any license under its patent, trademark, copyright, or common-law rights nor similar rights of any third parties by this document. The instructions in this document must be strictly and explicitly followed by qualified and properly trained personnel in order to ensure the proper and safe use of the product(s) described herein. All of the contents of this document must be fully read and understood prior to using such product(s). FAILURE TO COMPLETELY READ AND EXPLICITLY FOLLOW ALL OF THE INSTRUCTIONS CONTAINED HEREIN MAY RESULT IN DAMAGE TO THE PRODUCT(S), INJURY TO PERSONS, INCLUDING TO USERS OR OTHERS, AND DAMAGE TO OTHER PROPERTY. ILLUMINA DOES NOT ASSUME ANY LIABILITY ARISING OUT OF THE IMPROPER USE OF THE PRODUCT(S) DESCRIBED HEREIN (INCLUDING PARTS THEREOF OR SOFTWARE) OR ANY USE OF SUCH PRODUCT(S) OUTSIDE THE SCOPE OF THE EXPRESS WRITTEN LICENSES OR PERMISSIONS GRANTED BY ILLUMINA IN CONNECTION WITH CUSTOMER'S ACQUISITION OF SUCH PRODUCT(S). FOR RESEARCH USE ONLY 2014 Illumina, Inc. All rights reserved. Illumina, 24sure, BaseSpace, BeadArray, BlueFish, BlueFuse, BlueGnome, cbot, CSPro, CytoChip, DesignStudio, Epicentre, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, HiSeq X, Infinium, iscan, iselect, MiSeq, NeoPrep, Nextera, NextBio, NextSeq, Powered by Illumina, SeqMonitor, SureMDA, TruGenome, TruSeq, TruSight, Understand Your Genome, UYG, VeraCode, verifi, VeriSeq, the pumpkin orange color, and the streaming bases design are trademarks of Illumina, Inc. and/or its affiliate(s) in the U.S. and/or other countries. All other names, logos, and other trademarks are the property of their respective owners.

Introduction The BaseSpace Amplicon DS app analyzes DNA samples that have been prepared using a mirrored, dual strand TruSeq Amplicon method, such as TruSight Tumor assay samples. After alignment, variant calling is performed using the Somatic Variant Caller. See the following pages for more information: } Somatic Variant Caller on page 22 Variants are called only for the target regions, which are specified in the manifest files. Statistics reporting accumulates coverage and enrichment-specific statistics for each target as well as overall metrics. The main output files generated by the Amplicon DS app are: } BAM files, containing the reads after alignment plus alignment details. } VCF files, containing the variant calls. } Genome VCF (.genome.vcf) files, describing the calls for all variant and non-variant sites in the genome. } Annotation file (ANT). This binary file can be loaded into VariantStudio for viewing; see www.illumina.com/clinical/clinical_informatics/illumina-variantstudio.ilmn. In addition, there are analysis reports and summary.csv files. See Amplicon DS Methods on page 21 and Amplicon DS Output Files on page 14 for more information. Introduction Figure 1 Amplicon DS App Workflow Amplicon DS v1.1 User Guide 3

Versions Limitations The following module versions are used in the Amplicon DS app: } Amplicon DS (BaseSpace Workflow) 1.1 } Isis (Analysis Software) 2.5.40.16 } SAMtools 0.1.19-isis-1.0.3 } Somatic Variant Caller 3.5.2.1 } IAS (Annotation Service) VEP 72.5 Before running the Amplicon DS app, be aware of the limitations described in this topic. Current Limitations: } This app supports the analysis of TruSight Tumor samples or other samples prepared using a mirrored, dual strand TruSeq Amplicon assay. Technical Limitations: } Variants are found only in the regions that are targeted in the manifest, which is loaded automatically for the TruSight Tumor assay. } Reads must be at least 50 bases in length. } No minimum number of reads is required, but use sufficient data for each sample to support appropriate depth of coverage for variant calling. } App supports running only one manifest per analysis. } Samples need to all be paired-end or all single-end; a warning is thrown if singleend. } Samples all need to have the same read lengths. 4 15066594 Rev. B

Running Amplicon DS Figure 2 1 Click the Apps button. The BaseSpace Apps page, which lists all the available BaseSpace applications, opens. 2 Find Amplicon DS in the list of applications, and then click the Launch button. An End-User License Agreement (EULA) page might open; otherwise, the Amplicon DS Input page opens. 3 If applicable, read the EULA, and then click Accept. The EULA closes and the Amplicon DS Input page opens. Amplicon DS Input Form Running Amplicon DS 4 Enter the input information for the Amplicon DS app. Analysis Name: A default name (App name and the current date and time) is provided. You can leave the default name as-is, or modify it. Save Results To: Click Select Projects to open the Select Projects popup, and then do one of the following: Select an existing project for which you have Write permissions to store the app results, and then click Confirm. Click New to open to open the Create Project popup, and then do the following: Enter a name, and optionally, a description for the new project. Click Create to display the new project at the top of the Select Projects popup. Select the new project, and then click Confirm. NOTE You can enter search criteria in the Search field to search for a project in the list. Variant Caller: Somatic is the only variant caller available, and you cannot clear this option. Amplicon DS v1.1 User Guide 5

NOTE Somatic in the only variant caller that is available because the Amplicon DS is designed to analyze samples that have been prepared with theillumina TruSight Tumor Library library prep kit. The Somatic caller is designed to detect low frequency mutations (approximately 5% mutant allele frequencies). Annotation: Select the gene and transcript annotation reference database for annotating the called variants. Samples: To add each sample for a pooled sample pair, do the following: Click Select Pairs(s) to open the Select Pairs popup. If needed, search for the samples you want to analyze. Select a sample from the search results, or click Select All to select all the samples, and then drag and drop the samples to the appropriate location (FPA or FPB). Click Confirm. NOTE Select matching pooled sample pairs. 5 Click Continue. The Amplicon DS app is launched and the analysis of the samples is started. The Analysis Info page opens. This page shows the following information about the analysis: the analysis name, the selected application, the analysis start date, the analysis completion date, the duration of the analysis, the session type, and the analysis status. Figure 3 Amplicon DS Analysis Info page You receive an email when the analysis is complete. You can then open the specific project to view the analysis results. See Amplicon DS Output on page 7 6 15066594 Rev. B

Amplicon DS Output After you launch the Amplicon DS and start the analysis of the samples, you can sign out of the application. You receive an email when the analysis is complete. You can then sign back in to the application, and open the specific project to view the analysis results, which include various reports and output files. This chapter describes the Amplicon DS app output. NOTE When the project analysis is complete, you receive a notification email. You can click the link in the email to navigate to the specific project, or you can log on to BaseSpace and navigate to the project as described in Steps 1 and 2. 1 Click the Projects button. The Projects page, which lists all the BaseSpace projects for which you have permissions, opens. 2 Click the name of the appropriate project. The Analyses page opens. This page lists all the analyses that have been carried out for the selected project. 3 Click the appropriate analysis. The Project Output page opens. The page shows the Pairwise Analysis report for the matching sample pair that was analyzed last. The left side of the page shows the Output Navigation bar. Amplicon DS Output Figure 4 Amplicon DS Output Navigation Bar 4 Click an option on the Output Navigation bar to access the associated project output. See: Analysis Info: A read-only summary of the project analysis settings. See Analysis Info on page 13 for descriptions. Inputs: A read-only summary of the project input settings. See Inputs on page 14 for descriptions. Output Files: Access to the individual project output files. See Amplicon DS Output Files on page 14 for descriptions. Analysis Reports Pairwise Analysis reports: Access to the analysis reports for a single sample pair. One entry for each sample pair that was analyzed, with the entries displayed in reverse chronological order of analysis. See Sample Analysis Reports on page 8 for a description. Amplicon DS v1.1 User Guide 7

Sample Analysis Reports Aggregate Summary report: Access to the analysis metrics for the aggregate results. The Aggregate Summary option is only displayed if multiple sample pairs were analyzed. See Aggregate Summary Report on page 10 for descriptions. The Pairwise Analysis report provides an overview of the analysis statistics for each sample pair separated by pools. The information is shown is tables and in plots. The Pairwise Analysis report page also has an option for downloading the Summary Report (Sample Information, Sample Analysis, and Analysis Details) as a PDF. Amplicon Summary Statistic Number of Amplicon Regions Total Length of Amplicon Regions The number of amplicon regions that were sequenced. The total length of the sequenced amplicon regions in base pairs. } Read Level Statistics Statistic Total aligned reads Percent aligned reads The total number of reads passing filter present in the data set that aligned to the reference genome. The percentage of reads passing filter that aligned to the reference genome. } Base Level Statistics Statistic Total Aligned Bases Percent Aligned Bases Percent Q30 The total number of bases present in the data set that aligned to the reference genome. The percentage of bases that aligned to the reference genome. The percentage of bases with a quality score of 30 or higher. Mismatch Rate The average percentage of mismatches across both reads 1 and 2 over all cycles. Small Variants Summary The Small Variants Summary provides metrics about the number of SNVs, deletions, and insertions. Data are first analyzed for each individual pool (FPA and FPB). The data are then reconciled for the two pools and a consensus call is made for all the called variants. 8 15066594 Rev. B

Statistic Total Passing Percent Found in dbsnp Het/Hom Ratio Ts/Tv Ratio The total number of variants present in the data set that passed the variant quality filters. 100*(Number of SNVs in dbsnp/number of SNVs). The SNVs that were found in the dbsnp are annotated accordingly. Number of heterozygous variants/number of homozygous variants. Transition rate of SNVs that pass the quality filters divided by transversion rate of SNVs that pass the quality filters. Transitions are interchanges of purines (A, G) or of pyrimidines (C, T). Transversions are interchanges between purine and pyrimidine bases (for example, A to T). Amplicon DS Output Coverage Summary The Coverage Summary provides details about the uniformity of coverage and mean coverage separately for each sample pool. Typically, if the two pools were consistently prepared, the results are similar between the two pools. Statistic Amplicon Mean coverage Uniformity of Coverage The mean coverage across all sequenced amplicons. The percentage of amplicon regions with coverage values greater than the low coverage threshold, where the low coverage threshold is defined as (0.2 * Amplicon Mean coverage). Coverage by Amplicon Region (Overall) plot The Coverage by Amplicon Region plot shows the coverage across the entire panel of amplicons for both pools, with each amplicon shown as individual data point. The X axis shows the individual amplicon data points, and the Y axis shows the coverage (Log10). This layout is useful for quickly identifying outliers. The plot has the following characteristics: } Amplicon regions that have coverage values that are greater than the Low Coverage Threshold (0.2 * Amplicon Mean coverage) are shown as blue data points. } Amplicon regions that have coverage values that are less than the Low Coverage Threshold (0.2 * Amplicon Mean coverage) are shown as red data points. } The horizontal orange line marks the Moving Average across all coverage values. } The horizontal red line marks the Low Coverage Threshold. Amplicon DS v1.1 User Guide 9

Figure 5 Coverage by Amplicon Region (Overall) plot You can place your cursor on any amplicon data point in the plot to open a tooltip that details the coverage that was achieved for the amplicon. The tooltip also provides other information about the amplicon region, including the gene name, the exome name, the chromosome number, and an amplicon-specific string. You can export the plot to a CSV file. Figure 6 Coverage by Amplicon Region (Overall) plot with tooltip Coverage by Amplicon Region (Pool) plot The Coverage by Amplicon Region (Pool) plot is identical to the Coverage by Amplicon Region (Overall) plot with one exception - the information that is shown for an individual pool rather than for both pools. Aggregate Summary Report The Aggregate Summary report is available if more than one sample pair was analyzed. The Aggregate Summary report aggregates all the individual Pairwise Analysis reports for the project into a single report. The information is shown is tables and in plots. For any plot in the report, you can place your cursor on a data point to open a tooltip that shows the name of the associated sample and the value for the report metric. The Aggregate Summary report page also has an option for downloading the Summary 10 15066594 Rev. B

Report (Manifest Information, Sample Information, and Aggregate Summary Details) as a PDF. Figure 7 Aggregate Summary report plot with tooltip Manifest Information The manifest is the list of targeted regions that were analyzed. NOTE The Manifest Information is a fixed parameter for Amplicon DS. Amplicon DS Output Amplicon Summary } Read Level Statistics The Percent Aligned Reads is plotted against each sample for each analyzed pair, resulting in a single plot with one data point per sample pair. The following metrics are detailed in a table above the plot: Statistic Total Aligned Reads (R1/R2) Percent Aligned Reads (R1/R2) Overall Percent Aligned Reads The total number of reads passing filter present in the data set that aligned to the reference genome. Numbers are per read. The percentage of reads passing filter that aligned to the reference genome. The percentage of reads passing filter that aligned to the reference genome across both reads (R1 and R2). The value is the average of the individual Percent Aligned Reads values. } Base Level Statistics The Percent Aligned Bases is plotted against each sample for each analyzed pair, resulting in a single plot with one data point per sample pair. The following metrics are detailed in a table above the plot: Statistic Total Aligned Bases (R1/R2) The total number of bases present in the data set that aligned to the reference genome. Numbers are per read. Amplicon DS v1.1 User Guide 11

Statistic Overall Total Aligned Bases Percent Aligned Bases (R1/R2) Overall Percent Aligned Bases Percent Q30 Mismatch Rate The total number of bases present in the data set that aligned to the reference genome across both reads (R1 and R2). The value is the average of the individual Total Aligned Bases values. The percentage of bases that aligned to the reference genome. Numbers are per read. The percentage of bases that aligned to the reference genome across both reads (R1 and R2). The value is the average of the individual Percent Aligned Bases values. The percentage of bases with a quality score of 30 or higher. The percentage mismatch to the reference genome averaged over cycle per read (R1 and R2). Small Variants Summary } SNVs The total number of SNVs that passed the quality filters is plotted against each sample for each analyzed pair, resulting in a single plot with one data point per sample pair. The following metrics are detailed in a table above the plot: Statistic SNVs Percent Found in dbsnp Ts/Tv Ratio SNV Het/Hom Ratio The total number of variants present in the data set that passed the variant quality filters. 100*(Number of SNVs in dbsnp/number of SNVs). The SNVs that were found in the dbsnp are annotated accordingly. Transition rate of SNVs that pass the quality filters divided by transversion rate of SNVs that pass the quality filters. Transitions are interchanges of purines (A, G) or of pyrimidines (C, T). Transversions are interchanges between purine and pyrimidine bases (for example, A to T). Number of heterozygous variants/number of homozygous variants. } Insertions The total number of insertions that passed the quality filters is plotted against each sample for each analyzed pair, resulting in a single plot with one data point per sample pair. The following metrics are detailed in a table above the plot: 12 15066594 Rev. B

Statistic Insertions Percent Found in dbsnp Insertion Het/Hom Ratio The total number of variants present in the data set that passed the variant quality filters. 100*(Number of SNVs in dbsnp/number of SNVs). The SNVs that were found in the dbsnp are annotated accordingly. Number of heterozygous variants/number of homozygous variants. } Deletions The total number of deletions that passed the quality filters is plotted against each sample for each analyzed pair, resulting in a single plot with one data point per sample pair. The following metrics are detailed in a table above the plot: Amplicon DS Output Statistic Deletions Percent Found in dbsnp Deletions Het/Hom Ratio The total number of variants present in the data set that passed the variant quality filters. 100*(Number of SNVs in dbsnp/number of SNVs). The SNVs that were found in the dbsnp are annotated accordingly. Number of heterozygous variants/number of homozygous variants. Coverage Summary The Amplicon Mean Coverage Depth is plotted against each sample for each analyzed pair, resulting in a single plot with one data point per sample pair. The following metrics are detailed in a table above the plot: Statistic Amplicon Mean Coverage Depth Uniformity of Coverage The total number of aligned bases to the targeted region divided by the targeted region size. The percentage of amplicon regions with coverage values greater than the low coverage threshold, where the low coverage threshold is defined as (0.2 * Amplicon Mean coverage). Analysis Info This app provides an overview of the analysis on the Analysis Info page. A brief description of the metrics is below. Amplicon DS v1.1 User Guide 13

Table 1 Analysis Info Row Name Application Date Started Date Completed Duration Session Type Size Status Name of the app session. App that generated this analysis. Date and time the app session started. Date and time the app session completed. Duration of analysis. The number of nodes used. Total size of all output files. Status of the app session. Log Files Clicking the Log Files link on the Analysis Info page provides access to the app log files. Inputs The Inputs page provides an overview of the input samples and settings that were specified when the Amplicon DS project was set up. Amplicon DS Output Files The Output Files page provides access to the output files for each sample pair that was analyzed, with one top-level folder for each sample pair. If more than one sample pair was analyzed, then an Aggregate Summary folder is also available. Figure 8 Output Files Folder Structure You can click a top-level sample pair folder to open it and view the following: } A folder for each sample that was analyzed for the pair (FPA folder and FPB folder.) } A list of files that detail the consensus data for the sample pair, including a PDF Sequencing Report file that provides a detailed summary of the analysis. The detailed summary consists of Sample Information, Amplicon Summary, Read Level Statistics, Base Level Statistics, Small Variants Summary, Coverage Summary and Plots, Analysis Details, and Software Versions. 14 15066594 Rev. B

Figure 9 Opening a top-level sample pair folder Amplicon DS Output You can click either the FPA or FPB folder to open and view a list of files that detail the raw data for the sample. You can also click any file link to open a popup that shows an onscreen preview of the file with an option to download the file. Figure 10 Previewing an Output data file For detailed descriptions about the different types of Output files that are available for samples and consensus data, see the following: } BAM Files on page 16 } VCF Files on page 16 } gvcf Files on page 18 } ANT File on page 18 } Summary.csv File on page 18 Amplicon DS v1.1 User Guide 15

BAM Files The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mb) produced by different sequencing platforms. SAM is a text format file that is humanreadable. The Binary Alignment/Map (BAM) keeps the same information as SAM, but in a compressed, binary format that is only machine readable. If you use an app in BaseSpace that uses BAM files as input, the app locates the file when launched. If using BAM files in other tools, download the file to use it in the external tool. Go to samtools.sourceforge.net/sam1.pdf to see the exact SAM specification. VCF Files VCF is a text file format that contains information about variants found at specific positions in a reference genome. The file format consists of meta-information lines, a header line, and then data lines. Each data line contains information about a single variant. If you use an app in BaseSpace that uses VCF files as input, the app locates the file when launched. If using VCF files in other tools, download the file to use it in the external tool. A detailed description of the VCF format is provided in the BaseSpace User Guide. Amplicon DS VCF Entries The VCF files for Amplicon DS can have the following entries in the FILTER, FORMAT, and INFO fields: Table 2 VCF FILTER Entries Entry Description LowGQ LowVariantFreq PB R8 SB LowDP The genotyping quality (GQ) is below a cutoff. The variant frequency is less than the given threshold. The prevalence of the variant is significantly biased between the two forward and reverse probe pools. For an indel, the number of adjacent repeats (1-base or 2-base) in the reference is greater than 8. The strand bias is more than the given threshold. Applied to sites with depth of coverage that is below a cutoff. Table 3 VCF FORMAT Entries Entry Description AD GQ Allelic depths for the ref and alt alleles in the order listed. For indels, this value includes only the reads that confidently support each allele (posterior probability 0.999 or higher that read contains indicated allele vs all other intersecting indel alleles). Genotype Quality. 16 15066594 Rev. B

Entry GQX GT NL PB SB VF Description Minimum of {Genotype quality assuming variant position,genotype quality assuming non-variant position}. Genotype. Noise level, as a Q score. Probe pool bias. Strand bias. Variant frequency in the sample. Amplicon DS Output Table 4 VCF INFO Entries Entry Description AA AF1000G clinvar cosmic CSQR CSQT DP EVS The inferred allele ancestral to the chimpanzee/human lineage. The allele frequency from all populations of 1000 genomes data. Clinical significance from the ClinVar database (www.ncbi.nlm.nih.gov/clinvar/). The numeric identifier for the variant in the Catalogue of Somatic Mutations in Cancer (COSMIC) database (www.cancer.sanger.ac.uk/cancergenome/projects/cosmic/). Regulatory consequence as predicted by Variant Effect Predictor (www.ensembl.org/info/docs/tools/vep/index.html) version 72. A comma-separated list for each affected regulatory region (including transcription factor binding sites) is provided using the following delimited format: RegulatoryID Consequence. The annotations provided in this field come from the Ensembl database of regulatory features even if RefSeq was selected as the annotation source. Many of the RegulatoryIDs begin with ENSR. The consequences are indicated using valid Sequence Ontology (SO) terms (www.ensembl.org/info/genome/variation/predicted_ data.html#consequences) and typically are either regulatory_ region_variant or TF_binding_site_variant. Transcript consequence as predicted by Variant Effect Predictor (www.ensembl.org/info/docs/tools/vep/index.html) version 72. Only canonical transcripts are included in the VCF file to maintain readability. The ANT file contains consequences for all affected transcripts. This binary file can be loaded into VariantStudio for viewing. See www.illumina.com/clinical/clinical_informatics/illuminavariantstudio.ilmn. The depth (number of base calls aligned to a position and used in variant calling). In regions of high coverage, GATK downsamples the available reads. Allele frequency, sample count, and coverage taken from the Exome Variant Server (EVS). Format: AlleleFreqEVS EVSCoverage EVSSamples. Amplicon DS v1.1 User Guide 17

Entry EXON FC GI GMAF phastcons TI Description A comma-separated list of exon regions read from RefGene. Functional consequence. A comma-separated list of gene IDs read from RefGene Global minor allele frequency (GMAF); technically, the frequency of the second most frequent allele. Format: GlobalMinorAllele AlleleFreqGlobalMinor. Denotes if the variant is an identical or similar sequence that occurs between species and maintained between species throughout evolution. A comma-separated list of transcript IDs read from RefGene. gvcf Files This application also produces the Genome Variant Call Format file (gvcf). gvcf was developed to store sequencing information for both variant and non-variant positions, which is required for human clinical applications. gvcf is a set of conventions applied to the standard variant call format (VCF) 4.1 as documented by the 1000 Genomes Project. These conventions allow representation of genotype, annotation, and other information across all sites in the genome in a compact format. Typical human wholegenome sequencing results expressed in gvcf with annotation are less than 1 Gbyte, or about 1/100 the size of the BAM file used for variant calling. If you are performing targeted sequencing, gvcf is also an appropriate choice to represent and compress the results. gvcf is a text file format, stored as a gzip compressed file (*.genome.vcf.gz). Compression is further achieved by joining contiguous non-variant regions with similar properties into single block VCF records. To maximize the utility of gvcf, especially for high stringency applications, the properties of the compressed blocks are conservative. Block properties like depth and genotype quality reflect the minimum of any site in the block. The gvcf file can be indexed (creating a *.tbi file) and used with existing VCF tools such as tabix and IGV, making it convenient both for direct interpretation and as a starting point for further analysis. For more information, see sites.google.com/site/gvcftools/home/about-gvcf. ANT File The Illumina Annotation Service (IAS) generates a binary ANT annotation file, which contains consequences for all affected transcripts. The annotations are more detailed than the annotations in the VCF file. This binary file can be loaded into VariantStudio for viewing; see www.illumina.com/clinical/clinical_informatics/illuminavariantstudio.ilmn. Summary.csv File The Amplicon DS app produces an overview of statistics for each sample and the aggregate results in a comma-separated values (CSV) format: the *summary.csv. These files are located in the results folder for each sample and the aggregate results. 18 15066594 Rev. B

Statistic Sample ID Sample Name Run Folder Manifest Reference genome Number of amplicon regions IDs of samples reported on in the file. Names of samples reported on in the file. Run folders for samples reported on in the file. The manifest file used for analysis. This file specifies the targeted regions for the aligner and variant caller. Reference genome selected. The number of amplicon regions that were sequenced. Amplicon DS Output Total length of amplicon regions Total PF reads Total aligned reads Percent aligned reads Total PF bases Total aligned bases Percent aligned bases Percent Q30 Mismatch rate Amplicon mean coverage SNVs, Insertions, Deletions The total length of the sequenced bases in the target region. The number of reads passing filter for the sample. The total number of reads passing filter present in the data set that aligned to the reference genome. Numbers are calculated per read, and over both reads. The percentage of reads passing filter that aligned to the reference genome. Numbers are calculated per read, and over both reads. The number of bases passing filter for the sample. The total number of bases present in the data set that aligned to the reference genome. Numbers are calculated per read, and over both reads. The percentage of bases that aligned to the reference genome. Numbers are calculated per read, and over both reads. The percentage of bases with a quality score of 30 or higher. Numbers are calculated per read. The average percentage of mismatches across both reads 1 and 2 over all cycles. Numbers are calculated per read. The total number of aligned bases to the targeted region divided by the targeted region size. Total number of variants present in the data set that pass the quality filters. Amplicon DS v1.1 User Guide 19

Statistic SNVs, Insertions, Deletions (Percent found in dbsnp) SNV Ts/Tv ratio SNVs, Insertions, Deletions Het/Hom ratio 100*(Number of variants in dbsnp/number of variants). The number of Transition SNVs that pass the quality filters divided by the number of Transversion SNVs that pass the quality filters. Transitions are interchanges of purines (A, G) or of pyrimidines (C, T). Transversions are interchanges of purine and pyrimidine bases (for example, A to T). Number of heterozygous variants/number of homozygous variants. 20 15066594 Rev. B

Amplicon DS Methods Alignment The Amplicon DS workflow evaluates short regions of amplified DNA, or amplicons, for variants. Focused sequencing of amplicons enables high coverage of particular regions across many samples. Amplicon DS samples are generated using a mirrored, dual strand amplicon assay. This chapter describes the methods that are used in the Amplicon DS application. Clusters from each sample are aligned against amplicon sequences specified in the manifest file. Each paired-end read is initially evaluated in terms of its alignment to the relevant probe sequences for that read. Read 1 is evaluated against the reverse complement of the Downstream Locus-Specific Oligos (DLSO) and Read 2 is evaluated against the Upstream Locus-Specific Oligos (ULSO). If the start of a read matches a probe sequence with no more than one mismatch, the full length of the read is aligned against the amplicon target for that sequence. This alignment is performed along the length of the amplicon target using a banded Smith-Waterman alignment algorithm. The banded Smith-Waterman algorithm performs local sequence alignments to determine similar regions between two sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths, given the restriction that the maximum indel size is 25 bp. Any alignments that include more than three indels are filtered from alignment results. Filtered alignments are written in alignment (BAM) files as unaligned and are not used in variant calling. Indels within the DLSO and ULSO are not observed given the assay chemistry. Amplicon DS Methods Paired-End Evaluation For paired-end runs, the top-scoring alignment for each read is considered. Reads are flagged as an unresolved pair under the following conditions: } If either read did not align, or the paired reads aligned to different chromosomes. } If two alignments come from different amplicons or different rows in the Targets section of the manifest. Bin/Sort Variant Calling The bin/sort step groups reads by sample and chromosome, and then sorts by chromosome position. Results are written to one BAM file per sample. SNPs and short indels are identified using the somatic variant caller. Developed by Illumina, the somatic variant caller identifies variants present at low frequency in the DNA sample and minimizes false positives. The somatic variant caller identifies SNPs in three steps: } Considers each position in the reference genome separately. } Counts bases at the given position for aligned reads that overlap the position. } Computes a variant score that measures the quality of the call. Amplicon DS v1.1 User Guide 21

Variant scores are computed using a Poisson model that excludes variants with a variant quality score below Q20. Additionally, the model only calls variants for bases that are covered at 300x or greater for a single amplicon. Variants are first called for each pool separately. Then, variants from the two pools are compared and combined into a single output file. If a variant meets the following criteria, the variant is marked as PASS in the variant file: } Must be present in both pools. } Cumulatively have a depth of 1000 or an average depth of 500x per pool. } Meets all the VCF filter requirements as specified. See Amplicon DS VCF Entries on page 16. For more information, see the Amplicon - DS Variant Caller Technical Note on the TruSight Tumor Sample Preparation support page. Somatic Variant Caller The Somatic Variant Caller is designed for variant calling in tumor samples with no paired normal. The Somatic Variant Caller is recommended for detection of low frequency variants, such as those found in heterogeneous cancer samples. Variants are flagged as homozygous or heterozygous in the VCF sample column, with either a 1/1 or a 0/1 respectively. During somatic variant calling, somatic variants are observed at any frequency. Therefore, het/hom calls are made to indicate the most reasonable diploid genotype that can be assigned to a variant if it is a non-somatic (germline) variant. For more information about the Somatic Variant Caller, see res.illumina.com/documents/products/technotes/technote_somatic_variant_caller.pdf. Illumina Annotation Service (IAS) Annotation with IAS populates several values in the VCF file, including dbsnp ID (in the ID column), and some values in the INFO column. More detailed and extensive annotations are stored in a binary ANT file. This binary file can be imported into VariantStudio. Annotation through IAS is available for alignments against the human reference genome: UCSC build hg19 / Ensembl build GRCh37 / NCBI build37.2. 22 15066594 Rev. B

Technical Assistance For technical assistance, contact Illumina Technical Support. Table 5 Illumina General Contact Information Website Email www.illumina.com techsupport@illumina.com Table 6 Illumina Customer Support Telephone Numbers Region Contact Number Region Contact Number North America 1.800.809.4566 Italy 800.874909 Austria 0800.296575 Netherlands 0800.0223859 Belgium 0800.81102 Norway 800.16836 Denmark 80882346 Spain 900.812168 Finland 0800.918363 Sweden 020790181 France 0800.911850 Switzerland 0800.563118 Germany 0800.180.8994 United Kingdom 0800.917.0041 Ireland 1.800.812949 Other countries +44.1799.534000 Technical Assistance Safety Data Sheets Safety data sheets (SDSs) are available on the Illumina website at support.illumina.com/sds.html. Product Documentation Product documentation in PDF is available for download from the Illumina website. Go to support.illumina.com, select a product, then click Documentation & Literature. Amplicon DS v1.1 User Guide

Illumina San Diego, California 92122 U.S.A. +1.800.809.ILMN (4566) +1.858.202.4566 (outside North America) techsupport@illumina.com www.illumina.com