RNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance

Similar documents
Illumina Security Best Practices Guide

Decode File Client User Guide

MiSeq Reporter Generate FASTQ Workflow Guide

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

TruSeq Custom Amplicon v1.5

Illumina Q Financial Results. July 21, 2015

Rapid Aneuploidy and CNV Detection in Single Cells using the MiSeq System

iscan System Quick Reference Guide

Illumina. LIMS Project Manager Guide. For Reasearch Use Only. ILLUMINA PROPRIETARY Catalog # SW Part # Rev. C

GenomeStudio Data Analysis Software

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

bcl2fastq2 Conversion Software User Guide

GenomeStudio Data Analysis Software

HiSeq Analysis Software v0.9 User Guide

Consistent Assay Performance Across Universal Arrays and Scanners

Tutorial for proteome data analysis using the Perseus software platform

Illumina Sequencing Technology

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Frequently Asked Questions Next Generation Sequencing

Sequencing Analysis Software User Guide

TruSeq DNA Methylation Library Preparation Guide

LifeScope Genomic Analysis Software 2.5

MiSeq System User Guide FOR RESEARCH USE ONLY

Sequencing Library qpcr Quantification Guide

Notice. DNA Sequencing Module User Guide

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Analysis of ChIP-seq data in Galaxy

Introduction to NGS data analysis

Nextera XT Library Prep: Tips and Troubleshooting

ORACLE USER PRODUCTIVITY KIT USAGE TRACKING ADMINISTRATION & REPORTING RELEASE 3.6 PART NO. E

User Manual. Transcriptome Analysis Console (TAC) Software. For Research Use Only. Not for use in diagnostic procedures. P/N Rev.

Gene Expression Analysis

qpcr Quantification Protocol Guide

Release Information. Copyright. Limit of Liability. Trademarks. Customer Support

Expression Quantification (I)

Turbo Lister Listing Activity Quick Start Guide

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays

DocAve 6 Service Pack 1 Job Monitor

CA Nimsoft Monitor. Probe Guide for Active Directory Response. ad_response v1.6 series

Basic processing of next-generation sequencing (NGS) data

ICP Data Entry Module Training document. HHC Data Entry Module Training Document

NGS Data Analysis: An Intro to RNA-Seq

BioHPC Web Computing Resources at CBSU

Note: With v3.2, the DocuSign Fetch application was renamed DocuSign Retrieve.

WatchDox Administrator's Guide. Application Version 3.7.5

Introduction to Hyper-V High- Availability with Failover Clustering

Ver USERS MANUAL

EMC Documentum Webtop

CaseWare Audit System. Getting Started Guide. For Audit System 15.0

BlueFuse Multi Analysis Software for Molecular Cytogenetics

Business Objects Enterprise version 4.1. Report Viewing

A Tutorial in Genetic Sequence Classification Tools and Techniques

QuantStudio 3D AnalysisSuite Software: Relative Quantification

Chapter 2: Getting Started

2012 Teklynx Newco SAS, All rights reserved.

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

TOPS v3.2.1 Calendar/Scheduler User Guide. By TOPS Software, LLC Clearwater, Florida

Sage Accpac ERP 5.6A. CRM Analytics for SageCRM I User Guide

Novell ZENworks Asset Management 7.5

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

WD Sentinel DX4000. Small Office Storage Server Administrator s Quick Install Guide

Comparing Methods for Identifying Transcription Factor Target Genes

BANKSCOPE. Internet QuickGuide

Version 5.0 Release Notes

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

How To Convert A Lead In Sugarcrm

Administrator s Guide for the Polycom Video Control Application (VCA)

User Guide QAD Field Service Scheduler

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Excel Companion. (Profit Embedded PHD) User's Guide

User Self-Service Configuration Overview

Step-by-Step Guide to Basic Expression Analysis and Normalization

UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C Form 10-K

Integrating CaliberRM with Software Configuration Management Tools

Ad-hoc Reporting Report Designer

UGENE Quick Start Guide

Remote Support. User Guide 7.23

Utilities ComCash

Welcome to PowerClaim Net Services!

Use the Microsoft Office Word Add-In to Create a Source Document Template for Microsoft Dynamics AX 2012 WHITEPAPER

MICROSOFT OFFICE ACCESS NEW FEATURES

PreciseTM Whitepaper

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study

SonicWALL CDP 5.0 Microsoft Exchange InfoStore Backup and Restore

User guide. Tax & Accounting. Version 4.1. Last updated April 16, Copyright 2010 Thomson Reuters/ONESOURCE. All Rights Reserved

High Throughput Sequencing Data Analysis using Cloud Computing

Work with the File Library App

Contents Overview... 5 Configuring Project Management Bridge after Installation... 9 The Project Management Bridge Menu... 14

Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data

Software Application Tutorial

User Management Resource Administrator. UMRA Example Projects. Service Management

Technical Notes. EMC NetWorker Performing Backup and Recovery of SharePoint Server by using NetWorker Module for Microsoft SQL VDI Solution

Sage CRM. Sage CRM 7.3 Mobile Guide

AN4108 Application note

CTERA Agent for Mac OS-X

A Streamlined Workflow for Untargeted Metabolomics

Using InstallAware 7. To Patch Software Products. August 2007

G E N OM I C S S E RV I C ES

Transcription:

RNA Express Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance ILLUMINA PROPRIETARY 15052918 Rev. A February 2014

This document and its contents are proprietary to Illumina, Inc. and its affiliates ("Illumina"), and are intended solely for the contractual use of its customer in connection with the use of the product(s) described herein and for no other purpose. This document and its contents shall not be used or distributed for any other purpose and/or otherwise communicated, disclosed, or reproduced in any way whatsoever without the prior written consent of Illumina. Illumina does not convey any license under its patent, trademark, copyright, or common-law rights nor similar rights of any third parties by this document. The instructions in this document must be strictly and explicitly followed by qualified and properly trained personnel in order to ensure the proper and safe use of the product(s) described herein. All of the contents of this document must be fully read and understood prior to using such product(s). FAILURE TO COMPLETELY READ AND EXPLICITLY FOLLOW ALL OF THE INSTRUCTIONS CONTAINED HEREIN MAY RESULT IN DAMAGE TO THE PRODUCT(S), INJURY TO PERSONS, INCLUDING TO USERS OR OTHERS, AND DAMAGE TO OTHER PROPERTY. ILLUMINA DOES NOT ASSUME ANY LIABILITY ARISING OUT OF THE IMPROPER USE OF THE PRODUCT(S) DESCRIBED HEREIN (INCLUDING PARTS THEREOF OR SOFTWARE) OR ANY USE OF SUCH PRODUCT(S) OUTSIDE THE SCOPE OF THE EXPRESS WRITTEN LICENSES OR PERMISSIONS GRANTED BY ILLUMINA IN CONNECTION WITH CUSTOMER'S ACQUISITION OF SUCH PRODUCT(S). FOR RESEARCH USE ONLY 2014 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, 24sure, BaseSpace, BeadArray, BeadXpress, BlueFish, BlueFuse, BlueGnome, cbot, CSPro, CytoChip, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, HiSeq X, Infinium, iscan, iselect, MiSeq, MiSeqDx, NeoPrep, Nextera, NextSeq, NuPCR, SeqMonitor, Solexa, TruGenome, TruSeq, TruSight, Understand Your Genome, UYG, VeraCode, VeriSeq, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks of Illumina, Inc. in the U.S. and/or other countries. All other names, logos, and other trademarks are the property of their respective owners.

Introduction The BaseSpace RNA Express app combines the capabilities of the STAR aligner and DE-Seq analysis tools in one simple workflow. The aim of this app is to provide the most commonly used set of RNA analysis features in a convenient and rapid analysis package. Introduction Versions The following module versions are used in the RNA Express apps: } STAR 2.3.1s } DESeq2 1.0.17 Current Limitations Before running the RNA Express app, be aware of the following limitations: } Reads must be at least 35 bp and no more than 500 bp in length. } Individual samples must be between 100,000 and 400 million reads. } The total read count across all samples must be less than 2 billion reads. } Only UCSC hg19 (human), UCSC mm10 (mouse), and UCSC rn5 (rat) are currently supported. BaseSpace RNA Express User Guide 3

Run RNA Express 1 Navigate to the project or sample that you want to analyze. 2 Click the Launch App button and select RNA Express from the drop-down list. 3 Read the End-User License Agreement and permissions, and click Accept if you agree. 4 Fill out the app session storage information: a App Session Name: provide the app session name. Default name is the app name with the date and time the app session was started. b Save Results To: select the project that stores the app results. 5 Fill out the sample criteria: a Reference Genome: select the reference genome. b Stranded: Indicate if samples were stranded. c Trim TruSeq Adapters: If selected, the application attempts to trim TruSeq adapters from the FASTQ sequence. Typically, this trimming is unnecessary as adapter trimming is performed as part of demultiplexing during sample upload. However, if the user did not specify adapter sequences in the sample sheet during upload, this option provides a second opportunity to trim the adapters. 6 Fill out the control group information: a Group Label: provide the control group label. Default name is control. b Select Sample: browse to the sample you want to use as control, and select the checkbox. You can use multiple samples as control. 7 Fill out the comparison group information: a Group Label: provide the comparison group label. Default name is comparison. b Select Sample: browse to the sample you want to use as comparison, and select the checkbox. You can use multiple samples as comparison. 8 Click Continue. RNA Express starts analyzing your sample. When completed, the status of the app session is automatically updated, and you receive an email. 4 15052918 Rev. A

Figure 1 RNA Express Input Form Run RNA Express BaseSpace RNA Express User Guide 5

RNA Express App Output This chapter describes the RNA Express output. To go to the results, click the Projects button, then the project, then the analysis. Figure 2 RNA Express Output Navigation Bar When the analysis is completed, you can access your output through the left navigation bar, which provides the following: } Analysis Info: an overview of the analysis. See Analysis Info on page 9 for a description. } Inputs: an overview of the input samples and settings. See RNA Express Inputs Overview on page 10 for a description. } Output Files: access to the output files, organized by sample and app session. See RNA Express Output Files on page 10 for descriptions. } Analysis Reports: Summary: access to analysis metrics for the aggregate results. See RNA Express Report on page 6 for a description. RNA Express Report The RNA Express app provides an overview for all samples on the Summary page. A brief description of the metrics is below. Primary Analysis Information Statistic Read Length Number of reads Definition Number and length of reads. Total number of reads passing filter for this sample. Alignment Information Statistic Definition % Total Aligned Percentage of reads passing filter that aligns to the reference. 6 15052918 Rev. A

Statistic Definition % Abundant Percentage of reads that aligns to abundant transcripts, such as mitochondrial and ribosomal sequences. % Unaligned Percentage of reads that do not align to the reference. Multi-mapped (% Aligned Reads) Reads with spliced alignment (% Aligned Reads) Link to BAM File Read Counts The percentage of aligned reads that have more than one equally good alignment position in the genome. The percentage of aligned reads that map over splicing-events. Each case where a read-alignment skips over a known or discovered intron is counted. Download link to BAM file for this sample. RNA Express App Output Statistic Exonic Reads (%) Non-exonic Reads (%) Ambiguous Reads (%) Definition Reads mapping to exonic regions (% of uniquely aligned reads). Reads mapping to non-exonic regions (% of uniquely aligned reads). Reads aligning to more than one locus or to a locus overlapping multiple genes (% of uniquely aligned reads). Differential Expression Statistic Annotation Gene Count Assessed Gene Count Differentially Expressed Gene Count Link to Merged Gene Counts Link to Results Definition Number of genes in annotation. Number of genes tested for statistical significance. Number of significantly differentially expressed genes. Download link to CSV file CSV file describing the number of reads mapped to each gene for each sample in the control and comparison groups. Download link to CSV file describing the mean expression, log2 fold change, standard error of log2 fold change, p-value, adjusted p-value, and the expression status for each gene. Sample Correlation Matrix A heat map showing the relative similarity between all replicates in this analysis-run. Each row and column represents one replicate, ordered by similarity (hierarchical clustering). The color of each field indicates the Spearman Rho correlation between these replicates. BaseSpace RNA Express User Guide 7

Figure 3 Sample Correlation Matrix Control vs Comparison The control vs comparison plot shows an interactive scatter plot of the log2(fold Change) against the mean count for a gene. You can filter the results by the following metrics: } Test status: OK: test successful Low: low average expression across samples (mean normalized count across all samples less than 10) Outlier: a single (outlier) replicate strongly affects the result } Significance: Genes with a multiple-testing adjusted p-value (q-value) for differential expression of less than 0.05 } Gene: allows you to search for a particular gene in the plot and the gene table below the scatter plot The gene table below the scatter plot shows those metrics for individual gene results, in addition to the standard error of the Log2(Fold Change). If you click a gene, the corresponding dot is circled in the scatter plot. Likewise, if you click any of the dots in the scatter plot, the gene is highlighted in the gene table. The following additional metrics are reported in the gene table: } Std. err. log(fold Change): Standard error of the Fold Change estimate } q value: Multiple-testing adjusted p-value for differential expression (used for Significance filter) 8 15052918 Rev. A

Figure 4 Control vs Comparison Plot RNA Express App Output Analysis Info This app provides an overview of the analysis on the Analysis Info page. A brief description of the metrics is below. Row Name Application Date started Date completed Duration Session Type Size Status Definition Name of the app session. App that generated this analysis. Date the app session started. Date the app session completed. Duration of analysis. The number of nodes used. Total size of all output files. Status of the app session. Log Files Clicking the Log Files link at the bottom of the Analysis Info page provides access to RNA Express app log files. The following files log information to help follow data processing and debugging: } WorkflowLog.txt: Workflow standard output (contains details about workflow steps, command line calls with parameters, timing, and progress). } WorkflowError.txt: Workflow standard error output (contains errors messages created while running the workflow). BaseSpace RNA Express User Guide 9

} Logging.zip: Contains all detailed workflow log files for each step of the workflow. } IlluminaAppsService.log.copy: Wrapper log file containing information about communication (get and post requests) between BaseSpace and AWS. } CompletedJobInfo.xml: Contains information about the completed job. } SampleSheet.csv: Sample sheet. The following files contain additional information in case things (like mono) do not work as expected: } monoerr.txt: Wrapper mono call error output (contains anything that WorkflowError.txt does not catch; in most cases empty, except one line). } monoout.txt: Wrapper mono call standard output (contains command calling the workflow and anything that WorkflowLog.txt does not catch). NOTE For explanation about mono, see www.mono-project.com. RNA Express Status The status of the RNA Express app session can have the following values (in order): } Downloading data } Aligning } Post-alignment processing } Read counting } Differential expression analysis } Generating report } Finalizing results Depending on the size and number of samples, the complete analysis can take between a few hours and several days. RNA Express Inputs Overview The RNA Express app provides an overview of the input app results and settings on the Inputs page. A brief description of the metrics is below. Statistic Group Label Comparison Samples Control Samples Reference Genome Save Results To Stranded Trim TruSeq Adapters Definition The group label for the comparison or control group. Samples selected for comparison group. Samples selected for control group. Reference genome selected. The project that stores the app results. Indicates if samples were stranded. If selected, the application trims TruSeq adapters. RNA Express Output Files RNA Express produces the following output files in the indicated folders: 10 15052918 Rev. A

<AppResult>/differential/global } deseq.corr.pdf PDF file showing a heat map of the sample correlation matrix. } deseq.corr.png PNG file showing a heat map of the sample correlation matrix. } deseq.corr.csv CSV file describing the sample correlation matrix. } gene.counts.csv CSV file describing the number of reads mapped to each gene for each sample. <AppResult>/differential/<control>_vs_<comparison> } <control>_vs_<comparison>.deseq.ma.pdf PDF file showing a scatter plot of log2 (fold change) versus mean of normalized counts. This file is not available when DESeq2 fails to converge. } <control>_vs_<comparison>.deseq.counts.csv CSV file describing the number of reads mapped to each gene for each sample in the control and comparison groups. } <control>_vs_<comparison>.deseq.disp.pdf PDF file showing a scatter plot of dispersion versus mean of normalized counts. This file is not available when DESeq2 fails to converge. } <control>_vs_<comparison>.deseq.heat map.pdf PDF file showing a heat map of the expression of the differentially expressed genes with adjusted p-values < 0.05 for samples in the control and comparison groups. Only the top-5000 differentially expressed genes are used if there are more than 5000 differentially expressed genes. This file is not available when DESeq2 fails to converge or when there are no differentially expressed genes. } <control>_vs_<comparison>.deseq.res.csv CSV file describing the mean expression, log2 (fold change), standard error of log2 (fold change), p-value, adjusted p-value, and the expression status for each gene. RNA Express App Output <AppResult>/samples/<group>/replicates/<sample>/alignments } <sample>.alignments.sorted.bam Alignments of reads against the genome (and transcriptome). For description, see also BAM Files on page 11. } <sample>.coverage.bedgraph.gz Genome coverage with aligned RNA-Seq reads. <AppResult>/samples/<group>/replicates/<sample>/counts } <sample>.counts.genes Tab-delimited file describing the number of reads mapped to each gene. The last two lines in this file are not gene counts and should be removed. BAM Files The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mb) produced by different sequencing platforms. SAM is a text format file that is humanreadable. The Binary Alignment/Map (BAM) keeps the same information as SAM, but in a compressed, binary format that is only machine readable. If you use an app in BaseSpace that uses BAM files as input, the app locates the file when launched. If using BAM files in other tools, download the file to use it in the external tool. Go to samtools.sourceforge.net/sam1.pdf to see the exact SAM specification. BaseSpace RNA Express User Guide 11

RNA Express Workflow This chapter describes the workflow and modules that are used in the RNA Express app. 1 Alignment. Reads for each sample are aligned against the corresponding genome using the Spliced Transcripts Alignment to a Reference (STAR) software 1. STAR alignments are converted to BAM files in real time with samtools 2. There is no pre-treatment (trimming or filtering) of the FASTQ files. Instead a trim5 and trim3 option is passed to STAR, which does the trimming. In addition, STAR performs local alignment, allowing it to softclip read ends automatically (e.g. low quality or missed splice-junctions). STAR is run in a mode looking for novel junctions. After the initial alignment, RNA Express filters the junction list by confidence and retains only alignments across high confidence junctions. Only correctly paired alignments are reported for paired-end runs. 2 Post-Alignment. After alignment, the BAM files are sorted, indexed, and bedgraph coverage files are created using bedtools 3. Alignments to abundant sequences are determined from genomic alignments based on annotation of abundant regions of the genome. 3 Read Counting. Gene expression is estimated at the gene level by counting the number of aligned reads that overlap each gene present in the annotation. The counting strategy is similar to htseq-count in the union mode 4. Ambiguous reads, either aligning to more than one locus or to a locus overlapping multiple genes, are not counted. Only if both reads of a pair overlap exons with the same unique gene_id, is the read counted towards that gene. The counts are written to.csv files per sample. The counter also reports basic statistics (number of reads filtered, assigned, unassigned). 4 Global Expression. The raw read counts are used as input for differential expression analysis using R and DESeq2 5. The workflow writes an R script that loads all.csv files with read counts, generates a data frame from them and computes pairwise correlations. A Python script uses matplotlib to create a sample-to-sample correlation heat map. The correlations, a merged table with the read counts for all samples and the heat map are written to the output directory. 5 Pairwise Differential Expression. A new R script for the differential analysis is executed. This script loads the counts for all samples in this comparison and performs a pairwise differential expression analysis between them using DESeq2 (see online documentation for details of the model 5 ). The script filters low expressed genes (mean count across all samples less than 10) before testing to decrease the multiple testing burden. The DESeq2 variance model is used to detect outliers (based on extreme variation between replicates), which are also excluded. In the end, the status (filtered or passed) and the result of the analysis (mean expression, fold change, standard error, p-value, etc.) is reported for each gene. The script writes a table of raw counts across all replicates and plots a gene-level heat map sorted by hierarchical clustering. This heat map contains up to 5000 significantly differentially expressed genes, q < 0.05. 12 15052918 Rev. A

Figure 5 RNA Express Workflow RNA Express Workflow References 1 Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29 (1):15-21 2 SAMtools: samtools.sourceforge.net 3 Bedtools: bedtools.readthedocs.org 4 Htseq-count: www-huber.embl.de/users/anders/htseq/doc/count.html 5 DESeq2: www.bioconductor.org/packages/2.13/bioc/html/deseq2.html BaseSpace RNA Express User Guide 13

Notes

Technical Assistance For technical assistance, contact Illumina Technical Support. Table 1 Illumina General Contact Information Illumina Website Email www.illumina.com techsupport@illumina.com Table 2 Illumina Customer Support Telephone Numbers Region Contact Number Region Contact Number North America 1.800.809.4566 Italy 800.874909 Austria 0800.296575 Netherlands 0800.0223859 Belgium 0800.81102 Norway 800.16836 Denmark 80882346 Spain 900.812168 Finland 0800.918363 Sweden 020790181 France 0800.911850 Switzerland 0800.563118 Germany 0800.180.8994 United Kingdom 0800.917.0041 Ireland 1.800.812949 Other countries +44.1799.534000 Technical Assistance Safety Data Sheets Safety data sheets (SDSs) are available on the Illumina website at www.illumina.com/msds. Product Documentation Product documentation in PDF is available for download from the Illumina website. Go to www.illumina.com/support, select a product, then click Documentation & Literature. BaseSpace RNA Express User Guide

Illumina San Diego, California 92122 U.S.A. +1.800.809.ILMN (4566) +1.858.202.4566 (outside North America) techsupport@illumina.com www.illumina.com