Simplifying Data Interpretation with Nexus Copy Number



Similar documents
N E X U S C O P Y N U M B E R H O W T O G U I D E S

Interpret software. User guide. version 11

BlueFuse Multi Analysis Software for Molecular Cytogenetics

Agilent CytoGenomics Software A Complete Solution for Cytogenetic Research Data Analysis

CNV Univariate Analysis Tutorial

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

IGV Hands-on Exercise: UI basics and data integration

Delivering the power of the world s most successful genomics platform

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Release Notes. Agilent CytoGenomics v For Research Use Only. Not for use in diagnostic procedures. Product Number

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

LifeScope Genomic Analysis Software 2.5

Step by Step Guide to Importing Genetic Data into JMP Genomics

Next Generation Sequencing: Technology, Mapping, and Analysis

GenomeStudio Data Analysis Software

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

GeneChip Sequence Analysis Software (GSEQ) is used to analyze data from the Resequencing Arrays

GenomeStudio Data Analysis Software

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

Data Analysis for Ion Torrent Sequencing

Using an Access Database

GWASrap User Manual v1.1

Basic Analysis of Microarray Data

Core Facility Genomics

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Sequencing and microarrays for genome analysis: complementary rather than competing?

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

Analysis of ChIP-seq data in Galaxy

DNA Copy Number and Loss of Heterozygosity Analysis Algorithms

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

Overview of Genetic Testing and Screening

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Genomes and SNPs in Malaria and Sickle Cell Anemia

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

GAIA: Genomic Analysis of Important Aberrations

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

User Manual. Transcriptome Analysis Console (TAC) Software. For Research Use Only. Not for use in diagnostic procedures. P/N Rev.

Single Nucleotide Polymorphisms (SNPs)

GenBank, Entrez, & FASTA

Guide for Data Visualization and Analysis using ACSN

CLOCKWORK Training Manual and Reference: Inventory. TechnoPro Computer Solutions, Inc.

Custom TaqMan Assays For New SNP Genotyping and Gene Expression Assays. Design and Ordering Guide

SAP HANA Enabling Genome Analysis

Market Pricing Override

Release Information. Copyright. Limit of Liability. Trademarks. Customer Support

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015

Replacing TaqMan SNP Genotyping Assays that Fail Applied Biosystems Manufacturing Quality Control. Begin

Introduction to NGS data analysis

Notice. DNA Sequencing Module User Guide

Frequently Asked Questions Next Generation Sequencing

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Analysis of FFPE DNA Data in CNAG 2.0 A Manual

GeneSifter: Next Generation Data Management and Analysis for Next Generation Sequencing

Application for Splunk Enterprise

1. Contents What is AGITO Translate? Supported formats Translation memory & termbase Access, login and support...

ithenticate User Manual

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

QAD Usability Customization Demo

Module 3. Genome Browsing. Using Web Browsers to View Genome Annota4on. Kers4n Howe Wellcome Trust Sanger Ins4tute zfish-

Bioinformatics Resources at a Glance

Genotyping and quality control of UK Biobank, a large- scale, extensively phenotyped prospective resource

Exercises for the UCSC Genome Browser Introduction

Software version 1.1 Document version 1.0

MultiExperiment Viewer Quickstart Guide

Software Getting Started Guide

Disease gene identification with exome sequencing

USING THE UPSTREAM-CONNECT WEBSITE

Hierarchical Clustering Analysis

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

OWA - Outlook Web App

AS Replaces Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

SOP 3 v2: web-based selection of oligonucleotide primer trios for genotyping of human and mouse polymorphisms

ithenticate User Manual

Retrieving Chromatographic Data from the Database Using ChemStore and Security Pack Software

Finance Reporting. Millennium FAST. User Guide Version 4.0. Memorial University of Newfoundland. September 2013

The Human Genome Project

MassARRAY Typer 3.4 Software User s Guide for iplex and hme

Project Management Software

Calibration Control. Calibration Management Software. Tools for Management Systems

NATIONAL GENETICS REFERENCE LABORATORY (Manchester)

IGV User Guide. User Interface Main Window. This guide describes the Integrative Genomics Viewer (IGV).

ithenticate User Manual

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

SUPPLEMENTARY METHODS

Transcription:

Simplifying Data Interpretation with Nexus Copy Number A WHITE PAPER FROM BIODISCOVERY, INC. Rapid technological advancements, such as high-density acgh and SNP arrays as well as next-generation sequencing have enabled fine resolution scanning of the genome for identification of copy number changes as well as allelic event changes (such as Loss of Heterozygosity). This onslaught of data coupled with ever growing number of samples processed with these new technologies requires an effective software system that can help the user interpret the results in a reasonable amount of time and with good confidence in the interpretation accuracy. Here we will describe the Nexus Copy Number version 5 software that can drastically improve the interpretation process. Collecting, Organizing, and Searching Samples and Associated Annotations Measurements that can be used to estimate the copy number status of a sample can be generated using a variety of platforms and can be in different file formats as well as different levels of abstraction: Raw, Normalized log-ratio, or Segmented/Called. Nexus Copy Number provides direct import of data from all commercial array platforms and at various levels of processing, see Figure 1. In this paper we will use data generated from four different array platforms, Illumina HumanCytoSNP-12, Affymetrix SNP 6.0, Agilent 44K (EmArray Design), and Roche NimbleGene 3x720K Whole Genome arrays. Figure 1 Analysis workflow for Nexus Copy Number and how it can accommodate all platforms at different stages in the analysis pipeline

Nexus Copy Number uses the concept of a Project to refer to a collection of samples. A Nexus project can contain samples from different arrays and different resolutions. The only thing that is common between the samples is the genome (organism and build number). The user can batch load essentially an unlimited number of samples into a project with a single mouse click. Sample annotations, such as Gender, Ethnicity, Age, Phenotype, etc. can also be loaded with a single click importing a tab-delimited list of samples along with their annotations. Nexus Copy Number refers to such sample annotations as Factors. An unlimited number of Factors can be defined by the user with arbitrary text field values. Samples in a project can be organized and selected by sorting on factors of interest. For example, it is very simple to select all male samples with age greater than 50 in just two mouse clicks when the project has factors Age and Gender. There is a text based search tool available as well to locate any text string of interest. Figure 2 Searching all samples for a particular text string Interpreting data from a Single Sample Raw measurements or log-ratio data can be processed within Nexus Copy Number to identify regions of copy number change and if SNP data is available, allelic events such as LOH. Nexus Copy Number offers the following algorithms for making the calls: - Rank Segmentation: A robust variation of the well-known Circular Binary Segmentation (CBS) algorithm where the probe ranks are used to minimize the effect of outliers and drastically improve performance. - SNPRank Segmentation: An extension of the Rank Segmentation algorithm where B-Allele Frequency values are also included in the segmentation process generating both copy number and allelic event calls. - FASST Segmentation: A novel Hidden Markov Model (HMM) based approach that unlike other HMM methods does not aim to estimate the copy number state at each probe but uses many states to cover possibilities, such as mosaic events, and then make calls based on a second level threshold - SNP-FASST Segmentation: An extension of the FASST algorithm but adding many more states to cover events related to the B-Allele Frequency values to make copy number and allelic event calls. Email: info@biodiscovery.com Web: www.biodiscovery.com Page 2

Since Nexus has been designed to offer maximum flexibility, it is easy to allow Nexus to execute a different segmentation algorithm developed in R or any other programming language. Additionally, Nexus allows the user to import copy number result files where the segmentation and/or calling of the regions has already been done by another software package. Once the data is loaded and processed, the user can simply select one or more samples to review. The Sample Drill-Down window provides all the necessary information needed to examine the selected sample. The Overview tab provides a quick graphical view of all the aberration where single red bars mark areas of heterozygous deletion and double red bars indicate homozygous deletion. Single green bars indicate one copy gain and double green bars indicate multi-copy amplification, see figure 3. Figure 3 Overview of single sample showing all the chromosomes. All aberrations are marked by green or red marks. Here, the loss on chr 15 is related to Angelman Syndrome By clicking on an ideogram the user is directed to the detailed view of the selected chromosome. In this case if we click on Chromosome 15, we can see the detailed annotation of this region as well as the probe level data as depicted in figure 4 below. The annotation tracks selected here include the genes, exons, known CNVs from the Database of Genomic Variants (DGV) in Toronto, mirna locations, DECHIPER database information, Gene Association Database, and known segmental duplications. It is simple to add or take out tracks with a single click of the mouse. Nexus Copy Number version 5 also allows direct import of UCSC based BED files for even simpler annotation track loading. Tel: 310-414-8100 Fax: 310-414-8111 Page 3

Figure 4 A view of Chromosome 15 of this sample clearly indicating several loss and gain regions. Additional tracks showing information from web based databases is provided for quick reference. All the annotation tracks are active and provide additional information or are hyperlinked to a web-based resource. For example, clicking on a magenta colored region of known CNVs provides a list of all reported events at that location in the DGV, as shown in figure 5 below. Hyperlinks are then provided to query the region in DGV or the publication from PubMed. Email: info@biodiscovery.com Web: www.biodiscovery.com Page 4

Figure 5 A drill-down table showing all reported CNVs at a particular location on chromosome 15 As part of the interpretation process the user might look at an area of aberration and query various web resources for information about the genes in that area by right-clicking on a gene, as shown in figure 6. The user can also get immediate information about all the annotations on the screen with a single click of the drill-down tool. Figure 6 A close-up view of a region on chromosome 15. At this level gene names are clearly visible and are hyperlinked to various sources which can be customized by the user. Tel: 310-414-8100 Fax: 310-414-8111 Page 5

Nexus Copy Number also offers a powerful report generation tool for each sample. The report contains the following information: Chromosomal location Event (gain, loss, etc.) Region length Number of genes in region Number of probes in region % of overlap with known CNV In addition to the above fields, the user can customize the report to include flanking probe IDs, known syndromes in regions, or any other genomic based annotation that is desired. The user can select which regions to exclude from the report using a simple check box. Providing even more flexibility, the user can right-click on any region in the report and go to that region in a web-based browser (e.g. Ensemble or UCSC) or review the region back in Nexus. Another unique feature is the ability to query the selected region in the Nexus project or in Nexus DB to find any samples with similar events, as shown in figure 7. Figure 7 The Report tab of a single sample showing regions of aberration. A single region is selected and is queried in all other samples in the project. The query result window is shown Email: info@biodiscovery.com Web: www.biodiscovery.com Page 6

Searching for Samples with Common Aberrations The new high-density array platforms are allowing users to detect ever smaller aberrations. Although this is potentially useful, it also creates a challenge during data interpretation. Nexus provides a number of features to help in this process. First, the % CNV overlap allows the user to sort and easily locate the regions that have not been reported in public repositories as being polymorphic in the normal population. Second, Nexus provides a number of data filters to remove aberrations that are smaller than a specified size from the review process. And probably the most useful and unique feature is to provide various ways of searching for common events across multiple arrays. This process can be done at two different levels. One is to search all samples in a given project. Here the project can hold thousands of samples processed at a particular location. Second is by searching the powerful Nexus DB internet based repository. We will describe both options in order below. There are two simple ways to search a project for a genomic event. We can use an example to illustrate these methods. Here we use a project that has 57 samples from various acgh and SNP array platforms. While examining a sample we notice a small deletion at the gene FHIT loci. Using the query tool shown in Figure 7 above we can identify all samples in the project that have a loss in this region and we can see the various sample annotations (e.g. phenotype, gender, etc.). We can also select only these samples for further analysis with a single click. Another very useful tool in Nexus Copy Number is the Sort tool. Using this feature we can just point to an area on the genome and all samples with the selected aberration are moved to the top of the screen with the smallest event on top. Figure 8 shows the samples all having a loss at the FHIT loci. We can select to color code each sample based on a selected factor. Here we chose the phenotype and see that all three samples with the deletion are cancer samples of various types. Tel: 310-414-8100 Fax: 310-414-8111 Page 7

Figure 8 Sorting samples based on a loss around the FHIT loci. Samples are color coded based on phenotype Sharing Results with Collaborators Nexus Copy Number provides two simple but powerful methods to share profiles and findings with collaborators or the public. The first method allows the user to set a frequency threshold and ask the software to identify all aberrations that are present in higher than a set frequency threshold. The user can then create a BEDGRAF file with a single click. This file can be used by the UCSC browser to display the gain and loss frequency. For example a user having 1000 samples of normal European samples can use this function to create a frequency plot for all events higher than 1% (10 or more samples in the project) and post this at UCSC for all to use or only share it with colleagues to limit its distribution. The second mechanism is through the use of Nexus DB. The user can select with a click of a button to have his project be accessible by anyone using Nexus DB or by a specific group of. Any data that is shared can then be visible in queries or downloaded for detailed analysis by other users. Conclusion We have outlined here how Nexus Copy Number version 5 software can be a powerful resource in the process of understanding and interpreting results from high-density array based copy number measurement platforms. Nexus provides all the features necessary to make the process as efficient as possible. Email: info@biodiscovery.com Web: www.biodiscovery.com Page 8