Integrative Analysis of Genomic Copy Number. Cancer.

Similar documents
Simplifying Data Interpretation with Nexus Copy Number

Core Facility Genomics

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

CNV Univariate Analysis Tutorial

DNA Copy Number and Loss of Heterozygosity Analysis Algorithms

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

PREDA S4-classes. Francesco Ferrari October 13, 2015

Agilent CytoGenomics Software A Complete Solution for Cytogenetic Research Data Analysis

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Step by Step Guide to Importing Genetic Data into JMP Genomics

GenomeStudio Data Analysis Software

GenomeStudio Data Analysis Software

Data Analysis for Ion Torrent Sequencing

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Tutorial for proteome data analysis using the Perseus software platform

Overview of Genetic Testing and Screening

SNPbrowser Software v3.5

Combining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan

Microarray Data Analysis. A step by step analysis using BRB-Array Tools

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis

Breast cancer and the role of low penetrance alleles: a focus on ATM gene

Contents. molecular biology techniques. - Mutations in Factor II. - Mutations in MTHFR gene. - Breast cencer genes. - p53 and breast cancer

Frequently Asked Questions Next Generation Sequencing

Replacing TaqMan SNP Genotyping Assays that Fail Applied Biosystems Manufacturing Quality Control. Begin

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Interpret software. User guide. version 11

MUTATION, DNA REPAIR AND CANCER

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study

Analysis of FFPE DNA Data in CNAG 2.0 A Manual

GAIA: Genomic Analysis of Important Aberrations

How many of you have checked out the web site on protein-dna interactions?

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Name: Class: Date: ID: A

Overview of Next Generation Sequencing platform technologies

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Basic Analysis of Microarray Data

Analysis of ChIP-seq data in Galaxy

Step-by-Step Guide to Basic Expression Analysis and Normalization

Quality Assessment of Exon and Gene Arrays

CHAPTER 2: UNDERSTANDING CANCER

LESSON 3.5 WORKBOOK. How do cancer cells evolve? Workbook Lesson 3.5

Genomes and SNPs in Malaria and Sickle Cell Anemia

Partek Methylation User Guide

Human Genome Organization: An Update. Genome Organization: An Update

Roberto Ciccone, Orsetta Zuffardi Università di Pavia

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

micrornas Non protein coding, endogenous RNAs of 21-22nt length Evolutionarily conserved

Differential privacy in health care analytics and medical research An interactive tutorial

TruSeq Custom Amplicon v1.5

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

School of Nursing. Presented by Yvette Conley, PhD

Supervised and unsupervised learning - 1

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

Factors for success in big data science

What is Cancer? Cancer is a genetic disease: Cancer typically involves a change in gene expression/function:

Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes

Lecture 3: Mutations

BioBoot Camp Genetics

Genotyping and quality control of UK Biobank, a large- scale, extensively phenotyped prospective resource

CCR Biology - Chapter 9 Practice Test - Summer 2012

Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation

Collaborative Association Study of Psoriasis. Gonçalo Abecasis, Anne Bowcock, James Elder, Jerry Krueger

Autoimmunity and immunemediated. FOCiS. Lecture outline

Wissenschaftliche Highlights der GSF 2007

Online Supplement to Polygenic Influence on Educational Attainment. Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using

Bio EOC Topics for Cell Reproduction: Bio EOC Questions for Cell Reproduction:

Package cgdsr. August 27, 2015

European Medicines Agency

NATIONAL GENETICS REFERENCE LABORATORY (Manchester)

GWAS Data Cleaning. GENEVA Coordinating Center Department of Biostatistics University of Washington. January 13, 2016.

Information leaflet. Centrum voor Medische Genetica. Version 1/ Design by Ben Caljon, UZ Brussel. Universitair Ziekenhuis Brussel

DeCyder Extended Data Analysis module Version 1.0

GSR Microarrays Project Management System

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

GeneChip Sequence Analysis Software (GSEQ) is used to analyze data from the Resequencing Arrays

Consistent Assay Performance Across Universal Arrays and Scanners

Current Motif Discovery Tools and their Limitations

Genetics Lecture Notes Lectures 1 2

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

A Primer of Genome Science THIRD

Guide for Data Visualization and Analysis using ACSN

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Psychoonkology, Sept lifestyle factors and epigenetics

Release Notes. Agilent CytoGenomics v For Research Use Only. Not for use in diagnostic procedures. Product Number

Tutorial on gplink. PLINK tutorial, December 2006; Shaun Purcell,

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Single Nucleotide Polymorphisms (SNPs)

SAP HANA Enabling Genome Analysis

Cluster software and Java TreeView

1 Mutation and Genetic Change

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Transcription:

Integrative Analysis of Genomic Copy Number and Gene Expression Data in Metastatic Prostate Cancer. Elise Chang Agilent Technologies Elise_chang@agilent.com

Agenda Introduction Features of Copy Number Workflow SNPs.. SNPs.. Case study- Integrative Analysis CNVs.. of Genomic copy number CNVs.. and Gene Expression Data in Metastatic Prostate Cancer CNPs CNPs CNVRs.. CNVRs..

Copy Number Variation- Understanding the Relevance to Human Diseases Copy number variation (CNV): DNA segments in which copy-number varies between two or more genomes Ranges from 1 Kb to millions of DNA bases in size CNVs have been associated with susceptibility to disease, complex behavioral traits, and other phenotypic variability Identifying significant CNVs is important in understanding the underlying mechanism of disease and disease susceptibility

Supported Array Platforms Affymetrix: 100K (50K Xba, 50K Hind) 500K (250K Nsp, 250K Sty) SNP 5.0 SNP 6.0 Illumina: GenomeStudio outputs for all SNP/CNV arrays GeneSpring GX plugin for GenomeStudio used to export data in format GeneSpring GX will support (plug-in located in: INSTALLDIR\app\Illumina\GX.Genotyping.Export.dll to Genomestudio\modules \ BSGT \ ReportPlugins\) -Instructions for installation are in section 26.4.1 of the manual.

Supported Arrays Affymetrix Technology available on Agilent server. Experiment creation involves importing the CEL files, summarization and normalization GX11 computes log ratio, CN and LOH GX11 uses the CN values to get ASCN, PSCN and to run GISTIC Illumina Technology created on the fly. Experiment creation involves import from GenomeStudio Log ratios, CN values and LOH are imported from GenomeStudio GX11 uses the CN values to get ASCN, PSCN and to run GISTIC

Experimental Designs Identification of variation requires comparison to either a reference DNA source, a reference dataset or a reference genome sequence. This is important for Affymetrix experiment creation 1. Analysis against a reference: The control is generated from a pool of individuals. All the test samples are then compared against a common, pooled control, also known as reference. HapMap samples are packaged as Standard Reference Custom Reference can be created 2. Paired Analysis: Control and the test DNA are from the same individual Pairing is defined during experiment grouping

Custom Reference Creation Menu: Tools> Create Custom Reference Typically need 30-40 reference samples for accurate genotype calls on non-reference Once Custom Reference is created, it will be saved for future experiment creation

Reference Creation References contain: Averaged summarised intensities for probe sets from PLIER For Affymetrix 50/100K Set Statistics from BRLMM For 250/500K Set and SNP5.0 Affymetrix arrays Statistics from BirdSeed Algorithm Clusters from BirdSeed Algorithm (and median and s.d. of clusters) For SNP6.0 Affymetrix arrays Statistics from BirdSeed Algorithm Clusters from BirdSeed Algorithm (and median and s.d. of clusters) Clusters from CANARY (and median and s.d. of clusters)

Experimental Set-up for Paired Normal Design For paired-normal experimental designs, two parameters must be specified Group indicates a set of paired samples Condition indicates which sample(s) to use as reference (Normal) for test sample(s) (Tumor) Parameters must be Group and Condition for GeneSpring GX to recognize it as a paired design Interpretation using Group and Condition must be used for Copy Number Computation

Copy Number Analysis Workflow in GeneSpring GX 11 QC / Batch Correction Copy NumberAnalysis: (CN, LOH, ASCN, Log ratio) GISTIC for Identification of Statistically Common CN variation within a set of samples Filter for Regions of Interest Biological Contextualization of Genes in Regions of interest * QC/Batch correction step is not available for Illumina workflow

Quality Control on Samples This window should look familiar to current GeneSpringGX users.

Quality Control Tools - PCA and Batch Effect Quality Control PCA- -identifies potential sample outliers Batch Effect -identifies and corrects for systematic error when different samples are processed on different days or different conditions.

Batch Correction Select interpretation that groups samples into their respective batches Minimum samples per batch Minimum m number of samples per batch to be considered for correction P-value T-test p-value cutoff for each probe Percentage of bad batches allowed If percent bad batches below userspecified value, do not perform correction for probe Each batch is T-tested against a pool of all remaining batches. Correction for each flagged entity is Correction for each flagged entity is performed using a reference batch.

Copy Number Computation Copy NumberAnalysis: (CN, LOH, ASCN, Log ratio, LOD score)

Copy Number Analysis for Affymetrix Data Computation actually computing: (1) Log ratio values Against Reference design: Normalized intensity of sample/ Normalized intensity of reference Paired design: Normalized intensity of Case/ Normalized intensity of Control (2) Genomic Copy Number Circular Binary Segmentation to identify segments Log ratio values to estimate genomic copy number Confidence value give as log10 of p-value (3) Allele-specific copy number (ascn) information Fawkes algorithm used to assign allele-specific copy number using SNP probes (4) Parent-specific copy number (pscn) information (5) Loss of Heterozygosity (LOH) Hidden Markov Model (HMM) used to calculate LOH score

Log Ratio and Copy Number Computation Copy Number computation (paired or against reference) is determined by the interpretation selected: First Log 2 ratios are calculated for every probe: Against Reference design: Normalized intensity of sample/ Normalized intensity of reference Paired design: Normalized intensity of Case/ Normalized intensity of Control

Copy Number Computation Circular Binary Segmentation Smooths outliers Finds change points in each sample using a statistic to identify a segment break Validation of change point using t-test test with p value cut off < 0.002002 Outputs are segment break points and mean log ratio for segment Segment Break Points

Copy Number Computation Once segments are identified by CBS then copy numbers and confidence scores need to be assigned to them Copy Number: HapMap dataset is used to generate a median map Using the birdseed and CANARY outputs for each possible copy number (0,1,2,3,4) the median and s.d log ratios across all probes is calculated Log ratios for segments from CBS are compared to the median map and copy numbers are assigned Homozygous and Hemizygous deletions are given values of 0 and1 Amplifications are given CN values of 3 and 4. Copy Number Confidence: Copy Numbers between 1.5 and 2.5 are assigned a p value of '1' For any other copy number a T test t against zero of log ratios is performed with multiples l testing ti correction Negative logarithm to the base 10 of the final p value reported as confidence.

Copy Number Computation Median Map Copy Number Assigned Genome- Wide Human SNP Array 6.0 Genome-Wide Human SNP Array 5.0 Mean Log Ratio that is mapped Human Mapping 500K Array Set - NSP Human Mapping 500K Array Set - STY Mapping 100k array set 4.0 0.5531951 0.54314524 05 5 0.5104986 0 0.54314524 05 5 Same as 3.5 0.43365917 0.4216105 0.39650044 0.39650044 Genome Wide Human SNP 3.0 0.31824413 0.30864272 0.26924038 0.28693026 Array 6.0 2.5 0.16928099 0.16363965 0.13422728 0.15135522 2.0 0.0 0.0 0.0 0.0 1.5-0.22511256-0.2103804-0.18339391-0.18339391 1.0-0.48062363-0.44733366-0.36318222-0.36318222 0.5-0.73515093-0.68273795-0.57555604-0.57555604 0.0-1.4098581-1.2451344-0.9485139 0.9485139

Copy Number Analysis Log ratios are smoothed to give CN values. CN segments are created using Circular Binary Segmentation (CBS) algorithm. CN values log ratios F ti l ll di t CN l i d i Fractional as well as discrete CN values are assigned, in the range of 0-4

1. Paired Analysis CN computation Condition-Type Interpretation 2. Each tumor is paired against the Normal of its group 3. All Normals are compared against the reference All samples against reference comparison Only one set of CN Analysis results can be stored.

Allele-specific Copy Number Given segment with copy number = 3, which allele was duplicated? Example output: AAB = A2: B1

Parent-specific Copy Number Consider a section of a Chromosome with haplotypes: ChrCopy1: A 1B 2A 3B 4B 5 B (after duplication): A 1B 2A 3B 4B 5 B A 1B 2A 3B 4B 5 B ChrCopy2: A 1 A 2 B 3 A 4 B 5 Suppose Copy1 gets duplicated 2 additional times (CN of region =4), the ascn become: A 1 :4 B 1 :0 and pscn = 4-0 A 2 :1 B 2 :3 and pscn = 3-1 A 3 :3 B 3 :1 and pscn = 3-1 A 4 :1 B 4 :3 and pscn = 3-1 A 5 :0 B 5 :4 and pscn = 4-0 PSCN is a measure of allelic imbalance

Copy Number Computation for Illumina Arrays Copy Number, Log ratio, and LOH scores calculated in GenomeStudio and imported into GeneSpring GX The following are computed in GeneSpring GX: ASCN information PSCN information

Analysis and Filtering Once you have identified regions of genomic alteration in individual sample how can you find meaningful events in groups of samples? Find Common Genomic Variant Regions Filter By Regions Identify Copy Neutral LOH Filter By PSCN

Finding Common Genomic Variant Regions Across asetofsamples Samples Genomic Identification of Significant ifi Targets in Cancer (GISTIC)

Find Common Genomic Variant Regions Many tumour samples have large numbers of chromosomal abberations. GISTIC was developed to try and distinguish meaningful or driver mutation events from random background somatic or passenger events Driver mutations are functionally important events which confer advantageous biological properties to the tumour allowing it to initiate grow or persist and are more likely to drive cancer pathogenesis GISTIC can also be applied to non cancer datasets where you want to find common genomic variant regions

Common Genomic Variant Regions Choose Fine or Coarse Mode Amplified Regions Deleted Regions

Common Variation Results Once GISTIC has identified aberrant regions it uses the biological genome to find overlapping genes for amplified and deleted segments For each probeset within the region, the upstream and downstream 1000 bases are scanned and the genes are identified G l i th Genes overlapping the significant regions identified and stored in the Project Navigator

Use of Filters to identify genomic landscape prevalent in metastatic prostate cancer

Results Analysis 31 Confidentialit March

Biological Contextualization of Copy Number Data 32 Confidentialit March

Case Study

Integrative Analysis of Metastatic Prostate Cancer Prostate Cancer is the most common cancer in men. Primary tumors are thought to be composed of multiple genetically distinct cancer cell clones. Both the primary and the metastatic prostate cancers are p y p heterogenous in nature, posing therapeutic challenges.

Datasets Used Expression: GSE6919 24 metastatic samples from 4 patients and 18 normal samples Genomic Copy Number: GSE14996 58 metastatic locations from 14 patients and 16 subject paired non-cancerous samples Liu et al, Nat Med. 2009. May;15(5):559-65

Copy Number Analysis in Prostate Cancer Samples 36 Confidentialit March

Expression Analysis in Prostate Cancer Samples 37 Confidentialit March

PCA- Genotyping Data Shape by Condition: Tumor Normal Color by Patient Color by Patient Group

PCA- Expression Data Normal Metastatic QC using PCA shows separation of the Normal and the Metastatic samples of GSE6919

Histogram view of data tracks in Genome Browser showing deletions as green blocks and amplifications as red dblocks Published data Chr. 6 Deletion- Pateint #17 Chromosome 6 Validated d in GX11

Joint Analysis of Gene Expression and Genomic Copy Number Data in Metastatic Prostate Cancer Copy Number Gene Expression Prostate Cancer Studies Controlled for regions and metastatic tissues 41 Confidentialit March

Deletions present in chr.6 of patient 17: An Integrative Analysis

Analysis workflow Expression: Genotyping: T-test Standard Reference FC 2.0 p-value: 0.05 Differentially expressed 441 entities Copy Number computation Filters Genome Browser

Deletion of PLAGL1 2.15 Fold Downregulation of PLAGL1 in Metastasis Data xpression Ex Genomic Data

PLAGL1 Candidate Tumor suppressor gene, with anti-proliferative activities Zinc finger protein with transactivation and DNA binding activity Presence of splice variants which allow differential regulation of apoptosis induction and cell cycle arrest Frequently deleted in many solid tumors-breast, ovarian and renal cell carcinomas Also known as LOT or Lost On Transformation

PLAG1-network analysis

First order expansion of PLAG1 network and overlay with FC data

TCF21 Genomic Data Expression Data TCF21 TCF21 CN=2 No genomic aberration of TCF21 Down regulation of Down-regulation of expression levels of TCF21

TCF21 First Order Expansion of the PLAGL1 network identified TCF21, a ts gene, to be down regulated in the expression analysis. The CN of TCF21 remains at 2, unlike that of PLAGL1. TCF21 is known to be frequently silenced epigenetically in head and neck cancer. Consistent with this, TCF21 did not show any deletion in the samples examined, raising the possibility that TFC21 could be epigenetically pg regulated in prostate cancer.

Conclusions 1. Using GX11, we could validate the presence of ERG- TMPRSS2 in several of metastatic prostate cancer samples 2. Significant Aberration found in PTEN, FGF18, TRIB3 by GISTIC indicates that these could be driver mutations of prostate cancer. 3. Additional candidates were identified by combined use of filters to identify amplified regions and regions of allelic imbalance. 4. Integrative ti analysis using expression and genotyping data has identified PLAGL1, a candidate ts gene, and TCF21, a ts gene, to be having a possible role in prostate cancer. 5. PLAGL1 deletion, though present in a small percentage of population, is an early event, occurring at a pre-metastatic stage