Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Similar documents

Microarray Data Analysis Workshop. Custom arrays and Probe design Probe design in a pangenomic world. Carsten Friis. MedVetNet Workshop, DTU 2008

Core Facility Genomics

GAIA: Genomic Analysis of Important Aberrations

PREDA S4-classes. Francesco Ferrari October 13, 2015

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

How many of you have checked out the web site on protein-dna interactions?

Gene Expression Analysis

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

CHROMOSOMES Dr. Fern Tsien, Dept. of Genetics, LSUHSC, NO, LA

The following chapter is called "Preimplantation Genetic Diagnosis (PGD)".

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Simplifying Data Interpretation with Nexus Copy Number

restriction enzymes 350 Home R. Ward: Spring 2001

Hierarchical Bayesian Modeling of the HIV Response to Therapy

An unsupervised fuzzy ensemble algorithmic scheme for gene expression data analysis

LESSON 3.5 WORKBOOK. How do cancer cells evolve? Workbook Lesson 3.5

Package empiricalfdr.deseq2

Personalized Predictive Medicine and Genomic Clinical Trials

Recombinant DNA and Biotechnology

Fact Sheet 14 EPIGENETICS

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial

NATIONAL GENETICS REFERENCE LABORATORY (Manchester)

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Exploratory data analysis for microarray data

Microarray Technology

Arabidopsis. A Practical Approach. Edited by ZOE A. WILSON Plant Science Division, School of Biological Sciences, University of Nottingham

Vaxign Reverse Vaccinology Software Demo Introduction Zhuoshuang Allen Xiang, Yongqun Oliver He

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics

Next Generation Sequencing: Technology, Mapping, and Analysis

Gene Enrichment Analysis

The Human Genome Project

Gene expression analysis. Ulf Leser and Karin Zimmermann

micrornas Non protein coding, endogenous RNAs of 21-22nt length Evolutionarily conserved

TECHNOLOGIES, PRODUCTS & SERVICES for MOLECULAR DIAGNOSTICS, MDx ABA 298

Tutorial for proteome data analysis using the Perseus software platform

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

Analysis of Illumina Gene Expression Microarray Data

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Overview of Genetic Testing and Screening

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Introduction To Real Time Quantitative PCR (qpcr)

Microarray Data Analysis. A step by step analysis using BRB-Array Tools

DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

Gene Expression Assays

A Primer of Genome Science THIRD

BAPS: Bayesian Analysis of Population Structure

The genetic screening of preimplantation embryos by comparative genomic hybridisation

CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Genetic Algorithms commonly used selection, replacement, and variation operators Fernando Lobo University of Algarve

Array Comparative Genomic Hybridisation (CGH)

Human Genome Organization: An Update. Genome Organization: An Update

DNA Copy Number and Loss of Heterozygosity Analysis Algorithms

CCR Biology - Chapter 9 Practice Test - Summer 2012

LECTURE 6 Gene Mutation (Chapter )

Factors for success in big data science

Genetics 301 Sample Final Examination Spring 2003

European Medicines Agency

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -

MUTATION, DNA REPAIR AND CANCER

A and B are not absolutely linked. They could be far enough apart on the chromosome that they assort independently.

Investigating the genetic basis for intelligence

Quality Assessment of Exon and Gene Arrays

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

1 Mutation and Genetic Change

Genetic Technology. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study

Measuring gene expression (Microarrays) Ulf Leser

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

MeDIP-chip service report

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Genomic instability in cancers and cancer predispositions. Popova Tatiana Inserm U830 Institut Curie

Heuristics for the Sorting by Length-Weighted Inversions Problem on Signed Permutations

Lesson 3 Reading Material: Oncogenes and Tumor Suppressor Genes

Supplementary Information

Circular binary segmentation for the analysis of array-based DNA copy number data

Basic Analysis of Microarray Data

Analysis of the DNA Methylation Patterns at the BRCA1 CpG Island

Step-by-Step Guide to Basic Expression Analysis and Normalization

Current Motif Discovery Tools and their Limitations

DNA Mapping/Alignment. Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky

Interpret software. User guide. version 11

Bioinformatics Resources at a Glance

IGV Hands-on Exercise: UI basics and data integration

Step by Step Guide to Importing Genetic Data into JMP Genomics

School of Nursing. Presented by Yvette Conley, PhD

Statistical issues in the analysis of microarray data

acghviewer: A Generic Visualization Tool For acgh data

AP Biology Essential Knowledge Student Diagnostic

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Recombinant DNA & Genetic Engineering. Tools for Genetic Manipulation

Consistent Assay Performance Across Universal Arrays and Scanners

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Module 1. Sequence Formats and Retrieval. Charles Steward

Cancer Genomics: What Does It Mean for You?

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Transcription:

Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from H. Willenbrock) Media glna tnra GlnA TnrA C2 glnr C3 C5 C6 K GlnR C1 C4 C7

Outline Introduction to comparative genomic hybridization (CGH) and array CGH Data analysis approaches Breakpoint detection Loss and gain analysis Real data example: Comparative genomic profiling of bacterial strains

Outline Introduction to comparative genomic hybridization (CGH) and array CGH Data analysis approaches Breakpoint detection Loss and gain analysis Real data example: Comparative genomic profiling of bacterial strains

Comparative Genomic Hybridization Study types : Gain or loss of genetic material To find variations in the genetic material Purposes: Study of chromosomal aberrations often found in cancer and developmental abnormalities Study of variations in the baseline sequence in a microbial population (microbial comparative genomics) 4

Genetic Alterations and Disease A Variety of Genetic Alterations Underlie Developmental Abnormalities and Disease Inappropriate gene activation or inactivation can be caused by: Mutation Epigenetic gene silencing (e.g. addition of methyl groups) Reciprocal translocation (exchange of fragments between two nonhomologous chromosomes) Gain or loss of genetic material Any of the above may lead to an oncogene activation or to inactivation of a tumor suppressor

Detecting structural abnormalities Albertson and Pinkel, Human Molecular Genetics, 2003

Microarrays for copy number analysis BAC arrays Affymetrix SNP chip (500 K) Representational oligonucleotide microarray analysis (ROMA) Whole genome tiling arrays Own design (NimbleGen/NimbleExpress)

Array CGH Array CGH Maps DNA Copy Number Alterations to Positions in the Genome Test Genomic DNA Reference Genomic DNA Cot-1 DNA Gain of DNA copies in tumor Loss of DNA copies in tumor Ratio Position on Sequence

Structural abnormalities * *HSR: homogeneously staining region Albertson and Pinkel, Human Molecular Genetics, 2003

Advantages over Expression Arrays Hybridization of DNA to microarray (DNA is much more stable) Little normalization is necessary Use of spatial coherence in the analysis Only 1 sample is necessary to draw conclusions it is still necessary with biological replicates to be able to draw general conclusions regarding a certain biological subtype Results may be easier interpretable and correlated with sample phenotypes e.g. loss of oncogene repressor -> certain cancer subtype

Outline Introduction to comparative genomic hybridization (CGH) and array CGH Data analysis approaches Breakpoint detection Loss and gain analysis Real data example: Comparative genomic profiling of bacterial strains

Analysis of array CGH Goal: To partition the clones into sets with the same copy number and to characterize the genomic segments in terms of copy number. Biological model: genomic rearrangements lead to gains or losses Sizable contiguous parts of the genome, possibly spanning entire chromosomes Or, alternatively, to focal high-level amplifications

Copy Number Profiles of a Tumor

Varying genomic complexity Breakpoints

Observed clone value and spatial coherence Useful to make use of the physical dependence of the nearby clones, which translates into copy number dependence N(-.3,.08^2) N(.6,.1^2)??

Expected log 2 ratio A function of copy number change, normal cell contamination and ploidy Reference ploidy=2 2.58 100% Reference ploidy=3 50% 2.0 0.58 0.07 0.58 10% 0.0 0.42 0.38

Simulation of Array CGH Data Real biological variation considered: Breast cancer data used as model data Segment length and copy number is taken from the empirical distribution observed in breast cancer data (DNAcopy segmentation). Mixture of cells (sample is not pure) Each sample was assigned a value, P t : proportion of tumor cells, between 0.3 and 0.7 from a uniform distribution. Experimental noise is Gaussian Standard deviations drawn from a uniform distribution between 0.1 and 0.2 to imitate real data where the noise may vary between experiments. Cancer subtypes are heterogeneous Certain aberrations characteristic for a cancer subtype may only exist in a percentage of the patients with that cancer subtype. Thus, in each sample, segments with copy number alterations (copy number not 2) was removed at random with probability 30%. Willenbrock and Fridlyand; Bioinformatics 2005

Comparison Scheme Use of simulated data, where the truth is known and the noise is controlled True breakpoint false predicted breakpoint

Methods for Segmentation HMM: Hidden Markov Model (acgh package) Fit HMMs in which any state is reachable from any other state (Fridlyand et al, JMVA, 2004). CBS: Circular binary segmentation (DNAcopy package) Tertiary splits of the chromosomes into contiguous regions of equal copy number and assesses significance of the proposed splits by using a permutation reference distribution (Olshen et al, Biostatistics, 2004). GLAD: Gain and Loss Analysis of DNA (GLAD package) Detects chromosomal breakpoints by estimating a piecewise constant function that is based on adaptive weights smoothing (Hupe et al, Bioinformatics, 2004). Willenbrock and Fridlyand; Bioinformatics 2005

Breakpoint Detection Accuracy

Conclusions so far Signal2noise: CBS consistently the best performance HMM has the highest FDR GLAD is least sensitive

Outline Introduction to comparative genomic hybridization (CGH) and array CGH Data analysis approaches Breakpoint detection Loss and gain analysis Application of segmentation to testing Real data example: Comparative genomic profiling of bacterial strains

Merging segments Note: that all procedures operate on individual chromosomes, therefore resulting in a large number of segments with mean values close to each other Additional Challenge: reduce number of segments by merging the ones that are likely to correspond to the same copy number This will facilitate inference of altered regions

Merging For estimating actual copy number levels from segmentations

Segmentation and Merging

ROC Curves Identification of copy number alterations for varying thresholds

Using segmentation for testing (phenotype association studies) Case: Find clones (or whole segments) that are significantly differing in copy number between two cancer subtypes. Task: Investigate whether incorporating spatial information (segmentation) into testing for differential copy number increases detection power. Data type: Samples with either of 2 different phenotypes (e.g. 2 different cancer subtypes) How: Comparison of sensitivity and specificity using: 1. Original test statistic (no use of spatial information) 2. Segmented T-statistic derived from original log 2 ratios 3. T-statistic computed from segmented log 2 ratios 27

Testing samples (original values) Red: True different clones 28

Correction for multiple testing? standard p-value cutoff for alpha=0.05 => Many false positives 29

The maxt Multiple Testing Correction By repeating random class assigningment and testing, e.g. 100 times, the following permutation reference distribution of maximum absolute test statistic is obtained (maxt distribution): We wish to control the family wise error rate (FWER) at alpha=0.05 (5% chance of 1 false positive). Therefore, the cut-off should be such that only in 5% of the random cases, we will get one false positive (95 percentile): cutoff = 5 standard significance threshold MaxT multiple testing corrected threshold 30

Testing samples (original values) maxt p- value cutoff for alpha = 0.05 standard p-value cutoff for alpha=0.05 31

Testing: Segmenting test statistics Reference 32

Testing segmented samples...... 1. Segmentation of individual samples... 33

Testing segmented samples Reference 2. T-statistic from segmented individual samples... 34

Detecting regions with differential copy number Willenbrock and Fridlyand. Bioinformatics 2005; 21(22): 4084-91 35

Outline Introduction to comparative genomic hybridization (CGH) and array CGH Data analysis approaches Breakpoint detection Loss and gain analysis Real data example: Comparative genomic profiling of bacterial strains

Real Data Example: Comparative genomic profiling of several Escherichia coli strains The microarray design included probes for: 7 known E. coli strains 39 known E. coli bacteriophages 104 known E. coli virulence genes Experimentally: 2 sequenced control strains (W3110 and EDL933), 3 replicates 2 non-sequenced strains (D1 and 3538), 3 replicates Bacteriophage: φ3538 (Δstx2::cat), 2 replicates Willenbrock et al.; J. Bacteriology 2006 37

Comparative Genomic Profiling: Challenges Ratio problems: some genes might be present on query strain but not on the known reference strain Single channel microarrays or dual channel microarrays? In this case, we used an Affymetrix single channel custommade array (NimbleExpress) Partly present genes versus similar but distinct genes 38

The 7 E. coli strains included on the microarray Very high similarity between the two K-12 strains and between the two O157:H7 strains. Percentage of homologues for E. coli genomes in columns found in E. coli genomes in rows. Willenbrock et al. Journal of Bacteriology. 2006 Nov;188(22):7713-21. 39

BLAST Atlas Willenbrock et al. Journal of Bacteriology. 2006 Nov;188(22):7713-21.

Hybridization Atlases Probe hybridizations for experiments (samples) result in a similar pattern as expected from the BLAST atlas Willenbrock et al. Journal of Bacteriology. 2006 Nov;188(22):7713-21.

Mapping the phage Φ3538 (Δstx2::cat) Willenbrock et al. Journal of Bacteriology. 2006 Nov;188(22):7713-21.

Zoom of phage Φ3538 (Δstx2::cat) The hybridization pattern is very similar for the phage, strain 3538 and strain D1 Willenbrock et al. Journal of Bacteriology. 2006 Nov;188(22):7713-21.

Hierarchical Cluster Analysis D1 is very similar to the K-12 type strains (W3110 + MG1655) K-12

E. coli virulence genes D1 is probably still a commensal strain An organism participating in a symbiotic relationship from which it benefits while the other is unaffected Willenbrock et al. Journal of Bacteriology. 2006 Nov;188(22):7713-21.

Summary Comparative genomic profiling of two E. coli strains 0175:H16 D1 0157:H7 3538 Identification of virulence genes and phage elements Conclusions: D1 is similar to the K-12 type strains Characterization of D1 and 3538 genes: Identification of a number of genes involved in DNA transfer and recombination 46

Summary Numerous methods have been introduced for segmentation of DNA copy number data and breakpoint identification. Important to benchmark against existing methods (however, only feasible if the software is publicly available) Currently, CBS (DNAcopy package) has the best overall performance Merging of segmentation results improves copy number phenotype characterization Study types: Study of copy number in cancer samples Comparison of bacterial strains Etc.