Measuring gene expression (Microarrays) Ulf Leser

Similar documents
Analysis of gene expression data. Ulf Leser and Philippe Thomas

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Data Acquisition. DNA microarrays. The functional genomics pipeline. Experimental design affects outcome data analysis

Introduction To Real Time Quantitative PCR (qpcr)

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Quality Assessment of Exon and Gene Arrays

Basic Analysis of Microarray Data

How many of you have checked out the web site on protein-dna interactions?

Microarray Data Analysis. A step by step analysis using BRB-Array Tools

Row Quantile Normalisation of Microarrays

Microarray Technology

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Essentials of Real Time PCR. About Sequence Detection Chemistries

Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Translation Study Guide

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

RNA & Protein Synthesis

The Steps. 1. Transcription. 2. Transferal. 3. Translation

HiPer RT-PCR Teaching Kit

Gene expression analysis. Ulf Leser and Karin Zimmermann

Real-Time PCR Vs. Traditional PCR

REAL TIME PCR USING SYBR GREEN

Correlation of microarray and quantitative real-time PCR results. Elisa Wurmbach Mount Sinai School of Medicine New York

ncounter Leukemia Fusion Gene Expression Assay Molecules That Count Product Highlights ncounter Leukemia Fusion Gene Expression Assay Overview

Gene Expression Analysis

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays

Molecular Genetics. RNA, Transcription, & Protein Synthesis

CCR Biology - Chapter 9 Practice Test - Summer 2012

Core Facility Genomics

Gene Expression Assays

Molecular Biology Techniques: A Classroom Laboratory Manual THIRD EDITION

Transcription and Translation of DNA

Microarray Data Mining: Puce a ADN

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Consistent Assay Performance Across Universal Arrays and Scanners

Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

Micro RNAs: potentielle Biomarker für das. Blutspenderscreening

micrornas Non protein coding, endogenous RNAs of 21-22nt length Evolutionarily conserved

March 19, Dear Dr. Duvall, Dr. Hambrick, and Ms. Smith,

Frozen Robust Multi-Array Analysis and the Gene Expression Barcode

Recombinant DNA and Biotechnology

Materials and Methods. Blocking of Globin Reverse Transcription to Enhance Human Whole Blood Gene Expression Profiling

Real-time PCR: Understanding C t

MICROARRAY DATA ANALYSIS TOOL USING JAVA AND R

Technical Note. Roche Applied Science. No. LC 18/2004. Assay Formats for Use in Real-Time PCR

ALLEN Mouse Brain Atlas

Description: Molecular Biology Services and DNA Sequencing

DNA Fingerprinting. Unless they are identical twins, individuals have unique DNA

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial

Tutorial for proteome data analysis using the Perseus software platform

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Protein Synthesis How Genes Become Constituent Molecules

Next Generation Sequencing

Thermo Scientific DyNAmo cdna Synthesis Kit for qrt-pcr Technical Manual

Long-Term Effects of Drug Addiction

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Modeling DNA Replication and Protein Synthesis

High Resolution Epitope Mapping of Human Autoimmune Sera against Antigens CENPA and KDM6B. PEPperPRINT GmbH Heidelberg, 06/2014

Specific problems. The genetic code. The genetic code. Adaptor molecules match amino acids to mrna codons

Co Extra (GM and non GM supply chains: Their CO EXistence and TRAceability) Outcomes of Co Extra

DNA, RNA, Protein synthesis, and Mutations. Chapters

Web-based Tools for the Analysis of DNA Microarrays. End of Project Report. Authors: P. Geeleher 1,2, A. Golden 3, J. Hinde 2 and D. G.

PreciseTM Whitepaper

13.4 Gene Regulation and Expression

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, b.patel@griffith.edu.

13.2 Ribosomes & Protein Synthesis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Quantitative proteomics background

Gene Mapping Techniques

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

DNA Microarrays (Gene Chips) and Cancer

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

Activity 4 Long-Term Effects of Drug Addiction

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

Forensic DNA Testing Terminology

A Novel Bioconjugation Technology

Name: Date: Period: DNA Unit: DNA Webquest

A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias

Gene Models & Bed format: What they represent.

QPCR Applications using Stratagene s Mx Real-Time PCR Platform

To be able to describe polypeptide synthesis including transcription and splicing

RNAseq / ChipSeq / Methylseq and personalized genomics

OriGene Technologies, Inc. MicroRNA analysis: Detection, Perturbation, and Target Validation

Single Nucleotide Polymorphisms (SNPs)

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Introduction to next-generation sequencing data

Basic Concepts of DNA, Proteins, Genes and Genomes

User Manual/Hand book. qpcr mirna Arrays ABM catalog # MA003 (human) and MA004 (mouse)

1. Molecular computation uses molecules to represent information and molecular processes to implement information processing.

Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

Genetics Module B, Anchor 3

Transcription:

Measuring gene expression (Microarrays) Ulf Leser

This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2

http://learn.genetics.utah.edu/content/molecules/transcribe/ Recap Gene expression (Protein Biosynthesis) Gene expression has 2 phases: Transcription DNA mrna RNA polymerase Translation mrna protein (Ribosome) alternatively mrna (transcript) may encode for mirna, rrna, trna Proteins have various functions: antibodies, enzymes, hormones, storage, transport, structure, regulation 3

mrna Quantification Differences in gene expression correlate with drug response or disease risk! Measuring the relative amount of mrna expressed in different experimental conditions Assumption: mrna expression correlates with protein synthesis (not entirely true) Techniques: Northern/Southern blotting Real time PCR High throughput analysis (multiple genes with one experiment): Microarrays (since 1995) Trend towards next generation sequencing, e.g. RNA-Seq (~since 2007) www.affymetrix.com 4

This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 5

Microarrays - Overview Lab-on-a-chip: Measure mrna expression for all genes at a specific time point in parallel Find differentially expressed genes between experiments/groups (not between genes): Healthy vs. sick Tissue/Cell types Development state (embryo, adult, cell states, ) Environment (heat shock, nutrition, therapy) Disease subtypes (ALL vs. AML, 40% chemotherapy resistant in colon cancer) Co-regulation of genes: similar gene-profile similar function/regulation? Two types of microarrays: cdna microarrays and oligonucleotide microarrays 6

Microarray Core Principle Microarray: collection of single stranded DNA sequences (probes) attached to a solid surface (e.g. glass) Sample mrna is labeled with dyes and given onto the chip In the process of hybridization mrna binds to complementary sequences on the array Hybridisation: Process of unifying two complementary single- stranded chains of DNA or RNA to one doublestranded molecule. www.wikipedia.com 7

Microarrays - Workflow 1 2 3 5 6 4 1) Sample preparation and purification, isolation of mrna 2) Reverse transcription mrna cdna (complementary DNA) 3) Labelling of sample cdna with fluroescent dyes 4) Hybridization (overnight) and washing 5) Scanning of array with laser, detection of light intensities, image segmentation 6) Normalization of raw data and data analysis www.wikipedia.com 8

cdna Microarrays (Spotted Arrays) Manufacturing steps : Selection of genes Preparation/purification of these genes area: 3.6 cm Amplification of corresponding cdna via PCR* micro spotting of cdna sequences (probes) on a glass slide 10000-20000 spots, each represents one gene http://grf.lshtm.ac.uk/microarrayoverview.htm negative control spots (background correction) Known as in-house printed microarrays: Easy customization No special laboratory equipment needed Relatively low-costs 2 *Polymerase Chain Reaction 9

cdna Microarrays Two Color Array cdna microarrays are usually two color arrays Two samples with one array Two different colors Cy3/Cy5 Laser uses two different wave length Ratio Cy3/Cy5 Green: high expression sample 1 Red: High expression sample 2 Black: no signal in both samples Yellow: equal expression http://cmr.asm.org/content/22/4/611/f2.large.jpg 10

Oligonucleotide microarrays Short probes (25nt 60nt) designed to match parts of the sequence of known genes 11 20 probes is one probeset Represents one transcript Scattered over array Perfect match (PM) and mismatch (MM) probes (local/global background) Probes oligonucleotide microarray vs. cdna microarray Affymetrix.com Staal, F. J. T., et al.." Leukemia 17.7 (2003): 1324-1332. 11

Oligonucleotide microarray - Photolitography Manufacturing via photolitographic synthesis UV light is passed trough mask either transmits or blocks the light from array surface UV light removes protecting groups Array is flooded with one kind of nucleotide (A, C, G or T) One nucleotide is added to each deprotected position Unbound nucleotides are washed away Process is repeated ~70 times Miller MB, Tang Y-W. Basic Concepts of Microarrays and Potential Applications in Clinical Microbiology. Clinical Microbiology Reviews. 2009;22(4):611-633. doi:10.1128/cmr.00019-09. 12

Oligonucleotide microarrays Industrial manufacturing Prominent example: Affymetrix Microarrays No customization of probes possible Robustness between one array type Good quality control More expensive than cdna microarrays Selection of good oligos is difficult 13

Probe selection Requirements: High sensitivity (SN): strong signal if complementary target sequence is in sample solution TP SN = TP+ FN High specificity (SP): Weak signal if complementary target sequence is not in solution (probe uniqueness) TN SP= TN + FP Criteria/binding depends on: probe length, GC content, secondary structure, number of matches on all transcripts, probe self or cross hybridisation, position of probe in the transcript, number of matches on all transcripts, probe uniqueness (sensitivity vs. specificity),... http://grf.lshtm.ac.uk/microarrayoverview.htm 14

Comparison: cdna vs Oligonucleotide Microarrays Oligonucleotide microarrays cdna microarrays (also: spotted arrays) No special laboratory equipment needed Good customization Error prone workflow Less spots/genes (limited redundancy) One cdna or long oligonucleotides per gene High sensitivity: long probe sequences Lower specificity: cross hybridisations more probable Lower costs Industrial manufacturing Densely packed Companies sell kits No customization possible Higher repdroducibilty (e.g. no PCR amplification needed) Better Quality Control Higher specificity Sensitivity: lower but multiple probes per gene as compensation Expensive 15

Experimental Design - Replicates In order to exclude technical or biological bias, replicated measurements are exploited: Technical Replicates: Same sample hybridized against several arrays Statistical estimation of systematic effects Biological Replicates: Different sample sources are used They allow to estimate biological noise and reduce the randomness of the measurement. 16

Advances in technology Exon Array Gene-Expression profiling with microarrays Exon arrays: each exon of a gene is measured individually SNP arrays, ChIP-on-Chip Arrays,... (Oligonucleotid Microarrays) Probeset Probes Probeset 17

Challenges Patient data has a high variance Different genetic background Mixture of cells from different tissues Cells are in different stages (cell cycle; cell development) Environment has influence on hybridization quality Noise: Technical replicates never produce the same data Transient data Select appropriate time point Signaling might be very fast for some processes Intermediate steps are lost 18

Challenges High number of transcripts Multiple testing correction Choice of statistical test Time series results in day/night work Cause and effect Tumors have high cell proliferation EGF, p53 likely highly expressed Biological interpretation difficult 19

This Lecture Protein synthesis Microarray Idea Technologies Problems Quality control Normalization Analysis next week! 20

Data Visualization, Quality Control Detect arrays with poor quality (outliers) Identify arrays behaving different than others https://genevestigator.com Boxplot Estimate the homogeneity of data Array 14: overall higher signal intensity 21

Array-Array Correlation Plot correlate all arrays from an experiment with each other replicates should show high correlation Example: Array 14 poor correlation possible reasons: higher noise in the data, stronger background,... skeletal muscle bone marrow 22

Data visualization, Quality control Scatter Plot: Each point represents one transcript in two experimental settings Most points should appear around the horizontal line (only a few genes are expressed at different levels) Higher variation with low intensities 23

MA-Plot 45 rotated version with subsequent scaling of the scatter plot Log2 Fold Change or M-value Is the log2 ratio between two values Log-values are symmetric Visual interpretation (Difference between 4 to 16 vs. 0.25 to 0.0625) Value 1 Value 2 FC (Value 1 / Value 2 ) log 2 For example: 123 0 123 512 FC (512 / 1024) 0.5 1024 123 FC (123 / 123) 1 123 512 1 256 512 FC (512 / 256) 2 256 512 1 1024 FC (512 / 1024) log 2 FC (123 / 123) log 2 FC (512 / 256) log 2 24

MA-Plot A-Value is the logarhitmized intensity mean value A 1 log 2 (Value 1 ) log 2 (Value 2 ) 2 Points should be located on the y-axis at 0 Bana shape in two-color arrays: green brighter than red Further quality control possibilities: Image analysis RNA degradation plots Residual plots PCA... 25

This Lecture Protein synthesis Microarray Idea Technologies Problems Quality control Normalization Analysis next week! 26

Normalization Microarrays are comparative experiments, BUT measurements between two experiments are not directly comparable Several levels of variability in measured gene expressions: Highest level: biological variability in the population from where the sample derives Experimental level: variability between preparations and labelling of samples (different amounts of RNA or dye, experimenter variability, day/night work...) hybridisations The signal on replicate features on the same array (probe affinity) Further sources: different scanner settings,.. http://slideplayer.com/slide/2394201/ 27

Normalization Aim: identification of the real biological differences among samples (compensation for systematic technical differences) Assumptions overall number of mrna molecules changes not much between samples only a few genes are expressed at really different levels between samples (not entirely true for comparing highly transformed cancer cells with normal cells) within arrays or across arrays Two-color arrays: normally within array normalization followed by across array normalization 28

Normalization mrna in a sample Assumption: cells contain same proportion of RNA Measure total mrna Divide intensities by this value Z-Score (mean) Standardization: set mean to 0 and standard deviation to 1: Centering: substract the mean from each value Scaling: divide the centered value by the standard deviation z=( x meanest )/ sd est standardized value sample-based estimate of the population mean sample-based estimate of the population standard deviation Z-Score (median) 29

Quantile normalization Normalize so that the quantiles of each array are equal Distribution of expression values on each microarray is made identical Simple and fast algorithm Usually outperforms linear methods Steps: Given a matrix X with p x n where each array is a column and each transcript is a row Sort each column of X separately to give Xsort Take the mean across rows of Xsort and create X'sort Get Xn by rearranging each column of X'sort to have the same ordering as the corresponding input vector 30

Indexes Values Quantile normalization + differences between the separate values are retained + identical distribution for each array - some data can be lost, especially in the lower signals 31

Quantile normalization Boxplot for raw data (left) and normalized (right) data from previous slide MA Plots before (top) and after (bottom) quantile normalization (red line: loess line) Bolstad, Benjamin M., et al. "A comparison of normalization methods for high density oligonucleotide array data based on variance and bias." Bioinformatics 19.2 (2003): 185-193. 32

RMA Robust Multichip Average Used for Affymetrix microarray normalization Three steps: 1) Background correction (for each array separately) 2) Quantile normalization (across all arrays) 3) Summary of probesets (median-polish) returns normalized data in log2 scale RMA does not use the Mismatch probe intensity information (MM signal often higher than PM signal) Computing RMA: affy package for R (Exercise next week!) Further normalization possibilities: Non linear methods like Lowess (two-color, non-linear, within array) Statsitical approaches like MAS5 or VSN... 33

Gene expression matrix Preprocessing Steps: Data Visualization, Quality Control (Background Correction) Normalization (log2 of data) gene expression matrix Condition 1 arrays Condition 2 arrays Genes Sample Analysis next week! 34

Appendix Workflow Comparison cdna-microarray vs. Oligonucleotide Microarray Staal, F. J. T., et al. "DNA microarrays for comparison of gene expression profiles between diagnosis and relapse in precursor-b acute lymphoblastic leukemia: choice of technique and purification influence the identification of potential diagnostic markers." Leukemia 17.7 (2003): 1324-1332. 35

Appendix - Non linear methods Lowess Dye-bias in two color array Green channel appears consistently brighter than red channel Intensity based Fit simple models to localized subsets Needs no global function of any form to fit a model to the data It requires large, densely sampled data sets in order to produce good models 36

Appendix - Non linear methods Lowess 37