How To Make A Microarray From A Single Cell

Similar documents
Microarray Technology

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

Gene Expression Analysis

REAL TIME PCR USING SYBR GREEN

Measuring gene expression (Microarrays) Ulf Leser

Gene expression analysis. Ulf Leser and Karin Zimmermann

Recombinant DNA and Biotechnology

Introduction To Real Time Quantitative PCR (qpcr)

Final Project Report

Tutorial for proteome data analysis using the Perseus software platform

Biotechnology: DNA Technology & Genomics

Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University

A Primer of Genome Science THIRD

Real-time PCR: Understanding C t

HiPer RT-PCR Teaching Kit

Core Facility Genomics

How To Cluster

How many of you have checked out the web site on protein-dna interactions?

Environmental Remote Sensing GEOG 2021

Microarray Data Analysis. A step by step analysis using BRB-Array Tools

Row Quantile Normalisation of Microarrays

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

QPCR Applications using Stratagene s Mx Real-Time PCR Platform

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

An Introduction to Microarray Data Analysis

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Cluster software and Java TreeView

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

ALLEN Mouse Brain Atlas

bitter is de pil Linos Vandekerckhove, MD, PhD

Gene Expression Assays

A Demonstration of Hierarchical Clustering

Unsupervised and supervised dimension reduction: Algorithms and connections

Analysing Questionnaires using Minitab (for SPSS queries contact -)

RNA Structure and folding

Frequently Asked Questions Next Generation Sequencing

Correlation of microarray and quantitative real-time PCR results. Elisa Wurmbach Mount Sinai School of Medicine New York

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial

Validating Microarray Data Using RT 2 Real-Time PCR Products

Hierarchical Clustering Analysis

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays

Essentials of Real Time PCR. About Sequence Detection Chemistries

Quality Assessment of Exon and Gene Arrays

Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds. Overview. Data Analysis Tutorial

Lecture 13: DNA Technology. DNA Sequencing. DNA Sequencing Genetic Markers - RFLPs polymerase chain reaction (PCR) products of biotechnology

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Basic Analysis of Microarray Data

Statistical issues in the analysis of microarray data

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

ncounter Leukemia Fusion Gene Expression Assay Molecules That Count Product Highlights ncounter Leukemia Fusion Gene Expression Assay Overview

Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients

Analysis of Illumina Gene Expression Microarray Data

REAL TIME PCR SYBR GREEN

Single Nucleotide Polymorphisms (SNPs)

Data, Measurements, Features

Step-by-Step Guide to Basic Expression Analysis and Normalization

Materials and Methods. Blocking of Globin Reverse Transcription to Enhance Human Whole Blood Gene Expression Profiling

PrimePCR Assay Validation Report

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Forensic DNA Testing Terminology

Social Media Mining. Data Mining Essentials

Quantitative proteomics background

OriGene Technologies, Inc. MicroRNA analysis: Detection, Perturbation, and Target Validation

restriction enzymes 350 Home R. Ward: Spring 2001

Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation

Exploratory data analysis for microarray data

User Manual/Hand book. qpcr mirna Arrays ABM catalog # MA003 (human) and MA004 (mouse)

Distances, Clustering, and Classification. Heatmaps

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG)

Introduction to next-generation sequencing data

Global MicroRNA Amplification Kit

Consistent Assay Performance Across Universal Arrays and Scanners

PreciseTM Whitepaper

Real time and Quantitative (RTAQ) PCR. so I have an outlier and I want to see if it really is changed

Data Mining and Visualization

Design and Analysis of Comparative Microarray Experiments

PrimePCR Assay Validation Report

Real-Time PCR Vs. Traditional PCR

Microarray Data Mining: Puce a ADN

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

ChIP TROUBLESHOOTING TIPS

quantitative real-time PCR, grain, simplex DNA extraction: PGS0426 RT-PCR: PGS0494 & PGS0476

Principal components analysis

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Transcription:

Basics of microarrays Petter Mostad 2003

Why microarrays? Microarrays work by hybridizing strands of DNA in a sample against complementary DNA in spots on a chip. Expression analysis measure relative amounts of mrna in a tissue sample testing all genes at the same time alternatives: Northern blot. qpcr SNP-analysis genomic DNA

The transcriptome Genome -> Transcriptome -> Proteome In a cell: about 300.000 transcripts representing genes at different frequencies Highly regulated turnover of transcripts: lifetimes minutes to weeks. Depends on sequence, structure Alternative splicing Connection between expression profile and protein profile?

Alternatives to hybridization Alternative ways to measure the transcription profile: EST SAGE For small number of genes qpcr (advantages/disadvantages)

Basic technology Several technologies: Affymetrix chips cdna chips oligo-chips All based on complementary strings of DNA hybridizing against each other Fluorescence indicate the amount of hybridization

cdna chips non-proprietary technology clones are made based on expressed RNA probes based on sequences a few hundred bases long specialized chips two dyes cheaper per chip, but expensive to set up often noisy data

cdna: EST libraries, printing, labelling RNA sample -> cdna -> random library Clones from the library are sequenced -> ESTs Sequence analysis of EST s gene identities public EST libraries cdna cloes are amplified by PCR and put on well plates Robot pots PCR products onto glass slides UV treatment Two parallel samples labelled with two dyes (Cy3, Cy5) Labelling performed as a reverse transcription (get cdna)

cdna: hybridization and scanning Hybridization of target + probe Scan at Cy3 and Cy5 wavelengths

Affymetrix patented technology, expensive chips Based on system with PM, MM sequences syntesized on chip about 20 probes per gene, currently randomly placed sequences optimized, to reduce cross-hybridization have had some quality problems, secrecy problems seems to be less noisy than cdna

Oligo-arrays probes are about 50 bases long avoids Affymetrix patent may work better than cdna chips

Amplification? All technologies require a minimal amount of RNA to work (1 microgram mrna?) Sometimes there is too little (human samples, samples where you want purity of cell types...) Aplification, using PCR, is an alternative Introduces noise in the data (in a semi-systematic way)

Planning of microarray experiments Using cdna or Affymetrix? What kind of cdna chip? Reference sample? Pooling? Dye swap? How many repetitions are necessary? What kind of data analysis?

Statistical issues connected to cdna chips Experimental design: Array printing What to hybridize Low-level analysis Image processing Visualization Normalization Quality measures Data analysis Ranking differentially expressed genes Assigning significance to ranking Classification (discrimination and clustering)

Experimental design Questions: Pooling of samples? Reference sample? Which samples hybridize agains which? How many arrays? Tips: Two different sample types => compare directly Several types compare to wild type => wild type ref. Saturated designs, loop designs Complexity => use reference Dye-swap when appropriate Deciding factors: Aim of experiment Availability of types of sample material

Image Analysis Purposes: extract R and G for each spot; assess quality Scanning: Avoid spot saturation. Do not use several scans Finding spot foreground pixels: Histogram method. Fit a circle. Seeded region growth Finding background: Pixels within bounding box, not foreground. Two concentric circles. Valleys. Morphological opening Subtracting background from foreground: Estimate foreground with average over pixels Estimate background with median over pixels Ignore spots with resulting negative values Handling of background has big impact!

Graphical Presentation Images of microarrays; overlays Plotting M = log R log G versus A = ½(log G + log R) (ignore spots with negative R or G) Boxplots of M values Spatial plots

Normalization Simplest: Subtract mean or median of non-regulated genes: M := M c Intensity dependent: M := M c(a) Printtip-dependent: M := M ci(a) Scale normalization of M Use of control spots Sample pool titration series. Spiking

Dependence on signal strength

Spatial dependence of signal

Variation between regions of the arrays

Quality Measures Array quality Intensities span whole range Saturation avoided Check control spots Background mostly below signal Check slide images for spatial effects Spot quality: Single spots: Check spot parameters: area perimeter, standard deviation, background variability, etc Spot quality: compare repeated spots: Reject outlier M- values. Using a spot quality measure as a weight

Hypothesis generation versus Hypothesis generation: hypothesis testing Methods may suggest that a gene is up- or down-regulated Methods may suggest new relationships between genes Suggestions may not be reproduced by another experiment; all results must be verified by other methods. Hypothesis testing: Example: Testing whether a gene is significantly up-regulated. Reproducible conclusions. Fewer methods available. In general, require repetitions of experiments, or serious assumptions.

Ranking differentially expressed genes Assuming repeated comparison of two different sample types: Simplest: Rank M Next choice: Rank t = M s / n Penalized t-statistic (Lönnstedt, Speed): t = M ( a + s 2 ) / n Penalized t-statistic (Efron): M t = ( a + s) / n

Finding significantly diff. exp. genes Problem: Multiple testing Assuming normally distributed M-values and independency, use t-distribution probability plot Controlling the family-wise error rate: Using re-sampling in a repeated experiment with a reference sample (Dudoit) Estimating the false discovery rate by using re-sampling. SAM. (Tibshirani)

Classification Identification of different cell types of conditions, or identification of different gene types Supervised learning (discriminant analysis; using learning sets) versus unsupervised learning (clustering) Clustering methods may be overused Simple methods (linear discriminant methods, nearest neighbour, classification trees) often perform as well as more complex methods

Clustering Example: Data is a time series of transcription profiles: Cluster the genes according to behaviour. Clustering starts with defining similarity between all pairs of genes (e.g., distance in some space). Hierarchical clustering. Dendrograms. Linkage methods. The K-means method. Example of hypothesis generation: Tavazoie et al.(1999) used clusters of genes to identify probable regulatory sequences upstream of them.

Clustering of genes and samples

Self-organising maps

Principal Components Analysis The principal components can be viewed as the axes of a better coordinate system for the data. Better in the sense that the data is maximally spread out along the first principal components. The principal components correspond to eigenvectors of the covariance matrix of the data. The eigenvalues represent the part of the total variance explained by each of the principal components.

Principal component analysis of expression data

Good software: BioConductor, a package using R Ref. on statistics: Smyth, Yang, Speed: Statistical Issues in cdna Microarray Data Analysis