MICROARRAY Measures Gene Expression Global - Genome wide scale entire genome -30,000 genes Relative expression levels between two groups Treatment Control
Key components Array Immobilised target Labeled probes (extracted mrna > cdna-*) Detection system scanner Normalisation Data-analysis ratio generation fold change
Two distinct platforms Single Channel test either treatment or control eg: Affymetrix Two Channel contains both treatment and control Fundamentally operate by the same principle
Overview Extracted mrna 2 channel Labeled probes Target array Detection system
Immobilised sequences Complementary to specific probe / gene ARRAY binding site / target chip Millions binding sites for each gene
Hybridisation Array target Labelled probe (mrna, labelled cdna) Add labelled probe and cover Hybridise overnight in humid environment Scan Wash - remove unbound probe so only specifically bound labelled probe remains
Hybridisation The labelled cdna will bind specifically to complementary target sequences (nucleotides) Increased transcript = more binding = increased signal
Critical steps in the generation & analysis of data Array Cy3: 550-600nm Cy3: 532nm Excitation Emission Cy5: 635nm Gene Pix Cy5: 655-695nm Overlay Images Single channel > only one signal / slide
Automatic spot-finding Automatic spot finding algorithms are common Grids must be aligned with spots so that the software knows exactly what area of the slide to read the signal for any specific sequence
Results Once spots are aligned, software generates quantitative numerical results For each spot - represents a specific gene generate signal intensity (pixel count) cdna/mrna binding > signal RAW DATA OUTPUT FILES.CEL files Affymetrix Must be processed
Normalisation First step in analysis Standardise data from a number of microarrays into common scale Allows meaningful comparisons between slides Correct for technical variability
NO NORMALISATION NORMALISED GENE T1 C1 ra*o log2 RATIO N C1 (x1.34) T1 N C1 ra*o log2 RATIO A 1300.00 1000.00 1.30 0.38 1330.00 1300.00 1330.00 0.98-0.03 B 2600.00 2000.00 1.30 0.38 2660.00 2600.00 2660.00 0.98-0.03 C 5200.00 4000.00 1.30 0.38 5320.00 5200.00 5320.00 0.98-0.03 D 650.00 500.00 1.30 0.38 665.00 650.00 665.00 0.98-0.03 E 800.00 200.00 4.00 2.00 266.00 800.00 266.00 3.01 1.59 F 7800.00 6000.00 1.30 0.38 7980.00 7800.00 7980.00 0.98-0.03 0.00 AV SIG 3058.33 2283.33 1.34 0.42 3036.83 3058.33 3036.83 1.01 0.01 Average signal intensity/ratio should be same across arrays
Ratio generation log2 transformation gene treatment control RATIO RATIO (log base 2) A 1000 1000 1 no change 0 B 2000 1000 2 UP 2x 1 C 500 1000 0.5 DOWN 2x -1 D 4000 1000 4 E 1000 4000 0.25 UP 4x DOWN 4x 2-2 RATIO RATIO LOG2 2 1 4 2 8 3 Symmetrical representation of up / down regulated genes 16 4 32 5
Fold change calculation Colomns of normalised microarray Probe ID link to gene ID Signal value (natural or log2 scale) natural pixel intensity (divide: treatment / control) log2 pixel intensity (subtract: treatment control) Present/ Absent call is gene expressed? P-value (spot signal signif diff to background or non-specific sequence)
Microarray public repositories storage site hundreds microarray experiments free public access publication > requirement to allow public access Computational Tools mass analysis information beyond fold change
CO-EXPRESSION ANALYSIS WHY? cellular response, not mediated by the action of one gene/protein rather requires coordinated action of many 100s -1000s proteins cell coordinates expression of functionally related genes expression of genes involved in specific development stage specific stress responses coordinated expressed at same time
co-expression analysis - identify genes/proteins that function in a common cellular response Use unknown gene as driver gene If co-expressed with genes known to function in well defined process (eg pathogen defence), unknown genes may also Well known gene as driver gene (eg drought response) Identify other co-expressed genes that function in pathway
Arabidopsis co-expression tool http://www.arabidopsis.leeds.ac.uk/act/ Expression correlation analysis identify co-expressed genes Read : Frequently asked questions Pearson correlation coefficient Measures the strength and direction of a linear relationship between the X and Y variables
Uses microarray hybridization signal intensities to calculate a Pearson correlation coefficient (r value) Range +1.0 perfect positive to -1.0 perfect negative Gene A Gene B Gene C
scale-invariant measure of expression similarity that determines the strength and direction of the linear relationship between the reference gene (GOI) and all other Arabidopsis on the selected array Intensity independent Based on trend
Correlation analysis performed using over 100 microarray experiments Identify truly co-expressed genes under common transcription regulatory control Single experiment activate multiple regulatory networks Over 100 experiments co-expression likely indicates co-regulation
Gene of Interest = PR-2 ID = AT3G57260 Need Probe ID ID exchanger AtGenome1(8k) ATH1-121501 (22k) Input = 251625_at
Genevestigator www.genevestigator.ethz.ch Use to identify experiments where gene/s are differentially expressed
ARABIDOPSIS Nottingham Arabidopsis Stock Centre's microarray database (NASCArrays) http://affymetrix.arabidopsis.info/narrays/experimentbrowse.pl MULTIPLE ORGANISMS Array Express http://www.ebi.ac.uk/microarray-as/ae/ Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/
Some experiments on multiple sites Some only on one site OPTION download raw and normailised / processed data Procedure different for each site Important information experiment ID matching slides (essential) platform probe IDs (essential) experimental details associated publications
NASCARRAYS Cyclohexamide- NASCARRAYS-189 Search by ID or term Slides in this Experiment * Download available.cel and.chp files * Add all of these slides to the Slide Selection * Download all of the ATH1 chip data for this experiment
Array Express Processed data = normalised
GEO Series Matrix File(s)
Presentation of results Present as Figures Excel graphs individual genes or few genes
HEAT MAPS Illustrate results for Many genes over multiple experiments
HEAT MAPS DOWNLOAD TIGR MultiExperiment Viewer (TMEV) http://www.tm4.org/mev/