Course on Functional Analysis. ::: Gene Set Enrichment Analysis  GSEA 


 Amos Flowers
 1 years ago
 Views:
Transcription
1 Course on Functional Analysis ::: Madrid, June 31st, Gonzalo Gómez, PhD. Bioinformatics Unit CNIO
2 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
3 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
4 ::: Introduction. GSEA MIT Broad Institute v 2.0 available since Jan 2007 v available since Feb 16th 2007 Version 2.0 includes Biocarta, Broad Institute, GeneMAPP, KEGG annotations and more... Platforms: Affymetrix, Agilent, CodeLink, custom... (Subramanian et al. PNAS )
5 ::: Introduction. ::: How works GSEA? GSEA applies KolmogorovSmirnof test to find assymmetrical distributions for defined blocks of genes in datasets whole distribution. Is this particular Gene Set enriched in my experiment? Genes selected by researcher, Biocarta pathways, GeneMAPP sets, genes sharing cytoband, genes targeted by common mirnas up to you
6 ::: Introduction. ::: KS test The Kolmogorov Smirnov test is used to determine whether two underlying onedimensional probability distributions differ, or whether an underlying probability distribution differs from a hypothesized distribution, in either case based on finite samples. The onesample KS test compares the empirical distribution function with the cumulative distribution functionspecified by the null hypothesis. The main applications are testing goodness of fit with the normal and uniform distributions. The twosample KS test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples. Dataset distribution Gene set 1 distribution Gene set 2 distribution Number of genes Gene Expression Level
7 ::: Introduction. ClassA ClassB ::: How works GSEA? FDR< testing genes independently... ttest cutoff FDR<0.05 Biological meaning?
8 ::: Introduction. ::: How works GSEA?  ClassA ClassB Gene Set 1 Gene Set 2 Gene Set 3 Gene set 3 enriched in Class B ttest cutoff ES/NES statistic Gene set 2 enriched in Class A +
9 ::: Introduction. ES examples :::
10 ::: Introduction. The Enrichment Score ::: NES pval FDR BenjaminiHochberg
11 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
12 ::: GSEA software. Download :::
13 ::: GSEA software. Main Window :::
14 ::: GSEA software. Loading data :::!!!
15 ::: GSEA software. Running GSEA :::
16 ::: GSEA software. Leading Edge Analysis :::
17 ::: GSEA software. MSigDB ::: Chip to Chip Mapping :::
18 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
19 ::: Data Formats.
20 ::: Data Formats.
21 ::: Data Formats. Expression datasets ::: *.gct
22 ::: Data Formats. Expression datasets ::: *.res
23 ::: Data Formats. Expression datasets ::: *.pcl
24 ::: Data Formats. Expression datasets ::: *.txt
25 ::: Data Formats. Phenotype datasets ::: *.cls For categorical phenotypes (e.g. Tumor vs Control)
26 ::: Data Formats. Phenotype datasets ::: For continuous phenotypes (e.g. Gene correlated to GeneSet) Time serie (each 30 minutes) Peak profile wanted For continuous phenotypes (e.g. Gene vs Time Series)
27 ::: Data Formats. Gene Set Database ::: *.gmx
28 ::: Data Formats. Gene Set Database ::: *.gmt
29 ::: Data Formats. Other formats::: *.chip *.grp
30 ::: Data Formats. Ranked list format ::: *.rnk
31 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
32 ::: Using GSEA. Loading data :::
33 ::: Using GSEA. Loading data :::
34 ::: Using GSEA. Running GSEA :::
35 ::: Using GSEA. ::: MSigDB. gsea_home
36 ::: Using GSEA. Running GSEA ::: 1. Choose true (default) to have GSEA collapse each probe set in your expression dataset into a single gene vector, which is identified by its HUGO gene symbol. In this case, you are using HUGO gene symbols for the analysis. The gene sets that you use for the analysis must use HUGO gene symbols to identify the genes in the gene sets. 2. Choose false to use your expression dataset "as is." In this case, you are using the probe identifiers that are in your expression dataset for the analysis. The gene sets that you use for the analysis must also use these probe identifiers to identify the genes in the gene sets.
37 ::: Using GSEA. Running GSEA ::: Phenotype Gene Sets (few samples)
38 ::: Using GSEA. Running GSEA :::
39 ::: Using GSEA. Chip2Chip mapping ::: Chip2Chip translates the gene identifiers in a gene sets from HUGO gene symbols to the probe identifiers for a selected DNA chip.
40 ::: Using GSEA. Enrichment statistic ::: To calculate the enrichment score, GSEA first walks down the ranked list of genes increasing a runningsum statistic when a gene is in the gene set and decreasing it when it is not. The enrichment score is the maximum deviation from zero encountered during that walk. This parameter affects the runningsum statistic used for the analysis.
41 ::: Using GSEA. Ranking Metric ::: Signal2Noise ttest Cosine Euclidean Manhatten Pearson (time series) Ratio of Classes Diff of Classes Log2_Ratio_of_Classes Categorical phenotypes Continuous phenotypes
42 ::: Using GSEA. Ranking Metric :::
43 ::: Using GSEA. Ranking Metric :::
44 ::: Using GSEA. More parameters ::: real abs parameter to determine whether to sort the genes in descending (default) or ascending order.
45 ::: Using GSEA. Launching Analysis :::
46 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
47 ::: GSEA output. By default in gsea_home Results Accession ::: C:\Documents and settings\username\gsea_home /Users/yourhome/gsea_home
48 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
49 ::: GSEA results. Index.html ::: Heat map of the top 50 features for each phenotype and a plot showing the correlation between the ranked genes and the phenotypes. In a heat map, expression values are represented as colors, where the range of colors (red, pink, light blue, dark blue) shows the range of expression values (high, moderate, low, lowest).
50 ::: GSEA results. Enrichment results in html :::
51 ::: GSEA results. Enrichment results in html :::
52 ::: GSEA results. Enrichment results in html ::: How can I decide about my results? FDR 0.25 NOM pval 0.05
53 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
54 ::: GSEA results. Leading Edge Analysis :::
55 ::: GSEA results. Leading Edge Analysis ::: HeatMap SettoSet Histogram Gene in Subsets
56 ::: GSEA results. Leading Edge Analysis ::: Heat Map The heat map shows the (clustered) genes in the leading edge subsets. In a heat map, expression values are represented as colors, where the range of colors (red, pink, light blue, dark blue) shows the range of expression values (high, moderate, low, lowest).
57 ::: GSEA results. Leading Edge Analysis ::: SettoSet The graph uses color intensity to show the overlap between subsets: the darker the color, the greater the overlap between the subsets.. When you compare a leading edge subset to itself, its members completely overlap so the corresponding cell is dark green. When you compare two subsets that have no overlapping members, the corresponding cell is white.
58 ::: GSEA results. Leading Edge Analysis ::: Gene in Subsets The graph shows each gene and the number of subsets in which it appears.
59 ::: GSEA results. Leading Edge Analysis ::: Histogram The last plot is a histogram, where the Jacquard is the intersection divided by the union for a pair of leading edge subsets. Number of Occurrences is the number of leading edge subset pairs in a particular bin. In this example, most subset pairs have no overlap (Jacquard = 0).
60 ::: GSEA & FatiScan. Detects significant functions with Gene Ontology InterPro motifs, Swissprot KW and KEGG pathways in lists of genes ordered according to differents characteristics.
61 T H A N K S
Tutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationNonInferiority Tests for One Mean
Chapter 45 NonInferiority ests for One Mean Introduction his module computes power and sample size for noninferiority tests in onesample designs in which the outcome is distributed as a normal random
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More informationFalse Discovery Rate Control with Groups
False Discovery Rate Control with Groups James X. Hu, Hongyu Zhao and Harrison H. Zhou Abstract In the context of largescale multiple hypothesis testing, the hypotheses often possess certain group structures
More informationThe InStat guide to choosing and interpreting statistical tests
Version 3.0 The InStat guide to choosing and interpreting statistical tests Harvey Motulsky 19902003, GraphPad Software, Inc. All rights reserved. Program design, manual and help screens: Programming:
More informationIBM SPSS Missing Values 22
IBM SPSS Missing Values 22 Note Before using this information and the product it supports, read the information in Notices on page 23. Product Information This edition applies to version 22, release 0,
More informationIBM SPSS Direct Marketing 21
IBM SPSS Direct Marketing 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 21 and to
More informationIBM SPSS Direct Marketing 20
IBM SPSS Direct Marketing 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 20 and to
More informationMascot Search Results FAQ
Mascot Search Results FAQ 1 We had a presentation with this same title at our 2005 user meeting. So much has changed in the last 6 years that it seemed like a good idea to revisit the topic. Just about
More informationNew Features in JMP 9
Version 9 New Features in JMP 9 The real voyage of discovery consists not in seeking new landscapes, but in having new eyes. Marcel Proust JMP, A Business Unit of SAS SAS Campus Drive Cary, NC 27513 The
More informationBiostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
More informationGetting Started with SAS Enterprise Miner 7.1
Getting Started with SAS Enterprise Miner 7.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc 2011. Getting Started with SAS Enterprise Miner 7.1.
More information4 Other useful features on the course web page. 5 Accessing SAS
1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationWISE Power Tutorial All Exercises
ame Date Class WISE Power Tutorial All Exercises Power: The B.E.A.. Mnemonic Four interrelated features of power can be summarized using BEA B Beta Error (Power = 1 Beta Error): Beta error (or Type II
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationStudent Guide to SPSS Barnard College Department of Biological Sciences
Student Guide to SPSS Barnard College Department of Biological Sciences Dan Flynn Table of Contents Introduction... 2 Basics... 4 Starting SPSS... 4 Navigating... 4 Data Editor... 5 SPSS Viewer... 6 Getting
More informationUnit 26: Small Sample Inference for One Mean
Unit 26: Small Sample Inference for One Mean Prerequisites Students need the background on confidence intervals and significance tests covered in Units 24 and 25. Additional Topic Coverage Additional coverage
More information2 Basic Concepts and Techniques of Cluster Analysis
The Challenges of Clustering High Dimensional Data * Michael Steinbach, Levent Ertöz, and Vipin Kumar Abstract Cluster analysis divides data into groups (clusters) for the purposes of summarization or
More informationAn Introduction to Point Pattern Analysis using CrimeStat
Introduction An Introduction to Point Pattern Analysis using CrimeStat Luc Anselin Spatial Analysis Laboratory Department of Agricultural and Consumer Economics University of Illinois, UrbanaChampaign
More informationDesign of Experiments (DOE)
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Design
More informationTHE development of methods for automatic detection
Learning to Detect Objects in Images via a Sparse, PartBased Representation Shivani Agarwal, Aatif Awan and Dan Roth, Member, IEEE Computer Society 1 Abstract We study the problem of detecting objects
More informationOneWay Analysis of Variance (ANOVA) Example Problem
OneWay Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesistesting technique used to test the equality of two or more population (or treatment) means
More informationAccountable Care Organization Quality Explorer. Quick Start Guide
Accountable Care Organization Quality Explorer Quick Start Guide 1 P age Background HealthLandscape (a division of the American Academy of Family Physicians [AAFP]) and the Robert Graham Center for Policy
More informationJournal of Statistical Software
JSS Journal of Statistical Software February 2010, Volume 33, Issue 5. http://www.jstatsoft.org/ Measures of Analysis of Time Series (MATS): A MATLAB Toolkit for Computation of Multiple Measures on Time
More informationGetting Started with Minitab 17
2014 by Minitab Inc. All rights reserved. Minitab, Quality. Analysis. Results. and the Minitab logo are registered trademarks of Minitab, Inc., in the United States and other countries. Additional trademarks
More informationData Quality Assessment: A Reviewer s Guide EPA QA/G9R
United States Office of Environmental EPA/240/B06/002 Environmental Protection Information Agency Washington, DC 20460 Data Quality Assessment: A Reviewer s Guide EPA QA/G9R FOREWORD This document is
More informationBig Data, Smart Energy, and Predictive Analytics Time Series Prediction of Smart Energy Data
Big Data, Smart Energy, and Predictive Analytics Time Series Prediction of Smart Energy Data Rosaria Silipo Phil Winters Rosaria.Silipo@knime.com Phil.Winters@knime.com Copyright 2013 by KNIME.com AG all
More informationIBM SPSS Statistics Base 22
IBM SPSS Statistics Base 22 Note Before using this information and the product it supports, read the information in Notices on page 179. Product Information This edition applies to ersion 22, release 0,
More information