1 Course on Functional Analysis ::: Madrid, June 31st, Gonzalo Gómez, PhD. Bioinformatics Unit CNIO
2 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
3 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
4 ::: Introduction. GSEA MIT Broad Institute v 2.0 available since Jan 2007 v available since Feb 16th 2007 Version 2.0 includes Biocarta, Broad Institute, GeneMAPP, KEGG annotations and more... Platforms: Affymetrix, Agilent, CodeLink, custom... (Subramanian et al. PNAS )
5 ::: Introduction. ::: How works GSEA? GSEA applies Kolmogorov-Smirnof test to find assymmetrical distributions for defined blocks of genes in datasets whole distribution. Is this particular Gene Set enriched in my experiment? Genes selected by researcher, Biocarta pathways, GeneMAPP sets, genes sharing cytoband, genes targeted by common mirnas up to you
6 ::: Introduction. ::: K-S test The Kolmogorov Smirnov test is used to determine whether two underlying one-dimensional probability distributions differ, or whether an underlying probability distribution differs from a hypothesized distribution, in either case based on finite samples. The one-sample KS test compares the empirical distribution function with the cumulative distribution functionspecified by the null hypothesis. The main applications are testing goodness of fit with the normal and uniform distributions. The two-sample KS test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples. Dataset distribution Gene set 1 distribution Gene set 2 distribution Number of genes Gene Expression Level
7 ::: Introduction. ClassA ClassB ::: How works GSEA? FDR< testing genes independently... ttest cut-off FDR<0.05 Biological meaning?
8 ::: Introduction. ::: How works GSEA? - ClassA ClassB Gene Set 1 Gene Set 2 Gene Set 3 Gene set 3 enriched in Class B ttest cut-off ES/NES statistic Gene set 2 enriched in Class A +
9 ::: Introduction. ES examples :::
10 ::: Introduction. The Enrichment Score ::: NES pval FDR Benjamini-Hochberg
11 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
18 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
19 ::: Data Formats.
20 ::: Data Formats.
21 ::: Data Formats. Expression datasets ::: *.gct
22 ::: Data Formats. Expression datasets ::: *.res
23 ::: Data Formats. Expression datasets ::: *.pcl
24 ::: Data Formats. Expression datasets ::: *.txt
25 ::: Data Formats. Phenotype datasets ::: *.cls For categorical phenotypes (e.g. Tumor vs Control)
26 ::: Data Formats. Phenotype datasets ::: For continuous phenotypes (e.g. Gene correlated to GeneSet) Time serie (each 30 minutes) Peak profile wanted For continuous phenotypes (e.g. Gene vs Time Series)
27 ::: Data Formats. Gene Set Database ::: *.gmx
28 ::: Data Formats. Gene Set Database ::: *.gmt
29 ::: Data Formats. Other formats::: *.chip *.grp
30 ::: Data Formats. Ranked list format ::: *.rnk
31 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
32 ::: Using GSEA. Loading data :::
33 ::: Using GSEA. Loading data :::
34 ::: Using GSEA. Running GSEA :::
35 ::: Using GSEA. ::: MSigDB. gsea_home
36 ::: Using GSEA. Running GSEA ::: 1. Choose true (default) to have GSEA collapse each probe set in your expression dataset into a single gene vector, which is identified by its HUGO gene symbol. In this case, you are using HUGO gene symbols for the analysis. The gene sets that you use for the analysis must use HUGO gene symbols to identify the genes in the gene sets. 2. Choose false to use your expression dataset "as is." In this case, you are using the probe identifiers that are in your expression dataset for the analysis. The gene sets that you use for the analysis must also use these probe identifiers to identify the genes in the gene sets.
39 ::: Using GSEA. Chip2Chip mapping ::: Chip2Chip translates the gene identifiers in a gene sets from HUGO gene symbols to the probe identifiers for a selected DNA chip.
40 ::: Using GSEA. Enrichment statistic ::: To calculate the enrichment score, GSEA first walks down the ranked list of genes increasing a running-sum statistic when a gene is in the gene set and decreasing it when it is not. The enrichment score is the maximum deviation from zero encountered during that walk. This parameter affects the running-sum statistic used for the analysis.
41 ::: Using GSEA. Ranking Metric ::: Signal2Noise ttest Cosine Euclidean Manhatten Pearson (time series) Ratio of Classes Diff of Classes Log2_Ratio_of_Classes Categorical phenotypes Continuous phenotypes
42 ::: Using GSEA. Ranking Metric :::
43 ::: Using GSEA. Ranking Metric :::
44 ::: Using GSEA. More parameters ::: real abs parameter to determine whether to sort the genes in descending (default) or ascending order.
45 ::: Using GSEA. Launching Analysis :::
46 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
47 ::: GSEA output. By default in gsea_home Results Accession ::: C:\Documents and settings\username\gsea_home /Users/yourhome/gsea_home
48 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
49 ::: GSEA results. Index.html ::: Heat map of the top 50 features for each phenotype and a plot showing the correlation between the ranked genes and the phenotypes. In a heat map, expression values are represented as colors, where the range of colors (red, pink, light blue, dark blue) shows the range of expression values (high, moderate, low, lowest).
50 ::: GSEA results. Enrichment results in html :::
51 ::: GSEA results. Enrichment results in html :::
52 ::: GSEA results. Enrichment results in html ::: How can I decide about my results? FDR 0.25 NOM p-val 0.05
53 ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA Output 6. GSEA Results 7. Leading Edge Analysis
54 ::: GSEA results. Leading Edge Analysis :::
55 ::: GSEA results. Leading Edge Analysis ::: HeatMap Set-to-Set Histogram Gene in Subsets
56 ::: GSEA results. Leading Edge Analysis ::: Heat Map The heat map shows the (clustered) genes in the leading edge subsets. In a heat map, expression values are represented as colors, where the range of colors (red, pink, light blue, dark blue) shows the range of expression values (high, moderate, low, lowest).
57 ::: GSEA results. Leading Edge Analysis ::: Set-to-Set The graph uses color intensity to show the overlap between subsets: the darker the color, the greater the overlap between the subsets.. When you compare a leading edge subset to itself, its members completely overlap so the corresponding cell is dark green. When you compare two subsets that have no overlapping members, the corresponding cell is white.
58 ::: GSEA results. Leading Edge Analysis ::: Gene in Subsets The graph shows each gene and the number of subsets in which it appears.
59 ::: GSEA results. Leading Edge Analysis ::: Histogram The last plot is a histogram, where the Jacquard is the intersection divided by the union for a pair of leading edge subsets. Number of Occurrences is the number of leading edge subset pairs in a particular bin. In this example, most subset pairs have no overlap (Jacquard = 0).
60 ::: GSEA & FatiScan. Detects significant functions with Gene Ontology InterPro motifs, Swissprot KW and KEGG pathways in lists of genes ordered according to differents characteristics.
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
Microarray Data Analysis Using Partek Genomic Suite Xiaowen Wang Field Application Specialist Partek Inc. Who is Partek? Founded in 1993 Building tools for statistics & visualization Focused on genomics
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
Hierarchical Clustering Analysis What is Hierarchical Clustering? Hierarchical clustering is used to group similar objects into clusters. In the beginning, each row and/or column is considered a cluster.
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
Combination of Multivariate Analysis and KEGG Data Mapping in GeneSpring 13 Technical Overview Authors Nilanjan Guha, Deepak SA, Arunkumar Padmanaban, Syed Lateef, and Seetaramanjaneyulu Gundimeda Agilent
MANTRA 2.0 TUTORIAL mantra.tigem.it OUTLINE 1. MANTRA Web Tool 2. Analysis a) New Experiment b) New Node c) GSEA 3. Network a) View b) Button Panel 4. Search 5. In Summary 6. Conclusion OUTLINE 1. MANTRA
Methods for network visualization and gene enrichment analysis July 17, 2013 Jeremy Miller Scientist I email@example.com Outline Visualizing networks using R Visualizing networks using outside
Universidad Complutense de Madrid ESCUELA DE VERANO 2007 Universidad Complutense de Madrid ESCUELA DE VERANO 2007 Introducción al análisis de datos en microarrays 1. Introducción 2. Microarrays (tipos,
Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random
Identification of rheumatoid arthritis and osterthritis patients by transcriptome-based rule set generation Bering Limited Report generated on September 19, 2014 Contents 1 Dataset summary 2 1.1 Project
MultiExperiment Viewer Quickstart Guide Table of Contents: I. Preface - 2 II. Installing MeV - 2 III. Opening a Data Set - 2 IV. Filtering - 6 V. Clustering a. HCL - 8 b. K-means - 11 VI. Modules a. T-test
Minería de Datos ANALISIS DE UN SET DE DATOS! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions Data Mining on the DAG ü When working with large datasets, annotation
Using CrunchIt (http://bcs.whfreeman.com/crunchit/bps4e) or StatCrunch (www.calvin.edu/go/statcrunch) 1. In general, this package is far easier to use than many statistical packages. Every so often, however,
Gene Expression Analysis of a Down s Syndrome Study Using Partek Genomics Suite 6.6 This tutorial will illustrate how to: Import Affymetrix CEL files and check quality Add attributes describing the sample
HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER
MLECNIK Bernhard & BINDEA Gabriela Analysis of the colorectal tumor microenvironment using integrative bioinformatic tools INSERM U872, Jérôme Galon Team15: Integrative Cancer Immunology Cordeliers Research
0BComparativeMarkerSelection Documentation Description: Author: Computes significance values for features using several metrics, including FDR(BH), Q Value, FWER, Feature-Specific P-Value, and Bonferroni.
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. firstname.lastname@example.org
TIPS FOR DOING STATISTICS IN EXCEL Before you begin, make sure that you have the DATA ANALYSIS pack running on your machine. It comes with Excel. Here s how to check if you have it, and what to do if you
Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
The microarray block Bioinformatics 13-17 March 006 Microarray data analysis John Gustafsson Mathematical statistics Chalmers Lectures DNA microarray technology overview (KS) of microarray data (JG) How
Extracting Biological Information from Gene Lists Simon Andrews, Laura Biggins, Boo Virk email@example.com firstname.lastname@example.org email@example.com v1.0 Biological material Sample
Spearman s correlation Introduction Before learning about Spearman s correllation it is important to understand Pearson s correlation which is a statistical measure of the strength of a linear relationship
Step-by-Step Guide to Basic Expression Analysis and Normalization Page 1 Introduction This document shows you how to perform a basic analysis and normalization of your data. A full review of this document
1 Biomedicine The background The main interest? Bioinformatics Clinical informatics The tools 2 Outline 3 Outline 4 Working on Network Data Analysis HH RR Infrastructure Training BIOCOMPUTATION & STRUCTURAL
Let s explore SAS Proc T-Test Ana Yankovsky Research Statistical Analyst Screening Programs, AHS Ana.Yankovsky@albertahealthservices.ca Goals of the presentation: 1. Look at the structure of Proc TTEST;
t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. firstname.lastname@example.org www.excelmasterseries.com
Exiqon Array Software Manual Quick guide to data extraction from mircury LNA microrna Arrays March 2010 Table of contents Introduction Overview...................................................... 3 ImaGene
A Streamlined Workflow for Untargeted Metabolomics Employing XCMS plus, a Simultaneous Data Processing and Metabolite Identification Software Package for Rapid Untargeted Metabolite Screening Baljit K.
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments - Introduction
Basic Data Analysis Using JMP in Windows Table of Contents: I. Getting Started with JMP II. Entering Data in JMP III. Saving JMP Data file IV. Opening an Existing Data File V. Transforming and Manipulating
Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
1 ED632G: Research/Applied Educational Psychology This tutorial is designed to help ED632G students have a better understanding on how to run a general pre-test vs. posttest or improvement over semesters
How to choose a statistical test Francisco J. Candido dos Reis DGO-FMRP University of São Paulo Choosing the right test One of the most common queries in stats support is Which analysis should I use There
THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i
Chapter 610 Multiple One-Sample or Paired T-Tests Introduction This chapter describes how to estimate power and sample size (number of arrays) for paired and one sample highthroughput studies using the.
Lecture 11 Data storage and LIMS solutions Stéphane LE CROM email@example.com Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog
HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended
One-Sample t-test Example 1: Mortgage Process Time Problem A faster loan processing time produces higher productivity and greater customer satisfaction. A financial services institution wants to establish
Medical Information Management & Mining You Chen Jan,15, 2013 You.firstname.lastname@example.org 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
Politecnico di Milano School of Information Engineering Laurea Magistrale in Information Engineering Course Bioinformatics and Computational Biology for Medicine Exercises about DAVID (Database for Annotation,
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
Data Visualization Best Practices Guide 1 Data Visualization Best Practices Why visualize data? Page 3 #1 Choose the right chart type Tell the story in your data Page 4-11 #2 #3 #4 Format style Make your
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application
Chapter 2 Probability Topics SPSS T tests Data file used: gss.sav In the lecture about chapter 2, only the One-Sample T test has been explained. In this handout, we also give the SPSS methods to perform
Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
EXCEL Analysis TookPak [Statistical Analysis] 1 First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it: a. From the Tools menu, choose Add-Ins b. Make sure Analysis
Microarray Data Mining: Dealing with Challenges Gregory Piatetsky-Shapiro KDnuggets 2005 KDnuggets Dec 8, 2005 DNA and Gene Expression Cell Nucleus Chromosome Gene expression Protein Gene (mrna), single
g08 Nonparametric Statistics Introduction g08 NAG C Library Chapter Introduction g08 Nonparametric Statistics Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Parametric and Nonparametric
1 Statistical Inference and t-tests Objectives Evaluate the difference between a sample mean and a target value using a one-sample t-test. Evaluate the difference between a sample mean and a target value
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Chapter 7 Appendix Inference for Distributions with Excel, JMP, Minitab, SPSS, CrunchIt!, R, and TI-83/-84 Calculators Inference for the Mean of a Population Excel t Confidence Interval for Mean Confidence
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
NPTEL Course On STRUCTURAL RELIABILITY Module # 02 Lecture 6 Course Format: Web Instructor: Dr. Arunasis Chakraborty Department of Civil Engineering Indian Institute of Technology Guwahati 6. Lecture 06:
BIOSTATISTICS QUIZ ANSWERS 1. When you read scientific literature, do you know whether the statistical tests that were used were appropriate and why they were used? a. Always b. Mostly c. Rarely d. Never
Guide for SPSS for Windows Index Table Open an existing data file Open a new data sheet Enter or change data value Name a variable Label variables and data values Enter a categorical data Delete a record
3.6: General Hypothesis Tests The χ 2 goodness of fit tests which we introduced in the previous section were an example of a hypothesis test. In this section we now consider hypothesis tests more generally.
Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray
Introduction to Exploratory Data Analysis A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,
An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Minitab Guide This packet contains: A Friendly Guide to Minitab An introduction to Minitab; including basic Minitab functions, how to create sets of data, and how to create and edit graphs of different
SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list
Testing Random- Number Generators Raj Jain Washington University Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse574-08/