Time series experiments

Similar documents
Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -

How to report the percentage of explained common variance in exploratory factor analysis

How To Cluster

Tutorial for proteome data analysis using the Perseus software platform

Statistical issues in the analysis of microarray data

Alignment and Preprocessing for Data Analysis

Factors for success in big data science

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Exploratory data analysis for microarray data

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays

JustClust User Manual

MultiExperiment Viewer Quickstart Guide

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

They can be obtained in HQJHQH format directly from the home page at:

Introduction to data analysis: Supervised analysis

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Gene expression analysis. Ulf Leser and Karin Zimmermann

An Introduction to Microarray Data Analysis

Comparing Methods for Identifying Transcription Factor Target Genes

Supervised Feature Selection & Unsupervised Dimensionality Reduction

How To Run Statistical Tests in Excel

Design of LDPC codes

Characterizing Task Usage Shapes in Google s Compute Clusters

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Component Ordering in Independent Component Analysis Based on Data Power

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Exploratory data analysis approaches unsupervised approaches. Steven Kiddle With thanks to Richard Dobson and Emanuele de Rinaldis

MarkerView Software for Metabolomic and Biomarker Profiling Analysis

Solving Simultaneous Equations and Matrices

Face detection is a process of localizing and extracting the face region from the

Unsupervised and supervised dimension reduction: Algorithms and connections

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Oscilloscope, Function Generator, and Voltage Division

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Big Data & Scripting Part II Streaming Algorithms

Treelets A Tool for Dimensionality Reduction and Multi-Scale Analysis of Unstructured Data

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study

Expression Quantification (I)

Tackling Big Data with Tensor Methods

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

CS171 Visualization. The Visualization Alphabet: Marks and Channels. Alexander Lex [xkcd]

Anomaly detection. Problem motivation. Machine Learning

Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation

! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II

Machine Learning using MapReduce

Data Mining Techniques in CRM

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Feature Selection vs. Extraction

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

New York State Student Learning Objective: Regents Geometry

E6893 Big Data Analytics: Yelp Fake Review Detection

FOREX analysing made easy

Row Quantile Normalisation of Microarrays

Visualization of textual data: unfolding the Kohonen maps.

Excel: Analyze PowerSchool Data

Supervised and unsupervised learning - 1

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

User Manual. Transcriptome Analysis Console (TAC) Software. For Research Use Only. Not for use in diagnostic procedures. P/N Rev.

Basic Principles & Common Practices

Java Modules for Time Series Analysis

Going Big in Data Dimensionality:

Validation and Replication

Gene Expression Analysis

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

KEANSBURG SCHOOL DISTRICT KEANSBURG HIGH SCHOOL Mathematics Department. HSPA 10 Curriculum. September 2007

Using Data Mining for Mobile Communication Clustering and Characterization

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

PanelCheck A Tool for Monitoring of Assessor and Panel Performance

MEASURES OF VARIATION

EXPERIMENT NUMBER 8 CAPACITOR CURRENT-VOLTAGE RELATIONSHIP

How To Identify A Churner

DeCyder Extended Data Analysis (EDA) Software

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

Similarity Search in a Very Large Scale Using Hadoop and HBase

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Grade 6 Mathematics Performance Level Descriptors

Basic Analysis of Microarray Data

Electrical Resonance

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Hierarchical Clustering Analysis

MIC - Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska

Application of Automated Data Collection to Surface-Enhanced Raman Scattering (SERS)

DeCyder Extended Data Analysis module Version 1.0

Hybrid model rating prediction with Linked Open Data for Recommender Systems

MS Data Analysis I: Importing Data into Genespring and Initial Quality Control

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Transcription:

Time series experiments

Time series experiments Why is this a separate lecture: The price of microarrays are decreasing more time series experiments are coming Often a more complex experimental design Many time points makes this a large experiment The analysis method is often specified by what you are looking for

Time series experiments Why consider doing a time series experiment: Biology is dynamic The genes can change their expression in different ways during time Observe cascades and secondary effects

Time series experiments Expression level Peaks 0.0 0.5 1.0 1.5 2.0 895 900 905 time

Time series experiments Level change

Experimental design (Ex.d) Often larger and complex What to consider in an experimental design of a time series study: Time points Distance between time points Replicates Control Technology

Ex.d.: Time points and the distance between them How to choose: What is already known about the system/genes Theoretical and experimental knowledge Early /late response The direction of the response Will changes in some genes influence others What is your field of interest

Ex.d.: Replicates How many replicates to use: Many time points but few replicates Should be enough to do the analysis you want!

Ex.d.: Control What to use as a control and how many controls? One control point or one time series control Differences between treatments The other time points as a control

One control Ex.d.: Control Use one time point or a untreated sample as control Ok if no other time effects are expected, such as growth, phase of expression A untreated time series as control Same time points, technology, sample source and same handling except the treatment Find genes that change due to treatment over time, filter out some of the effects due to handling and time

Ex.d.: Control T1 T2 T3 T4 Tc1 Tc2 Tc3 Tc4 Vertically: Differences due treatment, but you reduce the effect of the handling Horizontally: Differences due to treatment, time and some handling effect

Ex.d.: Control Differences between treatments compare two or more time series Have the same: Time points Conditions Sample source Technology Handling

Ex.d.: Control The other time points as a control: Changes due to treatment, time but it does not exclude the effect of the handling Less arrays used See short time effects and long time effects T1 T2 T3 T4 Ok if no other time effects are expected, such as growth, phase of expression

Ex.d.: Technology What technology to choose Homemade/commercial One channel Two channel: Reference design is recommended What comparisons are to be done Budget

Experimental design Often larger and complex What to consider in an experimental design of a time series study: Time points Distance between time points Replicates Control Technology

Pre processing The same steps as for non time series microarray experiments, but not always the same algorithm or the same use of the algorithms: QC Filtering Normalization...

Analysis of time series data Clustering T test between time points PCA PLS Cyclic Profile...

Analysis: Clustering Filter the data set before clustering Cluster the genes Use correlation as a distance measure (as a starting point) There exist no right number of clusters a priori knowledge is needed

Analysis: Clustering

Analysis: T test One time point vs. another time point You have 3 time points: T1, T2 and T3 What should you compare?

Analysis: T test Early response: Late response: (T1 T2)(T2 T3) T1 T2 T2 T3 In common Not in common

Analysis: T test The smaller changes: T1 vs. T3 T1 g1 g3 T2 g2 g3 T3 T1 g3 T3

Analysis: T test If you have a large time series, it can be time consuming to do this Hard to select the ones to compare Solution: Compare all consecutive pairs of time points and then make a profile based on this Other types of analysis: PCA, PLS,

PCA Projecting the genes or samples into (a low dimensionality) space that displays most of the variance in the (high dimensionality) data i.e. PC1 and PC2

Partial Least Square (PLS) Using external information to guide the projection Instead of projecting the genes into a space that shows greatest variance, the genes are projected into a space that will help identify genes with particular characteristics, e.g. as below or cyclic genes

Example: Cycling genes Expectation of cyclic genes One top and one bottom within one cycle Time between top and bottom is half a cycle Sine and cosine are known shapes that have these characteristics Use sine and cosine to search for genes that are correlated with these shapes The combination of the two allows us to identify all phase shifts of these shapes

J Express: Cellcycle

GSEA with continuous profiles The genes are ranked based on their correlation to an interesting (continuous) profile After ranking the genes, looking for genesets that are overrepresented towards the top of the ranked list is the same as for differential expression

GSEA with continuous profiles Use a gene profile from the dataset as the search profile Create a profile Ascending or decending profile Peak profile

Summary Time series experiments are now affordable and within reach for more groups. Time series experiments are more complex Time series experiments demand more planning

Acknowledgement Presentation made in collaboration with Ingrid Østensen (former NMC Oslo).