Statistical Analysis for Microarray & Next-Generation Sequencing Studies

Size: px
Start display at page:

Download "Statistical Analysis for Microarray & Next-Generation Sequencing Studies"

Transcription

1 Statistical Analysis for Microarray & Next-Generation Sequencing Studies Using the right tools and interpreting the results Jonathan Gerstenhaber Field Application Specialist Partek Inc.

2 Who is Partek? Founded in 1993 Based in St. Louis, MO USA Focused on Genomics Thousands of customers worldwide Building tools for both biologists and bioinformaticians 2 Copyright Partek Inc

3 What is Partek Genomics Suite? Desktop software - no server required Supports multiple assays Supports all assay providers Enables Integrated Genomics Advanced Statistics Rapid Development Focus on Technical Support Competitively priced 3 Copyright Partek Inc

4 ONE Software Any Platform Partek Genomics Suite RT- PCR 4 Copyright Partek Inc

5 ONE Software for Any Assay Partek Genomics Suite Gene Expression RNA-seq srna-seq Exon ChIP-Seq chip-chip & methylation mirna Copy Number DNA-seq 5 Copyright Partek Inc

6 Linear Workflows Help Users through Analysis Import QA / QC Analysis Visualization Biological Interpretation Integrated Genomics Additional Menu options 6 Copyright Partek Inc

7 Integrated Genomics Genome Copy Number + AsCN Loss of Heterozygocity Cytogenetics Association DNA-seq Transcriptome Gene Expression Exon/Alternative Splicing DGE & mrna-seq Taqman RT-PCR Sage Regulation mirna arrays srna-seq Tiling arrays ChIP-seq MeDIP-seq 7 Copyright Partek Inc

8 Three major goals today 1. How to analyze data in the most powerful way possible 2. What to do when your experimental design is not balanced 3. How to interpret your results to give you the most powerful answers 8 Copyright Partek Inc

9 Build complex models to explain experiments Larger experiments can be analyzed more powerfully when all experimental variables are taken into account This study of Downs Syndrome has very complex behaviors the necessitate powerful analytical methods 9 Copyright Partek Inc

10 Evolving models: colon cancer vs. normal T-test One factor comparison Tumor vs Normal. Data does not appear especially significant Paired T-test Two factors comparing Tumor vs Normal while controlling for patient-patient differences 3-Way ANOVA Compares Tumor vs Normal just as a t-test Controls for patient-patient and male-female differences Allows us to find new concepts: Does colon cancer affect men and women the same way? 10 Copyright Partek Inc

11 Significant interaction 11 Copyright Partek Inc

12 RNA sequencing analysis is just as easy in Partek RNA Seq workflow will take you through import and transcript abundance estimation The resulting estimates can be analyzed using ANOVA for complex designs or to remove potential batch effects like lane or flow channel Fit data to known transcripts 12 Copyright Partek Inc

13 Batch effects Large experiments can become more complex due to batch or other nuisance variables 13 Copyright Partek Inc

14 Remove noise & highlight biology: Trefoil Factor 1 Since the treatments were perfectly balanced with the batches, the batch can be can be completely removed from the data With a simple 2-way ANOVA, this gene was #228 on the gene list and would not pass multiple test correction for significance. With a 3-way ANOVA including batch, it was #2 on the gene list. Factor 2-way ANOVA 3-way ANOVA Treatment E-07 Time Treatment*Time E Copyright Partek Inc

15 Batch effect remover Appreciating that the ANOVA is successfully partitioning all our factors apart doesn t make for better images straight away If you want to see your data the way ANOVA sees your data, use the Batch Effect Remover 15 Copyright Partek Inc

16 Batch effect remover Batch effect remover does not make the data any better. Analyzing the removed data yields the same results Even fold change will not be altered Batch effect data should not be used as input into other types of analysis as it has already been fit to the ANOVA model Original Results Batch Corrected Results 16 Copyright Partek Inc

17 Model building: Am I making it better, or bigger? Factors should be more significant than error They should help explain more of the error from the pie 17 Copyright Partek Inc

18 Model fitness and significance ANOVA models, like lines fit in Excel, have a significance and fitness! This is a less graphical method best suited when you know have a particular gene of interest that you are looking to optimize Analysis of Variance Source DF Sum of Squares Excerpt from ANOVA report Mean Square F p-value Model Error C Total Copyright Partek Inc

19 Contrast vs T-test: power Contrasts allow specific comparisons to be made between groups in the ANOVA without filtering data and running a T-Test This allows us to have more degrees of freedom When small experiments are run our analysis is limited because we have a poor variance estimate A contrast allows us to calculate variance from all groups, even when comparing only two of them! 19 Copyright Partek Inc

20 Contrast vs T-test: new comparisons T test view ANOVA view 20 Copyright Partek Inc

21 Assumptions of ANOVA 1. Sample groups are independent Build the best models possible to describe sample relations 2. Variance is equal within different treatment groups Designing balanced experiments with similar numbers of treated samples and control sample will keep variance similar between groups 3. Data is Normally distributed (bell shaped) within different treatment groups For array data, this is one major reason we log-transform the data 21 Copyright Partek Inc

22 Imbalance: REML Default ANOVA in Partek is called Method of Moments It is especially fast and works very well on balance designs Experiments that are very unbalanced can actually become effectively underpowered 2-way MoM ANOVA 2-way MoM ANOVA (excluding non paired samples) 2-way REML ANOVA 22 Copyright Partek Inc

23 Imbalance: REML REML is designed to handle data that is imbalanced or incomplete In some of these cases, Method of Moments will be unable to present an unbiased result and Partek will output? within the results. Switching to REML will remedy the situation, but will also remove the p-values for random effects 23 Copyright Partek Inc

24 Imbalance: Welch s When data is balanced between group (# of controls = # of treated) variance can be within 3x without problems when using an ANOVA. When groups are of very different sizes, equal variance can become of concern This is why it is ideal that you design balanced experiments Available from the Stat menu, Welch s ANOVA allows the comparison of multiple groups when variance is unequal Unfortunately, it is limited to a single factor at a time 24 Copyright Partek Inc

25 Parametric tests versus nonparametric Parametric Test T-test ANOVA Non Parametric Test Mann-Whitney Kruskal Wallis Repeated Measures ANOVA Friedman Parametric tests assume a normal distribution, but yield more powerful results It is best to normalize data to make it normal. Nonparametric tests can operate even when the data is of unknown distribution so long as the shape is the same for all samples. # of samples Minimum P-value You need many samples to get significance Significance is independent of degree of change! This can lead to increased false discovery especially in small experiments 25 Copyright Partek Inc

26 Power analysis to appreciate our experiment Can give the effective dynamic range of an experiment Blue line represents current experiment (colon cancer vs. normal) With 20 samples, we detect 90% of genes that changed 1.8 fold as statistically significant but only 10% of the genes that changed 1.1 fold 26 Copyright Partek Inc

27 Power analysis for pilot studies What if I only had started with 2 patients? (2 colon cancer + 2 normal samples) Determine ideal experiment size to detect genes of a specific fold change What if I needed to detect genes changed only 75%, or 1.75 fold To significantly detect 90% of the genes that changed 1.75 fold I will need 20 samples 27 Copyright Partek Inc

28 What is a P-value and what is FDR Comparison Treated vs Control P-value 0.01: 1% chance that this data can appear if Treatment=Control. P-value 0.99: 99% chance that treatment=control P-value 0.2: 20% chance that treatment=control Lack of low p-values does not indicate groups are necessarily equal What about FDR? FDR is a measure of the false positive potential of a gene list FDR is not a correction of an individual gene s level of significance, rather it helps us to keep in mind the whole picture. A coin which lands heads up 5 times in a row has a p-value of 0.03 Yet, if I was to give you 1000 coins to flip 5 times, you would expect it to happen over 30 times! Here is where FDR helps. A gene list with an FDR of 0.05 has 5% predicted false positives. 28 Copyright Partek Inc

29 What is FDR: Storey s q-value On left: random data is compared with a t-test. On right: colon cancer and normal tissue are compared. A histogram is generated showing how many genes there are at different significance levels in the dataset. Random data Appears equally as significant or non significant Significant Data Significant genes will lead to an increased number of genes at low p-values 29 Copyright Partek Inc

30 What is FDR: Storey s q-value On left: random data is compared with a t-test. On right: colon cancer and normal tissue are compared. A histogram is generated showing how many genes there are at different significance levels in the dataset. Random data Appears equally as significant or non significant Significant Data Significant genes will lead to an increased number of genes at low p-values Storey s q-value: Non significant genes tell us this number of genes likely falsely discovered. In this case, 10% of genes at a p-value of Copyright Partek Inc

31 What is FDR: Storey s q-value On left: random data is compared with a t-test. On right: Down Syndrome and Normal individuals are compared. A histogram is generated showing how many genes there are at different significance levels in the dataset. Random data Appears equally as significant or non significant Significant Data Significant genes will lead to an increased number of genes at low p-values Storey s q-value: The non significant genes tell us this number of genes is likely falsely discovered 31 Copyright Partek Inc

32 FDR in ChIP sequencing analysis Split the entire genome into 100bp windows Fit a distribution to the number of reads randomly occurring in each Use this distribution to estimate background significance and false discovery and determine an FDR cutoff when detecting binding events Storey s q-value ChIP-Seq FDR ZTNB (Background Brinding) Detected region reads 32 Copyright Partek Inc

33 FDR and downstream analysis FDR at the gene level can be useful when looking for biomarkers and false positives are particularly worrisome, but when looking downstream excessive filtering after ANOVA can be harmful Cutoff Value FDR on Genome FDR FDR # of Significant p-values FDR on Chromosome 21 Cutoff Value FDR FDR # of Significant p-values 33 Copyright Partek Inc

34 Downstream analysis can filter out false positives Positional Enrichment of gene passing FDR 0.05 Cytoband Enrichment Score Enrichment p-value chr21q E-05 chr21q E-05 chr2p chr17q Positional Enrichment of gene passing P value 0.01 Cytoband Enrichment Score Enrichment p-value chr21q E-16 chrxq E-05 chr15q E-05 chr4q chr21q chr14q chr21q All but one of the down syndrome critical region genes are located on Chr21q22 34 Copyright Partek Inc

35 Model selection So far we looked for genes that are altered by a disease with ANOVA But did not answer the question, can these altered genes predict disease state? A tutorial is available from the Tutorials page 35 Copyright Partek Inc

36 Partek model selection: Part 1 Choose methods to find the best genes 36 Copyright Partek Inc

37 Partek model selection: Part 2 Choose how to classify samples 37 Copyright Partek Inc

38 Partek model selection: Part 3 And detect significance of the model using cross validation If I had different samples, would I find different lead genes? If I had different samples, would I find a different best model? 38 Copyright Partek Inc

39 Quick example of colon cancer prediction ANOVA was used to rank the genes by significance 50 genes passing FDR of 0.02 (2 genes passed 0.01) on the left, we see decent separation between groups. While differential, we are not sure that these genes are diagnostic. Instead of FDR, predictability was used to choose the lead set on the right. 15 genes were chosen, and they are estimated to predict correctly tumorogenesis 86.25% of the time 39 Copyright Partek Inc

40 Partek GS A Complete Solution Any genomic assay Microarray & Next-Gen Seq Any platform Advanced Statistical Analysis Biological Interpretation Integrated Genomics Competitive price Join us online Get your FREE trial today! FREE Data Analysis Webinars 40 Copyright Partek Inc

Frequently Asked Questions Next Generation Sequencing

Frequently Asked Questions Next Generation Sequencing Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided

More information

Model Selection. Introduction. Model Selection

Model Selection. Introduction. Model Selection Model Selection Introduction This user guide provides information about the Partek Model Selection tool. Topics covered include using a Down syndrome data set to demonstrate the usage of the Partek Model

More information

Partek Methylation User Guide

Partek Methylation User Guide Partek Methylation User Guide Introduction This user guide will explain the different types of workflow that can be used to analyze methylation datasets. Under the Partek Methylation workflow there are

More information

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)

8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design) Experimental Design & Intro to NGS Data Analysis Ryan Peters Field Application Specialist Partek, Incorporated Agenda Experimental Design Examples ANOVA What assays are possible? NGS Analytical Process

More information

Gene Expression Analysis

Gene Expression Analysis Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands

More information

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?... Two-Way ANOVA tests Contents at a glance I. Definition and Applications...2 II. Two-Way ANOVA prerequisites...2 III. How to use the Two-Way ANOVA tool?...3 A. Parametric test, assume variances equal....4

More information

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study The data for this study is taken from experiment GSE848 from the Gene Expression

More information

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing

More information

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,

More information

PREDA S4-classes. Francesco Ferrari October 13, 2015

PREDA S4-classes. Francesco Ferrari October 13, 2015 PREDA S4-classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.

More information

G E N OM I C S S E RV I C ES

G E N OM I C S S E RV I C ES GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Analysis of Illumina Gene Expression Microarray Data

Analysis of Illumina Gene Expression Microarray Data Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray

More information

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc. New Technologies for Sensitive, Low-Input RNA-Seq Clontech Laboratories, Inc. Outline Introduction Single-Cell-Capable mrna-seq Using SMART Technology SMARTer Ultra Low RNA Kit for the Fluidigm C 1 System

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9. Two-way ANOVA, II Post-hoc comparisons & two-way analysis of variance 9.7 4/9/4 Post-hoc testing As before, you can perform post-hoc tests whenever there s a significant F But don t bother if it s a main

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical

More information

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine

More information

THE KRUSKAL WALLLIS TEST

THE KRUSKAL WALLLIS TEST THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON

More information

MEASURES OF LOCATION AND SPREAD

MEASURES OF LOCATION AND SPREAD Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

More information

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting

More information

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data experimental design data collection modeling statistical testing biological heterogeneity

More information

Statistical issues in the analysis of microarray data

Statistical issues in the analysis of microarray data Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data

More information

StatCrunch and Nonparametric Statistics

StatCrunch and Nonparametric Statistics StatCrunch and Nonparametric Statistics You can use StatCrunch to calculate the values of nonparametric statistics. It may not be obvious how to enter the data in StatCrunch for various data sets that

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA - Course on Functional Analysis ::: Madrid, June 31st, 2007. Gonzalo Gómez, PhD. ggomez@cnio.es Bioinformatics Unit CNIO ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA

More information

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

Understanding West Nile Virus Infection

Understanding West Nile Virus Infection Understanding West Nile Virus Infection The QIAGEN Bioinformatics Solution: Biomedical Genomics Workbench (BXWB) + Ingenuity Pathway Analysis (IPA) Functional Genomics & Predictive Medicine, May 21-22,

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

Basic processing of next-generation sequencing (NGS) data

Basic processing of next-generation sequencing (NGS) data Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance

More information

One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Hierarchical Clustering Analysis

Hierarchical Clustering Analysis Hierarchical Clustering Analysis What is Hierarchical Clustering? Hierarchical clustering is used to group similar objects into clusters. In the beginning, each row and/or column is considered a cluster.

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Big Data Visualization for Genomics. Luca Vezzadini Kairos3D

Big Data Visualization for Genomics. Luca Vezzadini Kairos3D Big Data Visualization for Genomics Luca Vezzadini Kairos3D Why GenomeCruzer? The amount of data for DNA sequencing is growing Modern hardware produces billions of values per sample Scientists need to

More information

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Christopher Benner, PhD Director, Integrative Genomics and Bioinformatics Core (IGC) idash Webinar,

More information

A Streamlined Workflow for Untargeted Metabolomics

A Streamlined Workflow for Untargeted Metabolomics A Streamlined Workflow for Untargeted Metabolomics Employing XCMS plus, a Simultaneous Data Processing and Metabolite Identification Software Package for Rapid Untargeted Metabolite Screening Baljit K.

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Testing for differences I exercises with SPSS

Testing for differences I exercises with SPSS Testing for differences I exercises with SPSS Introduction The exercises presented here are all about the t-test and its non-parametric equivalents in their various forms. In SPSS, all these tests can

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

Step-by-Step Guide to Basic Expression Analysis and Normalization

Step-by-Step Guide to Basic Expression Analysis and Normalization Step-by-Step Guide to Basic Expression Analysis and Normalization Page 1 Introduction This document shows you how to perform a basic analysis and normalization of your data. A full review of this document

More information

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

More information

Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis

Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) Department of Biostatistics &

More information

The Kruskal-Wallis test:

The Kruskal-Wallis test: Graham Hole Research Skills Kruskal-Wallis handout, version 1.0, page 1 The Kruskal-Wallis test: This test is appropriate for use under the following circumstances: (a) you have three or more conditions

More information

ABSORBENCY OF PAPER TOWELS

ABSORBENCY OF PAPER TOWELS ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?

More information

That s Not Fair! ASSESSMENT #HSMA20. Benchmark Grades: 9-12

That s Not Fair! ASSESSMENT #HSMA20. Benchmark Grades: 9-12 That s Not Fair! ASSESSMENT # Benchmark Grades: 9-12 Summary: Students consider the difference between fair and unfair games, using probability to analyze games. The probability will be used to find ways

More information

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director Gene expression depends upon multiple factors Gene Transcription

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to

More information

InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis

InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis WHITE PAPER By InSyBio Ltd Konstantinos Theofilatos Bioinformatician, PhD InSyBio Technical Sales Manager August

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Permutation & Non-Parametric Tests

Permutation & Non-Parametric Tests Permutation & Non-Parametric Tests Statistical tests Gather data to assess some hypothesis (e.g., does this treatment have an effect on this outcome?) Form a test statistic for which large values indicate

More information

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED Targeted TARGETED Sequencing sequencing solutions Accurate, scalable, fast Sequencing for every lab, every budget, every application Ion Torrent semiconductor sequencing Ion Torrent technology has pioneered

More information

Cluster software and Java TreeView

Cluster software and Java TreeView Cluster software and Java TreeView To download the software: http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/treeview.html Cluster 3.0

More information

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat 1 RNA Transcription to RNA and subsequent

More information

Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial Samuel J. Rulli, Jr., Ph.D. qpcr-applications Scientist Samuel.Rulli@QIAGEN.com Pathway Focused Research from Sample Prep to Data Analysis! -2-

More information

Overview of Next Generation Sequencing platform technologies

Overview of Next Generation Sequencing platform technologies Overview of Next Generation Sequencing platform technologies Dr. Bernd Timmermann Next Generation Sequencing Core Facility Max Planck Institute for Molecular Genetics Berlin, Germany Outline 1. Technologies

More information

Microarray Data Analysis. A step by step analysis using BRB-Array Tools

Microarray Data Analysis. A step by step analysis using BRB-Array Tools Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.

More information

Parametric and non-parametric statistical methods for the life sciences - Session I

Parametric and non-parametric statistical methods for the life sciences - Session I Why nonparametric methods What test to use? Rank Tests Parametric and non-parametric statistical methods for the life sciences - Session I Liesbeth Bruckers Geert Molenberghs Interuniversity Institute

More information

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray

More information

Gene Enrichment Analysis

Gene Enrichment Analysis a Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 14a: January 21, 2010 Lecturer: Ron Shamir Scribe: Roye Rozov Gene Enrichment Analysis 14.1 Introduction This lecture introduces

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

A Statistician s View of Big Data

A Statistician s View of Big Data A Statistician s View of Big Data Max Kuhn, Ph.D (Pfizer Global R&D, Groton, CT) Kjell Johnson, Ph.D (Arbor Analytics, Ann Arbor MI) What Does Big Data Mean? The advantages and issues related to Big Data

More information

GC3 Use cases for the Cloud

GC3 Use cases for the Cloud GC3: Grid Computing Competence Center GC3 Use cases for the Cloud Some real world examples suited for cloud systems Antonio Messina Trieste, 24.10.2013 Who am I System Architect

More information

Core Facility Genomics

Core Facility Genomics Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray

More information

Statistical analysis of modern sequencing data quality control, modelling and interpretation

Statistical analysis of modern sequencing data quality control, modelling and interpretation Statistical analysis of modern sequencing data quality control, modelling and interpretation Jörg Rahnenführer Technische Universität Dortmund, Fakultät Statistik Email: rahnenfuehrer@statistik.tu-.de

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

RNA-seq. Quantification and Differential Expression. Genomics: Lecture #12

RNA-seq. Quantification and Differential Expression. Genomics: Lecture #12 (2) Quantification and Differential Expression Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #12 Today (2) Gene Expression per Sources of bias,

More information

Structural Health Monitoring Tools (SHMTools)

Structural Health Monitoring Tools (SHMTools) Structural Health Monitoring Tools (SHMTools) Getting Started LANL/UCSD Engineering Institute LA-CC-14-046 c Copyright 2014, Los Alamos National Security, LLC All rights reserved. May 30, 2014 Contents

More information

Package empiricalfdr.deseq2

Package empiricalfdr.deseq2 Type Package Package empiricalfdr.deseq2 May 27, 2015 Title Simulation-Based False Discovery Rate in RNA-Seq Version 1.0.3 Date 2015-05-26 Author Mikhail V. Matz Maintainer Mikhail V. Matz

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

More information

Gene expression analysis. Ulf Leser and Karin Zimmermann

Gene expression analysis. Ulf Leser and Karin Zimmermann Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a

More information

Skewed Data and Non-parametric Methods

Skewed Data and Non-parametric Methods 0 2 4 6 8 10 12 14 Skewed Data and Non-parametric Methods Comparing two groups: t-test assumes data are: 1. Normally distributed, and 2. both samples have the same SD (i.e. one sample is simply shifted

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

More information

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information