Statistical Analysis for Microarray & Next-Generation Sequencing Studies
|
|
- Jodie Oliver
- 7 years ago
- Views:
Transcription
1 Statistical Analysis for Microarray & Next-Generation Sequencing Studies Using the right tools and interpreting the results Jonathan Gerstenhaber Field Application Specialist Partek Inc.
2 Who is Partek? Founded in 1993 Based in St. Louis, MO USA Focused on Genomics Thousands of customers worldwide Building tools for both biologists and bioinformaticians 2 Copyright Partek Inc
3 What is Partek Genomics Suite? Desktop software - no server required Supports multiple assays Supports all assay providers Enables Integrated Genomics Advanced Statistics Rapid Development Focus on Technical Support Competitively priced 3 Copyright Partek Inc
4 ONE Software Any Platform Partek Genomics Suite RT- PCR 4 Copyright Partek Inc
5 ONE Software for Any Assay Partek Genomics Suite Gene Expression RNA-seq srna-seq Exon ChIP-Seq chip-chip & methylation mirna Copy Number DNA-seq 5 Copyright Partek Inc
6 Linear Workflows Help Users through Analysis Import QA / QC Analysis Visualization Biological Interpretation Integrated Genomics Additional Menu options 6 Copyright Partek Inc
7 Integrated Genomics Genome Copy Number + AsCN Loss of Heterozygocity Cytogenetics Association DNA-seq Transcriptome Gene Expression Exon/Alternative Splicing DGE & mrna-seq Taqman RT-PCR Sage Regulation mirna arrays srna-seq Tiling arrays ChIP-seq MeDIP-seq 7 Copyright Partek Inc
8 Three major goals today 1. How to analyze data in the most powerful way possible 2. What to do when your experimental design is not balanced 3. How to interpret your results to give you the most powerful answers 8 Copyright Partek Inc
9 Build complex models to explain experiments Larger experiments can be analyzed more powerfully when all experimental variables are taken into account This study of Downs Syndrome has very complex behaviors the necessitate powerful analytical methods 9 Copyright Partek Inc
10 Evolving models: colon cancer vs. normal T-test One factor comparison Tumor vs Normal. Data does not appear especially significant Paired T-test Two factors comparing Tumor vs Normal while controlling for patient-patient differences 3-Way ANOVA Compares Tumor vs Normal just as a t-test Controls for patient-patient and male-female differences Allows us to find new concepts: Does colon cancer affect men and women the same way? 10 Copyright Partek Inc
11 Significant interaction 11 Copyright Partek Inc
12 RNA sequencing analysis is just as easy in Partek RNA Seq workflow will take you through import and transcript abundance estimation The resulting estimates can be analyzed using ANOVA for complex designs or to remove potential batch effects like lane or flow channel Fit data to known transcripts 12 Copyright Partek Inc
13 Batch effects Large experiments can become more complex due to batch or other nuisance variables 13 Copyright Partek Inc
14 Remove noise & highlight biology: Trefoil Factor 1 Since the treatments were perfectly balanced with the batches, the batch can be can be completely removed from the data With a simple 2-way ANOVA, this gene was #228 on the gene list and would not pass multiple test correction for significance. With a 3-way ANOVA including batch, it was #2 on the gene list. Factor 2-way ANOVA 3-way ANOVA Treatment E-07 Time Treatment*Time E Copyright Partek Inc
15 Batch effect remover Appreciating that the ANOVA is successfully partitioning all our factors apart doesn t make for better images straight away If you want to see your data the way ANOVA sees your data, use the Batch Effect Remover 15 Copyright Partek Inc
16 Batch effect remover Batch effect remover does not make the data any better. Analyzing the removed data yields the same results Even fold change will not be altered Batch effect data should not be used as input into other types of analysis as it has already been fit to the ANOVA model Original Results Batch Corrected Results 16 Copyright Partek Inc
17 Model building: Am I making it better, or bigger? Factors should be more significant than error They should help explain more of the error from the pie 17 Copyright Partek Inc
18 Model fitness and significance ANOVA models, like lines fit in Excel, have a significance and fitness! This is a less graphical method best suited when you know have a particular gene of interest that you are looking to optimize Analysis of Variance Source DF Sum of Squares Excerpt from ANOVA report Mean Square F p-value Model Error C Total Copyright Partek Inc
19 Contrast vs T-test: power Contrasts allow specific comparisons to be made between groups in the ANOVA without filtering data and running a T-Test This allows us to have more degrees of freedom When small experiments are run our analysis is limited because we have a poor variance estimate A contrast allows us to calculate variance from all groups, even when comparing only two of them! 19 Copyright Partek Inc
20 Contrast vs T-test: new comparisons T test view ANOVA view 20 Copyright Partek Inc
21 Assumptions of ANOVA 1. Sample groups are independent Build the best models possible to describe sample relations 2. Variance is equal within different treatment groups Designing balanced experiments with similar numbers of treated samples and control sample will keep variance similar between groups 3. Data is Normally distributed (bell shaped) within different treatment groups For array data, this is one major reason we log-transform the data 21 Copyright Partek Inc
22 Imbalance: REML Default ANOVA in Partek is called Method of Moments It is especially fast and works very well on balance designs Experiments that are very unbalanced can actually become effectively underpowered 2-way MoM ANOVA 2-way MoM ANOVA (excluding non paired samples) 2-way REML ANOVA 22 Copyright Partek Inc
23 Imbalance: REML REML is designed to handle data that is imbalanced or incomplete In some of these cases, Method of Moments will be unable to present an unbiased result and Partek will output? within the results. Switching to REML will remedy the situation, but will also remove the p-values for random effects 23 Copyright Partek Inc
24 Imbalance: Welch s When data is balanced between group (# of controls = # of treated) variance can be within 3x without problems when using an ANOVA. When groups are of very different sizes, equal variance can become of concern This is why it is ideal that you design balanced experiments Available from the Stat menu, Welch s ANOVA allows the comparison of multiple groups when variance is unequal Unfortunately, it is limited to a single factor at a time 24 Copyright Partek Inc
25 Parametric tests versus nonparametric Parametric Test T-test ANOVA Non Parametric Test Mann-Whitney Kruskal Wallis Repeated Measures ANOVA Friedman Parametric tests assume a normal distribution, but yield more powerful results It is best to normalize data to make it normal. Nonparametric tests can operate even when the data is of unknown distribution so long as the shape is the same for all samples. # of samples Minimum P-value You need many samples to get significance Significance is independent of degree of change! This can lead to increased false discovery especially in small experiments 25 Copyright Partek Inc
26 Power analysis to appreciate our experiment Can give the effective dynamic range of an experiment Blue line represents current experiment (colon cancer vs. normal) With 20 samples, we detect 90% of genes that changed 1.8 fold as statistically significant but only 10% of the genes that changed 1.1 fold 26 Copyright Partek Inc
27 Power analysis for pilot studies What if I only had started with 2 patients? (2 colon cancer + 2 normal samples) Determine ideal experiment size to detect genes of a specific fold change What if I needed to detect genes changed only 75%, or 1.75 fold To significantly detect 90% of the genes that changed 1.75 fold I will need 20 samples 27 Copyright Partek Inc
28 What is a P-value and what is FDR Comparison Treated vs Control P-value 0.01: 1% chance that this data can appear if Treatment=Control. P-value 0.99: 99% chance that treatment=control P-value 0.2: 20% chance that treatment=control Lack of low p-values does not indicate groups are necessarily equal What about FDR? FDR is a measure of the false positive potential of a gene list FDR is not a correction of an individual gene s level of significance, rather it helps us to keep in mind the whole picture. A coin which lands heads up 5 times in a row has a p-value of 0.03 Yet, if I was to give you 1000 coins to flip 5 times, you would expect it to happen over 30 times! Here is where FDR helps. A gene list with an FDR of 0.05 has 5% predicted false positives. 28 Copyright Partek Inc
29 What is FDR: Storey s q-value On left: random data is compared with a t-test. On right: colon cancer and normal tissue are compared. A histogram is generated showing how many genes there are at different significance levels in the dataset. Random data Appears equally as significant or non significant Significant Data Significant genes will lead to an increased number of genes at low p-values 29 Copyright Partek Inc
30 What is FDR: Storey s q-value On left: random data is compared with a t-test. On right: colon cancer and normal tissue are compared. A histogram is generated showing how many genes there are at different significance levels in the dataset. Random data Appears equally as significant or non significant Significant Data Significant genes will lead to an increased number of genes at low p-values Storey s q-value: Non significant genes tell us this number of genes likely falsely discovered. In this case, 10% of genes at a p-value of Copyright Partek Inc
31 What is FDR: Storey s q-value On left: random data is compared with a t-test. On right: Down Syndrome and Normal individuals are compared. A histogram is generated showing how many genes there are at different significance levels in the dataset. Random data Appears equally as significant or non significant Significant Data Significant genes will lead to an increased number of genes at low p-values Storey s q-value: The non significant genes tell us this number of genes is likely falsely discovered 31 Copyright Partek Inc
32 FDR in ChIP sequencing analysis Split the entire genome into 100bp windows Fit a distribution to the number of reads randomly occurring in each Use this distribution to estimate background significance and false discovery and determine an FDR cutoff when detecting binding events Storey s q-value ChIP-Seq FDR ZTNB (Background Brinding) Detected region reads 32 Copyright Partek Inc
33 FDR and downstream analysis FDR at the gene level can be useful when looking for biomarkers and false positives are particularly worrisome, but when looking downstream excessive filtering after ANOVA can be harmful Cutoff Value FDR on Genome FDR FDR # of Significant p-values FDR on Chromosome 21 Cutoff Value FDR FDR # of Significant p-values 33 Copyright Partek Inc
34 Downstream analysis can filter out false positives Positional Enrichment of gene passing FDR 0.05 Cytoband Enrichment Score Enrichment p-value chr21q E-05 chr21q E-05 chr2p chr17q Positional Enrichment of gene passing P value 0.01 Cytoband Enrichment Score Enrichment p-value chr21q E-16 chrxq E-05 chr15q E-05 chr4q chr21q chr14q chr21q All but one of the down syndrome critical region genes are located on Chr21q22 34 Copyright Partek Inc
35 Model selection So far we looked for genes that are altered by a disease with ANOVA But did not answer the question, can these altered genes predict disease state? A tutorial is available from the Tutorials page 35 Copyright Partek Inc
36 Partek model selection: Part 1 Choose methods to find the best genes 36 Copyright Partek Inc
37 Partek model selection: Part 2 Choose how to classify samples 37 Copyright Partek Inc
38 Partek model selection: Part 3 And detect significance of the model using cross validation If I had different samples, would I find different lead genes? If I had different samples, would I find a different best model? 38 Copyright Partek Inc
39 Quick example of colon cancer prediction ANOVA was used to rank the genes by significance 50 genes passing FDR of 0.02 (2 genes passed 0.01) on the left, we see decent separation between groups. While differential, we are not sure that these genes are diagnostic. Instead of FDR, predictability was used to choose the lead set on the right. 15 genes were chosen, and they are estimated to predict correctly tumorogenesis 86.25% of the time 39 Copyright Partek Inc
40 Partek GS A Complete Solution Any genomic assay Microarray & Next-Gen Seq Any platform Advanced Statistical Analysis Biological Interpretation Integrated Genomics Competitive price Join us online Get your FREE trial today! FREE Data Analysis Webinars 40 Copyright Partek Inc
Frequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
More informationModel Selection. Introduction. Model Selection
Model Selection Introduction This user guide provides information about the Partek Model Selection tool. Topics covered include using a Down syndrome data set to demonstrate the usage of the Partek Model
More informationPartek Methylation User Guide
Partek Methylation User Guide Introduction This user guide will explain the different types of workflow that can be used to analyze methylation datasets. Under the Partek Methylation workflow there are
More information8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)
Experimental Design & Intro to NGS Data Analysis Ryan Peters Field Application Specialist Partek, Incorporated Agenda Experimental Design Examples ANOVA What assays are possible? NGS Analytical Process
More informationGene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands
More informationTwo-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...
Two-Way ANOVA tests Contents at a glance I. Definition and Applications...2 II. Two-Way ANOVA prerequisites...2 III. How to use the Two-Way ANOVA tool?...3 A. Parametric test, assume variances equal....4
More informationAnalyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study
Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study The data for this study is taken from experiment GSE848 from the Gene Expression
More informationShouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center
Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing
More informationAnalyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,
More informationPREDA S4-classes. Francesco Ferrari October 13, 2015
PREDA S4-classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.
More informationG E N OM I C S S E RV I C ES
GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationAnalysis of Illumina Gene Expression Microarray Data
Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray
More informationNew Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.
New Technologies for Sensitive, Low-Input RNA-Seq Clontech Laboratories, Inc. Outline Introduction Single-Cell-Capable mrna-seq Using SMART Technology SMARTer Ultra Low RNA Kit for the Fluidigm C 1 System
More informationSCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
More informationPost-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.
Two-way ANOVA, II Post-hoc comparisons & two-way analysis of variance 9.7 4/9/4 Post-hoc testing As before, you can perform post-hoc tests whenever there s a significant F But don t bother if it s a main
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationOverview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS
Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical
More informationNext Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013
Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine
More informationTHE KRUSKAL WALLLIS TEST
THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON
More informationMEASURES OF LOCATION AND SPREAD
Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the
More informationStatistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant
Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting
More informationComparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationFrom Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data
From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data experimental design data collection modeling statistical testing biological heterogeneity
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationStatCrunch and Nonparametric Statistics
StatCrunch and Nonparametric Statistics You can use StatCrunch to calculate the values of nonparametric statistics. It may not be obvious how to enter the data in StatCrunch for various data sets that
More informationAP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
More informationCourse on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -
Course on Functional Analysis ::: Madrid, June 31st, 2007. Gonzalo Gómez, PhD. ggomez@cnio.es Bioinformatics Unit CNIO ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA
More informationUsing Illumina BaseSpace Apps to Analyze RNA Sequencing Data
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless
More informationHYPOTHESIS TESTING WITH SPSS:
HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER
More informationUnderstanding West Nile Virus Infection
Understanding West Nile Virus Infection The QIAGEN Bioinformatics Solution: Biomedical Genomics Workbench (BXWB) + Ingenuity Pathway Analysis (IPA) Functional Genomics & Predictive Medicine, May 21-22,
More informationStatistical tests for SPSS
Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly
More informationBasic processing of next-generation sequencing (NGS) data
Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
More informationOne-Way Analysis of Variance (ANOVA) Example Problem
One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means
More informationProjects Involving Statistics (& SPSS)
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
More informationHierarchical Clustering Analysis
Hierarchical Clustering Analysis What is Hierarchical Clustering? Hierarchical clustering is used to group similar objects into clusters. In the beginning, each row and/or column is considered a cluster.
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationLAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
More informationt Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com
More information1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationComparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
More informationBig Data Visualization for Genomics. Luca Vezzadini Kairos3D
Big Data Visualization for Genomics Luca Vezzadini Kairos3D Why GenomeCruzer? The amount of data for DNA sequencing is growing Modern hardware produces billions of values per sample Scientists need to
More informationAnalysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics
Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Christopher Benner, PhD Director, Integrative Genomics and Bioinformatics Core (IGC) idash Webinar,
More informationA Streamlined Workflow for Untargeted Metabolomics
A Streamlined Workflow for Untargeted Metabolomics Employing XCMS plus, a Simultaneous Data Processing and Metabolite Identification Software Package for Rapid Untargeted Metabolite Screening Baljit K.
More informationResearch Methods & Experimental Design
Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and
More informationChapter 7 Section 7.1: Inference for the Mean of a Population
Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used
More informationINTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationQUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationTesting for differences I exercises with SPSS
Testing for differences I exercises with SPSS Introduction The exercises presented here are all about the t-test and its non-parametric equivalents in their various forms. In SPSS, all these tests can
More informationUNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
More informationAnalysis of Variance ANOVA
Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.
More informationStep-by-Step Guide to Basic Expression Analysis and Normalization
Step-by-Step Guide to Basic Expression Analysis and Normalization Page 1 Introduction This document shows you how to perform a basic analysis and normalization of your data. A full review of this document
More informationANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.
ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall
More informationComparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis
Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) Department of Biostatistics &
More informationThe Kruskal-Wallis test:
Graham Hole Research Skills Kruskal-Wallis handout, version 1.0, page 1 The Kruskal-Wallis test: This test is appropriate for use under the following circumstances: (a) you have three or more conditions
More informationABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
More informationThat s Not Fair! ASSESSMENT #HSMA20. Benchmark Grades: 9-12
That s Not Fair! ASSESSMENT # Benchmark Grades: 9-12 Summary: Students consider the difference between fair and unfair games, using probability to analyze games. The probability will be used to find ways
More informationIntroduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director
Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director Gene expression depends upon multiple factors Gene Transcription
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationAnalysis of Data. Organizing Data Files in SPSS. Descriptive Statistics
Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to
More informationInSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis
InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis WHITE PAPER By InSyBio Ltd Konstantinos Theofilatos Bioinformatician, PhD InSyBio Technical Sales Manager August
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationPermutation & Non-Parametric Tests
Permutation & Non-Parametric Tests Statistical tests Gather data to assess some hypothesis (e.g., does this treatment have an effect on this outcome?) Form a test statistic for which large values indicate
More informationTargeted. sequencing solutions. Accurate, scalable, fast TARGETED
Targeted TARGETED Sequencing sequencing solutions Accurate, scalable, fast Sequencing for every lab, every budget, every application Ion Torrent semiconductor sequencing Ion Torrent technology has pioneered
More informationCluster software and Java TreeView
Cluster software and Java TreeView To download the software: http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/treeview.html Cluster 3.0
More informationDiscovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat
Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat 1 RNA Transcription to RNA and subsequent
More informationRank-Based Non-Parametric Tests
Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs
More informationSPSS Explore procedure
SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationRT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial
RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial Samuel J. Rulli, Jr., Ph.D. qpcr-applications Scientist Samuel.Rulli@QIAGEN.com Pathway Focused Research from Sample Prep to Data Analysis! -2-
More informationOverview of Next Generation Sequencing platform technologies
Overview of Next Generation Sequencing platform technologies Dr. Bernd Timmermann Next Generation Sequencing Core Facility Max Planck Institute for Molecular Genetics Berlin, Germany Outline 1. Technologies
More informationMicroarray Data Analysis. A step by step analysis using BRB-Array Tools
Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.
More informationParametric and non-parametric statistical methods for the life sciences - Session I
Why nonparametric methods What test to use? Rank Tests Parametric and non-parametric statistical methods for the life sciences - Session I Liesbeth Bruckers Geert Molenberghs Interuniversity Institute
More informationMolecular Genetics: Challenges for Statistical Practice. J.K. Lindsey
Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray
More informationGene Enrichment Analysis
a Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 14a: January 21, 2010 Lecturer: Ron Shamir Scribe: Roye Rozov Gene Enrichment Analysis 14.1 Introduction This lecture introduces
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationA Statistician s View of Big Data
A Statistician s View of Big Data Max Kuhn, Ph.D (Pfizer Global R&D, Groton, CT) Kjell Johnson, Ph.D (Arbor Analytics, Ann Arbor MI) What Does Big Data Mean? The advantages and issues related to Big Data
More informationGC3 Use cases for the Cloud
GC3: Grid Computing Competence Center GC3 Use cases for the Cloud Some real world examples suited for cloud systems Antonio Messina Trieste, 24.10.2013 Who am I System Architect
More informationCore Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
More informationStatistical analysis of modern sequencing data quality control, modelling and interpretation
Statistical analysis of modern sequencing data quality control, modelling and interpretation Jörg Rahnenführer Technische Universität Dortmund, Fakultät Statistik Email: rahnenfuehrer@statistik.tu-.de
More informationTwo-Sample T-Tests Allowing Unequal Variance (Enter Difference)
Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption
More informationRNA-seq. Quantification and Differential Expression. Genomics: Lecture #12
(2) Quantification and Differential Expression Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #12 Today (2) Gene Expression per Sources of bias,
More informationStructural Health Monitoring Tools (SHMTools)
Structural Health Monitoring Tools (SHMTools) Getting Started LANL/UCSD Engineering Institute LA-CC-14-046 c Copyright 2014, Los Alamos National Security, LLC All rights reserved. May 30, 2014 Contents
More informationPackage empiricalfdr.deseq2
Type Package Package empiricalfdr.deseq2 May 27, 2015 Title Simulation-Based False Discovery Rate in RNA-Seq Version 1.0.3 Date 2015-05-26 Author Mikhail V. Matz Maintainer Mikhail V. Matz
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
More informationUNIVERSITY OF NAIROBI
UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER
More informationGene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a
More informationSkewed Data and Non-parametric Methods
0 2 4 6 8 10 12 14 Skewed Data and Non-parametric Methods Comparing two groups: t-test assumes data are: 1. Normally distributed, and 2. both samples have the same SD (i.e. one sample is simply shifted
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More informationCHAPTER 14 NONPARAMETRIC TESTS
CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences
More informationBBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More information