It is possible to perform tests to see if the sample data are consistent with the that they were sampled from a normal

Size: px
Start display at page:

Download "It is possible to perform tests to see if the sample data are consistent with the that they were sampled from a normal"

Transcription

1 Department of Biochemistry and Microbiology BMS 617 Lecture 7 Non- normality and outliers Normally distributed data Many of the sta@s@cal tests we will study rely on the assump@on that the data were sampled from a normal distribu@on How reasonable is this assump@on? The normal distribu@on is an ideal distribu@on that likely never exists in reality Includes arbitrarily large values and arbitrarily small (nega@ve) values However, simula@ons show that most tests that rely on the assump@on of normality are robust to devia@ons from the normal distribu@on The ideal normal distribu@on Samples from a normal distribu@on Image shows data sampled from a theore@cal normal distribu@on Uses a very large sample size Close approxima@on to theore@cal distribu@on Tests for normality It is possible to perform tests to see if the sample data are consistent with the assump@on that they were sampled from a normal distribu@on Unfortunately, this is not what we really want to know Would really like to know if the distribu@on is close enough to normal for the test we use to be useful Tests for normality A test for normality is a sta@s@cal test for which the null hypothesis is The data were sampled from a normal distribu4on Common normality tests include D Agos@no- Pearson omnibus K2 normality test Shapiro- Wilk test Kolmogorov- Smirnov test 1

2 D Pearson omnibus K2 normality test The D Agos@no- Pearson omnibus K2 normality test works by compu@ng two values for the data set: The skewness, which measures how far the data is from being symmetric The kurtosis, which measures how sharply peaked the data is The test then combines these to a single value that describes how far from normal the data appear to lie Computes a p- value for this combined value Problem with normality tests If the p- value for a normality test is small, the interpreta@on is: If the data were sampled from an ideal normal distribu@on, it is unlikely the sample would be this skewed and/or kurto@c If the p- value for a normality test is large, then the data are not inconsistent with being sampled from a normal distribu@on However If the sample size is large, it is possible to get a small p- value even for small devia@ons from the normal distribu@on Data are likely sampled from a distribu@on that is close to, but not exactly, normal If the sample size is small, it is possible to get a large p- value even if the underlying distribu@on is far from normal Data do not provide sufficient evidence to reject the null hypothesis Useful to examine the values for skewness and kurtosis as well as the p- value Skewness and kurtosis Interpre@ng skewness and kurtosis The real ques@on we would like to answer is How much skewness and kurtosis are acceptable? Difficult to answer In general, interpret a skewness between and 0.5 as being approximately symmetric Between and - 0.5, or 0.5 and 1.0 is moderately skewed Less than or more than 1.0 is highly skewed For kurtosis, values between - 2 and 2 are generally accepted as being within limits Outside this is evidence the distribu@on is far from normal What to do if the data fail a test for normality If the data fail a test for normality, the following op@ons are available Can the data be transformed to data that come from a normal distribu@on? For example, if the data are nega@vely skewed, transforming to logs may give normally distributed data Are there a small number of outliers that are causing the data to fail a normality test? Next sec@on discusses outliers Is the departure from normality small? I.e. are the skewness and kurtosis small. If so, your sta@s@cal tests may s@ll be accurate enough Use a test that does not assume a normal distribu@on (a non- parametric test) Non- parametric tests The most common sta@s@cal tests assume the data are sampled from a normal distribu@on T- tests, ANOVA, Pearson correla@on, etc Some other tests do not make this assump@on Mann- Whitney test, Kruskal- Wallis test, Spearman correla@on, etc However, these tests have (much) lower sta@s@cal power than their parametric equivalents when the data are normally distributed 2

3 Choosing non- parametric tests When running a series of similar experiments, all data should be analyzed the same way Use normality tests to choose the sta@s@cal test for all experiments together Following common prac@ce is acceptable Ideally, run one experiment just to determine whether the data look like they come from a normal distribu@on For small data sets A test for normality does not tell you much Not likely to get a small p- value anyway Viola@ons of the normality assump@on are more egregious Non- parametric tests have very low sta@s@cal power The Mann- Whitney Test The Mann- Whitney test is the non- parametric equivalent of the unpaired T- test Use when you want to compare a variable between two groups, but you have reason to believe the data is not sampled from a normally- distributed popula@on How the Mann- Whitney Test works The Mann- Whitney test works as follows: Compute the rank for all values, regardless of which group they come from Smallest value has a rank of 1, next smallest has a rank of 2, etc. Choose one group: for each data point in that group, count the number of data points in the other group which are smaller Sum these values, and call the sum U 1 Similarly compute U 2, or use the fact that U 1 +U 2 =n 1 n 2 Let U=min(U 1,U 2 ) The distribu@on of U under the null hypothesis is known, so sogware can compute a p- value Pros and cons of non- parametric tests Pros of non- parametric tests: Since non- parametric tests do not rely on the assump@on of normally- distributed popula@ons, they can be used when that assump@on fails, or cannot be verified Cons of non- parametric tests: If the data really do come from normally- distributed popula@ons, the non- parametric tests are less powerful than their parametric counterparts i.e. they will give higher p- values For small sample sizes, they are much less powerful: Mann- Whitney p- values are always greater than 0.05 if the sample size is 7 or fewer Nonparametric Tests typically do not compute confidence intervals Can some@mes be computed, but ogen require addi@onal assump@ons Non- parametric tests are not related to regression models Cannot be extended to account for confounding variables using mul@ple regression techniques Choosing between parametric and non- parametric tests The choice between parametric and non- parametric tests is not straighiorward A common, but invalid, approach is to use normality tests to automate the choice The choice is most important for small data sets, for which normality tests are of limited use Using the data set to determine the sta@s@cal analysis will underes@mate p- values If data fail normality tests, a transforma@on may be appropriate The most "honest" approach is to perform in independent experiment with a large sample to test for normality, and then design the experiment in hand based on the results of this This is almost always imprac@cal For well- used experimental designs, an almost- equivalent approach is to follow customary procedure Essen@ally assuming this has been carried out in some way already How much difference does it make? The central limit theorem ensures that parametric tests work well with non- normal distribu@ons if the sample is large enough How large is large enough? Depends on the distribu@on! For most distribu@ons, sample sizes in the range of dozens will remove any issues with normality You will s@ll increase your sta@s@cal power by using a transforma@on if appropriate Conversely, if the data really come from a normally- distributed popula@on and you choose a non- parametric test, you will lose sta@s@cal power For large samples, however, the difference is minimal Small samples present problems: Non- parametric tests have very liole power for small samples Parametric tests can give misleading results for small samples if the popula@on data are non- normal Tests for normality are not helpful for small samples 3

4 Conclusions The booom- line conclusion is that large samples are beoer than small samples In general, the larger the beoer Of course, it can consuming and/or expensive to analyze large samples If your experimental design is going to use a small sample, you need to be able to jus@fy the data come from a normally distributed popula@on If this is a common experimental design that is conven@onally analyzed this way, that may be good enough For a new methodology, you should really perform an independent experiment with a large sample to test for normality first Use the results of this to guide the data analysis for future experiments Computa@onally- intensive non- parametric methods The non- parametric methods we examined worked by analyzing the ranks of the data Another class of non- parametric tests is the class of computa@onally- intensive methods There are two subclasses: Permuta@on or randomiza@on tests: Simulate the null distribu@on by repeatedly randomly reassigning group labels Compare the "real" data to the generated null distribu@on Bootstrapping techniques: Effec@vely generate many samples from the popula@on by resampling from the original sample Look at the distribu@on of summary data from the generated samples These techniques s@ll require a reasonable sample size to begin with Big enough to generate enough dis@nct permuta@ons or bootstraps Outliers Outliers are values in the data that are far from the other values Occur for several reasons: Invalid data entry Experimental mistakes Random chance In any distribu@on, some values are far from the others In a normal distribu@on, these values are rarer, but s@ll exist Biological diversity If your samples are from pa@ent or animal samples, the outlier may be correct and due to biological diversity May be an interes@ng finding! Wrong assump@ons For example, in a lognormal distribu@on, some values are far from the others Why test for outliers Presence of erroneous outliers, or assuming the wrong distribu@on, can introduce spurious results or mask real results Trying to detect outliers without a test can be problema@c We tend to want to observe paoerns in data Anything that appears to be counter to these paoerns seems to be an outlier We tend to see too many outliers Before tes@ng for outliers Before tes@ng for outliers: Check the data entry Errors here can ogen be fixed Were there problems with the experiment? If errors were observed during the experiment, remove data associated with those errors Many experimental protocols have quality control measures Is it possible your data is not normally distributed Most outlier tests assume the (non- outlier) data is normally distributed Was there anything different about any of the samples Was one of the mice phenotypically different, etc? Outlier tests Ager addressing the concerns on the previous slide, if you s@ll suspect an outlier you can run an outlier test Outlier tests answer the following ques@on: If the data were sampled from a normal distribu4on, what is the chance of observing one value as far from the others as is in the observed data? 4

5 Results of an outlier test If an outlier test results in a small p- value, then the conclusion is that the outlying value is (probably) not from the same distribu@on as the other values Jus@fies excluding it from the analysis If the outlier test results in a high p- value, there is no evidence the value came from a different distribu@on Doesn t prove it did come from the same distribu@on, just that there is no strong evidence to the contrary Guidelines on removing outliers If you address all the previous concerns, and an outlier test gives strong evidence of an outlier, then it is legi@mate to remove it from the analysis The rules for elimina@ng outliers should be established before you generate the data You should report the number of outliers removed and the ra@onale for doing so in any publica@on using the data How outlier tests work Outlier tests work by compu@ng the difference between the extreme value and some measure of central tendency That value is typically divided by a measure of the variability Resul@ng ra@o is compared with a table or expected distribu@on of those values Grubb s outlier test Grubb s outlier test calculates the difference between the extreme value and the mean of all values (including the extreme value), and divides by the standard devia@on Resul@ng value is then compared to a table of cri@cal values Cri@cal value depends on the sample size If the value is larger than the cri@cal value, then the extreme value can be considered an outlier Demo We ll experiment with the GRHL2 Basal- A and Basal- B data sets in GraphPad, checking for outliers and tes@ng for normality. 5

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Difference tests (2): nonparametric

Difference tests (2): nonparametric NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Analyzing Data with GraphPad Prism

Analyzing Data with GraphPad Prism 1999 GraphPad Software, Inc. All rights reserved. All Rights Reserved. GraphPad Prism, Prism and InStat are registered trademarks of GraphPad Software, Inc. GraphPad is a trademark of GraphPad Software,

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Come scegliere un test statistico

Come scegliere un test statistico Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

More information

Statistics for Sports Medicine

Statistics for Sports Medicine Statistics for Sports Medicine Suzanne Hecht, MD University of Minnesota (suzanne.hecht@gmail.com) Fellow s Research Conference July 2012: Philadelphia GOALS Try not to bore you to death!! Try to teach

More information

Statistics in Medicine Research Lecture Series CSMC Fall 2014

Statistics in Medicine Research Lecture Series CSMC Fall 2014 Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power

More information

Version 4.0. Statistics Guide. Statistical analyses for laboratory and clinical researchers. Harvey Motulsky

Version 4.0. Statistics Guide. Statistical analyses for laboratory and clinical researchers. Harvey Motulsky Version 4.0 Statistics Guide Statistical analyses for laboratory and clinical researchers Harvey Motulsky 1999-2005 GraphPad Software, Inc. All rights reserved. Third printing February 2005 GraphPad Prism

More information

The InStat guide to choosing and interpreting statistical tests

The InStat guide to choosing and interpreting statistical tests Version 3.0 The InStat guide to choosing and interpreting statistical tests Harvey Motulsky 1990-2003, GraphPad Software, Inc. All rights reserved. Program design, manual and help screens: Programming:

More information

Data Transforms: Natural Logarithms and Square Roots

Data Transforms: Natural Logarithms and Square Roots Data Transforms: atural Log and Square Roots 1 Data Transforms: atural Logarithms and Square Roots Parametric statistics in general are more powerful than non-parametric statistics as the former are based

More information

How To Understand The Big Data Paradigm

How To Understand The Big Data Paradigm Big Data and Its Empiricist Founda4ons Teresa Scantamburlo The evolu4on of Data Science The mechaniza4on of induc4on The business of data The Big Data paradigm (data + computa4on) Cri4cal analysis Tenta4ve

More information

Nonparametric statistics and model selection

Nonparametric statistics and model selection Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.

More information

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other 1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

More information

Non Parametric Inference

Non Parametric Inference Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem) NONPARAMETRIC STATISTICS 1 PREVIOUSLY parametric statistics in estimation and hypothesis testing... construction of confidence intervals computing of p-values classical significance testing depend on assumptions

More information

CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

More information

MEASURES OF LOCATION AND SPREAD

MEASURES OF LOCATION AND SPREAD Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the

More information

1 Nonparametric Statistics

1 Nonparametric Statistics 1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions

More information

Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation

More information

STATISTICAL SIGNIFICANCE OF RANKING PARADOXES

STATISTICAL SIGNIFICANCE OF RANKING PARADOXES STATISTICAL SIGNIFICANCE OF RANKING PARADOXES Anna E. Bargagliotti and Raymond N. Greenwell Department of Mathematical Sciences and Department of Mathematics University of Memphis and Hofstra University

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

More information

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation.

Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation. Computer Workshop 1 Part I Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation. Outlier testing Problem: 1. Five months of nickel

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics J. Lozano University of Goettingen Department of Genetic Epidemiology Interdisciplinary PhD Program in Applied Statistics & Empirical Methods Graduate Seminar in Applied Statistics

More information

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the

More information

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck! Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck! Name: 1. The basic idea behind hypothesis testing: A. is important only if you want to compare two populations. B. depends on

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U Previous chapters of this text have explained the procedures used to test hypotheses using interval data (t-tests and ANOVA s) and nominal

More information

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

Error Type, Power, Assumptions. Parametric Tests. Parametric vs. Nonparametric Tests

Error Type, Power, Assumptions. Parametric Tests. Parametric vs. Nonparametric Tests Error Type, Power, Assumptions Parametric vs. Nonparametric tests Type-I & -II Error Power Revisited Meeting the Normality Assumption - Outliers, Winsorizing, Trimming - Data Transformation 1 Parametric

More information

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

More information

MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST

MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST Zahayu Md Yusof, Nurul Hanis Harun, Sharipah Sooad Syed Yahaya & Suhaida Abdullah School of Quantitative Sciences College of Arts and

More information

9/21/15. Research Educa4on Solu4ons A NEW LANGUAGE FOR LEADERSHIP TRANSFORMING PERFORMANCE MANAGEMENT: AN ELI LILLY CASE STUDY

9/21/15. Research Educa4on Solu4ons A NEW LANGUAGE FOR LEADERSHIP TRANSFORMING PERFORMANCE MANAGEMENT: AN ELI LILLY CASE STUDY A NEW LANGUAGE FOR LEADERSHIP TRANSFORMING PERFORMANCE MANAGEMENT: AN ELI LILLY CASE STUDY Research Educa4on Solu4ons Dr. David Rock, Director, NeuroLeadership Ins4tute Mark Ferrara, VP of Talent Management,

More information

Skewed Data and Non-parametric Methods

Skewed Data and Non-parametric Methods 0 2 4 6 8 10 12 14 Skewed Data and Non-parametric Methods Comparing two groups: t-test assumes data are: 1. Normally distributed, and 2. both samples have the same SD (i.e. one sample is simply shifted

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Inference for two Population Means

Inference for two Population Means Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

NAG C Library Chapter Introduction. g08 Nonparametric Statistics

NAG C Library Chapter Introduction. g08 Nonparametric Statistics g08 Nonparametric Statistics Introduction g08 NAG C Library Chapter Introduction g08 Nonparametric Statistics Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Parametric and Nonparametric

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

You have data! What s next?

You have data! What s next? You have data! What s next? Data Analysis, Your Research Questions, and Proposal Writing Zoo 511 Spring 2014 Part 1:! Research Questions Part 1:! Research Questions Write down > 2 things you thought were

More information

Lesson 4 Measures of Central Tendency

Lesson 4 Measures of Central Tendency Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g., CHAPTER 13 Nonparametric and Distribution-Free Statistics Nonparametric tests these test hypotheses that are not statements about population parameters (e.g., 2 tests for goodness of fit and independence).

More information

The Wilcoxon Rank-Sum Test

The Wilcoxon Rank-Sum Test 1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Testing for differences I exercises with SPSS

Testing for differences I exercises with SPSS Testing for differences I exercises with SPSS Introduction The exercises presented here are all about the t-test and its non-parametric equivalents in their various forms. In SPSS, all these tests can

More information

THE KRUSKAL WALLLIS TEST

THE KRUSKAL WALLLIS TEST THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON

More information

13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics. Post hoc Comparisons 13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior

More information

Chapter G08 Nonparametric Statistics

Chapter G08 Nonparametric Statistics G08 Nonparametric Statistics Chapter G08 Nonparametric Statistics Contents 1 Scope of the Chapter 2 2 Background to the Problems 2 2.1 Parametric and Nonparametric Hypothesis Testing......................

More information

UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

More information

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA Chapter 13 introduced the concept of correlation statistics and explained the use of Pearson's Correlation Coefficient when working

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Not riskless, not even close.. Pseudo or Specula6ve Arbitrage. Aswath Damodaran

Not riskless, not even close.. Pseudo or Specula6ve Arbitrage. Aswath Damodaran Not riskless, not even close.. Pseudo or Specula6ve Arbitrage Aswath Damodaran Pseudo or Specula6ve Arbitrage There are a large number of strategies that are characterized as arbitrage, but actually expose

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

The Assumption(s) of Normality

The Assumption(s) of Normality The Assumption(s) of Normality Copyright 2000, 2011, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you knew

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine 2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9. Two-way ANOVA, II Post-hoc comparisons & two-way analysis of variance 9.7 4/9/4 Post-hoc testing As before, you can perform post-hoc tests whenever there s a significant F But don t bother if it s a main

More information

Chapter 12 Nonparametric Tests. Chapter Table of Contents

Chapter 12 Nonparametric Tests. Chapter Table of Contents Chapter 12 Nonparametric Tests Chapter Table of Contents OVERVIEW...171 Testing for Normality...... 171 Comparing Distributions....171 ONE-SAMPLE TESTS...172 TWO-SAMPLE TESTS...172 ComparingTwoIndependentSamples...172

More information

A full analysis example Multiple correlations Partial correlations

A full analysis example Multiple correlations Partial correlations A full analysis example Multiple correlations Partial correlations New Dataset: Confidence This is a dataset taken of the confidence scales of 41 employees some years ago using 4 facets of confidence (Physical,

More information

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA) UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

More information

Why is it important to study sogware engineering?

Why is it important to study sogware engineering? Last 7me CS 521/621 Course Overview: Sta7c and Dynamic Analyses What did we talk about? Why is it important to study sogware engineering? Just like cars US automobile industry used to be very complacent

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k.

REPEATED TRIALS. The probability of winning those k chosen times and losing the other times is then p k q n k. REPEATED TRIALS Suppose you toss a fair coin one time. Let E be the event that the coin lands heads. We know from basic counting that p(e) = 1 since n(e) = 1 and 2 n(s) = 2. Now suppose we play a game

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Analyzing Data with GraphPad Prism

Analyzing Data with GraphPad Prism Analyzing Data with GraphPad Prism A companion to GraphPad Prism version 3 Harvey Motulsky President GraphPad Software Inc. Hmotulsky@graphpad.com GraphPad Software, Inc. 1999 GraphPad Software, Inc. All

More information

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

More information

Principles of Hypothesis Testing for Public Health

Principles of Hypothesis Testing for Public Health Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification. Heatmaps Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

More information

Missing Values, Outliers, Robust Statistics & Non-parametric Methods

Missing Values, Outliers, Robust Statistics & Non-parametric Methods L G urope Online Supplement statistics and data analysis 19 Missing Values, s, Robust Statistics & Non-parametric Methods Shaun urke, RHM Technology Ltd, High Wycombe, uckinghamshire, UK. This article,

More information

Parametric and Nonparametric: Demystifying the Terms

Parametric and Nonparametric: Demystifying the Terms Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD

More information

AN ANALYSIS OF INSURANCE COMPLAINT RATIOS

AN ANALYSIS OF INSURANCE COMPLAINT RATIOS AN ANALYSIS OF INSURANCE COMPLAINT RATIOS Richard L. Morris, College of Business, Winthrop University, Rock Hill, SC 29733, (803) 323-2684, morrisr@winthrop.edu, Glenn L. Wood, College of Business, Winthrop

More information

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

More information