# MEASURES OF LOCATION AND SPREAD

Size: px
Start display at page:

Transcription

1 Paper TU04 An Overview of Non-parametric Tests in SAS : When, Why, and How Paul A. Pappas and Venita DePuy Durham, North Carolina, USA ABSTRACT Most commonly used statistical procedures are based on the assumption that distributions are normal. But what if they are not? When should non-parametric statistics be used, and what assumptions do those require? If your data doesn t meet the assumptions for either group, what should you do? This paper will provide an easy guide to choosing the most appropriate statistical test, whether parametric or nonparametric; how to perform that-test in SAS; and how to interpret the results. Non-parametric SAS procedures covered will include PROCs ANOVA, NPAR1WAY, TTEST and UNIVARIATE. Discussion will include assumptions and assumption violations, robustness, and exact versus approximate tests. INTRODUCTION Many statistical tests rely heavily on distributional assumptions, such as normality. When these assumptions are not satisfied, commonly used statistical tests often perform poorly, resulting in a greater chance of committing an error. Non-parametric tests are designed to have desirable statistical properties when few assumptions can be made about the underlying distribution of the data. In other words, when the data are obtained from a non-normal distribution or one containing outliers, a non-parametric test is often a more powerful statistical tool than it s parametric normal theory equivalent. We will explore the use of parametric and non-parametric tests for one- and two-sample location differences, two-sample dispersion differences, and a one way layout analysis. We will also examine testing for general differences between populations. MEASURES OF LOCATION AND SPREAD Mean and variance are typically used to describe the center and spread of normally distributed data. If the data are not normally distributed or contain outliers, these measures may not be robust enough to accurately describe the data. The median is a more robust measure of the center of distribution, in that it is not as heavily influenced by outliers and skewed data. As a result, the median is typically used with non-parametric tests. The spread is less easy to quantify but is often represented by the interquartile range, which is simply the difference between the first and third quartiles. DETERMINING NORMALITY (OR LACK THEREOF) One of the first steps in test selection should be investigating the distribution of the data. PROC UNIVARIATE can be implemented to determine whether or not your data are normal. This procedure generates a variety of summary statistics, such as the mean and median, as well as numerical representations of properties such as skewness and kurtosis. If the population from which the data are obtained is normal, the mean and median should be equal or close to equal. The skewness coefficient, which is a measure of symmetry, should be near zero. Positive values for the skewness coefficient indicate that the data are right skewed, and negative values indicate that that data are left skewed. The kurtosis coefficient, which is a measure of spread, should also be near zero. Positive values for the kurtosis coefficient indicate that the distribution of the data is steeper than a normal distribution, and negative values for kurtosis indicate that the distribution of the data is flatter than normal distribution. The NORMAL option in PROC UNIVARIATE produces a table with tests for normality. In general, if the p-values are less than 0.05, then the data should be considered non-normally distributed. However, it is important to remember that these tests are heavily dependent on sample size. Strikingly non-normal data may have a p-value greater than 0.05 due to a small sample size. Therefore, graphical representations of the data should always be examined. Low resolution plots and high resolution histograms are both available in PROC UNIVARIATE. The PLOTS option in PROC UNIVARIATE creates low-resolution stem-and-leaf, box, and normal probability plots. The stem-and-leaf plot is used to visualize the overall distribution of the data and the box plot is a graphical representation of the 5-number summary. The normal probability plot is designed to investigate whether a variable is normally distributed. If the data are normal, then the plot should display a straight diagonal line. Different departures from the straight diagonal line indicate different types of departures from normality. 1

2 The HISTOGRAM statement in PROC UNIVARIATE will produce high resolution histograms. When used in conjunction with the NORMAL option, the histogram will have a line indicating the shape of a normal distribution with the same mean and variance as the sample. PROC UNIVARIATE is an invaluable tool in visualizing and summarizing data in order to gain an understanding of the underlying populations from which the data are obtained. To produce these results, the following code can be used. Omitting the VAR statement will run the analysis on all the variables in the dataset. PROC UNIVARIATE data=datafile normal plots; Histogram; Var variable1 variable2... variablen; The determination of the normality of the data should result from evaluation of the graphical output in conjunction with the numerical output. In addition, the user might wish to look at subsets of the data; for example, a CLASS statement might be used to stratify by gender. DIFFERENCES IN DEPENDENT POPULATIONS Testing for the difference between two dependent populations, such as before and after measurements on the same subjects, is typically done by testing for a difference in centers of distribution (means or medians). The paired t-test looks for a difference in means, while the non-parametric sign and signed rank tests look for a difference in medians. It is assumed that the spreads and shapes of the two populations are the same. In this situation, the data are paired; two observations are obtained on each of n subjects resulting in one sample of 2n observations. PAIRED T-TEST If the data are normal, the one-sample paired t-test is the best statistical test to implement. Assumptions for the onesample paired t-test are that the observations are independent and identically normally distributed. To perform a onesample paired t-test in SAS, either PROC UNIVARIATE (as described above) or the following PROC TTEST code can be used. PROC TTEST data=datafile; Paired variable1*variable2; If the p-value for the paired t-test is less than the specified alpha level, then there is evidence to suggest that the population means of the two variables differ. In the case of measurements taken before and after a treatment, this would suggest a treatment effect. SIGN TEST AND SIGNED RANK TEST The Signed Rank test and the Sign test are non-parametric equivalents to the one-sample paired t-test. Neither of these tests requires the data to be normally distributed, but both tests require that the observed differences between the paired observations be mutually independent, and that each of the observed paired differences comes from a continuous population symmetric about a common median. However, the observed paired differences do not necessarily have to be obtained from the same underlying distribution. In general, the signed rank test has more statistical power than the sign test, but if the data contain outliers or are obtained from a heavy-tailed distribution, the sign test will have the most statistical powerful. PROC UNIVARIATE produces both of these tests as well as a paired t-test. Before PROC UNIVARIATE can be used to carry out these tests, the paired differences must be computed, as per the following code: DATA datafile; Set datafile; Difference=variable1-variable2; PROC UNIVARIATE data=datafile; Var difference; If the p-values for the signed rank or sign tests are less than the specified alpha level, then there is evidence to suggest that the population medians of the two variables differ. DIFFERENCES BETWEEN TWO INDEPENDENT POPULATIONS Investigators may be interested in the difference between centers of distributions, difference in distributional spreads, or simply interested in any differences between two populations. PROC TTEST allows the user to test for differences in means for both equal and unequal variances, as well as providing a test for differences in variance. The nonparametric method provides different-tests, depending on the hypothesis of interest. The Wilcoxon Rank Sum test 2

3 investigates differences in medians, with the assumption of identical spreads. The more generalized Kolmogorov- Smirnov test looks for differences in center, spread, or both. It should be noted that two stage calculations should not be used. For instance, using the Wilcoxon Rank Sum test for differences in medians followed by the Ansari- Bradley test for differences in spread dos not protect the nominal significance level. When overall differences between populations need to be investigated, consider the Kolmogorov-Smirnov test. TWO-SAMPLE T-TEST If the data consists of two samples from independent, normally distributed populations, the two-sample t-test is the best statistical test to implement. Assumptions for the two-sample t-test and Folded F test are that within each sample the observations are independent and identically normally distributed. Also, the two samples must be independent of each other. To perform these tests in SAS, the following code can be used: PROC TTEST data=datafile; Var variable1; The CLASS statement is necessary in order to distinguish between the two samples. The output from these statements includes the two-sample t-test for both equal and unequal variances, as well as the Folded F test for equality of variances. The equality of variance results should be reviewed first, and used to determine which results from the t-test are applicable. If the p-value for the t-test is less than the specified alpha level, then there is evidence to suggest that the two population means differ. WILCOXON RANK SUM TEST The Wilcoxon Rank Sum (which is numerically equivalent to the Mann-Whitney U test) is the non-parametric equivalent to the two-sample t-test. This test can also be performed if only rankings (i.e., ordinal data) are available. It-tests the null hypothesis that the two distributions are identical against the alternative hypothesis that the two distributions differ only with respect to the median. Assumptions for this test are that within each sample the observations are independent and identically distributed, and that the shapes and spreads of the distributions are the same. It does not matter which distribution the data are obtained from as long as the data are randomly selected from the same underlying population. Also, the two samples must be independent of each other. The following code will perform the rank sum test in SAS: PROC NPAR1WAY data=datafile wilcoxon; Var variable1; The Wilcoxon Rank Sum test will be performed without using the WILCOXON option; however, this option limits the amount of other output. The EXACT statement is optional and requests exact-tests to be performed. Approximate tests perform well when the sample size is sufficiently large. When sample size is small or the data are sparse, skewed, or heavy-tailed, approximate tests may be unreliable. However, exact-tests are computationally intensive which may require computer algorithms to run for hours or days. Thus, if your sample size is sufficiently large, it may not be worth your while to request exact-tests. If the p-value is less than the specified alpha level, then there is evidence to suggest that the two population medians differ. It should be noted that when the variances of the treatment groups are heterogeneous, the Type I error probabilities in the Wilcoxon Rank Sum test can increase by as much as 300%, with no indication that they asymptotically approach the nominal significance level as the sample size increases. Therefore, care should be taken when assuming that variances are, in fact, equal. ANSARI-BRADLEY TEST In some instances, it may be necessary to test for differences in spread while assuming that the centers of two populations are identical. One example is comparing two assay methods to see which is more precise. The Ansari- Bradley test is the non-parametric equivalent to the Folded F test for equality of variances. Assumptions for this test are that within each sample the observations are independent and identically distributed. Also, the two samples must be independent of each other, with equal medians. To perform an Ansari-Bradley test in SAS, the following code can be used: PROC NPAR1WAY data=datafile ab; The EXACT option may also be used if sample sizes are small. If the p-value is less than the specified alpha level, then there is evidence to suggest that the spreads of the two populations are not identical. 3

4 KOLMOGOROV-SMIRNOV TEST In many cases, the Kolmogorov-Smirnov may be the most appropriate of the non-parametric tests for overall differences between two groups. If it can be assumed that spreads and shapes of the two distributions are the same, the Wilcoxon Rank Sum test is more powerful than the Kolmogorov-Smirnov test; if the median and shapes are the same, the Ansari-Bradley test is more powerful. This test is for the general hypothesis that the two populations differ. The assumptions are that the data is independent and identically distributed within the two populations, and that the two samples are independent. To perform this test in SAS, use the following code: PROC NPAR1WAY data=datafile edf; As with the other non-parametric tests, the EXACT option is available in the Kolmogorov-Smirnov test, but is only recommended for small sample sizes or sparse, skewed or heavy tailed data. If the p-value is less than the specified alpha level, then there is evidence to suggest that the two populations are not the same. DIFFERENCES IN THREE OR MORE INDEPENDENT POPULATIONS In this situation the data consist of more than two random samples. ONE-WAY ANALYSIS TEST If the data are normal, then a one-way ANOVA is the best statistical test to implement. Assumptions for a one-way ANOVA are that within each sample the observations are independent and identically normally distributed. Also, the samples must be independent of each other with equal population variances. To perform a one-way ANOVA in SAS, the following code can be used: PROC ANOVA data=datafile; Model variable=sample; If the p-value is less than the specified alpha level, then there is evidence to suggest that at least one of the population means differs from the others. Further investigation is required to determine which specific population means can be considered statistically significantly different from each other. KRUSKAL-WALLIS TEST The Kruskal-Wallis test is the non-parametric equivalent to the one-way ANOVA. Assumptions for the Kruskal-Wallis test are that within each sample the observations are independent and identically distributed and the samples are independent of each other. To perform a Kruskal-Wallis test in SAS, the following code can be used: PROC NPAR1WAY data=datafile wilcoxon; This block of code is identical to the code used to produce the Wilcoxon Rank Sum test. In fact, the Kruskal-Wallis test reduces to the rank sum test when there are only two samples. The EXACT statement is optional and requests exact-tests to be performed in addition to the large-sample approximation. As before, the exact option is very computationally intensive and should only be used if needed. If the p-value is less than the specified alpha level, then there is evidence to suggest that at least one of the population medians differs from the others. Further investigation is required to determine which specific population medians can be considered statistically significantly different from each other. CONCLUSION Statistical analyses may be invalid if the assumptions behind those tests are violated. Prior to conducting analyses, the distribution of the data should be examined for departures from normality, such as skewness or outliers. If the data are normally distributed, and other assumptions are met, parametric tests are the most powerful. If the data are non-normal but other criteria are met, non-parametric statistics provide valid analyses. When neither set of assumptions has been met, both tests should be implemented to see if they agree. Also, further research should be done to discover whether the appropriate parametric or non-parametric test is the most robust to specific data issues. This paper discusses the most commonly used non-parametric tests, and their parametric equivalents. A variety of other non-parametric tests are available in SAS, both in PROC NPAR1WAY and in other procedures such as PROC FREQ. Further technical information is available in the SAS Online Docs. 4

5 REFERENCES Hollander, M. and Wolfe, D.A. (1973), Nonparametric Statistical Methods, New York: John Wiley & Sons, Inc. SAS Institute (2003). SAS OnlineDoc, Version 8, Cary, NC: SAS Institute, Inc UCLA Academic Technology Services. SAS Class Notes 2.0 Analyzing Data < (July 9, 2004) Zimmerman, D.W. (1998). Invalidation of parametric and nonparametric statistical tests by concurrent violation of two assumptions. Journal of Experimental Education, 67, CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Paul A. Pappas Venita DePuy PO Box PO Box Durham, NC Durham, NC (919) (919) Fax: (919) Fax: (919) SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. 5

Chapter 12 Nonparametric Tests Chapter Table of Contents OVERVIEW...171 Testing for Normality...... 171 Comparing Distributions....171 ONE-SAMPLE TESTS...172 TWO-SAMPLE TESTS...172 ComparingTwoIndependentSamples...172

### SAS/STAT. 9.2 User s Guide. Introduction to. Nonparametric Analysis. (Book Excerpt) SAS Documentation

SAS/STAT Introduction to 9.2 User s Guide Nonparametric Analysis (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

### Statistics Review PSY379

Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

### Parametric and Nonparametric: Demystifying the Terms

Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD

### UNIVERSITY OF NAIROBI

UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER

### EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA

EXPLORATORY DATA ANALYSIS: GETTING TO KNOW YOUR DATA Michael A. Walega Covance, Inc. INTRODUCTION In broad terms, Exploratory Data Analysis (EDA) can be defined as the numerical and graphical examination

### Dongfeng Li. Autumn 2010

Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

### How To Test For Significance On A Data Set

Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

### t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

### Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical

### INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

### Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

### DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

### Testing for differences I exercises with SPSS

Testing for differences I exercises with SPSS Introduction The exercises presented here are all about the t-test and its non-parametric equivalents in their various forms. In SPSS, all these tests can

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

CHAPTER 13 Nonparametric and Distribution-Free Statistics Nonparametric tests these test hypotheses that are not statements about population parameters (e.g., 2 tests for goodness of fit and independence).

### Difference tests (2): nonparametric

NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge

### NCSS Statistical Software

Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly

### NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

NONPARAMETRIC STATISTICS 1 PREVIOUSLY parametric statistics in estimation and hypothesis testing... construction of confidence intervals computing of p-values classical significance testing depend on assumptions

### Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

### Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### How To Check For Differences In The One Way Anova

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### The Statistics Tutor s Quick Guide to

statstutor community project encouraging academics to share statistics support resources All stcp resources are released under a Creative Commons licence The Statistics Tutor s Quick Guide to Stcp-marshallowen-7

### SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list

### Reporting Statistics in Psychology

This document contains general guidelines for the reporting of statistics in psychology research. The details of statistical reporting vary slightly among different areas of science and also among different

### Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

### Description. Textbook. Grading. Objective

EC151.02 Statistics for Business and Economics (MWF 8:00-8:50) Instructor: Chiu Yu Ko Office: 462D, 21 Campenalla Way Phone: 2-6093 Email: kocb@bc.edu Office Hours: by appointment Description This course

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### SPSS TUTORIAL & EXERCISE BOOK

UNIVERSITY OF MISKOLC Faculty of Economics Institute of Business Information and Methods Department of Business Statistics and Economic Forecasting PETRA PETROVICS SPSS TUTORIAL & EXERCISE BOOK FOR BUSINESS

### Nonparametric Statistics

Nonparametric Statistics J. Lozano University of Goettingen Department of Genetic Epidemiology Interdisciplinary PhD Program in Applied Statistics & Empirical Methods Graduate Seminar in Applied Statistics

### Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

### QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

### Analyzing Research Data Using Excel

Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial

### Nonparametric Statistics

Nonparametric Statistics References Some good references for the topics in this course are 1. Higgins, James (2004), Introduction to Nonparametric Statistics 2. Hollander and Wolfe, (1999), Nonparametric

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### How To Write A Data Analysis

Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

### The Wilcoxon Rank-Sum Test

1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We

### Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

Babraham Bioinformatics Introduction to Statistics with GraphPad Prism (5.01) Version 1.1 Introduction to Statistics with GraphPad Prism 2 Licence This manual is 2010-11, Anne Segonds-Pichon. This manual

### Mathematics within the Psychology Curriculum

Mathematics within the Psychology Curriculum Statistical Theory and Data Handling Statistical theory and data handling as studied on the GCSE Mathematics syllabus You may have learnt about statistics and

### Non-parametric Tests Using SPSS

Non-parametric Tests Using SPSS Statistical Package for Social Sciences Jinlin Fu January 2016 Contact Medical Research Consultancy Studio Australia http://www.mrcsau.com.au Contents 1 INTRODUCTION...

### CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

### Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation.

Computer Workshop 1 Part I Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation. Outlier testing Problem: 1. Five months of nickel

### Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### 1 Nonparametric Statistics

1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions

### Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

### Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

### Data analysis process

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

### Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

### Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

### An introduction to using Microsoft Excel for quantitative data analysis

Contents An introduction to using Microsoft Excel for quantitative data analysis 1 Introduction... 1 2 Why use Excel?... 2 3 Quantitative data analysis tools in Excel... 3 4 Entering your data... 6 5 Preparing

### NCSS Statistical Software. One-Sample T-Test

Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,

### THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

### Are Histograms Giving You Fits? New SAS Software for Analyzing Distributions

Are Histograms Giving You Fits? New SAS Software for Analyzing Distributions Nathan A. Curtis, SAS Institute Inc., Cary, NC Abstract Exploring and modeling the distribution of a data sample is a key step

### LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

### Descriptive and Inferential Statistics

General Sir John Kotelawala Defence University Workshop on Descriptive and Inferential Statistics Faculty of Research and Development 14 th May 2013 1. Introduction to Statistics 1.1 What is Statistics?

### EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

EPS 625 INTERMEDIATE STATISTICS The Friedman test is an extension of the Wilcoxon test. The Wilcoxon test can be applied to repeated-measures data if participants are assessed on two occasions or conditions

### STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE Perhaps Microsoft has taken pains to hide some of the most powerful tools in Excel. These add-ins tools work on top of Excel, extending its power and abilities

### Come scegliere un test statistico

Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

### Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

### Analysis of Questionnaires and Qualitative Data Non-parametric Tests

Analysis of Questionnaires and Qualitative Data Non-parametric Tests JERZY STEFANOWSKI Instytut Informatyki Politechnika Poznańska Lecture SE 2013, Poznań Recalling Basics Measurment Scales Four scales

### Statistics for Sports Medicine

Statistics for Sports Medicine Suzanne Hecht, MD University of Minnesota (suzanne.hecht@gmail.com) Fellow s Research Conference July 2012: Philadelphia GOALS Try not to bore you to death!! Try to teach

### Introduction to Statistics with SPSS (15.0) Version 2.3 (public)

Babraham Bioinformatics Introduction to Statistics with SPSS (15.0) Version 2.3 (public) Introduction to Statistics with SPSS 2 Table of contents Introduction... 3 Chapter 1: Opening SPSS for the first

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### Version 4.0. Statistics Guide. Statistical analyses for laboratory and clinical researchers. Harvey Motulsky

Version 4.0 Statistics Guide Statistical analyses for laboratory and clinical researchers Harvey Motulsky 1999-2005 GraphPad Software, Inc. All rights reserved. Third printing February 2005 GraphPad Prism

### Chapter G08 Nonparametric Statistics

G08 Nonparametric Statistics Chapter G08 Nonparametric Statistics Contents 1 Scope of the Chapter 2 2 Background to the Problems 2 2.1 Parametric and Nonparametric Hypothesis Testing......................

### Chapter 7. One-way ANOVA

Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### t-test Statistics Overview of Statistical Tests Assumptions

t-test Statistics Overview of Statistical Tests Assumption: Testing for Normality The Student s t-distribution Inference about one mean (one sample t-test) Inference about two means (two sample t-test)

### Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

### P(every one of the seven intervals covers the true mean yield at its location) = 3.

1 Let = number of locations at which the computed confidence interval for that location hits the true value of the mean yield at its location has a binomial(7,095) (a) P(every one of the seven intervals

### Week 1. Exploratory Data Analysis

Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

### ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

### Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

### Parametric and non-parametric statistical methods for the life sciences - Session I

Why nonparametric methods What test to use? Rank Tests Parametric and non-parametric statistical methods for the life sciences - Session I Liesbeth Bruckers Geert Molenberghs Interuniversity Institute

### Geostatistics Exploratory Analysis

Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

### Rank-Based Non-Parametric Tests

Rank-Based Non-Parametric Tests Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

### A Correlation of. to the. South Carolina Data Analysis and Probability Standards

A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards

### Analysis of Student Retention Rates and Performance Among Fall, 2013 Alternate College Option (ACO) Students

1 Analysis of Student Retention Rates and Performance Among Fall, 2013 Alternate College Option (ACO) Students A Study Commissioned by the Persistence Committee Policy Working Group prepared by Jeffrey

### Comparing Means in Two Populations

Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

### 12: Analysis of Variance. Introduction

1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

### SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR