Rank-Based Non-Parametric Tests



Similar documents
Non-Parametric Tests (I)

Study Guide for the Final Exam

UNIVERSITY OF NAIROBI

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

1 Nonparametric Statistics

THE KRUSKAL WALLLIS TEST

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

SPSS Tests for Versions 9 to 13

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SPSS Explore procedure

Research Methodology: Tools

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U

Intro to Parametric & Nonparametric Statistics

Nonparametric Statistics

1.5 Oneway Analysis of Variance

Research Methods & Experimental Design

Come scegliere un test statistico

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Nonparametric and Distribution- Free Statistical Tests

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

Nonparametric Statistics

CHAPTER 14 NONPARAMETRIC TESTS

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

Comparing Means in Two Populations

MASTER COURSE SYLLABUS-PROTOTYPE PSYCHOLOGY 2317 STATISTICAL METHODS FOR THE BEHAVIORAL SCIENCES

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

Descriptive Statistics

Parametric and non-parametric statistical methods for the life sciences - Session I

Difference tests (2): nonparametric

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Statistical tests for SPSS

The Wilcoxon Rank-Sum Test

Analysis of Questionnaires and Qualitative Data Non-parametric Tests

T-test & factor analysis

13: Additional ANOVA Topics. Post hoc Comparisons

The Dummy s Guide to Data Analysis Using SPSS

Using Excel for inferential statistics

Permutation Tests for Comparing Two Populations

Deciding which statistical test to use:

THE UNIVERSITY OF TEXAS AT TYLER COLLEGE OF NURSING COURSE SYLLABUS NURS 5317 STATISTICS FOR HEALTH PROVIDERS. Fall 2013

SPSS 3: COMPARING MEANS

Types of Data, Descriptive Statistics, and Statistical Tests for Nominal Data. Patrick F. Smith, Pharm.D. University at Buffalo Buffalo, New York

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

Statistics for Sports Medicine

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Projects Involving Statistics (& SPSS)

Permutation & Non-Parametric Tests

Introduction to Quantitative Methods

NCSS Statistical Software. One-Sample T-Test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

WHAT IS A JOURNAL CLUB?

Introduction to Statistics and Quantitative Research Methods

II. DISTRIBUTIONS distribution normal distribution. standard scores

Normality Testing in Excel

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Parametric and Nonparametric: Demystifying the Terms

The Statistics Tutor s Quick Guide to

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

One-Way Analysis of Variance (ANOVA) Example Problem

Additional sources Compilation of sources:

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Chapter G08 Nonparametric Statistics

Chapter 12 Nonparametric Tests. Chapter Table of Contents

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

NCSS Statistical Software

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

CALCULATIONS & STATISTICS

Biostatistics: Types of Data Analysis

The Mann-Whitney test:

Recall this chart that showed how most of our course would be organized:

Reporting Statistics in Psychology

StatCrunch and Nonparametric Statistics

Tutorial 5: Hypothesis Testing

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

T-TESTS: There are two versions of the t-test:

First-year Statistics for Psychology Students Through Worked Examples. 3. Analysis of Variance

STATISTICAL SIGNIFICANCE OF RANKING PARADOXES

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Exact Nonparametric Tests for Comparing Means - A Personal Summary

UNDERSTANDING THE TWO-WAY ANOVA

Terminating Sequential Delphi Survey Data Collection

MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

NCSS Statistical Software

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

The Kruskal-Wallis test:

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Regression III: Advanced Methods

Transcription:

Rank-Based Non-Parametric Tests

Reminder: Student Instructional Rating Surveys You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs The survey should be available on any device with a full-featured web browser. Please take the time to fill it out. Your answers: Will be anonymous Will help me to improve my teaching strategies and the structure of the course Will help the department in planning and designing future courses Will be used by the university in promotion, tenure, and reappointment decisions

Parametric and Nonparametric Tests Non-Parametric Tests Most of the statistical tests that we have used throughout the semester have relied on certain specific assumptions about the distribution of the involved variables and/or their means, and have been set up to test hypotheses about specific population parameters Such tests (z-tests, t-tests, ANOVAs, Pearson s correlation) are called parametric tests

Parametric and Nonparametric Tests Non-Parametric Tests Though these parametric tests are robust to minor violations of their assumptions, they can lead to gross systematic errors when the data are strongly violate the underlying assumptions and can even be undefined for certain types of data (e.g., nominal or non-numerical data). Certain tests do not rely on specific distributional assumptions or test hypotheses about particular population parameters. These tests are generally called nonparametric tests The chi-square tests introduced in the last lecture and Spearman s rank correlation coefficient test were examples of nonparametric tests.

Parametric versus Nonparametric Tests The advantages of parametric tests are that Non-Parametric Tests They are more powerful (i.e., you can detect smaller effect sizes with smaller samples) than comparable non-parametric tests when the parametric assumptions are correct (or approximately correct). The hypothesis tests are more specific and easier to interpret. The advantage of nonparametric tests are that They can be used when the distribution of the population is completely unknown They tend to be more robust to ill-behaved (e.g., non-normal, heteroscedastic, & multi-modal) data They are less sensitive to outliers They can be used with nominal and ordinal data

Nonparametric tests In the last 70 years or so, statisticians have developed many different nonparametric tests. Those most widely used in the behavioral sciences tend to be rank randomization tests Rank randomization (or rank-permutation) tests are hypothesis tests based on the theoretical distribution of randomly assigned ranks. As a first step, they all require the conversion of raw scores to ordinal ranks This makes them obvious candidates for ordinal data, though these data usually still need to be modified

Rank Randomization Tests Advantages of rank-based tests: 1. Ranks are simpler, and rank-based tests are easier to compute 2. They are largely insensitive to the particular form of the population distributions and differences between the distributions underlying the scores in different samples 3. They tend to minimize the effects of large sample variances 4. They are insensitive to outlier scores and make it easier to deal with undetermined scores (e.g., time to task completion) 5. The distribution of randomly assigned ranks can be computed exactly

720 permutations Sample 1 Ranks 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 4 1 2 4 1 2 4 1 2 4 6 5 3 6 5 3 6 5 4 6 5 4 6 5 4 6 5 4 6 5 4 6 5 4 Sample 2 Ranks 4 5 6 4 6 5 5 4 6 5 6 4 6 4 5 6 5 4 3 5 6 3 6 5 5 3 6 5 6 3 4 1 2 4 2 1 1 2 3 1 3 2 2 1 3 2 3 1 3 1 2 3 2 1

720 permutations Sample 1 Ranks R 1 = Σr 1 2 3 6 1 2 3 6 1 2 3 6 1 2 3 6 1 2 3 6 1 2 3 6 1 2 4 7 1 2 4 7 1 2 4 7 1 2 4 7 6 5 3 14 6 5 3 14 6 5 4 15 6 5 4 15 6 5 4 15 6 5 4 15 6 5 4 15 6 5 4 15 Sample 2 Ranks R 2 = Σr 4 5 6 15 4 6 5 15 5 4 6 15 5 6 4 15 6 4 5 15 6 5 4 15 3 5 6 14 3 6 5 14 5 3 6 14 5 6 3 14 4 1 2 7 4 2 1 7 1 2 3 6 1 3 2 6 2 1 3 6 2 3 1 6 3 1 2 6 3 2 1 6 W s = min(r 1,R 2 ) 6 6 6 6 6 6 7 7 7 7 7 7 6 6 6 6 6 6

Non-Parametric Tests

Non-Parametric Tests

Rank Randomization Tests Popular rank-based tests: 1. The Mann-Whitney U (or Wilcoxon rank-sum) test Nonparametric analogue to the independent-samples t-test 2. The Wilcoxon signed-rank test Nonparametric analogue to the matched-samples t-test 3. The Kruskal-Wallis test Nonparametric analogue to the one-way ANOVA (independent meas.) 4. The Friedman test Nonparametric analogue to the repeated-measures ANOVA

A Note about Computing Ranks All of the rank-based tests will require that you compute ranks based on the total number of scores and from lowest to highest I.e., if you have 3 samples with 5 scores each, the lowest overall score should be assigned the rank 1 and the highest overall score should be assigned the rank 15 In case of ties, each tied score should be assigned the mean of the tied ranks I.e., if the 3 rd and 4 th lowest scores have the same value then you should assign them each a rank of 3.5, with the next highest value receiving a rank of 5. If the 7 th, 8 th, and 9 th scores are all tied, then you should assign them each a rank of 8, with the next highest value receiving a rank of 10.

Rank Randomization Tests Aside from converting the raw scores to ranks, the logical steps are similar to those for the parametric hypothesis tests: 1. State the null and alternative hypotheses about the population. 2. Use the null hypotheses to predict the characteristics that the sample ranks should have. 3. Use the samples to compute the test statistic 4. Compare the test statistic with the hypothesis prediction

The Mann-Whitney Test This is a test for independent-measures (between subjects) research designs with two groups and is thus an alternative to the independent-measures t-test The basic intuition behind the test is that: A real difference between the two treatments should cause the scores in one sample to be generally larger than the scores in the other sample If all the scores are ranked, the larger ranks should be concentrated in one sample and the smaller ranks should be concentrated in the other sample.

The Mann-Whitney Test The null and alternative hypotheses are a bit more vague than for the t-test, but still test for some sort of difference in central tendency H 0 : There is no difference between treatments. Therefore, there is no tendency for ranks in one sample to be systematically higher or lower than in the other sample H 1 : There is a difference between treatments. Therefore, the ranks in one sample should be systematically higher or lower than in the other sample

The Mann-Whitney Test When comparing two samples, the Mann-Whitney U statistic for sample 1 represents the sum of the number of scores in sample 2 outranked by scores in sample 1 Treatment A Raw Scores Ranks 27 7 2 1 9 4 48 8 6 2 15 5 Treatment B Raw Scores Ranks 71 11 63 9 18 6 68 10 94 12 8 3 The smaller of the U values is looked up in the table

Example Treatment A Raw Scores Ranks Points 27 7 2 2 1 0 9 4 1 48 8 2 6 2 0 15 5 1 6 Treatment B Raw Scores Ranks Points 71 11 6 63 9 6 18 6 4 68 10 6 94 12 6 8 3 2 30

The Mann-Whitney Test: Steps In practice, the steps for computing the test statistic (U) are: 1. Rank all the observations from smallest to largest 2. Compute the sum of the ranks in each sample, using the following formulas to compute U statistics from the ranks R: U n ( n 1 ), 2 1 1 2 2 1 R1 U n ( n R 2 2 2 1) 3. U is the smaller of the sums of these counts

Example Treatment A Raw Scores Ranks 27 7 2 1 9 4 48 8 6 2 15 5 ΣR 1 27 Treatment B Raw Scores Ranks 71 11 63 9 18 6 68 10 94 12 8 3 ΣR 2 51 U n1( n1 1) R 2 6(7) 27 27 21 6 2 1 1

U 6 n 6 n 1 2 6 In this case, U crit = 5, so we retain the null hypothesis Note: Most tables for rank permutation statistics are set up so that you reject the null hypothesis if the value is smaller than the tabled value.

The Mann-Whitney Test: Normal Approximation When n 1 and n 2 are sufficiently large (e.g., n 1,n 2 10) the distribution of rank sums becomes roughly normal and you can use a normal approximation, evaluated against a critical z value, to test for significance. U U nn n n 1 2 2 N 1 2 1 12 U U zu U

Wilcoxon s Signed Ranks Test This is a test for repeated-measures (within-subjects) research designs with two treatment conditions and is thus an alternative to the repeated-measures t-test The basic intuition behind the test is that: A real difference between the two treatments should cause the difference scores to be generally positive or negative If all the difference scores are ranked and signed (according to whether they represent increases + or decreases -), the ranks should be concentrated in either the positive or negative set.

Wilcoxon s Signed Ranks Test Again, the null and alternative hypotheses are a bit more vague than for the repeated measures t-test, but test for some sort of difference in central tendency H 0 : There is no difference between treatments. Therefore, there is no tendency for the ranks of difference scores to be generally positive or negative H 1 : There is a difference between treatments. Therefore, the ranks of the difference scores should be systematically positive or negative

Wilcoxon s Signed Rank Test: Steps The steps for computing the test statistic (T) are: 1. Compute the difference scores Non-Parametric Tests 2. Rank all the difference scores from smallest to largest absolute value and assign them positive or negative signs based on whether they represent an increment or decrement 3. Compute separate sums for the positively and negatively signed sets 4. T is the smaller of the resulting signed-rank-sums

Example (Suddath et al., 1990)

Example T = 9 N = 15 T crit is 25, which is greater than our test statistic, so we would reject the null hypothesis

Wilcoxon s Signed Ranks Test: Normal Approximation Again, when n is sufficiently large (e.g., n 20) the distribution of T becomes roughly normal and you can use a normal approximation, evaluated against a critical z value, to test for significance. T T n n1 4 n( n 1)(2n 2) 24 zt T T T n n1 T 4 n( n 1)(2n 1) 24

The Kruskal-Wallis One Way ANOVA Non-Parametric Tests This is a test for independent-measures research designs with more than two groups. As its name suggests, it is a nonparametric alternative to the parametric one-way ANOVA The basic intuition behind the test is analogous to that for the parametric one-way ANOVA: A real difference among treatments should cause the variability of scores between groups to be greater than the variability of scores within groups If all the scores are ranked the variability of rank-sums between groups should be greater than the variability of rank-sums within groups

The Kruskal-Wallis One Way ANOVA Non-Parametric Tests The null and alternative hypotheses are very similar to those in the parametric one-way ANOVA. H 0 : There is no difference between treatments. There is no tendency for ranks in any sample to be systematically higher or lower than in any other condition. H 1 : There are differences between treatments. The ranks in at least one condition are systematically higher or lower than in another treatment condition

Non-Parametric Tests 2 1 2 k i k n i j i i T ij i n M M F C x M 2 2 2 k T i k n T i i i j j i r r r n r H C Parametric ANOVA: Kruskal-Wallis: 1 1 within between df N k C df k 2 1 C N Comparison of Conceptual Formulas for the Parametric one-way ANOVA and Kruskal-Wallis Test:

The Kruskal-Wallis Test: Steps The steps for computing the test statistic (H) are: 1. Rank all scores from lowest to highest across all samples 2. Compute R, the sum of ranks in each sample 3. Plug into the following formula to solve for H: H 2 12 R i 3 N 1 N N n 1 i i

The Kruskal-Wallis Test: Normal Approximation As in the other rank-randomization tests, when N is sufficiently large, the distribution of rank-sums becomes approximately normal H is a linear combination (a weighted sum) of squared-rank sums, which means that it can be approximated by the distribution of a sum of squared normal variables For this reason, the significance of H is usually evaluated using a chi-squared distribution with k-1 degrees of freedom.

Friedman s Rank Test This is a test for repeated-measures research designs with more than two groups. It is the non-parametric analogue to a one-way repeated-measures ANOVA Just as the repeated measures ANOVA tests for consistent changes between individuals across treatment groups, Friedman s test looks for consistent rankings between individuals across treatment groups It can be used with any repeated-measures data, but is especially useful for measuring inter-rater agreement for rankings

The Friedman Test The null and alternative hypotheses are identical to those for the Kruskal-Wallis Test. H 0 : There is no difference between treatments. There is no tendency for ranks in any sample to be systematically higher or lower than in any other condition. H 1 : There are differences between treatments. The ranks in at least one condition are systematically higher or lower than in another treatment condition

The Friedman Test: Steps 2 The steps for computing the test statistic ( R ) are: 1. Rank scores across each treatment group for each individual 2. Compute R, the sum of ranks for each group 3. Plug into the following formula to compute the test statistic: 12 ( k 1) R 3 n( k 1) 2 k 2 R i nk( k 1) i

Friedman Test Example Sommelier Wine 1 Wine 2 Wine 3 A 1 2 3 B 2 1 3 C 2 1 3 D 3 2 1 E 1 2 3 F 2 1 3 G 3 2 1 R 14 11 17 12 (2) 3 n( k 1) k 2 2 R Ri nk( k 1) i 12 7(3)(4) 2 2 2 14 11 17 3(7)(4) 12 196 121 289 84 2.658 84

χ 2 Distribution Upper Tail Probability df 0.1 0.05 0.025 0.01 1 2.71 3.84 5.02 6.63 2 4.61 5.99 7.38 9.21 3 6.25 7.81 9.35 11.34 4 7.78 9.49 11.14 13.28 5 9.24 11.07 12.83 15.09 6 10.64 12.59 14.45 16.81 7 12.02 14.07 16.01 18.48 8 13.36 15.51 17.53 20.09 9 14.68 16.92 19.02 21.67 10 15.99 18.31 20.48 23.21 11 17.28 19.68 21.92 24.72 12 18.55 21.03 23.34 26.22 13 19.81 22.36 24.74 27.69 14 21.06 23.68 26.12 29.14 15 22.31 25.00 27.49 30.58 16 23.54 26.30 28.85 32.00 17 24.77 27.59 30.19 33.41 18 25.99 28.87 31.53 34.81 19 27.20 30.14 32.85 36.19 20 28.41 31.41 34.17 37.57 30 40.26 43.77 46.98 50.89 40 51.81 55.76 59.34 63.69 50 63.17 67.50 71.42 76.15 60 74.40 79.08 83.30 88.38 70 85.53 90.53 95.02 100.43 80 96.58 101.88 106.63 112.33 90 107.57 113.15 118.14 124.12 100 118.50 124.34 129.56 135.81

Friedman Test Example Sommelier Wine 1 Wine 2 Wine 3 A 1 2 3 B 2 1 3 C 2 1 3 D 3 2 1 E 1 2 3 F 2 1 3 G 3 2 1 R 14 11 17 12 (2) 3 n( k 1) k 2 2 R Ri nk( k 1) i 12 7(3)(4) 2 2 2 14 11 17 3(7)(4) 12 196 121 289 84 2.658 84 2 crit 5.99, retain H 0 In this case, we would conclude that there is no significant difference in quality between the 3 wines or that the sommeliers are unable to discern any differences.