1 Nonparametric Statistics

Similar documents
The Wilcoxon Rank-Sum Test

THE KRUSKAL WALLLIS TEST

Non-Parametric Tests (I)

CHAPTER 14 NONPARAMETRIC TESTS

Rank-Based Non-Parametric Tests

How To Test For Significance On A Data Set

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Parametric and non-parametric statistical methods for the life sciences - Session I

II. DISTRIBUTIONS distribution normal distribution. standard scores

Difference tests (2): nonparametric

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Nonparametric Statistics

Comparing Means in Two Populations

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Nonparametric Statistics

UNDERSTANDING THE TWO-WAY ANOVA

Tutorial 5: Hypothesis Testing

Descriptive Statistics

Introduction to Quantitative Methods

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Permutation & Non-Parametric Tests

Research Methodology: Tools

The Dummy s Guide to Data Analysis Using SPSS

6.4 Normal Distribution

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Stat 5102 Notes: Nonparametric Tests and. confidence interval

Skewed Data and Non-parametric Methods

Statistical tests for SPSS

Study Guide for the Final Exam

HYPOTHESIS TESTING WITH SPSS:

Recall this chart that showed how most of our course would be organized:

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Biostatistics: Types of Data Analysis

NCSS Statistical Software. One-Sample T-Test

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Two Related Samples t Test

Statistics Review PSY379

Likert Scales. are the meaning of life: Dane Bertram

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Analysis of Variance ANOVA

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Lecture Notes Module 1

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Descriptive Statistics

The Kruskal-Wallis test:

Lesson 4 Measures of Central Tendency

Using Excel for inferential statistics

The Normal Distribution

Analysing Questionnaires using Minitab (for SPSS queries contact -)

13: Additional ANOVA Topics. Post hoc Comparisons

UNIVERSITY OF NAIROBI

4. Continuous Random Variables, the Pareto and Normal Distributions

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Non-Inferiority Tests for Two Means using Differences

Permutation Tests for Comparing Two Populations

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Analysis of Questionnaires and Qualitative Data Non-parametric Tests

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Additional sources Compilation of sources:

Section 13, Part 1 ANOVA. Analysis Of Variance

Statistics for Sports Medicine

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

1.5 Oneway Analysis of Variance

Hypothesis Testing. Reminder of Inferential Statistics. Hypothesis Testing: Introduction

StatCrunch and Nonparametric Statistics

Descriptive Statistics

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

MEASURES OF LOCATION AND SPREAD

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Projects Involving Statistics (& SPSS)

Descriptive Statistics and Measurement Scales

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U

Hypothesis Testing --- One Mean

Dongfeng Li. Autumn 2010

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Standard Deviation Estimator

Paired T-Test. Chapter 208. Introduction. Technical Details. Research Questions

Chapter 4. Probability and Probability Distributions

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

NCSS Statistical Software

Statistics 2014 Scoring Guidelines

Week 3&4: Z tables and the Sampling Distribution of X

SPSS Guide How-to, Tips, Tricks & Statistical Techniques

Descriptive Analysis

HYPOTHESIS TESTING: POWER OF THE TEST

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

3.4 Statistical inference for 2 populations based on two samples

Transcription:

1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions about the values of the parameters with the help of inferential statistics. These methods are referred to as Parametric Statistics. In nonparametric statistics we will not built such a model, and will not have to assume that the population is normally distributed. This is very relevant, because data in many cases can not be assumed to come from a normal distribution. Another advantage of nonparametric test is that they are applicable, even when the data is nonnumerical but can be ranked (ordinal data, with many categories), like Likert type scales, often used in psychology. Many of these methods are based on the ranks of the measurements in the sample and are therefore called rank statistics or rank tests. We will introduce rank tests for one sample, comparing two samples (independent and paired), and more than two samples). We will find competing methods for one sample -, two sample -, paired t, and One-Way ANOVA. 1.1 The Sign Test The sign test can be used to test hypotheses about the central tendency of nonnormal distributions. In order to measure the central tendency the mean would not be an appropriate measure for the center, because distribution might be skewed or have outliers. Instead we will use the median as a measure for the center of the distribution. Remember: The median of a distribution is the value, so that the probability to fall below equals 0.5, or 50% of the measurements in the population fall below. The median of a sample, is the value so that 50% of the sample data fall below the median. The sign test for a population median η 1. Hypotheses: test type hypotheses upper tail H 0 : η η 0 vs. H a : η > η 0 lower tail H 0 : η η 0 vs. H a : η < η 0 two tail H 0 : η = η 0 vs. H a : η η 0 Choose α. 2. Assumptions: Random sample. 3. Test statistic: S 0 test type test statistic upper tail number of measurements greater than η 0 lower tail number of measurements less than η 0 two tail max(s 1,S 2 ), where S 1 = number of measurements greater than η 0 and S 2 = number of measurements less than η 0 Note: Ignore all measurement that are equal to η 0. 1

4. P-value: test type P-value upper tail P (x S 0 ) lower tail P (x S 0 ) two tail 2P (x S 0 )) and p = 0.5. where x has a binomial distribution with parameters n=sample size 5. Decision: If P value < α reject H 0, otherwise do not reject H 0. 6. Put into context The guiding idea for this test statistic is based on the following: Assume the median equals η 0, then about 50% of the sample data should fall above η 0. But if the median is smaller than η 0, then less than 50% of the data is expected to fall above η 0. That is when testing H 0 : η η 0, we would expect about or less than 50% of the data to fall above η 0. In fact the distribution of S 0, if H 0 is true, is a binomial distribution with parameters n and p = 0.5. The P-value for the test statistic, S 0, now finds how likely this number of measurements falls above η 0, if H 0 is true. Example 1 The ammonia level (in ppm) is studied near an exit ramp of a San Francisco highway tunnel. The daily ammonia concentration on eight randomly chosen days during afternoon drive-time are reproduced below: 1.37, 1.41, 1.42, 1.48, 1.50, 1.51, 1.53, 1.55 Do the the data provide sufficient evidence that the median daily ammonia concentration exceeds 1.5ppm. To answer this question conduct a sign test. 1. Hypotheses: H 0 : η 1.5 vs. H a : η > 1.5 Choose α = 0.05. 2. Assumptions: Random sample. The days the measurements were taken were chosen randomly. 3. Test statistic: (uppertail test ) S 0 = number of measurements greater than 1.5ppm = 3 4. P-value = P (x S 0 ) = P (x 3) = 1 P (x < 3) = 1 P (x 2) = 1 0.209 = 0.791 (k = 2, n = 8, p = 0.5) 5. Decision: Since the P value = 0.791 > α = 0.05 do not reject H 0. 6. The data do not provide sufficient evidence that the daily ammonia concentration at this location exceeds 1.5ppm on 50% of days, or most of the time, or on the majority of days. 2

If the sample size is too large the calculation of the binomial probabilities becomes very time consuming (even for a computer). Then we can use a test statistic which is based on the normal approximation of the binomial distribution: Replace steps 3 and 4 above by the following steps 3. Test statistic z 0 = (S 0 0.5) (0.5)n (0.5) n 4. P-value: test type P-value upper tail P (z > z 0 ) lower tail P (z > z 0 ) two tail 2P (z > z 0 )) Use this test statistic for n 10. Example 2 The above experiment is repeated with a higher sample size of 42. Out of the 42 days the ammonia concentration exceeded 1.5ppm on 24 days. 3. Test statistic z 0 = (24 0.5) (0.5)42 (0.5) 42 = 0.77 4. P-value = P (z > 0.77)=1-(0.7794)= 0.2206 (z-table) 5. The P-value is greater than 0.05, do not reject H 0. 5. At significance level of 0.05 the data do not provide sufficient evidence, that at this location half of the days the ammonia concentration exceeds 1.5ppm. Comment: Handling Ties When two measurements are equal, we say there is a tie. For tied measurements we will assign the mean rank that the measurements would have received. measurements: 1.2 2.4 2.4 3.1 4.7 9.5 rank: 1 2 3 4 5 6 rank(ties): 1 2.5 2.5 4 5 6 1.2 Rank Sum Test Wilcoxon Test Suppose measures of central tendency of two populations shall be compared but it is not save to assume that the populations are normally distributed. If the sample sizes are not large it is not appropriate to use the t-test. In this case we will use the Wilcoxon Rank Sum Test. Instead of using the original measurements, all sample data is ranked from smallest to largest and according to its position in the combined data set the rank is assigned. 3

Example 3 Six economists who work for the federal government and seven university economists are randomly selected, and each asked to predict next year s percentage change in cost of living as compared with this year s figure. The objective is to compare the government economists prediction to those of the university economists. The data: Government University 3.1 4.4 4.8 5.8 2.3 3.9 5.6 8.7 0.0 6.3 2.9 10.5 10.8 Experience has shown that such predictions tend to be skewed, and because of the small sample sizes a t-test is not appropriate. To check if the data provide sufficient evidence that the center of the distributions of predictions are not the same for government and university economists, instead of using the raw scores, ranks are used. Combine the two samples, sort from smallest to largest and assign ranks according to the position: Government University Prediction Rank Prediction Rank 3.1 4 4.4 6 4.8 7 5.8 9 2.3 2 3.9 5 5.6 8 8.7 11 0.0 1 6.3 10 2.9 3 10.5 10.8 13 The total ranks for the two groups are good indicators if the centres are different. If the medians for the two populations would be the same the average rank for the two groups should be close. If the mean rank for one group is much smaller than for the other this indicates that the median for the first group is most likely smaller than for the second (more of the small numbers came from group 1). T G = 4 + 7 + 2 + 8 + 1 + 3 = 25 T U = 6 + 9 + 5 + 11 + 10 + + 13 = 66 The total of the ranks is always equal to the total of the numbers from 1 to n = n 1 + n 2, n(n + 1)/2. Since the total is fixed,the total for group 1 can serve as a test statistic and the average is not required. 4

Example 4 In a genetic inheritance study discussed by Margolin [1988], samples of individuals from several ethnic groups were taken. Blood samples were collected from each individual and several variables measured. We shall compare Native American and Caucasian with respect to the variable MSCE (mean sister chromatid exchange). Sister Chromatic Exchange occurs normally in cells during or cell division, but when a cells DNA is damaged by genotoxic agents, the rate of SCE increases. It is thought that SCE is an attempt by the cell to fix the DNA damage caused by genotoxic agents; therefore, you would expect a more potent genotoxic agent to generate a higher rate of SCE. The data is as follows: Native American: 8.50 9.48 8.65 8.16 8.83 7.76 8.63 8.34 9.01 7.23 Caucasian: 8.27 8.20 8.25 8.14 9.00 8.10 7.20 8.32 7.70 8.43 7.20 7.23 7.70 7.76 8.10 8.14 8.16 8.20 8.25 8.27 8.32 8.34 8.43 8.50 Race: C N C N C C N C C C C N C N Rank: 1 2 3 4 5 6 7 8 9 10 11 13 14 8.63 8.65 8.83 9.00 9.01 9.48 Race: N N N C N N Rank: 15 16 17 18 19 20 Then the sum of ranks for the Native American sample is T N = 2 + 4 + 7 + + 14 + 15 + 16 + 17 + 19 + 20 = 6 and the sum of ranks for the Caucasian sample is T C = 1 + 3 + 5 + 6 + 8 + 9 + 10 + 11 + 13 + 18 = 84 The total of all ranks = 1 + 2 +... + 19 + 20 = 20(20 + 1)/2 = 210. Since T 1 + T 2 is the same for given sample sizes, it means that a small T 1 indicates a large T 2 and viceversa. The rank sum of sample 1 is used as a test statistic to conduct tests for comparing the population distributions or medians. Let D 1 and D 2 denote the distributions of population 1 and population 2, respectively. We will use the Wilcoxon test to detect location shifts. 5

In the first diagram, the density curves of the 2 distribution match each other, there is no shift (D 1 = D 2 ). In this case we would expect so see a random pattern in the ranks of the measurements from the two samples, which would result in the mean rank for the samples to be about the same. The distribution of the rank sum under the assumption that D 1 = D 2 has been worked out and computer programs are available to obtain probabilities for this random variable. In the second diagram, the density curve of population 2 has the same shape as the one for population 1, but shifted to the right, denote (D 1 < D 2 ). In such a situation it should be expected that measurements from population one have lower ranks and measurements from population 2 have higher ranks. The mean rank for sample one should be expected to be smaller than for the sample from distribution 2. In the case that D 1 > D 2, for similar reasons it is expected that the mean rank for sample 1 is large. In conclusion: The mean rank (or the rank sum) of sample 1 reflects the location shift of distributions and is suitable to be used as a test statistic. For paper and pencil applications of this test statistic, finding the p-value is cumbersome. In this case use the normal approximation of the rank sum as test statistic. The approximation is based on the following result. Let T 1 be the rank sum of sample 1 out of two random samples. If D 1 = D 2 and the sample sizes are large, then T 1 is approximately normally distributed with mean and standard deviation µ 1 = n 1(n 1 + n 2 + 1) 2 n1 n 2 (n 1 + n 2 + 1) σ 1 =. 6

The Wilcoxon Rank Sum Test 1. Hypotheses: Let D 1 and D 2 denote the distributions of population 1 and population 2, respectively. test type hypotheses upper tail H 0 : D 1 D 2 vs. H a : D 1 > D 2 lower tail H 0 : D 1 D 2 vs. H a : D 1 < D 2 two tail H 0 : D 1 = D 2 vs. H a : D 1 D 2 Choose α. 2. Assumptions: random samples from continuous distribution (not many ties) 3. Test statistic: Both sample sizes are at least 10. T 1 =sum of ranks for sample 1. 4. P-value: test type P-value upper tail P (z > z 0 ) lower tail P (z < z 0 ) two tail 2P (z > abs(z 0 )) z 0 = T 1 n 1(n 1 +n 2 +1) 2, n1 n 2 (n 1 +n 2 +1) 5. Decision: If P value < α reject H 0, otherwise do not reject H 0. 6. Put into context Continue example (Economists): Test if the center of the government economists is smaller than the center for the university economists: 1. Hypotheses: H 0 : D(G) D(U) vs. H a : D(G) < D(U) Let α = 0.05. 2. Assumptions: random samples have to be from continuous distribution, the percentage is continuous. 3. Test statistic: Sample sizes are too small to use normal approximation (but we will do it anyway for demonstration purpose). T (G)=25, n G = 5, n U = 6. z 0 = 25 6(6+7+1) 2 6 7(6+7+1) = 25 42 49 = 2.43, 4. P-value: lower tail P-value=P (z < z 0 ) = P (z < 2.43) = 0.0075 7

5. Decision: Since P value < 0.05 = α reject H 0. 6. At significance level of 0.05 the data do provide sufficient evidence that the median predictions of next year s change in cost of living by government economists is less than the median predictions by university economists. Government economists are more optimistic about the future than university economists. Continue example (genetic inheritance study): 1. Hypotheses: H 0 : D(N) = D(C) vs. H a : D(N) D(C) Let α = 0.05. 2. Assumptions: random samples have to be from continuous distribution, the MSCE is continuous. 3. Test statistic: Sample sizes are at large enough to use normal approximation. T (N)=6, n N = n C = 10. z 0 = 6 10(10+10+1) 2 10 10(10+10+1) = 6 105 2100/ = 1.587, 4. P-value: two tail P-value=2P (z > abs(z 0 )) = 2(1 0.9429) = 0.1142 5. Decision: Since P value > 0.05 = α do not reject H 0. 6. At significance level of 0.05 the data do not provide sufficient evidence that there is a difference in the distribution of MSCE for the Native American and Caucasian population. Comment: Handling Ties For tied measurements we will assign the mean rank that the measurements would have received (like for the sign test). measurements: 1.2 2.4 2.4 3.1 4.7 9.5 rank: 1 2 3 4 5 6 rank(ties): 1 2.5 2.5 4 5 6 1.3 Signed Rank Test For comparing paired samples one should use the signed rank test. It is based on the pair differences like the paired t-test, and on a rank sum similar to the Wilcoxon rank sum test. Example 5 Mileage tests are conducted to compare a new versus a conventional spark plug. A sample of cars ranging form subcompacts to full-sized sedans are included in the study. Cars were driven with new and the conventional plug and the mileage was recorded. Based on the data we want to decide if the mileage increases with the new plug, that is test H 0 : D N D C versus H a : D N > D C The data: 8

Car New Conventional Difference Rank Number A B A-B 1 26.4 24.3 2.1 + 2 10.3 9.8 0.5 4+ 3 15.8 16.9-1.1 8-4 16.5 17.2-0.7 5-5 32.5 30.5 2.0 11+ 6 8.3 7.9 0.4 3+ 7 22.1 22.4-0.3 2-8 30.1 28.6 1.5 9+ 9.9 13.1-0.2 1-10.6 11.6 1.0 7+ 11 27.3 25.5 1.8 10+ 9.4 8.6 0.8 6+ Now the data is ranked according to the ABSOLUTE value of the difference. positive and negative measurements. The test statistic to use will be the sum of the ranks with a labelled +. Here: Assign labels for T + = + 4 + 11 + 3 + 9 + 7 + 10 + 6 = 62 T + and T will for all samples of size add up to 1+2+3+...+= (13)/2=78. We will have to judge T + if it indicates a shift in one of the distributions. The argument follows the same line, as for the Wilcoxon rank sum test. When D 1 = D2, then the rank sums should be about the same, when D 1 < D 2, we have to expect more of the differences to be negative, and T + to be small, and when D 1 > D 2, we should expect more positive differences and T + to be large. Again the distribution of T + was worked out, and computer programs can work out the P-value for the Statistic. For pencil and paper applications, we will use the normal approximation of that distribution, which is appropriate to use for sample sizes of at least 25. When D 1 = D 2 and the sample size is large and the distribution of differences is symmetric, then T + is approximately normally distributed with mean and standard deviation µ + = n(n + 1) 4 n(n + 1)(2n + 1) σ + =. 24 Symmetry of the pairwise differences follows if for example the distributions for the first and second population have the same shape. The Signed Rank Test 9

1. Hypotheses: Let D 1 and D 2 denote the distributions of population 1 and population 2, respectively. test type hypotheses upper tail H 0 : D 1 D 2 vs. H a : D 1 > D 2 lower tail H 0 : D 1 D 2 vs. H a : D 1 < D 2 two tail H 0 : D 1 = D 2 vs. H a : D 1 D 2 Choose α. 2. Assumptions: Paired random samples from continuous distribution (not many ties), with a symmetric distribution for the differences. 3. Test statistic: Sample size of pairs is at least 10. T + =sum of ranks for positive differences. 4. P-value: test type P-value upper tail P (z > z 0 ) lower tail P (z < z 0 ) two tail 2P (z > abs(z 0 )) z 0 = T + n(n+1) 4 n(n+1)(2n+1) 5. Decision: If P value < α reject H 0, otherwise do not reject H 0. 6. Put into context Continue Example: 24, 1. Hypotheses: Let D 1 and D 2 denote the distributions of mileage using the new plug and using the conventional plug, respectively. upper tail H 0 : D 1 D 2 vs. H a : D 1 > D 2 Choose α = 0.05. 2. Assumptions: Random sample, Mileage is continuous we do not too have many ties. It can be assumed that the distributions have the same shape and therefore the differences are symmetrically distributed. 3. Test statistic: Sample sizes is n =. T + =62 z 0 = T + n(n+1) 4 n(n+1)(n+2) 24 = 62 (13) 4 (13)(25) 24 = 1.841 4. P-value: upper tail P (z > 1.841) 1 0.9671 = 0.0329 10

5. Decision: P value < α = 0.05 reject H 0 6. At significance level of 5% the data provide sufficient evidence that the new plug increases the mileage. 1.4 Kruskal-Wallis H Test We will introduce on more nonparametric Test. Here we will use ranks for the analysis of more than 2 populations when the response variable is quantitative. This will become an alternative for 1-way ANOVA. This test is also based on the ranks assigned to the all measurements in the combined data set. The following example shall illustrate the test statistic. Example 6 Three different methods for memorizing information are compared in an experiment. Eighteen participants are randomly assigned to three groups of 6 individuals. Each group uses a different method of memorization to memorize a list of 50 words. Then all participants are tested to determine how many of the words on the list they can resall. The data: Method 1: 23, 27, 30, 32, 33, 36 Method 2: 28, 30, 31, 35, 37, 40 Method 3: 16, 18, 22, 24, 27, 37 First we will assign ranks for each measurement, according to its placement in the combined data set. The ranks are: Method 1: 4, 6.5, 9.5,, 13, 15 Method 2: 8, 9.5, 11, 14, 16.5, 18 Method 3: 1, 2, 3, 5, 6.5, 16.5 Now find the mean rank for each sample: Method 1: R1 = 60/6 = 10 Method 2: R2 = 77/6 =.83 Method 3: R3 = 34/6 = 5.67 Then the mean of the mean ranks is always R = total of ranks)n = n+1 = 9.5, because total of 2 ranks=n(n + 1)/2 The test statistic is comparing these mean ranks with R. H = n(n + 1) ni ( R i n + 1 ) 2 = 2 18(19) (6(10 9.5)2 + 6(.83 9.5) 2 + 6(5.67 9.5) 2 ) = 5.4753 If D 1 = D 2 = D 3, then the mean ranks would be close to R, and H would be small, but if at least one of the distributions is (let s say) shifted to right, the that rank mean would tend to be larger than R, and H would tend to be larger. This way H is sensitive to differences in the populations under investigation. The distribution of H if the distributions are equal is approximately χ 2 distributed with df = k 1. Based on this result we get the The Kruskal-Wallis Test 11

1. Hypotheses: H 0 : D 1 = D 2 =... = D k vs. H a : at least on distribution is shifted to the right or left. Choose α. 2. Assumptions: Random samples of independent measurements, numerical response variable 3. Test statistic: H 0 = n(n + 1) ni ( R i n + 1 ) 2, df = k 1 2 4. P-value=P (χ 2 > H 0 ) (table VII) 5. Decision: If P value < α reject H 0, otherwise do not reject H 0. 6. Put into context Continue example: 1. Hypotheses: H 0 : D 1 = D 2 = D 3 vs. H a : at least on distribution is shifted to the right or left. Choose α = 0.05. 2. Assumptions: We base this on random samples, number of words recalled is numerical 3. Test statistic: H 0 = 5.4753 df = 2 4. P-value=P (χ 2 > 5.4753), from table VII find 0.05 <P-value< 0.1. 5. Decision: Since P value > α = 0.05 do not reject H 0. 6. The data do not provide sufficient evidence against the assumption that the distributions of recollected words are not the same for the three memorization methods. Multiple Comparison If an analysis returns a significant Kruskal-Wallis Test we conclude that at least on of the distribution is different from the others. In this case the same question comes like after a significant ANOVA test: Where are the differences? To answer this question a multiple comparison should be conducted: Bonferroni as post hoc analysis for Kruskal Wallis Test Choose experiment wise error rate α.

1. When comparing k distributions let m = k(k 1)/2 and the comparison wise error rate α = α/m. 2. Use the Wilcoxon Rank Sum test to compare every pair of treatments at significance level α. Then all significant results hold simultaneously at experiment wise error rate of α. Continue Example: If we would have used α = 0.1 in the example above, we would have decided that the data provide sufficient evidence to conclude that the distribution are not all the same. Let us continue that example for α = 0.1, and conduct a multiple comparison of the three distributions. m = 3 and α = 0.1/3 = 0.0333 1. We test: H 0 : D 1 = D 2 vs. H a : D 1 D 2 H 0 : D 1 = D 3 vs. H a : D 1 D 3 H 0 : D 2 = D 3 vs. H a : D 2 D 3 α = 0.0333. 2. Assumptions: checked already 3. Test statistic: Use normal approximation for demonstration purpose. Ranks for comparing 1 and 2 test statistic Method 1: 1, 2, 4.5, 7, 8, 11 T 1 = 33.5 z 0 = 0.88 Method 2: 3, 4.5, 6, 9, 10, T 2 = 44.5 Ranks for comparing 1 and 3 Method 1: 4, 6.5, 8, 9, 10, 11 T 1 = 48.5 z 0 = 1.52 Method 3: 1, 2, 3, 5, 6.5, T 1 = 29.5 Ranks for comparing 2 and 3 Method 2: 6, 7, 8, 9, 10.5, T 2 = 52.5 z 0 = 2.16 Method 3: 1, 2, 3, 4, 5, 10.5 T 3 = 25.5 z 0 = T 6(13) 2 6(6)(13) = T 39 39, 4. P-value: 2 tailed test Comparison P-value 1-2 2(1-0.8106)= 0.3788 1-3 2(1-0.9357)= 0.86 2-3 2(1-0.9846)= 0.0308 5. Decision: Since only the p-value for the comparison of method 2 and 3 falls below α = 0.0333, we only reject that D 2 = D 3. 6. At significance level of 0.1, we conclude that method 2 results in a significantly higher number of recollected words than method 3. Method 1 is neither significantly different from method 2 nor from method 3. 13

3 1 2 ------- ------ 14