MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 11: Nonparametric Methods May 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide 2 Aims of the Lecture 3 Typical Syntax 4 Introduction 5 Example... 5 Parametric vs. Nonparametric Tests 8 Basic Ideas... 8 Unrelated and Related Samples... 10 Decision Tree of Nonparametric Tests (a Selection)... 12 Nonparametric Tests with SPSS 13 Comparison of Two Unrelated Groups... 14 Comparison of Two Related Groups... 19 Comparison of Several Unrelated Groups... 24 Comparison of Several Related Groups... 29 Appendix 34
Aims of the Lecture Slide 3 You will understand the concept of ranking data. You will understand different concepts of nonparametric tests. You will understand the key steps in conducting the following nonparametric tests: Wilcoxon rank-sum test Kruskal-Wallis test Wilcoxon signed-rank test Friedman test You can apply nonparametric methods with SPSS In particular, you will know how to > interpret the output describe the output Typical Syntax Slide 4 Wilcoxon rank-sum test (Mann-Whitney U-Test) NPAR TESTS /M-W= values BY sample(1 2) /STATISTICS=DESCRIPTIVES /MISSING ANALYSIS. Dependent variable values Group variable sample Descriptive statistics Wilcoxon signed-rank test NPAR TESTS /WILCOXON=measure1 WITH measure2 (PAIRED) /STATISTICS DESCRIPTIVES /MISSING ANALYSIS. Dependent variables measure Descriptive statistics
Introduction Example Slide 5 Medical research: correlation analysis of babies' weight at birth and the increase from day 70 to day 100, n = 20 Increase in weight [Gram] 4000 3500 leverage effect 3000 2500 2000 1500 1000 500 0 2000 2500 3000 3500 4000 Weight at birth [gram] Correlation coefficient r = -.76 (Variation of r: -1 r 1) The higher the weight at birth, the lower the increase from day 70 to day 100. Problem: Leverage effect due to premature babies with weight at birth of less than 3'000 grams Average normal weight at birth is 3'400 grams Solution Take into account ranks Baby Weight at birth Rank Increase in weight Rank 1 2'740 5 2'550 17 2 3'180 8 1'790 9 3 3'150 7 1'870 10 4 3'030 6 2'040 13 5 3'370 12 1'470 6 6 2'610 4 2'130 14 7 3'570 16 2'150 15 8 2'270 1 3'350 19 9 2'300 2 3'400 20 10 2'380 3 3'230 18 11 3'260 9 820 2 12 3'350 11 1'190 3 13 3'630 19 1'360 4 14 3'640 20 1'420 5 15 3'490 14 1'960 11 16 3'290 10 1'670 7 17 3'540 15 770 1 18 3'570 17 1'700 8 19 3'460 13 2'010 12 20 3'570 18 2'490 16 Lowest weight at birth = 2'270 Rank 1 Following weight at birth = 2'300 Rank 2 etc. Lowest increase in weight = 770 Rank 1 Following increase in weight = 820 Rank 2 etc. (SPSS will allocate the ranks for you!) Slide 6 "How-to" in SPSS Scales Variables: ordinal or higher measurement level SPSS AnalyzeCorrelateBivariate... "Spearman"(Spearman rank correlation coefficient)
Slide 7 Results 20 Rank of increase in weight 15 10 5 Spearman rank correlation coefficient r = -.56 (Compare with r = -.76 from above) 0 0 5 10 15 20 Rank of weight at birth Leverage effect due to premature babies has been reduced. Parametric vs. Nonparametric Tests Slide 8 Basic Ideas All of the tests previously used in this course are based upon specific assumptions. Especially assumptions about the distribution of variables in the population: normality (variable or error term is normally distributed) homogeneity of variance (variance of different groups/units is the same) These tests are referred to as "parametric tests" because the shapes of the population distributions are described with known distributions and their parameters. Example: Variable X is normally distributed X ~ N(µ,σ 2 ) with parameters µ and σ 2. Tests that do not make such assumptions are referred to as nonparametric tests. Nonparametric tests are sometimes known as distribution-free tests because they make no assumptions about the population distribution. Nonparametric tests are used when the parametric assumptions are invalid.
Slide 9 Methods for nonparametric tests There are 2 main methods for nonparametric tests: Resampling - Permutation (Example: Fisher exact test) - Simulation (Example: Bootstrapping) Ranking All Nonparametric tests presented in this lecture work on the principle of ranking the data: High scores being represented by high rankings, low scores by low rankings The analysis is then carried out on the ranks rather than the original data Advantages Nonparametric tests can be used when nothing is known about the distribution in the population or when parametric assumptions are invalid with outliers with small samples Disadvantage By ranking the data, information about the magnitude of differences between scores is lost: Nonparametric tests have less power than the parametric test even with same sample size. It is more likely to miss a significant effect (β error). Unrelated and Related Samples Slide 10 Unrelated (independent) samples The measurement values of a person in sample 1 ( ) and a person in sample 2 ( ) are not influenced by one another. Clinic 1 Clinic 2 Treatment A B C Related (paired, dependent) samples Each measurment value in sample 1 is influenced by a particular measurement value in sample 2 (and vice versa). When multiple measurements are applied to the same subject to examine a development over time (repeated measures) to compare different treatments When different persons are tested who belong together ("natural pairings", e.g. couples) who are matched to reduce the effects of a confounding variable (e.g. matching persons with a comparable level of empathy) Placebo Creativity Item 1 Stamina Day 1 Day 14 Day 70 Treatment A B C Husband Wife Anger management training* A B C *Matched by level of intelligence
Slide 11 SPSS examples for unrelated and for related samples Unrelated Related Bone density (Slide 14) Sleeping pills (Slide 24) Sample A Women up to 50 : Sample B Women over 50 Sample: Reaction time of proband 3 after taking four different sleeping pills Decision Tree of Nonparametric Tests (a Selection) Slide 12 normal t-test one group any Sign test normal unrelated related t-test t-test two groups any unrelated related WRS test WSR test many groups normal unrelated related ANOVA Repeated ANOVA any unrelated related Kruskal-Wallis Friedman Distribution of dependent variable Samples WRS: Wilcoxon rank-sum test WSR: Wilcoxon signed-rank test
Nonparametric Tests with SPSS Slide 13 SPSS: AnalyzeNonparametric TestsLegacy Dialogs 7 "2 Independent Samples>" WRS: Wilcoxon rank-sum test (Mann-Whitney U test) "K Independent Samples>" Kruskal-Wallis test "2 Related Samples>" WSR: Wilcoxon signed-rank test "K Related Samples>" Friedman test Comparison of Two Unrelated Groups Slide 14 Wilcoxon rank-sum test (Mann-Whitney U test) Given Two independent (unrelated) samples with sample sizes n 1, n 2 (in general n 1 n 2 ) Small sample size Not normally distributed Question Do the central tendencies µ A and µ B of a characteristic differ between two unpaired samples? H 0 : Central tendencies are equal µ A = µ B H A : Central tendencies are not equal µ A µ B Example Medical research about osteoporosis: bone density in g/cm 3 Sample A: women up to and including age 50, n = 13 Sample B: women above age 50, n = 11 sample A 163 152 202 105 134 134 139 110 122 146 149 94 158 sample B 125 121 133 95 148 96 117 112 100 84 98
Procedure Slide 15 1. Sort the values of the two samples by size values sample A 94 105 110 122 134 134 139 146 149 152 158 163 202 sample B 84 95 96 98 100 112 117 121 125 133 148 2. Allocate ranks to the values ranks ranks A 2 7 8 12 15.5 15.5 17 18 20 21 22 23 24 ranks B 1 3 4 5 6 9 10 11 13 14 19 3. Calculate the sum of ranks rank sum rank sum A 205 rank sum B 95 4. Calculate the test statistic based on the sample with smaller rank sum Test statistic U = rank sum s n s (n s + 1) / 2 Values of sample k with smaller rank sum U = rank sum B n B (n B + 1) / 2 = 95 11 (11 + 1) / 2 = 29 Slide 16 5. Determine the critical value in the table below Distribution of Wilcoxon rank-sum test (Mann-Whitney U test) two sided, α = 5% n 2 9 10 11 12 13 14 15 16 17 18 19 20 The critical value is 37. n 1 8 15 17 19 22 24 26 29 31 34 36 38 41 9 17 20 23 26 28 31 34 37 39 42 45 48 10 20 23 26 29 33 36 39 42 45 48 52 55 11 23 26 30 33 37 40 44 47 51 55 58 62 12 26 29 33 37 41 45 49 53 57 61 65 69 6. Compare the test statistic with the critical value 29 non-rejection of H 0-37 0 rejection of H 0 37 non-rejection of H 0 The value of the test statistic is in the rejection region of H 0 : The bone densities of samples A and B are significantly different.
Wilcoxon rank-sum test (Mann-Whitney U test) with SPSS Slide 17 SPSS: AnalyzeNonparametric TestsLegacy Dialogs2 Independent Samples > Values of the grouping variable: Consult value labels or a frequency table NPAR TESTS /M-W= values BY sample(1 2) /STATISTICS=DESCRIPTIVES /MISSING ANALYSIS. Slide 18 Test statistic U Rank sum of smaller sample Z-value calculated for the asymptotic test "Asymp. Sig. (2-tailed)" is based on an approximation to a normal distribution. For samples with n > 30 use "Asymp. Sig." Here sample size n 30, therefore: The bone densities of samples A and B are significantly different (Exact Wilcoxon rank-sum test: U = 29, p =.013). If sample size n > 30: The bone densities of samples A and B are significantly different (Asymptotic Wilcoxon ranksum test: Z = -2.463, p =.014).
Comparison of Two Related Groups Slide 19 Wilcoxon signed-rank test Given Two related samples (sometimes called paired samples) Small sample sizes Not normally distributed Question Is there a difference between the central tendencies µ A and µ B of a characteristic in two related samples? H 0 : central tendencies are equal µ A = µ B H A : central tendencies are not equal µ A µ B Example Medical research about osteoporosis: bone density in g/cm 3 Sample: 10 women Measure 1: bone density before exercise therapy Measure 2: bone density after exercise therapy Procedure 1. Calculate differences in values of the two related data points difference = measure 2 measure 1 Slide 20 2. Write down the sign of the difference Sign of "measure 2 measure 1" 3. Assign ranks to the absolute differences Differences with a value of 0 will not be considered. 4. Sum up positive ranks and negative ranks women measure 1 measure 2 difference sign rank positive ranks negative ranks 1 202 133 69-9 9 2 163 125 38-7 7 3 94 128 34 + 6 6 4 152 121 31-5 5 5 134 148 14 + 2 2 6 139 117 1. 22 2. - 3. 3.5 4. 4. 3.5 7 110 112 2 + 1 1 8 122 100 22-3.5 3.5 9 158 85 73-10 10 10 146 84 62-8 8 sum 55 9 46
5. Calculate the test statistic Test statistic W = positive rank sum negative rank sum = 9 46 = 37 Slide 21 6. Determine the critical value in the table below Distribution of Wilcoxon signed-rank test: Number of differences not equal 0 one sided two sided T 0.95 T 0.975 5 15 na 6 17 21 7 22 24 8 26 30 9 29 35 10 35 39 11 40 46 12 44 52 7. Compare the value of the test statistic with the critical value 37 0 39 rejection of H 0 non-rejection of H 0 The value of the test statistic is inside the non-rejection region of H 0. Test (weakly) not significant Exercise therapy does not increase bone density. Wilcoxon signed-rank test with SPSS Slide 22 SPSS: AnalyzeNonparametric TestsLegacy Dialogs2 Related Samples > NPAR TESTS /WILCOXON=measure1 WITH measure2 (PAIRED) /STATISTICS DESCRIPTIVES /MISSING ANALYSIS.
Slide 23 As the rank sums approximatively follow a normal distribution, the smaller rank sum is z-standardized. Test (weakly) not significant. Therapy does not increase bone density (Wilcoxon signed-rank test: Z = -1.887, p =.059). Comparison of Several Unrelated Groups Slide 24 Kruskal-Wallis test Given Many independent (unrelated) samples with sample sizes n 1,... n k (in general n i n j, for i j) Small sample sizes Not normally distributed Question Is there a difference between the central tendencies? H 0 : central tendencies are equal H A : at least two of the central tendencies are not equal Example Test of 3 different sleeping pills (drug1, drug2, drug3) on sleep duration (measured in hours). The sleeping pills are used in three random samples (n 1 = 3, n 2 = 4, n 3 = 5) sleep duration drug1 6.2 6.9 5.1 drug2 7.1 6.2 6.2 7.9 drug3 8.4 8.8 8.6 8.2 7.2
Slide 25 Procedure 1. All values of the sample are sorted according to size sleep duration drug 1 5.1 6.2 6.9 drug 2 6.2 6.2 7.1 7.9 drug 3 7.2 8.2 8.4 8.6 8.8 2. Assign ranks to the values and calculate the rank sum for every sample ranks rank sum drug 1 1 3 5 9 drug 2 3 3 6 8 20 drug 3 7 9 10 11 12 49 3. Calculate the test statistic K k 2 R 2 2 2 j Constant 12 9 20 49 K = 3(N+ 1) = + + 3(12+ 1) = 7.71 N(N+ 1) nj 12(12+ 1) 3 4 5 j= 1 Constant = 12, applies to all numbers of samples and for all sample sizes 2 1 2 k j N =n+n +...+n k R = squared rank sum of sample j = number of samples n= j sample size of sample j 4. Determine the critical value in the table below Slide 26 The test statistic K follows a χ 2 -distribution with degrees of freedom ν = k 1 = 3 1 = 2 1 - α df 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990 0.995 1 1.07 1.32 1.64 2.07 2.71 3.84 5.02 6.63 7.88 2 2.41 2.77 3.22 3.79 4.61 5.99 7.38 9.21 10.60 3 3.66 4.11 4.64 5.32 6.25 7.81 9.35 11.34 12.84 4 4.88 5.39 5.99 6.74 7.78 9.49 11.14 13.28 14.86 5 6.06 6.63 7.29 8.12 9.24 11.07 12.83 15.09 16.75 6 7.23 7.84 8.56 9.45 10.64 12.59 14.45 16.81 18.55 7 8.38 9.04 9.80 10.75 12.02 14.07 16.01 18.48 20.28 8 9.52 10.22 11.03 12.03 13.36 15.51 17.53 20.09 21.95 9 10.66 11.39 12.24 13.29 14.68 16.92 19.02 21.67 23.59 10 11.78 12.55 13.44 14.53 15.99 18.31 20.48 23.21 25.19 11 12.90 13.70 14.63 15.77 17.28 19.68 21.92 24.73 26.76 12 14.01 14.85 15.81 16.99 18.55 21.03 23.34 26.22 28.30 13 15.12 15.98 16.98 18.20 19.81 22.36 24.74 27.69 29.82 14 16.22 17.12 18.15 19.41 21.06 23.68 26.12 29.14 31.32 15 17.32 18.25 19.31 20.60 22.31 25.00 27.49 30.58 32.80 Critical value for α = 5%: χ 2 95% = 5.99 5. Compare the value of the test statistic with the critical value (χ 2 95% = 5.99) < (K = 7.71) The null hypothesis is rejected. The rank sums differ significantly. The sleeping pills have different effects on sleep duration.
Kruskal-Wallis test with SPSS Slide 27 SPSS: AnalyzeNonparametric TestsLegacy DialogsK Independent Samples > NPAR TESTS /K-W=duration BY drug(1 3) /STATISTICS DESCRIPTIVES /MISSING ANALYSIS. Slide 28 Compare "Chi-Square" with test statistic K = 7.71 The sleeping pills have significantly different effects on sleep duration (Kruskal-Wallis test: χ 2 = 7.708, df = 2, p =.021).
Comparison of Several Related Groups Slide 29 Friedman test Given Related samples (repeated measures design) Small sample size Not normally distributed Question Is there a difference between the central tendencies? H 0 : central tendencies are equal H A : at least two of the central tendencies are not equal Example The effect on reaction time of 4 different sleeping pills (drug1, >) is measured (in milliseconds) proband drug1 drug2 drug3 drug4 1 30 28 16 34 2 14 18 10 22 3 28 28 14 30 4 24 20 18 30 5 38 34 20 44 Example: Reaction time dependent on type of sleeping pill (drug1, > drug4) of proband 3 Procedure 1. Within each person, assign ranks to the treatments Slide 30 proband drug1 drug2 drug3 drug4 1 3 2 1 4 2 2 3 1 4 3 2.5 2.5 1 4 4 3 2 1 4 5 3 2 1 4 Example: ranked reaction time (ranks 1 to 4) of proband 3 If several values have the same rank, they will be replaced by the average 2. Calculate the rank sum R j of each column (= of each treatment) proband drug1 drug2 drug3 drug4 1 3 2 1 4 2 2 3 1 4 3 2.5 2.5 1 4 4 3 2 1 4 5 3 2 1 4 R j 13.5 11.5 5.0 20.0 3. Calculate the test statistic V k Constant 2 12 2 2 2 2 V = Rj 3n(k + 1) = (13.5 + 11.5 + 5.0 + 20.0 ) 3 5(4+ 1) = 13.74 nk(k+ 1) 5 4(4+ 1) Constant = 12 j= 1 k = number of treatment levels = 4 treatment levels n = number of probands = 5 probands
4. Determine the critical value in the table below The test statistic V follows a χ 2 -distribution with degrees of freedom ν = k 1 = 4 1 = 3 Slide 31 1 - α df 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990 0.995 1 1.07 1.32 1.64 2.07 2.71 3.84 5.02 6.63 7.88 2 2.41 2.77 3.22 3.79 4.61 5.99 7.38 9.21 10.60 3 3.66 4.11 4.64 5.32 6.25 7.81 9.35 11.34 12.84 4 4.88 5.39 5.99 6.74 7.78 9.49 11.14 13.28 14.86 5 6.06 6.63 7.29 8.12 9.24 11.07 12.83 15.09 16.75 6 7.23 7.84 8.56 9.45 10.64 12.59 14.45 16.81 18.55 7 8.38 9.04 9.80 10.75 12.02 14.07 16.01 18.48 20.28 8 9.52 10.22 11.03 12.03 13.36 15.51 17.53 20.09 21.95 9 10.66 11.39 12.24 13.29 14.68 16.92 19.02 21.67 23.59 10 11.78 12.55 13.44 14.53 15.99 18.31 20.48 23.21 25.19 11 12.90 13.70 14.63 15.77 17.28 19.68 21.92 24.73 26.76 12 14.01 14.85 15.81 16.99 18.55 21.03 23.34 26.22 28.30 13 15.12 15.98 16.98 18.20 19.81 22.36 24.74 27.69 29.82 14 16.22 17.12 18.15 19.41 21.06 23.68 26.12 29.14 31.32 15 17.32 18.25 19.31 20.60 22.31 25.00 27.49 30.58 32.80 Critical value for α = 5%: χ 2 95% = 7.81 5. Compare the value of the test statistic with the critical value (χ 2 95% = 7.81) < (V = 13.74) The null hypothesis is rejected. The rank sums differ significantly. The sleeping pills cause significantly different reaction times. Friedman test with SPSS Slide 32 SPSS: AnalyzeNonparametric TestsLegacy DialogsK Related Samples > NPAR TESTS /FRIEDMAN=drug1 drug2 drug3 drug4 /STATISTICS DESCRIPTIVES /MISSING LISTWISE.
Slide 33 Compare "Chi-Square" with test statistic V = 13.74 SPSS uses a slightly different algorithm The sleeping pills have different impact on reaction times (Friedman test: χ 2 = 14.020, df = 3, p =.003). Appendix Slide 34 Which Groups Differ? Post hoc Tests after a Kruskal-Wallis Test Post hoc tests can be accessed through the output generated by the newer SPSS dialogs: SPSS: AnalyzeNonparametric TestsIndependent Samples > NPTESTS /INDEPENDENT TEST (duration) GROUP (drug) /MISSING SCOPE=ANALYSIS USERMISSING=EXCLUDE /CRITERIA ALPHA=0.05 CILEVEL=95. Note: This dialog does not allow defining which groups are compared. All groups coded in drug are being compared. Alternatively, separate Wilcoxon rank-sum tests for each of the combinations of two drugs could be conducted (using an alpha level adjustment, e.g. a Bonferroni correction).
Slide 35 Double-click table in the output Copy output Values needed for reporting the Kruskal-Wallis test Select "Pairwise comparisons" Slide 36 Copy output => Only drug=1 and drug=3 differ significantly from one another (p =.029). SPSS does not offer any choice between different alpha level adjustments. By default, a Bonferroni adjustment is carried out. Other adjustments need to be carried out manually.
Which Groups Differ? Post hoc Tests after a Friedman Test Post hoc tests can be accessed through the output generated by the newer SPSS dialogs: Slide 37 SPSS: AnalyzeNonparametric TestsRelated Samples > NPTESTS /RELATED TEST(drug1 drug2 drug3 drug4) /MISSING SCOPE=ANALYSIS USERMISSING=EXCLUDE /CRITERIA ALPHA=0.05 CILEVEL=95. Alternatively, separate Wilcoxon signed-rank tests for each of the combinations of two drugs could be conducted (using an alpha level adjustment, e.g. a Bonferroni correction). Slide 38 Double-click table in the output Copy output Values needed for reporting Friedman test Select "Pairwise comparisons"
Slide 39 Copy output => Only drug3 and drug4 differ significantly from one another (p =.001). SPSS does not offer any choice between different alpha level adjustments. By default, a Bonferroni adjustment is carried out. Other adjustments need to be carried out manually.