MODIFIED PARAMETRIC BOOTSTRAP: A ROBUST ALTERNATIVE TO CLASSICAL TEST Zahayu Md Yusof, Nurul Hanis Harun, Sharipah Sooad Syed Yahaya & Suhaida Abdullah School of Quantitative Sciences College of Arts and Sciences Universiti Utara Malaysia zahayu@uum.edu.my, no_niece89@yahoo.com, sharipah@uum.edu.my, suhaida@uum.edu.my ABSTRACT Classical parametric test such as ANOVA often used in testing the equality of central tendency since this method provide a good control of Type I error and generally more powerful than other statistical methods. However, ANOVA is known to be adversely affected by non-normality, heteroscedasticity, and unbalanced design. Type I error and power rates are substantially affected when these problems occur simultaneously. Continuously using ANOVA under the influence of these problems eventually will result in unreliable findings. Normality and homogeneity are two assumptions that need to be fulfilled when dealing with classical parametric test and not all data encompassed with these assumptions. Thus, this study proposed a robust procedure that insensitive to these assumptions namely Parametric Bootstrap (PB) with a popular robust estimator, MAD n. The p-value produced by modified PB was then compared with the p-value of ANOVA and Kruskal-Wallis test. The finding showed that modified PB able to produce significant result in testing the equality of central tendency measure. Field of Research: robust statistics, education ---------------------------------------------------------------------------------------------------------------------------------- 1. Introduction ANOVA is used to determine the mean equality for more than two groups while independent samples t-test is frequently used when researchers want to make inferences about two independent groups by using the sample mean. However, a characteristic of these procedures is the fact that making inference depends on certain assumptions that need to be fulfilled. There are three main assumptions that need to be fulfilled before making inference on the classical parametric test namely: (a) collecting data from independents groups, (b) the data are normally distributed and (c) variances in the groups are equal (homoscedasticity). However, the specific interest of this study will focus only for the assumptions of normality and equality of variances in the groups since these assumptions are rarely met in real data. Violation of normality and equality of variances assumptions can have drastic effect on the result of classical parametric test especially on Type I error and Type II error (Erceg-Hurn & Mirosevich, 2008). Type I error occur when the true null hypothesis is rejected while Type II error occur when the null hypothesis is failed to reject even though it is false. The probability that Type II error will not occur is considered the power of a test. Failure to meet normality and equality of variances assumptions can cause the Type I error rate to distort. For example, the actual probability of Type I error rate should be in between of 0.025 and 0.075 if the significant level (α) is set at 0.05. This brings the meaning that the probability of Type I error must be within the level of significant bound when the null hypothesis is assumed to be true. However, violation of any of these assumptions can lead to inflated the Type I error rates and consequently will 2013, Langkawi, Malaysia. (e-isbn 978-967-11768-2-5). Organized by WorldConferences.net 353
make the Type I error contained outside the level of significant bound when the null hypothesis is assumed to be true (Wilcox & Keselman, 2010). Classical parametric test are often used in testing the equality of central tendency such as mean, mode and median. However, classical parametric test have underlying assumptions that need to be fulfilled before analyzing the data. Recently, many published articles have shown that violation in classical parametric test assumptions can give biased results (Wilcox, 2002; Lix & Keselman, 1998; Micceri, 1989). This situation attracted researchers in finding a test statistic that can control the Type I error much better under the condition of non-normality distributions and unequal variances. Thus, robust statistics was introduced as an alternative approach in handling the violation of normality distribution and equality of variances assumptions. According to Hampel (2001), robust statistics is the stability theory of statistical procedures. It means that the statistical procedures insensitive to the violation of non-normality and unequal variances and hence will provide a good controlling of Type I error rates. Several procedures have been introduced and recommended for analyzing the data when the assumptions of normality distribution and equality of variances are violated (Krishnamoorthy, Lu & Mathew, 2007; Md Yusof, Abdullah & Yahaya, 2012; Fan & Hancock, 2012). Among the earlier procedures used by researchers are Welch test (1951), James test (1951) and Box test (1953). These three tests appear to be the most prevalent in controlling Type I error rate and providing competitive power under varying variances heterogenous conditions. However, these tests can be biased when the data are both unequal variances and non-normal distribution especially when the group design is unequal. In term of robustness of these alternative procedures, no one approach is the best in all the situations. To overcome this problem, many researchers had contributed in the development of alternative approach such as robust procedures. Robust procedures involve replacing the original mean and variances with robust measures of location. For example, some researchers proposed using trimmed mean and Winsorized variances when applying alternatives approaches such as James test, Welch test and Parametric Bootstrap test (Lix & Keselman, 1998: Keselman, et al., 2002: Md Yusof, Abdullah, Syed Yahaya & Othman, 2011) since robust procedures can improve robustness. The applying test intended to provide much better Type I error control when computed with trimmed means and Winsorized variance (Lix & Keselman, 1998). Among the latest procedures in detecting differences between location measures or assessing the effects of a treatment variable across groups is a statistic known as PB which was proposed by Krishnamoorthy et al., (2007). An interesting characteristic of this statistic is that no trimming needs to be done on the data when they are skewed. It is the primary goal of this paper to investigate the robustness of this statistic towards non-normality and heteroscedasticity by combining the statistic with some other alternatives scale estimators in controlling the Type I error rates and at the same time trying to increase the power of the test. 2. Methods In this section, we discussed on the modified PB method, which combines PB statistic with one of the scale estimators suggested by Rousseuw and Croux (1993) and the approximation of the unknown sampling distribution of the modified PB was done by using the bootstrap percentile method. 2013, Langkawi, Malaysia. (e-isbn 978-967-11768-2-5). Organized by WorldConferences.net 354
2.1 PB Statistic This study will use Parametric Bootstrap procedure as test statistic. Krishnamoorthy et al. (2007) proposed Parametric Bootstrap test as a relatively new statistics for comparing the equality of central tendency such as means of independent under unequal variances in the groups. Parametric Bootstrap test involves generating sample statistics from parametric model where the parameters are replaced by their estimates using bootstrap method. The objective of Krishnamoorthy et al. (2007) study is to compare the proposed Parametric Bootstrap test with three other tests such as Welch test, James test and generalized F (GF) test. Based on the results obtained, Parametric Bootstrap intended to provide a good Type I error control and more powerful than the original Welch test, James test and the generalized F (FG) test. Parametric Bootstrap test also can provide a good controlling of Type I error even for small sample sizes and the number of groups was large where the sample test statistics, T N0 is compute using the following formula: ² T N0 = - / / However, according to Cribbie et al. (2010), Welch test using robust estimators (trimmed means and winsorized variance) provided an excellent Type I error control compared to Parametric Bootstrap test under the presence of non-normal data and unequal variances. In addition, the power for the Welch test using trimmed means and Winsorizes variances always more powerful than the original Welch test, James test and Parametric Bootstrap test. To further reducing the effect of non-normality, Cribbie et al. (2012) demonstrated the Parametric Bootstrap test using trimmed means and Winsorized variances provide better Type I error control compared to the original Welch test and Parametric Bootstrap test when comparing the central tendencies of groups with non-normal distributions and unequal variances. Apart from that, Parametric Bootstrap test with trimmed mean is a test statistic that was shown to provide a good controlled of Type I error even for small sample sizes and there are many groups (Cribbie et al., 2012). Bootstrap can be carried out by parametric approach and nonparametric approach. According to Lee (1994), Parametric Bootstrap result may be more accurate than their nonparametric version. Hence, this study will not consider the Nonparametric Bootstrap. 2.2 Scale estimator Let X = x, x,..., x ) be a random sample from any distribution and let the sample median be ( 1 2 n denoted by med i x i 2.2.1 MADn MADn is median absolute deviation about the median. Given by with b as a constant, this scale estimator is very robust with best possible breakdown point and bounded influence function. Huber (1981) identified MADn as the single most useful ancillary estimate of scale due to its high breakdown property. MADn is simple and easy to compute. The constant b is needed to make the estimator consistent for the parameter of interest. For example if the observations are randomly sampled from a normal distribution, by including b = 1.4826, the MADn will estimateσ, the standard deviation. With constant b = 1, MADn will estimate 0.75σ, and this is known as MAD. 2013, Langkawi, Malaysia. (e-isbn 978-967-11768-2-5). Organized by WorldConferences.net 355
2.3 Bootstrap Method Since the sampling distribution of PB is intractable, and its asymptotic null distribution may not be of much use for practical sample sizes, the bootstrap method is considered to give a better approximation. Therefore, to assess statistical significance in this study, percentile bootstrap method (see, e.g. Efron and Tibshirani, 1993) was used. According to Babu, Padmanabhan and Puri (1999), the bootstrap method is known to give a better approximation than the one on the normal approximation theory and this method is attractive, especially when the samples are of moderate size. Bootstrap was introduced by Efron (1979) as a computer-based method for estimating the standard error of an estimator, θˆ. This method has gained a great deal of popularity in empirical research. The word bootstrap is used to indicate that the observed data are used not only to obtain an estimate of the parameter but also to generate new samples from which many more estimates may be obtained, and hence an idea of the variability of the estimate (Staude & Sheather, 1990). The basic idea is that in the absence of any other information about a population, the values in a random sample are the best guide to the distribution, and resampling the sample is the best guide to what can be expected from resampling the population. To obtain the p-value, the percentile bootstrap method is used as follows, (1) Calculate PB based on the available data. (2) Generate bootstrap samples by randomly sampling with replacement n j observations from the * * * jth group yielding Y Y,... Y. 1 j, 2 j, n j j (3) Each if the sample points in the bootstrapped groups must be centered at their respective estimated medians. (4) Use the bootstrap sample to compute the PB statistic denoted by PB *. (5) Repeat Step 2 to Step 4 B times yielding PB situations when n 12 (Wilcox, 2005). * (6) Calculate the p-value as (# of PB 1B > PB1) / B * * * 11,PB12,..., PB1 B. B = 599 appears sufficient in most Type I error and power of test corresponding to each method will be determined and compared. 3. Analysis on Real Data The performance of the modified PB method was demonstrated on real data. Four classes (groups) of Decision Analysis (2 nd Semester 2010/2011) conducted by 4 different lecturers were chosen at random. The final marks were recorded and tested for the equality between the classes. The sample sizes for Class 1, 2, 3 and 4 were 33, 32, 19 and 20 respectively. The descriptive statistics for each of the groups and the results of the test in the form of p-values are given in Table 1 and Table 2 respectively. 2013, Langkawi, Malaysia. (e-isbn 978-967-11768-2-5). Organized by WorldConferences.net 356
Table 1: Descriptive statistics for each group Lecturer Sample size (N) Mean of the marks Std. Deviatio n Std. Error 95% Confidence Interval for Mean Minimum Maximum Lower Bound Upper Bound 1 33 72.07 15.646 2.724 66.53 77.62 7 94 2 19 70.13 9.130 2.095 65.73 74.53 56 90 3 24 73.38 10.749 2.194 68.84 77.91 60 96 4 20 79.21 6.105 1.365 76.35 82.06 68 93 Total 96 73.50 11.980 1.223 71.07 75.93 7 96 Table 2: Real Data Groups Data 66 60 80 74 94 71 90 90 78 65 7 69 74 82 71 66 79 56 69 68 1 81 73 74 76 78 74 71 55 48 78 81 88 89 2 69 69 57 65 86 57 71 71 70 74 65 67 67 90 73 85 56 74 66 96 62 81 75 80 66 60 75 65 85 76 71 61 83 82 65 73 62 92 60 3 90 66 70 65 4 93 89 85 81 81 73 85 68 73 79 73 77 75 84 73 83 78 79 80 77 We employed the Shapiro-Wilk test in order to determine the normality of data analysis since the small sample sizes was used. Based on Shapiro-Wilk test, group 2, group 3 and group 4 were found out to be non-normally distributed with the p-values of 0.130, 0.101 and 0.867. Table 3: Results of the test using different methods Methods p-value ANOVA 0.0870 Kruskall Wallis 0.0160 PB with MADn 0.0234 For comparison, the data were tested using all the three procedures mentioned in this study namely ANOVA, Kruskall Wallis and the modified PB. As can be observed in Table 2, when testing using ANOVA, the result fails to reject the null hypothesis such that the performance for all groups is equal. On contrary, when using Kruskall Wallis and modified PB method, the tests show significant result (reject the null hypothesis). The former result indicates that ANOVA fails to detect the difference which exists between the groups. Both the non parametric (Kruskall Wallis) and robust methods (modified PB) show better detection. Even though Kruskall Wallis shows stronger significance (p = 0.0160) as compared to the modified PB (p = 0.0234), but Kruskall Wallis in general only gave a brief information on the data. Thus, misrepresentation of the result could occur. 4. Conclusion The goal of this paper is to find the alternative procedures in testing location parameter for skewed distribution by simultaneously controlling the Type I error and power rates. Classical method such as ANOVA is not robust to nonnormality and heteroscedasticity. When these problem occur at the same 2013, Langkawi, Malaysia. (e-isbn 978-967-11768-2-5). Organized by WorldConferences.net 357
time, the Type I error will inflate causing spurious rejections of the null hypotheses and power of test can be substantially reduced from theoretical values, which will result in differences going undetected. Realizing the need of a good statistic in addressing these problems, we integrate the PB statistic by Krishnamoorthy et al., (2007) with the high breakdown scale estimators of Rousseuw and Croux (1993) and these new method are known as the modified PB method. This paper has shown some improvement in the statistical solution of detecting differences between location parameters. In controlling the Type I error rate, the study reported in this paper leads us to formulate the following conclusions and recommendations. When symmetry is suspect, we can avoid trimming the observations by using the PB with MADn suggested in our study. Acknowledgement The authors would like to acknowledge the work that led to this paper, which was partially funded by the RAGS provided by Ministry of Education. References Babu, G.J., Padmanabhan, A.R and Puri, M.L. (1999). Robust one-way ANOVA under possibly non regular conditions. Biometrical Journal, 41: 321 339 Box, G. E. P. (1953). Non-normality and tests on variances. Biometrika,10, 318-335. Cribbie, R. A., Fiksenbaum, L., Keselman, H. J., & Wilcox, R. R. (2012). Effect of nonnormality on test statistics for one-way independent groups designs. British Journal of Mathematical and Statistical Psychology, 65, 56-73. Efron, B. (1979). Bootstrap methods: another look at the jackknife. Annals of Statistics. 7: 1-26 Efron, B and Tibshirani, R.J. (1993). An Introduction to the Bootstrap. Chapman & Hall Inc. Erceg-Hurn, D. M., & Mirosovich, V. M. (2008). Modern robust statistical methods: An easy way to maximize the accuracy and power of your research. American Psychologist, 63, 591-601. Fan, W., & Hancock, G. R. (2012). Robust means modeling: an alternative for hypothesis testing of independent means under variance heterogeneity and nonnormality. Journal of Educational and Behavioral Statistics, 37, 137-156. Hampel, F. R. (2001). Robust statistics: A brief introduction and overview. Huber, P. J. (1981). Robust statistics. New York: Wiley. James, G. S. (1951). The comparison of several groups of observations when the ratios of the population variances are unknown. Biometrika, 38: 324-329 Keselman, H. J., Wilcox, R. R., Othman, A. R., & Fradette, K. (2002). Trimming, transforming statistics, and bootstrapping: circumventing the biasing effects of heteroscedasticity and nonnormality. Journal of Modern Applied Statistical Methods, 1, 288-309. Krishnamoorthy, K., Lu, F., & Mathew, T. (2007). A parametric bootstrap approach for ANOVA with unequal variances: Fixed and random models. Computational Statistics and Data Analysi, 51, 5731 5742. 2013, Langkawi, Malaysia. (e-isbn 978-967-11768-2-5). Organized by WorldConferences.net 358
Lee, S.M.N. (1994). Optimal choice between parametric and nonparametric bootstrap estimates. Lix, L. M., & Keselman, H. J. (1998). To trim or not to trim: Tests of location equality under heteroscedasticity and non-normality. Educational and Psychological Measurement. Math. Proc. Cambridge Philosophical Society, 115, 335-363. Md Yusof, Z. M., Abdullah, S., Syed Yahaya, S. S., & Othman, A. R. (2011). Testing the equality of central tendency measures using various trimming strategies. African Journal of Mathematics and Computer Science Research, 4(1), 32-38. Md Yusof, Z. M., Abdullah, S., & Syed Yahaya, S. S. (2012). Type I error rates of parametric, robust and nonparametric methods for two group cases. World Applied Sciences Journal, 16(12), 1815-1819. Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletion, 156-166. Rousseeuw, P.J.and Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88: 1273--283. Staudte, R.G. and Sheather, S.J. (1990). Robust Estimation and Testing. John Wiley & Sons, Inc., New York Welch, B.L. (1951). On the comparison of several mean values: an alternative approach. Biometrika, 38: 330-336 Wilcox, R. R. (2002). Understanding the practical advantages of modern ANOVA methods. Journal of Clinical Child and Adolescent Psychology, 31, 399-412. Wilcox, R. R. (2005). Introduction to robust estimation and hypothesis testing (2nd Ed.). San Diego, CA: Academic Pres. Wilcox, R. R., & Keselman, H. J. (2010). Modern robust data analysis methods: Measures of central tendency. 1-43. 2013, Langkawi, Malaysia. (e-isbn 978-967-11768-2-5). Organized by WorldConferences.net 359