Beyond the F Test: Effect Size Confidence Intervals and Tests of Close Fit in the Analysis of Variance and Contrast Analysis


 Abner Parsons
 3 years ago
 Views:
Transcription
1 Psychological Methods 004, Vol. 9, No., Coyright 004 by the American Psychological Association X/04/$1.00 DOI: / X Beyond the F Test: Effect Size Confidence Intervals and Tests of Close Fit in the Analysis of Variance and Contrast Analysis James H. Steiger Vanderbilt University This article resents confidence interval methods for imroving on the standard F tests in the balanced, comletely betweensubjects, fixedeffects analysis of variance. Exact confidence intervals for omnibus effect size measures, such as and the rootmeansquare standardized effect, rovide all the information in the traditional hyothesis test and more. They allow one to test simultaneously whether overall effects are (a) zero (the traditional test), (b) trivial (do not exceed some small value), or (c) nontrivial (definitely exceed some minimal level). For situations in which singledegreeoffreedom contrasts are of rimary interest, exact confidence interval methods for contrast effect size measures such as the contrast correlation are also rovided. The analysis of variance (ANOVA) remains one of the most commonly used methods of statistical analysis in the behavioral sciences. Most ANOVAs, esecially in exloratory studies, reort an omnibus F test of the hyothesis that a main effect, interaction, or simle main effect is recisely zero. In recent years, a number of authors (Cohen, 1994; Rosnow & Rosenthal, 1996; Schmidt, 1996; Schmidt & Hunter, 1997; Serlin & Lasley, 1993; Steiger & Fouladi, 1997) have sharly questioned the efficacy of tests of this nil hyothesis. Several of these critiques have concentrated on ways that the nil hyothesis test fails to deliver the information that the tyical behavioral scientist wants. However, a number of the articles have also suggested, more or less secifically, relacements for or extensions of the null hyothesis test that would deliver much more useful information. The suggestions have develoed along several closely related lines, including the following: I exress my gratitude to Michael W. Browne, Stanley A. Mulaik, the late Jacob Cohen, William W. Rozeboom, Rachel T. Fouladi, Gary H. McClelland, and numerous others who have encouraged this roject. Corresondence concerning this article should be addressed to James H. Steiger, Deartment of Psychology and Human Develoment, Box 51 Peabody College, Vanderbilt University, Nashville, TN Eliminate the emhasis on omnibus tests, with attention instead on focused contrasts that answer secific research questions, along with calculation of oint estimates and aroximate confidence interval estimates for some correlational measures of effect size (e.g., Rosenthal, Rosnow, & Rubin, 000; Rosnow & Rosenthal, 1996).. Calculate exact confidence interval estimates of measures of standardized effect size, using an iterative rocedure (e.g., Smithson, 001; Steiger & Fouladi, 1997). 3. Perform tests of a statistical null hyothesis other than that of no difference or zero effect (e.g., Serlin & Lasley, 1993). As roonents of the first suggestion, Rosnow and Rosenthal (1996) discussed several tyes of correlation coefficients that are useful in assessing exerimental effects. Their work is articularly valuable in situations in which the researcher has questions that are best addressed by testing single contrasts. Rosnow and Rosenthal emhasized the use of the Pearson correlation, rather than the squared multile correlation, artly because of concern that the latter tends to resent an overly essimistic icture of the value of small exerimental effects. The second suggestion, exact interval estimation, has been gathering momentum since around The movement to relace hyothesis tests with confidence intervals stems from the fundamental realization that, in many if not most situations, confidence intervals rovide more of the information that the scientist is truly interested in. For examle, in a twogrou exeriment, the scientist is more interested in knowing how large the difference between the two grous is (and how recisely it has been determined) 164
2 BEYOND THE F TEST 165 than whether the difference between the grous is exactly zero. The third suggestion, which might be called tests of close fit, has much in common with the aroach widely known to biostatisticians as bioequivalence testing and is based on the idea that the scientist should not be testing erfect adherence to a oint hyothesis but should relace the test of close fit with a relaxed test of a more aroriate hyothesis. Tests of close fit share many of their comutational asects with the exact interval estimation aroach in terms of the software routines required to comute robability levels, ower, and samle size. They remain within the familiar hyothesistesting framework, while roviding imortant ractical and concetual gains, esecially when the exerimenter s goal is to demonstrate that an effect is trivial. In this article, I resent methods that imlement; suort; and, in some cases, unify and extend major suggestions (1) through (3) discussed above. First I briefly review the history, rationale, and theory behind exact confidence intervals on measures of standardized effect size in ANOVA. I then rovide detailed instructions, with examles, and software suort for comuting these confidence intervals. Next I discuss a general rocedure for assessing effects that are reresented by one or more contrasts, using correlations. Included is a oulation rationale, with samling theory and an exact confidence interval estimation rocedure, for one of the correlational measures discussed by Rosnow and Rosenthal (1996). Although the initial emhasis is on confidence interval estimation, I also discuss how the same technology that generates confidence intervals may be used to test hyotheses of minimal effect, thus imlementing the good enough rincile discussed by Serlin and Lasley (1993). Exact Confidence Intervals on Standardized Effect Size The notion that hyothesis tests of zero effect should be relaced with exact confidence intervals on measures of effect size has been around for quite some time but was somewhat imractical because of its comutational demands until about 10 years ago. A general method for constructing the confidence intervals, which Steiger and Fouladi (1997) referred to as noncentrality interval estimation, is considered elementary by statisticians but seldom is discussed in behavioral statistics texts. In this section, I review some history, then describe the method of noncentrality interval estimation in detail. Rationale and History Suose that, as a researcher, you test a drug that you believe enhances erformance. You erform a simle twogrou exeriment with a doubleblind control. In this case, you are engaging in reject suort (RS) hyothesis testing (rejecting the null hyothesis will suort your belief). The null and alternative hyotheses might be H 0 : 1 ; H 1 : 1. (1) The null hyothesis states that the drug is no better than a lacebo. The alternative, which the investigator believes, is that the drug enhances erformance. Rejecting the null hyothesis, even at a very low alha such as.001, need not indicate that the drug has a strong effect, because if samle size is very large relative to the samling variability of the drug effect, even a trivial effect might be declared highly significant. On the other hand, if samle size is too low, even a strong effect might have a low robability of creating a statistically significant result. Statistical ower analysis (Cohen, 1988) and samle size estimation have been based on the notion that calculations made before data are gathered can hel to create a situation in which neither of the above roblems is likely to occur. That is, samle size is chosen so that ower will be high, but not too high. There is an alternative situation, accet suort (AS) testing, that attracts far less attention than RS testing in statistics texts and has had far less imact on the oular wisdom of hyothesis testing. In AS testing, the statistical null hyothesis is what the exerimenter actually wishes to rove. Acceting the statistical null hyothesis suorts the researcher s theory. Suose, for examle, an exeriment rovides convincing evidence that the abovementioned drug actually works. The next ste might be to rovide convincing evidence that it has few, or accetably low, side effects. In this case two grous are studied, and some measure of side effects is taken. The null hyothesis is that the exerimental grou s level of side effects is less than or equal to the control grou s level. The researcher (or drug comany) suorting the research wants not to reject this null hyothesis, because in this case acceting the null hyothesis suorts the researcher s oint of view, that is, that the drug is no more harmful than its redecessors. In a similar vein, a comany might wish to show that a generic drug does not differ areciably in bioavailability from its brand name equivalent. This roblem of bioequivalence testing is well known to biostatisticians and has resulted in a very substantial literature (e.g., Chow & Liu, 000). Suose that Drug A has a wellestablished bioavailability level A, and an investigator wishes to assess the bioequivalence of Drug B with Drug A. One might engage in AS testing, that is, test the null hyothesis that A B ()
3 166 STEIGER and declare the two drugs bioequivalent if this null hyothesis is not rejected. However, the erils of such AS testing are even greater than in RS testing. Secifically, simly running a sloy, lowower exeriment will tend to result in nonrejection of the null hyothesis, even if the drugs differ areciably in bioavailability. Thus, aradoxically, someone trying to establish the bioequivalence of Drug B with Drug A could virtually guarantee success simly by using too small a samle size. Moreover, with extremely large samle sizes, Drug B might be declared nonequivalent to Drug A even if the difference between them is trivial. Because of such roblems, biostatisticians decided long ago that the test for strict equality is inaroriate for bioavailability studies (Metzler, 1974). Rather, a dual hyothesis test should be erformed. Suose that the Food and Drug Administration has determined that any drug with bioavailability within 0% of A may be considered bioequivalent and rescribed in its stead. Suose that 1 and reresent these bioequivalence limits. Then establishing bioequivalence of Drug B with Drug A might amount to rejecting the following hyothesis, against the alternative H 0 : B or B 1 (3) H a : 1 B. (4) In ractice, this usually amounts to testing two onesided hyotheses, and H 01 : B 1 versus H a1 : B 1 (5) H 0 : B versus H a : B. (6) An alternative aroach (Westlake, 1976) is to construct a confidence interval for B. Bioequivalence would be declared if the confidence interval falls entirely within the established bioequivalence limits. In other contexts, articularly the more exloratory studies erformed in sychology, the research goal may be simly to inoint the nature of a arameter rather than to decide whether it is within a known fixed range. In that case, reorting the endoints of a confidence interval (without announcing an associated decision) may be an aroriate conclusion to an analysis. In any case, because the hyothesis test may be erformed with the confidence interval, it seems that the confidence interval should always be reorted. It contains all the information in a hyothesis test result, and more. In structural equation modeling, which includes factor analysis and multile regression as secial cases, statistical testing rior to 1980 was limited to a chisquare test of erfect fit. In this rocedure, the statistical null hyothesis is that the model fits erfectly in the oulation. This hyothesis test was erformed, and a model was judged to fit the data sufficiently well if the null hyothesis was not rejected. There was widesread dissatisfaction with the test, because no model would be exected to fit erfectly, and so large samle sizes usually led to rejection of a model, even if it fit the data quite well. In this arrangement, enhanced recision actually worked against the researcher s interests. Steiger and Lind (1980) suggested that the traditional null hyothesis test of erfect fit of a structural model be relaced by a confidence interval on the rootmeansquare error of aroximation (RMSEA), an index of oulation badness of fit that comensated for the comlexity of the model. MacCallum, Browne, and Sugawara (1996) suggested augmenting the confidence interval with a air of hyothesis tests. They considered a oulation RMSEA value of.05 to be indicative of a closefitting model, whereas a value of.08 or more was evidence of marginal to oor fit. Consequently, a test of close fit would test the null hyothesis that the RMSEA is greater than or equal to.05 against the alternative that it is less than.05. Rejection of the null hyothesis indicates close fit. A test of notclose fit tests the null hyothesis that the RMSEA is less than or equal to.08 against the alternative that it is greater than.08. Rejection of the null hyothesis indicates that fit is not close. MacCallum et al. demonstrated in detail how, with such hyothesis tests, ower calculations could be erformed and required samle sizes estimated. These two onesided tests can be erformed easily and simultaneously with a single 1 confidence interval recommended by Steiger (1989). Simly construct the confidence interval and see whether its uer end is below.05 (in which case the test of close fit results in rejection at the alha level) and whether its lower end exceeds.08 (in which case the test of notclose fit results in rejection at the alha level). The confidence interval rovides all the information in both hyothesis tests, and more. Fleishman (1980) suggested interval estimation as a sulement for the F test in ANOVA. He gave examles of how to comute exact confidence intervals on a number of useful quantities, such as the signaltonoise ratio, in ANOVA. These confidence intervals offered clear advantages over the traditional hyothesis test. Other authors have noted the existence of exact confidence intervals for the standardized effect size in the simlest secial case of ANOVA, the twosamle t test (e.g., Hedges & Olkin, 1985). The rationale for switching from hyothesis testing to confidence interval estimation is straightforward (Steiger & Fouladi, 1997). Unfortunately, the exact interval estimation rocedures of Steiger and Lind (1980), Fleishman (1980), and Hedges and Olkin (1985) are virtually imossible to comute accurately by hand. However, by 1990, microcomuter caabilities had advanced substantially. The RMSEA
4 BEYOND THE F TEST 167 confidence interval was imlemented in general urose structural equation modeling software (Mels, 1989; Steiger, 1989) and, by the late 1990s, had achieved widesread use. Steiger (1990) resented general rocedures for constructing confidence intervals on measures of effect size in covariance structure analysis, ANOVA, contrast analysis, and multile regression. Steiger and Fouladi (199) roduced a general comuter rogram, R, that erformed exact confidence interval estimation of the squared multile correlation in multile regression. Taylor and Muller (1995, 1996) have resented general rocedures for analyzing ower and noncentrality in the general linear model, including an analysis of the imact of restriction of ublished articles to significant results. Steiger and Fouladi (1997) demonstrated general rocedures for confidence interval calculations, and Steiger (1999) imlemented these in a commercial software ackage. Smithson (001) discussed a number of confidence interval rocedures in fixed and random regression models and included SPSS macros for calculating confidence intervals for noncentral distributions. Reiser (001) discussed confidence intervals on functions of Mahalanobis distance. General Theory of NoncentralityBased Interval Estimation In this section, I review the general theoretical rinciles for constructing exact confidence intervals for effect size, ower, and samle size in the balanced fixedeffects betweensubjects ANOVA. For a more detailed discussion of these rinciles, see Steiger and Fouladi (1997). Throughout what follows, I adot a simle notational device: When several grous or cells are samled, I use N tot to stand for the total samle size and use n to stand for the number of observations in each grou. I begin this section with a brief nontechnical discussion of noncentral distributions. The t, chisquare, and F distributions are secial cases of more general distributions called the noncentral t, noncentral chisquare, and noncentral F. Each of these noncentral distributions has an additional arameter, called the noncentrality arameter. For examle, whereas the F distribution has two arameters (the numerator and denominator degrees of freedom), the noncentral F has these two lus a noncentrality arameter (often indicated with the symbol ). When the noncentral F distribution has a noncentrality arameter of zero, it is identical to the F distribution, so it includes the F distribution as a secial case. Similar facts hold for the t and chisquare distributions. What makes the noncentrality arameter esecially imortant is that it is related very closely to the truth or falsity of the null hyotheses that these distributions are tyically used to test. Thus, for examle, when the null hyothesis of no difference between two means is correct, the standard t statistic has a distribution that has a noncentrality arameter of zero, whereas if the null hyothesis is false, it has a noncentral t distribution, that is, the noncentrality arameter is nonzero. The more false the null hyothesis, the larger the absolute value of the noncentrality arameter for a given alha and samle size. Most confidence intervals in introductory textbooks are derived by simle maniulation of a statement about interval robability of a samling distribution. This aroach cannot be used to generate exact confidence intervals for many quantities of fundamental imortance in statistics. As an examle, consider the samle squared multile correlation, whose distribution changes as a function of the oulation squared multile correlation. Confidence intervals for the squared multile correlation are very informative yet are not discussed in standard texts, because a single simle formula for the direct calculation of such an interval cannot be obtained in a manner analogous to the way one obtains a confidence interval for the oulation mean. Steiger and Fouladi (1997) discussed a general method for confidence interval construction that handles many such interesting examles. The method combines two general rinciles, which they called the confidence interval transformation rincile and the inversion confidence interval rincile. The former is obvious but seldom discussed formally. The latter is referred to by a variety of names in textbooks and review articles (Casella & Berger, 00; Steiger & Fouladi, 1997), yet it does not seem to have found its way into the standard behavioral statistics textbooks, rimarily because its imlementation involves some difficult comutations. However, the method is easy to discuss in rincile and is no longer imractical. When the two rinciles are combined, a number of very useful confidence intervals result. Proosition 1: Confidence interval transformation rincile. Let f() be a monotone function of, that is, a function whose sloe never changes sign and is never zero. Let l 1 and l be lower and uer endoints of a 1 confidence interval on quantity. Then, if the function is increasing, f(l 1 ) and f(l ) are lower and uer endoints, resectively, of a 100(1 )% confidence interval on f(). If the function is decreasing, f(l ) and f(l 1 ) are lower and uer endoints. Here are two elementary examles of this rincile. Examle 1: Suose you read in a textbook how to calculate a confidence interval for the oulation variance. However, you desire a confidence interval for. Because takes on only nonnegative values, it is a monotonic increasing function of over its domain. Hence, the confidence interval for is obtained by taking the square root of the endoints for the corresonding confidence interval for. Examle : Suose one calculates a confidence interval for z(), the Fisher transform of, the oulation correlation coefficient. Taking the inverse Fisher transform of the endoints of this interval will give a confidence interval for.
5 168 STEIGER This is, in fact, the method used to calculate the standard confidence interval for a correlation. These examles show why Proosition 1 is very useful in ractice. A statistical quantity we are very interested in such as may be a simle function of a quantity such as z() we are not so interested in, but for which we can easily obtain a confidence interval. Next, we define the inversion confidence interval rincile. Proosition : Inversion confidence interval rincile. Let x be the observed value of X, a random variable with a continuous cdf (cumulative distribution function) F(x, ) Pr(X x) for some numerical arameter. Let 1 with 0 1befixed values. If F(x, ) is strictly decreasing in, for fixed values of x, choose l 1 (x) and l (x) so that Pr[X x l 1 (x)] 1 and Pr[X x l (x)] 1.IfF(x, ) is strictly increasing in, for fixed values of x, choose l 1 (x) and l (x) so that Pr[X x l 1 (x)] 1 and Pr[X x l (x)] 1. Then the random interval [l 1 (x), l (x)] is a 100(1 )% confidence interval for. Uer or lower 100(1 )% confidence bounds (or onesided confidence intervals ) may be obtained by setting 1 or to zero. For a simle grahically based exlanation of Proosition, consult Steiger and Fouladi (1997, ). For a clear, succinct discussion with artial roof, see Casella and Berger (00,. 43), who referred to this as ivoting the cdf. In this article, I assume 1 /, although such an interval may not be the minimum width for a given. Proosition imlies a simle aroach to interval estimation: Suose you have observed an F statistic with a value x and known degrees of freedom 1 and. Denote the cumulative distribution of the F statistic by F(x, ), where is the noncentrality arameter. It can be shown that if 1,, and x are held constant at any ositive value, then F(x, ) is strictly decreasing in. Accordingly, Proosition can be used. To calculate a 100(1 )% confidence interval on the noncentrality arameter of the F distribution, use the following stes. 1. Calculate the cumulative robability of x in the central F distribution. If is below /, then both limits of the confidence interval are zero. If is below 1 /, the lower limit of the confidence interval is zero, and the uer limit must be calculated (go to Ste 3). Otherwise, calculate both limits of the confidence interval, using Stes and 3.. To calculate the lower limit, find the unique value of that laces x at the 1 / cumulative robability oint of a noncentral F distribution with 1 and degrees of freedom. 3. To calculate the uer limit, find the unique value of that laces x at the / cumulative robability oint of a noncentral F distribution with 1 and degrees of freedom. Calculating a confidence interval for thus requires iterative calculation of the unique value of that laces an observed value of F at a articular ercentile of the noncentral F distribution. 1 In what follows, I give a variety of examles of confidence interval calculations. Some will be at the 95% level of confidence, others at the less common 90% level. In a later section, I discuss why, when confidence intervals are used to erform a hyothesis test at the.05 level, a 90% interval may be aroriate in some situations and a 95% interval in others. At that oint, I describe how to select confidence intervals at the aroriate level to erform a articular hyothesis test. Measures of Standardized Effect Size Now I examine some more ambitious examles. For simlicity of exosition, I assume in this section that either the freeware rogram NDC (noncentral distribution calculator; see Footnote 1) or other software is available to comute a confidence interval on, the noncentrality arameter of a noncentral F distribution. Consider the oneway, fixedeffects ANOVA, in which means are comared for equality, and there are n observations er grou. The overall F statistic has a distribution that is a noncentral F, with degrees of freedom 1 and (n 1) N tot. The noncentrality arameter can be exressed in a number of ways. One formula that aears frequently in textbooks is n j. (7) j1 The j values in Equation 7 are the effects as commonly defined in ANOVA, that is, j j. (8) If j is the mean of the jth grou, and is the overall mean, then is, in the case of equal n, simly the arithmetic average of the j. More generally (although in what follows I assume a balanced design unless stated otherwise), n j N j. (9) tot j1 1 NDC (noncentral distribution calculator), a freeware Windows rogram for calculating ercentage oints and noncentrality confidence intervals for noncentral F, t, and chisquare distributions, is available for direct download from the author s website (htt://
6 BEYOND THE F TEST 169 The quantity j / is a standardized effect, that is, the effect exressed in standard deviation units. The quantity /n is therefore the sum of squared standardized effects. There are numerous ways one might convert the sum of squared standardized effects into an overall measure of effect size. For examle, suose we average these squared standardized effects in order to obtain an overall measure of strength of effects in the design. The arithmetic average of the squared standardized effects, sometimes called the signaltonoise ratio (Fleishman, 1980), is as follows: f 1 j n. (10) N tot j1 One roblem with this measure is that it is the average squared effect and so is not in the roer unit of measurement. A otential solution is to simly take the square root of the signaltonoise ratio, obtaining f N tot 1 j1 j. (11) In a oneway ANOVA with grous and equal n, the effects are constrained to sum to zero, so there are actually only 1 indeendent effects. Thus, an alternative measure, /[( 1)n], is the average squared indeendent standardized effect, and the rootmeansquare standardized effect (RMSSE) is as follows: 1n 1 1 j1 j. (1) Equations 11 and 1 demonstrate that the relationshis between, f, and the noncentrality arameter are straightforward. In order to obtain a confidence interval for, we roceed as follows. First, we obtain a confidence interval estimate for. Next, we invoke the confidence interval transformation rincile to directly transform the endoints by dividing by ( 1)n. Finally, we take the square root. The result is an exact confidence interval on. Examle 3: Suose a oneway fixedeffects ANOVA is erformed on four grous, each with a samle size of 0, and that an overall F statistic of 5.00 is obtained, with 3 and 76 degrees of freedom, with a robability level of.003. The F test is thus highly significant, and the null hyothesis is rejected at the.01 level. Some investigators might interret this result as imlying that a owerful exerimental effect was found and that this was determined with high recision. In this case, the noncentrality interval estimate rovides a more informative and somewhat different account of what has been found. The 95% confidence interval for ranges from to To convert this to a confidence interval for, we use Equation 1. The corresonding confidence interval for ranges from.1764 to Effects are almost certainly here, but they are on the order of half a standard deviation, what is commonly considered a mediumsize effect. Moreover, the size of the effects has not been determined with high recision. Examle 4: Fleishman (1980) described the calculation of confidence intervals on the noncentrality arameter of the noncentral F distribution to obtain, in a manner equivalent to that used in the revious two examles, confidence intervals on f and, the latter of which is defined as A(artialed) S A, (13) S A e where S A is the variance of means for the levels of a articular effect A, that is, S A 1/ j (14) j1 and e is the withincell variance. A(artialed) may be thought of as the roortion of the variance remaining (after all other main effects and interactions have been artialed out) that is exlained by the effect. (In what follows, for simlicity, I refer to the coefficient simly as.) There are simle relationshis between f,, and, secifically, and f 1 n (15) N tot f 1 f N tot. (16) Fleishman (1980) cited an examle given by Venables (1975) of a fivegrou ANOVA with n 11 er cell and an observed F of In this case the 90% confidence interval for the noncentrality arameter has endoints and Once we obtain the confidence interval for, it is a trivial matter to transform the limits of the interval to confidence limits for, using Equation 16. For examle, the lower limit becomes (17) In a similar manner, the uer limit of the confidence interval can be calculated as.565. The confidence interval has determined with 90% confidence that the main effect
7 170 STEIGER accounts for between 6.1% and 56.5% of the variance in the deendent variable. General Procedures for Effect Size Intervals in BetweenSubjects Factorial ANOVA In a revious examle, we saw how easy it is to construct a confidence interval on measures of effect size in oneway ANOVA, rovided a confidence interval for has been comuted. In this section, a comletely general method is demonstrated for comuting confidence intervals for various measures of standardized effect size in comletely betweensubjects factorial ANOVA designs with equal samle size n er cell. We begin with a general formula relating the noncentrality arameter with the RMSSE in any comletely betweensubjects factorial ANOVA. Let stand for a articular effect, and n the samle size er cell. Then n df. (18) In Equation 18, n is equal to n (the number of observations in each cell of the design) multilied by the roduct of the numbers of levels in all the factors not reresented in the effect currently under consideration; df is the numerator degrees of freedom arameter for the effect under consideration. There are simle relationshis between the RMSSE and other measures of standardized effect size. Secifically, for a general factorial ANOVA, f df, (19) Cells N tot Table 1 Key Quantities for Comuting Effect Size Intervals in FourWay Analysis of Variance Source Levels df n A 1 nqrs B q q 1 nrs C r r 1 nqs D s s 1 nqr AB ( 1)(q 1) nrs AC ( 1)(r 1) nqs AD ( 1)(s 1) nqr BC (q 1)(r 1) ns BD (q 1)(s 1) nr CD (r 1)(s 1) nq ABC ( 1)(q 1)(r 1) ns ABD ( 1)(q 1)(s 1) nr ACD ( 1)(r 1)(s 1) nq BCD (q 1)(r 1)(s 1) n ABCD ( 1)(q 1)(r 1)(s 1) n Error qrs(n 1) Note. reresents a articular effect; n reresents the samle size er cell; and, q, r, and s reresent levels of factors A, B, C, and D, resectively. where Cells is, for any main effect, the number of levels of the effect. For any interaction, it is the roduct of the numbers of levels for all factors involved in the interaction. The relationshi between f and is given in Equation 16. Some examles of these quantities, for a fourway ANOVA, with, q, r, and s levels of factors A, B, C, and D, resectively, are given in Table 1. The table may be used also for one, two, or threeway ANOVAs simly by eliminating terms involving levels not reresented in the design. For examle, in a threeway ANOVA, the BC interaction effect has (q 1)(r 1) numerator degrees of freedom, and n BC is n, because there is no s in this design. The error degrees of freedom in a threeway ANOVA are qr(n 1). In the following two examles, I demonstrate how to comute a 90% confidence interval on various measures of effect, using the information in the table. Examle 5: Suose that, as a researcher, you erform a threeway 3 7 ANOVA, with n 6 observations er cell. In this case, we have, q 3, and r 7. Suose that, for the A main effect, you observe an F statistic of 4.708, which, with 1 and 10 degrees of freedom, has We first calculate a confidence interval for. The endoints of this interval are lower and uer To convert these to confidence intervals on, f, f, and, we aly Equations 18, 19, and 16. For the A effect, we have n A (6)(3)(7) 16, df A ( 1) 1, Cells A, and N tot 5. Hence, for we have, from Equation 18, lower , uer For f and f we have, for the lower limits, f lower (0) , f 5 lower (1) For the uer limits, we obtain f uer and f uer We can also convert the confidence limits for f into limits for, using Equation 16. We have lower f lower () 1 f lower In a similar manner, we obtain the uer limit as uer Examle 6: Table 1 can also be used for a twoway ANOVA, simly by letting r 1 and s 1 and ignoring all
8 BEYOND THE F TEST 171 effects involving factors C and D. Suose, for examle, one were to erform a twoway 7 ANOVA, with n 4 observations er cell, and the F statistic for the AB interaction is observed to be.50. The key quantities are df AB 6, df error 4, n AB 4, and Cells AB 14. The confidence limits for AB are lower and uer Consequently, from Equation 18, the confidence limits for the RMSSE are lower lower n AB df AB , (3) 46 uer (4) 46 The confidence intervals for f and f are f lower f uer , f 56 lower , (5) , f 56 uer (6) Using Equation 16, we convert the above to the following confidence limits for : lower f lower , (7) 1 f lower uer (8) Multile Regression With Fixed Regressors One standardized index of the size of effects is to comute the squared multile correlation coefficient between the indeendent variable and the scores on the deendent variable. This index, in the oulation, characterizes the strength of the effect. ANOVA may be concetualized as a linear regression model with fixed indeendent variables. In this case, the theory of multile regression with fixed regressors alies. It is imortant to realize (e.g., Samson, 1974) that the theory for fixed regressors, although it shares many similarities with that for random regressors, has imortant differences, which are esecially aarent when considering the nonnull distributions of the variables. The general model is E X, (9) where is an N tot 1 random vector, X is an N tot matrix, and is a 1 vector of unknown arameters. This model includes model errors () that are assumed to be indeendently and identically distributed with a normal distribution, zero mean, and variance. That is, X ˆ, (30) and has a multivariate normal distribution with zero mean vector 0 and covariance matrix I, with I an identity matrix. It is common to artition into 0 1, (31) where 0 is an intercet term. Corresondingly, X is artitioned as X 1 X 1, (3) where 1 is a column of ones and X 1 contains the original X scores transformed into deviations about their samle means. Consider now a set of observed scores y, reresenting realizations of the random variables in. IfX 1 has 1 columns, then an F statistic for testing the hyothesis that 1 0is R / 1 F 1 R /N tot. (33) This statistic has a noncentral F distribution with 1 and N tot degrees of freedom, with a noncentrality arameter given by XI P 1X XQ 1X. (34) For any matrix A of full column rank, P A is the column sace rojection oerator A(AA) 1 A and Q A the comlementary rojector I P A. We now turn to an alication of this theory in the context of ANOVA. Consider the simle case of a oneway fixedeffects ANOVA with n observations in each of indeendent grous. It is wellknown that this model can be written in the form of Equation 9, where X is a design matrix with N tot n rows and columns, and contains ANOVA arameters. We are not interested in R er se. Rather, we are interested in the corresonding quantity in an infinite oulation of observations in which treatment grous are reresented equally. There are several alternative ways of concetualizing such a quantity. Formally, we can define as the robability limit of R, that is, limr. (35) n3 This is the constant that R converges to as the samle size increases without bound. It can be roven (see Aendix A) that, with this definition of, the noncentrality arameter is equivalent to
9 17 STEIGER and so N tot 1, (36) N tot. (37) Consequently, a confidence interval for may be converted easily into a confidence interval on or, because is nonnegative. reresents the coefficient of determination for redicting scores on the deendent variable from only a knowledge of the oulation means of the grous in an infinite oulation in which all treatment grous are equally reresented. Examle 7: Suose that X is set u as in Equation 38 to reresent a full rank design matrix for a oneway ANOVA, with three grous, and n 3, and that the scores in y are 1,, 3, 4, 5, 6, 7, 8, 9. In this arameterization, 0 corresonds to 3, 1 corresonds to 1 3, and corresonds to 3. The grou means are, 5, 8, and the grou variances are all 1. y y y 31 y y (38) y y y y In this case, it is easy to show using any standard multile regression rogram that the samle squared multile correlation for redicting y from X is.90 and that the F statistic for testing the null hyothesis that 0is F, 6 R / 1 R /6.9/ 7.0. (39).1/6 This F statistic is identical to the one obtained by erforming a oneway fixedeffects ANOVA on the data. The 90% confidence interval for has endoints of and The lower endoint for the confidence interval on, the coefficient of determination, is thus and the uer endoint is , (40) (41) With oneway ANOVA and equal n er grou, this confidence interval is identical to the one for discussed earlier. Note also that the samle R is ositively biased with small samle sizes and will consequently be much closer to the uer end of the confidence interval than the lower. One of several alternative methods for arameterizing the linear model in Equation 9 is to use what is sometimes called effect coding. In this case, the entries in X corresond to the contrast weights alied to grou means in the ANOVA null hyothesis. For examle, the hyothesis of no treatments in a oneway ANOVA with three grous corresonds to two contrasts simultaneously being zero, that is, and 3 0. The contrast weights for the two hyotheses are thus 1, 0, 1 and 0, 1, 1. Thus, omnibus effect size in ANOVA can be exressed as the multile correlation between a set of contrast weights and the deendent variable. There has been a fair amount of discussion in the alied literature (Ozer, 1985; Rosenthal, 1991; Steiger & Ward, 1987) about whether the coefficient of determination is overly essimistic in describing the strength of effects. Those who refer may convert a confidence interval on to a confidence interval on simly by taking the square root of the endoints of the former. Confidence Intervals on SingleContrast Measures of Effect Size Rosenthal et al. (000) argued convincingly for the imortance of relacing the omnibus hyothesis in ANOVA with hyotheses that focus on substantive research questions. Often such hyotheses involve single contrasts of the form j1 c j j, with c j, the contrast weights and the null hyothesis being that 0. Rosenthal et al. discussed several different correlational measures for assessing the status of hyotheses on a single contrast. In this section, I discuss methods for exact confidence interval estimation of measures of effect size for a single contrast, including the oulation equivalent of the correlation measure r contrast discussed by Rosenthal et al. Exact Confidence Intervals for Standardized Contrast Effect Size Consider a contrast hyothesis on means, of the form H 0 : c j j 0. (4) j1 With equal samle sizes of n er grou, this hyothesis may be tested with a t statistic of the form t n ˆ MS within c j j1, (43)
10 BEYOND THE F TEST 173 with ˆ c j Y j, (44) j1 where Y j reresents the samle mean of the jth grou. The standardized effect size E s is the size of the contrast in standard deviation units, that is, E s. (45) The test statistic has a noncentral t distribution with (n 1) degrees of freedom and a noncentrality arameter of n E s LE s. (46) c j j1 To estimate E s, one obtains a confidence interval for, using the method discussed by Steiger and Fouladi (1997), and transforms the endoints of the confidence interval by dividing by L (i.e., the exression under the radical in Equation 46), as shown in the examle below. Examle 8: The data in Table reresent four indeendent grous of three observations each. Suose one wished to test the following null hyothesis: (47) This hyothesis tests whether the average of the means of the first and fourth grous is equal to the average of the means of the other two grous. Suose we observe t(8) The traditional 95% confidence interval for ranges from to Because mean square error is 1 in this examle, we would exect a confidence interval for E s to be similar. Actually, it is somewhat narrower. The 95% confidence interval for ranges from to The sum of squared contrast weights is 1, so L 3, and the endoints of the confidence interval are divided by 3 to obtain 95% confidence limits of and.041 for E s. Table Samle Data for a OneWay Analysis of Variance Grou 1 Grou Grou 3 Grou Exact Confidence Intervals for contrast Rosenthal et al. (000) discussed the samle statistic r contrast, which is the squared artial correlation between the contrast weight vector discussed in the revious section and the scores in y, with all other sources of systematic betweengrous variation artialed out. Consider the data discussed in the receding examle. These weights haen to be the rescaled orthogonal olynomial weights for testing quadratic trend. The remaining sources of betweengrous variation may be redicted from any orthogonal comlement of the quadratic trend contrast weights. Consequently, if we construct the vectors with columns of reeated linear and cubic contrast weights, the artial correlation between y and the contrast weights with the quadratic and cubic weights artialed out is r contrast, which may also be comuted directly from the standard F statistic for the contrast as r contrast F contrast. (48) F contrast df within Rosenthal et al. (000) did not discuss samling theory for r contrast. However, a oulation equivalent, contrast,may be defined, and it may be shown (see Aendix B) that, with grous in the analysis, F contrast r contrast 1 r contrast /N tot (49) has a noncentral F distribution with 1 and N tot degrees of freedom and noncentrality arameter contrast N tot. (50) 1 contrast Consequently, one may construct a confidence interval for contrast by comuting a confidence interval for and transforming the endoints, using the result of Equation 37. Examle 9: Consider again the data in Table. We can comute the F statistics corresonding to linear, quadratic, and cubic trend and, for each trend, comute confidence intervals for contrast and/or contrast. For examle, consider the test for linear trend. The F statistic is 16, with 1 and 8 degrees of freedom, and the 95% confidence interval for the noncentrality arameter has endoints of and Consequently, from Equation 37, a 95% confidence interval for contrast has endoints of lower , uer (51) The confidence interval for contrast (defined as the square
11 174 STEIGER root of contrast, thus excluding negative values as in Rosenthal et al., 000) ranges from.905 to.988. Table 3 shows the results of comuting contrast correlations and the associated confidence intervals for linear, quadratic, and cubic trend. Some brief comments are in order. Note, first, that although the r contrast values for quadratic and cubic trends are aealingly high, the corresonding confidence intervals are quite wide and include zero. On the other hand, the confidence interval for the linear trend is very narrow. The Relationshi Between Confidence Intervals and Hyothesis Tests Choosing the Aroriate Interval Confidence intervals on measures of effect size convey all the information in a hyothesis test, and more. If one selects an aroriate confidence interval, a hyothesis test may be erformed simly by insection. If the confidence interval excludes the null hyothesized value, then the null hyothesis is rejected. In such alications, I recommend using the traditional twosided confidence interval, rather than a onesided interval (or confidence bound), regardless of whether the hyothesis test is onesided or twosided. When a twosided confidence interval is used to erform the hyothesis test, the confidence level must be matched aroriately both to the tye of hyothesis test and to the Tye I error rate. Recall that the endoints of the twosided confidence interval for a arameter at the 100(1 )% confidence level are the values of that lace the observed statistic ˆ at the / or 1 / cumulative robability oint. Suose the uer and lower limits of the 100(1 )% confidence interval are U and L, resectively. Then ˆ is the rejection oint at the / significance level for onesided hyothesis tests that is, first, greater than or equal to U and, second, less than or equal to L. The observed statistic ˆ is also equal to (a) the uer rejection oint for a twosided test that L at the alha level and (b) the lower rejection oint for the twosided test that U at the alha level. Consequently, the endoints of the confidence interval reresent two values of that the observed statistic would barely reject in a twosided test with significance level Table 3 Confidence Intervals (CIs) for Contrast Correlations Statistic Linear Quadratic Cubic F r contrast CI r contrast CI alha. These endoints are also aroriate for testing onesided hyotheses at the / significance level. The receding aragrah imlies a general rule of thumb: to use the confidence intervals to test a statistical hyothesis and to maintain a Tye I error rate at alha: 1. When testing a twosided hyothesis at the alha level, use a 100(1 )% confidence interval.. When testing a onesided hyothesis at the alha level, use a 100(1 )% confidence interval. Examle 10: Consider a test of the hyothesis that 0, that is, that the RMSSE (as defined in Equation 1) in an ANOVA is zero. This hyothesis test is onesided, because the RMSSE cannot be negative. To use a twosided confidence interval to test this hyothesis at the.05 significance level, one should examine the 100(1 )% 90% confidence interval for. Ifthe confidence interval excludes zero, the null hyothesis will be rejected. This hyothesis test is equivalent to the standard ANOVA F test. Examle 11: Consider the test that the standardized effect size E s in Equation 45 is recisely zero. This hyothesis test is twosided, because E s can be either ositive or negative. Consequently, to use a confidence interval to test this hyothesis at the.05 level, a 100(1 )% 95% twosided confidence interval should be used, and the null hyothesis rejected only if both ends of the confidence interval are above zero or if both are below zero. Examle 1: Consider a situation in which one wishes to establish that the standardized effect size E s in Equation 45 is small, and that smallness is defined as an absolute value less than 0.0. To establish smallness, one must reject a hyothesis that E s is not small. Because E s can be either ositive or negative, E s can be not small in two directions. The hyothesis that E s is not small can therefore be tested with two simultaneous onesided hyothesis tests, and H 01 : E s 0.0 versus H a1 : E s 0.0 (5) H 0 : E s 0.0 versus H a : E s 0.0. (53) These two hyotheses can both be tested simultaneously at the.05 level by constructing a 90% confidence interval and observing whether the lower end of the interval is above 0.0 (to test the first onesided hyothesis) and the uer end of the interval is below 0.0. What this amounts to is observing whether the entire interval is between 0.0 and 0.0. If so, the hyothesis that E s is not small is rejected, and smallness is indicated.
12 BEYOND THE F TEST 175 Tests of Minimal Effect Rationale and Method In many situations, the null hyothesis of zero effect is inaroriate or can be misleading. For examle, in RS testing with extremely large samle sizes, a null hyothesis may be rejected consistently, with a very low robability level, even when the oulation effect is small. Conversely, in AS testing, the nil hyothesis of zero effect is often unreasonable, and the hyothesis the exerimenter robably wants to test is that the effect is trivial. Tests of minimal effect are a artial solution to the roblems caused by inaroriate testing of a nil hyothesis when the goal is to show that an effect is small. For examle, if some minimal reasonable effect size can be secified, rejection of the hyothesis that the effect is less than or equal to this value is of ractical imortance whether or not the samle size is very large. In the traditional AS situation, in which the exerimenter is trying to show that an effect is trivial, the hyothesis that the effect is greater than or equal to a minimal reasonable value can be tested. Serlin and Lasley (1993) discussed this latter notion in detail and gave numerical examles. In such cases, large samle size will work for, rather than against, the exerimenter, because if the effect size is truly below a level that is of ractical imort, larger samles will yield greater ower to demonstrate that fact by rejecting the null hyothesis that the effect is at or above a oint of triviality. The confidence intervals described in the receding section can be used to test hyotheses of minimal effect: One simly observes whether the aroriately constructed confidence interval contains the target minimal reasonable value. For examle, suose you decide that an RMSSE of 0.5 constitutes a minimal reasonable effect. In other words, effects below that level may be ignored. Effects that are definitely above that level are nontrivial. If you wish to demonstrate that effects are trivial, you might test the hyotheses H 0 : 0.5; H 1 : 0.5. (54) On the other hand, if you wish to demonstrate that effects are definitely not trivial, you might test the hyotheses H 0 : 0.5; H 1 : 0.5. (55) In each case, rejecting the null hyothesis will suort the goal in erforming the test, and the roblems inherent in AS testing can be avoided. A simle aroach to simultaneously testing the two hyotheses discussed above is to examine the 1 confidence interval for and see if it excludes 0.5. If the entire confidence interval is above the oint of triviality (i.e., 0.5), then the effect may be judged nontrivial. If the entire confidence interval is below the oint of triviality, then the effect has been shown to be trivial. There is a strong similarity between using the effect size confidence interval in this way and the long tradition of bioequivalence testing. Examle 13: Suose you have 6 grous and n 75 er grou. You observe an F statistic of F(5, 444).8, with.046, so the nil hyothesis of zero effects is rejected at the.05 significance level. However, on substantive grounds, you have decided that a value of less than 0.5 can be ignored. To demonstrate triviality, you would attemt to reject the null hyothesis that is greater than or equal to 0.5. There are two aroaches to erforming the test. The first aroach requires only a single calculation from the noncentral F distribution. Consider the cutoff value of 0.5. Using the result of Equation 1, one may convert this to a value for via the formula ( 1)n (6 1)(75)(.5 ) The observed F statistic of.8 has a onesided robability value of.056 in the noncentral F distribution with , and 5 and 444 degrees of freedom, so the null hyothesis is rejected at the.05 level, and the overall effects are declared trivial. An alternative aroach uses the confidence interval. Note that, because the test is onesided, we use the 90% confidence interval. The endoints of the interval for are and Using the result of Equation 1, we convert this confidence interval into a confidence interval for by dividing the above endoints by ( 1)n 375, then taking the square root. The resulting endoints for the confidence interval for are and This confidence interval excludes 0.5, so we can reject the hyothesis that effects are nontrivial, that is, 0.5, at the.05 significance level. The advantage of using the confidence interval is that it rovides us with an aroximate indication of the recision of the estimation rocess while still allowing us to erform the hyothesis test. Significant technical and theoretical issues surround the use of confidence intervals in this manner. 1. The choice of a numerical oint of triviality for a measure of omnibus effect size should not be treated as a mechanical selection from a small menu of aroved choices. Rather, it should be considered carefully on the basis of the secific exerimental design and the substantive asects of the variables being measured and maniulated. Whereas.5 might be considered trivial in one exeriment, it might be considered very imortant in another.
13 176 STEIGER. The ower of both hyothesis tests must be analyzed a riori to assess whether samle size is adequate. With low recision (i.e., a wide confidence interval), one might still have high ower to demonstrate nontrivial effects if effects are large. However, it is virtually imossible to demonstrate triviality if recision is low, because the triviality oint will be close to zero, and a wide confidence interval will not fit between zero and the triviality oint. Full consideration of the technical asects of estimating the oint of triviality, and recision of a arameter estimate and the resulting confidence interval, is beyond the scoe (and length restrictions) of this article. However, in the next section, I discuss several theoretical issues that the sohisticated user should kee in mind. Conclusions and Discussion This article demonstrates that the F statistic in ANOVA contains information about standardized effect size, and its recision of estimation, that has not been made available in tyical social science reorts and is not reorted by traditional software ackages. Yet this information can readily be calculated, using a few basic techniques. The fact is, simly reorting an F statistic, and a robability level attached to a hyothesis of nil effect, is so subotimal that its continuance can no longer be justified, at least in a social science tradition that rides itself on emiricism. A number of the field s most influential commentators on social statistics have emhasized this and urged that, as researchers, we revise our aroach to reorting the results of significance tests (e.g., see articles in Harlow, Mulaik, & Steiger, 1997). Null hyothesis testing is the source of much controversy. I have tried to romote an eclectic, integrated oint of view that resists the temtation to downgrade either the hyothesis testing or the interval estimation aroaches and emhasizes how they comlement each other. Reviewers and other readers of the article have rovided much food for thought and have raised several substantive criticisms that enriched my oint of view considerably. In the following sections, I discuss some of the limitations of the rocedures in this article, deal exlicitly with several of the more common objections to my major suggestions, and then summarize my oint of view and resent some conclusions. Statistical Limitations and Extensions of the Present Procedures The rocedures discussed in this article rovide exact distributional results under standard ANOVA assumtions (indeendence, normality, and equal variances) and are easily calculated with modern software. However, they are restricted to (a) comletely betweensubjects fixedeffect ANOVA with (b) equal n er cell. The resent article does not resent rocedures for dealing with the comlications that result from unbalanced designs and/or reeated measures, nor does it discuss extensions to random effects or mixed ANOVA models or to multivariate analyses. In some cases, rocedures for these other situations are already available. Consider, for examle, the case of oneway random effects ANOVA. The treatment effects are random variables with a variance of A, and may be redefined as A /. A 100(1 )% confidence interval for may therefore be obtained in the equal n case by taking the square root of the wellknown (Glass & Hokins, 1996,. 54) confidence interval for A /. One obtains, with grous, lower maxn 1 F obs F* / 1, 0, uer maxn 1 F obs F* 1/ 1, 0. (56) F obs is the observed value of the F statistic, and F* isthe ercentage oint from the F distribution with 1 and (n 1) degrees of freedom. This aroach can be generalized to more comlicated designs. Burdick and Graybill (199) discussed general methods for obtaining exact confidence intervals for and related quantities in random effects models, both in the equal n and unbalanced cases. Comutational rocedures for the unbalanced case are much more comlicated than for the case of equal n. However, on close insection, some extensions yield challenging comlications that require careful analysis. Some examles are as follows. 1. In the unbalanced, fixedeffects case, the noncentrality arameter is defined as follows: n j1 j j. (57) Note that with defined as in Equation 9, the quantity f as defined in Equation 10 reresents the ratio of betweengrous to withingrou variance in a oulation with robability of membershi in the treatment grous roortional to the samle sizes in the ANOVA. There are situations in which this quantity is of interest (such as when the samling lan reflects the relative size of natural suboulations) and others in which it might not be. Cohen (1988, ) discussed this oint in detail.. In reeated measures ANOVA designs, the noncen
14 BEYOND THE F TEST 177 trality arameter unfortunately confounds effects of treatments with the correlation among observations. For examle, in a oneway withinsubjects design, if the data ossess comound symmetry, the noncentrality arameter is n 1 j1 j. (58) The RMSSE,, asdefined in Equation 1, though still an aroriate measure of effect size, cannot be estimated directly using the exact techniques discussed in this article, unless is known. For a detailed discussion of this issue in the context of oint estimation in metaanalysis, see Dunla, Cortina, Vaslow, and Burke (1996). 3. In multivariate analysis, the noncentrality arameter includes information about the variances and correlations of the deendent variables. For examle, when two oulations are comared on k deendent variables, using Hotelling s T with two indeendent samles of size n 1 and n, the standard F statistic has k and n 1 n k 1 degrees of freedom and has a noncentral F distribution with a noncentrality arameter that is a simle function of the squared oulation Mahalanobis distance : The latter, comuted as n 1n n 1 n. (59) (60) with 1 and the oulation mean vectors, and the common covariance matrix, may be described as a sum of squared orthogonalized and standardized mean differences. Consequently, a natural analogue of Equation 1 that takes into account the number of deendent variables is k. (61) A confidence interval on may be calculated easily (Reiser, 001) from a confidence interval on, using the results of Equation 59. This interval may, in turn, be transformed into a confidence interval on using Equation 61. We see that in one of the contexts discussed above (reeated measures), deendencies between measures are an annoying confound that must be removed from consideration. In another (the case of two oulations), they are an essential ingredient for roer evaluation of effect size. In some of the roblematic cases discussed above, and in situations where the standard ANOVA statistical assumtions are inaroriate, resamling methods such as bootstraing can be used to obtain aroriate confidence intervals. The width of a confidence interval often is described as indicating recision of measurement. However, as Steiger and Fouladi (1997, ) ointed out, this relationshi is less than erfect and is seriously comromised in some situations for several reasons. The width of a confidence interval is itself a random variable and is subject to samling variations. Moreover, the confidence intervals are truncated at zero to avoid imroer estimates. In extreme cases, a confidence interval might actually have 0 as both endoints. This zerowidth confidence interval obviously does not imly that effect size was determined with erfect recision. Focused Contrasts or Omnibus Hyotheses? In an early version of this article, I concentrated almost exclusively on omnibus measures of effect size. Several reviewers have objected that confidence intervals on measures of standardized effect size such as and the RMSSE were, to arahrase, an elegant solution to the wrong roblem. These writers have echoed the view of Rosenthal et al. (000), who stated that omnibus questions seldom address questions of real interest to researchers, and are tyically less owerful than focused rocedures (. 1). I share an enthusiasm for focused contrasts and recommend them in lieu of an omnibus test whenever researchers have clear ideas about linear contrasts. Moreover, I believe that not enough researchers have been trained to look carefully for ways to hrase their ideas as contrasts. However, I think that dismissing the imrovements to the use of the F statistic suggested in this article ignores several imortant realities. First, much research in the social sciences is exloratory, and an omnibus F test in such circumstances may be the relude to subsequent examination of unlanned contrasts. In such cases, an overall measure of the strength of effect sizes, and the recision with which they have been determined, may alert the researcher in advance to a lack of overall recision in the exerimental design. Second, when one is comaring several studies that have reorted overall F tests, comaring confidence intervals on standardized effect size measures can be very useful in resolving aarent disarities in exerimental outcomes. As it turns out, the confidence interval on contrast, one standardized measure of omnibus effect size, is closely related, concetually and comutationally, to the rocedure for comuting a confidence interval on. The latter index examines the squared multile correlation between observed data and a set of contrast weights, whereas the former examines the squared correlation between the data and one set of contrast weights with the variation redicted by the comlementary contrasts artialed out. Thus, the
15 178 STEIGER same technology that I find useful for omnibus tests may be alied directly to contrasts. I believe that reorting an exact confidence interval on contrast is substantially more informative than simly reorting the raw coefficient. And, to be clear, I fully suort concentration on focused contrasts in lieu of omnibus tests whenever the exerimenter has firm questions that suit the contrast analysis framework. Some Recent Objections to Standardized Measures of Effect Size Revised hyothesistesting strategies for ANOVA require secification of target values of a standardized measure of effect size. The confidence interval aroach is more relaxed but strongly tends to lead the exerimenter to consider which overall effect sizes qualify as trivial and which are nontrivial in a articular alication. Although many writers have emhasized the value of standardized measures of effect size in ower analysis and samle size estimation, standardized effect size measures do have some shortcomings. As a nonlinear combination of several sources of variation in an exeriment, they reduce several values into one and are of necessity less recise than similar indices comuted on a focused contrast. Moreover, ANOVA effects as used in the calculation of the noncentrality arameter in the omnibus test may or may not corresond to exerimental effects as commonly concetualized (see, e.g., Steiger & Fouladi, 1997, ), and focused contrasts can get at such exerimental effects much more effectively than an omnibus rocedure. Recently, Lenth (001) suggested disensing with standardized measures of effect size altogether in the context of ower analysis and samle size estimation. His main justification was that combining information about raw effects (i.e., mean differences) and variation ignored a ossible confounding imact of reliability of measurement. Reconciling the Interval Estimation and Minimal EffectTesting Aroaches As stated at the outset, this article discusses two major aroaches that might be used to relace the traditional F test in ANOVA. The noncentrality interval estimation aroach emhasizes estimation of some function of overall effect size, along with an indication of the recision of the measurement. The dual hyothesis testing aroach relaces the hyothesis of nil effect with two hyotheses, one that the effect is trivial, the other that it is nontrivial. The aroach I ersonally favor is confidence interval estimation on some standardized measure of overall effect size. This aroach may be viewed as relacing hyothesis testing entirely, yet it can be used to erform both kinds of hyothesis tests required by the dual hyothesistesting framework. Secifically, one simly examines, simultaneously, whether the confidence interval excludes a trivial effect value on the left or right. If, for examle, the confidence interval lies entirely above the cutoff oint for a trivial effect, one rejects the hyothesis of triviality. If the confidence interval lies entirely below the cutoff oint, one rejects the hyothesis of nontriviality. Moreover, the confidence interval aroach, being an exact rocedure, also rovides all the information available in the standard F test. For examle, the F test results in rejection at the.05 level if and only if the 90% confidence interval for excludes zero. The hyothesistesting aroach offers advantages as well. For one, it kees the analysis within the comfortably familiar bounds of hyothesis testing. For another, it is comutationally easier one may erform the hyothesis test without extensive iteration, and so it may be erformed with a wider range of available free software. Another advantage is that, by simultaneously analyzing ower for both a test of triviality and a test of nontriviality, the user can be relatively certain that the confidence interval, if calculated, will have enough recision to determine whether effects are trivial or not. Standardized Effects and Coefficients of Determination A Caution Any statistical technique offers oortunity for abuse and misuse, esecially if the technique is used mechanically and without taking into account the secial circumstances surrounding a articular set of data. Abelson (1995) discussed in detail how imortant it is to remain oenminded when judging the imortance of effect sizes. In some cases, effects that seem small may be quite imortant. This should be ket in mind before effects that are nonzero, but seemingly trivial, are dismissed. Abelson s comments are similar to Cohen s (1988, ) in his chater on secial issues in ower analysis. Casting a Vote for Change A fundamental contribution to behavioral statistics by Cohen (196) was to demonstrate that many studies lack sufficient statistical ower. The initial emhasis on ower analysis searheaded by Cohen (196) has now given way to a more sohisticated emhasis on recision of estimation. Confidence intervals on standardized measures of effect size allow one to assess how recisely effects have been measured and simultaneously assess whether the exeriment has ruled out (a) the notion that effects are trivial and (b) the notion that they are nontrivial. The rocedures are straightforward and offer obvious benefits. It is time for a change. Yet there are numerous obstacles to change in
16 BEYOND THE F TEST 179 behavioral statistics ractice. A significant obstacle is the dominant influence a few commercial statistical ackages such as SPSS and SAS have on ractice in the field. The way sychology has oerated in the ast, rocedures are unlikely to be used until they have been imlemented in a widely used statistics ackage, and commercial statistics ackages tend to be conservative toward new aroaches. In the final analysis, the imetus for change may have to come from journal editors and ractitioners, some of whom have resisted change for a variety of reasons discussed by Thomson (1999). Fortunately, the Internet makes it ossible to distribute innovative software to ractitioners very easily at virtually zero cost. There is no longer any reason to reort a squared multile correlation, an ANOVA F statistic, or a focused contrast t test without roviding information about confidence intervals on standardized effects. Each reader of this article can cast votes for change by obtaining the freeware I (and other authors) have made available, and then, when reviewing articles that reort omnibus tests and focused contrasts without associated intervals, taking two simle stes: (a) erforming their own calculation of confidence intervals on standardized effect size and (b) requesting that the author include this information in the ublished article. References Abelson, R. P. (1995). Statistics as rinciled argument. Hillsdale, NJ: Erlbaum. Burdick, R. K., & Graybill, F. A. (199). Confidence intervals on variance comonents. New York: Dekker. Casella, G., & Berger, R. L. (00). Statistical inference (nd ed.). Pacific Grove, CA: Duxbury. Chow, S.C., & Liu, J.P. (000). Design and analysis of bioavailability and bioequivalence studies. New York: Dekker. Cohen, J. (196). The statistical ower of abnormal social sychological research. Journal of Abnormal and Social Psychology, 65, Cohen, J. (1988). Statistical ower analysis for the behavioral sciences (nd ed.). Mahwah, NJ: Erlbaum. Cohen, J. (1994). The earth is round (.05). American Psychologist, 49, Dunla, W. P., Cortina, J. M., Vaslow, J. M., & Burke, M. J. (1996). Metaanalysis of exeriments with matched grous or reeated measures designs. Psychological Methods, 1, Fleishman, A. E. (1980). Confidence intervals for correlation ratios. Educational and Psychological Measurement, 40, Glass, G. V., & Hokins, K. D. (1996). Statistical methods in education and sychology (3rd ed.). Needham Heights, MA: Allyn & Bacon. Harlow, L. L., Mulaik, S. A., & Steiger, J. H. (1997). What if there were no significance tests? Mahwah, NJ: Erlbaum. Hedges, L. V., & Olkin, I. (1985). Statistical methods for metaanalysis. New York: Academic Press. Lenth, R. V. (001). Some ractical guidelines for effective samle size determination. American Statistician, 55, MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of samle size for covariance structure modeling. Psychological Methods, 1, Mels, G. (1989). A general system for ath analysis with latent variables. Unublished master s thesis, University of South Africa, Pretoria, South Africa. Metzler, C. M. (1974). Bioavailability: A roblem in equivalence. Biometrics, 30, Ozer, D. J. (1985). Correlation and the coefficient of determination. Psychological Bulletin, 97, Reiser, B. (001). Confidence intervals for the Mahalanobis distance. Communications in Statistics, Simulation and Comutation, 30, Rosenthal, R. (1991). Effect sizes: Pearson s correlation, its dislay via the BESD, and alternative indices. American Psychologist, 46, Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (000). Contrasts and effect sizes in behavioral research: A correlational aroach. New York: Cambridge University Press. Rosnow, R. L., & Rosenthal, R. (1996). Comuting contrasts, effect sizes, and counternulls on other eole s ublished data: General rocedures for research consumers. Psychological Methods, 1, Samson, A. R. (1974). A tale of two regressions. Journal of the American Statistical Association, 69, Schmidt, F. L. (1996). Statistical significance testing and cumulative research in sychology: Imlications for the training of researchers. Psychological Methods, 1, Schmidt, F. L., & Hunter, J. E. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? ( ). Mahwah, NJ: Erlbaum. Searle, S. R. (1987). Linear models for unbalanced data. New York: Wiley. Serlin, R. A., & Lasley, D. K. (1993). Rational araisal of sychological research and the goodenough rincile. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues ( ). Hillsdale, NJ: Erlbaum. Smithson, M. (001). Correct confidence intervals for various regression effect sizes and arameters: The imortance of noncentral distributions in comuting intervals. Educational and Psychological Measurement, 61, Steiger, J. H. (1989). EzPATH: A sulementary module for SYS TAT and SYGRAPH. Evanston, IL: Systat. Steiger, J. H. (1990, October). Noncentrality interval estimation and the evaluation of statistical models. Paer resented at the meeting of the Society of Multivariate Exerimental Psychology, Kingston, RI.
17 180 STEIGER Steiger, J. H. (1999). STATISTICA ower analysis. Tulsa, OK: StatSoft. Steiger, J. H., & Fouladi, R. T. (199). R: A comuter rogram for interval estimation, ower calculation, and hyothesis testing for the squared multile correlation. Behavior Research Methods, Instruments, and Comuters, 4, Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and the evaluation of statistical models. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (. 1 57). Mahwah, NJ: Erlbaum. Steiger, J. H., & Lind, J. C. (1980, May). Statistically based tests for the number of factors. Paer resented at the meeting of the Psychometric Society, Iowa City, IA. Steiger, J. H., & Ward, L. M. (1987). Factor analysis and the coefficient of determination. Psychological Bulletin, 99, Taylor, D. J., & Muller, K. E. (1995). Comuting confidence bounds for ower and samle size of the general linear univariate model. The American Statistician, 49, Taylor, D. J., & Muller, K. E. (1996). Bias in linear model ower and samle size calculation due to estimating noncentrality. Communications in Statistics: Theory and Methods, 5, Thomson, B. (1999). Why encouraging effect size reorting is not working: The etiology of researcher resistance to changing ractices. The Journal of Psychology, 133, Venables, W. (1975). Calculation of confidence intervals for noncentrality arameters. Journal of the Royal Statistical Society, Series B, 37, Westlake, W. J. (1976). Symmetrical confidence intervals for bioequivalence trials. Biometrics, 3, Aendix A The Relationshi Between and in OneWay ANOVA Define, for the samle means, s x 1 1 x j x. j1 (A1) Moreover, the lim of a samle moment is equal to the corresonding oulation moment. We define as lim(r ), that is, the value that R converges to in an infinite oulation. Then limr lim SS treatments SS treatments SS error The corresonding oulation quantity is s 1 1 j1 j XQ 1X, (A) n 1 where, X, and Q 1 are as described in Equations 9 through 38. In a balanced, oneway ANOVA, with grous and n observations er grou, SS treatments n( 1)s x. Consider any estimator ˆ of a arameter. The robability limit of, ˆ denoted lim( ˆ), is equal to a value c if and only if for any error tolerance 0, we have lim n3 Prˆ c 1. limn 1s x limn 1s x limn 1MS error lims x lims x lim n3n limms 1n error s s 1 1s 1s. (A3) The notion of a robability limit is closely related to that of consistency, in that ˆ is a consistent estimator for if and only if lim n3 ( ) ˆ. In what follows, for brevity of notation, I simly write lim(x) rather than lim n3 (X). I use a number of wellknown results. In articular, if lim(x) and lim(y) exist, then limx Y limx limy, limx/y limx/limy, limxy limxlimy. Combining Equations A1 through A3, we obtain and XQ 1 X n XQ 1 X n (A4) XQ 1 X N tot 1 N tot n XQ 1X, (A5) where is as defined in Equation 34.
18 BEYOND THE F TEST 181 Aendix B The Distribution of the F Statistic for r contrast Assume the general linear model as described in Equations 9 and 30. For any full column rank matrix A, define P A A(AA) 1 A, and Q A I P A, with I a conformable identity matrix. Define 1 to be a column of 1s. Partition X as X [1 x 1 X ]. x 1 contains relications of the contrast weights for the contrast being evaluated, so that the ith value in x 1 is the contrast weight for the grou that y i is in, and X contains a set of columns that are the orthogonal comlement of the contrast weights in x 1. Thus, for examle, if x 1 contains contrast weights for evaluating linear trend, X would contain quadratic and cubic contrast weights (or some full rank transformation of them). The regression weight vector is artitioned accordingly as. (B1) 0 1 Define as a vector of the oulation means of the grous, and c as the linear weights for the contrast of interest. In this case, the contrast of interest is c, (B) and because x 1 contains n relications of the elements of c, and E() contains n relications of the elements of, we have This statistic is a ratio of two quadratic forms, in the general form yay/a yby/b, (B7) where a 1, and b N tot. From Searle (1987, ), F contrast has a noncentral F distribution with a and b degrees of freedom and noncentrality arameter contrast XAX/ (B8) if A is idemotent, B is idemotent, AB 0, and a and b are the ranks of A and B, resectively. These four roerties are easily established by substitution and the fact that x 1, X, and 1 are airwise orthogonal. The orthogonality imlies that the noncentral F distribution has a noncentrality arameter equal to contrast XP x 1 X 1 x 1 x 1, (B9) and the ranks of the A and B are 1 and N tot, resectively. Next, we derive the relationshi between contrast and the oulation equivalent of r contrast. We may write r contrast as follows: and cc x 1 x 1 /n 1 n x 1 x 1. (B3) (B4) I first demonstrate that an F statistic may be constructed for r contrast. Rosenthal et al. (000) defined r contrast as the squared artial correlation between y and x 1 with X artialed out. This samle statistic can be comuted as the following ratio of quadratic forms in y: r contrast yp x1 y yi P X P 1 y. (B5) r contrast We define contrast contrast F contrast F contrast n 1 nˆ /MS error cc nˆ /MS error cc n 1 ˆ ˆ ccms error n 1/n. as limr contrast limˆ limˆ cc lim n3 n 1 n limms error (B10) Consider the statistic F contrast r contrast 1 r contrast /N tot Hence, cc. (B11) yp x1 y yi P x1 P X P 1 y/n tot. (B6) contrast 1 contrast cc, (B1) Aendix continues
19 18 STEIGER which, after substitution of Equations B3 and B4, becomes contrast /n x 1x contrast x 1 x 1 /n x 1x 1 1 n x 1x 1 1 N tot. (B13) Recalling the result of Equation B9 for contrast, we have thus shown that contrast x 1x 1 1 contrast N tot. (B14) 1 contrast Received February 3, 001 Revision received December 1, 003 Acceted January 8, 004
Effect Sizes Based on Means
CHAPTER 4 Effect Sizes Based on Means Introduction Raw (unstardized) mean difference D Stardized mean difference, d g Resonse ratios INTRODUCTION When the studies reort means stard deviations, the referred
More informationThe impact of metadata implementation on webpage visibility in search engine results (Part II) q
Information Processing and Management 41 (2005) 691 715 www.elsevier.com/locate/inforoman The imact of metadata imlementation on webage visibility in search engine results (Part II) q Jin Zhang *, Alexandra
More informationConfidence Intervals for CaptureRecapture Data With Matching
Confidence Intervals for CatureRecature Data With Matching Executive summary Caturerecature data is often used to estimate oulations The classical alication for animal oulations is to take two samles
More informationLoglikelihood and Confidence Intervals
Stat 504, Lecture 3 Stat 504, Lecture 3 2 Review (contd.): Loglikelihood and Confidence Intervals The likelihood of the samle is the joint PDF (or PMF) L(θ) = f(x,.., x n; θ) = ny f(x i; θ) i= Review:
More informationHOMEWORK (due Fri, Nov 19): Chapter 12: #62, 83, 101
Today: Section 2.2, Lesson 3: What can go wrong with hyothesis testing Section 2.4: Hyothesis tests for difference in two roortions ANNOUNCEMENTS: No discussion today. Check your grades on eee and notify
More informationThe risk of using the Q heterogeneity estimator for software engineering experiments
Dieste, O., Fernández, E., GarcíaMartínez, R., Juristo, N. 11. The risk of using the Q heterogeneity estimator for software engineering exeriments. The risk of using the Q heterogeneity estimator for
More informationMonitoring Frequency of Change By Li Qin
Monitoring Frequency of Change By Li Qin Abstract Control charts are widely used in rocess monitoring roblems. This aer gives a brief review of control charts for monitoring a roortion and some initial
More informationRisk and Return. Sample chapter. e r t u i o p a s d f CHAPTER CONTENTS LEARNING OBJECTIVES. Chapter 7
Chater 7 Risk and Return LEARNING OBJECTIVES After studying this chater you should be able to: e r t u i o a s d f understand how return and risk are defined and measured understand the concet of risk
More informationA Multivariate Statistical Analysis of Stock Trends. Abstract
A Multivariate Statistical Analysis of Stock Trends Aril Kerby Alma College Alma, MI James Lawrence Miami University Oxford, OH Abstract Is there a method to redict the stock market? What factors determine
More informationA Brief Introduction to Design of Experiments
J. K. TELFORD D A Brief Introduction to Design of Exeriments Jacqueline K. Telford esign of exeriments is a series of tests in which uroseful changes are made to the inut variables of a system or rocess
More informationHYPOTHESIS TESTING FOR THE PROCESS CAPABILITY RATIO. A thesis presented to. the faculty of
HYPOTHESIS TESTING FOR THE PROESS APABILITY RATIO A thesis resented to the faculty of the Russ ollege of Engineering and Technology of Ohio University In artial fulfillment of the requirement for the degree
More informationChapter 9, Part B Hypothesis Tests. Learning objectives
Chater 9, Part B Hyothesis Tests Slide 1 Learning objectives 1. Able to do hyothesis test about Poulation Proortion 2. Calculatethe Probability of Tye II Errors 3. Understand ower of the test 4. Determinethe
More informationAn important observation in supply chain management, known as the bullwhip effect,
Quantifying the Bullwhi Effect in a Simle Suly Chain: The Imact of Forecasting, Lead Times, and Information Frank Chen Zvi Drezner Jennifer K. Ryan David SimchiLevi Decision Sciences Deartment, National
More informationThe Graphical Method. Lecture 1
References: Anderson, Sweeney, Williams: An Introduction to Management Science  quantitative aroaches to decision maing 7 th ed Hamdy A Taha: Oerations Research, An Introduction 5 th ed Daellenbach, George,
More informationAn Introduction to Risk Parity Hossein Kazemi
An Introduction to Risk Parity Hossein Kazemi In the aftermath of the financial crisis, investors and asset allocators have started the usual ritual of rethinking the way they aroached asset allocation
More informationWeb Application Scalability: A ModelBased Approach
Coyright 24, Software Engineering Research and Performance Engineering Services. All rights reserved. Web Alication Scalability: A ModelBased Aroach Lloyd G. Williams, Ph.D. Software Engineering Research
More informationLargeScale IP Traceback in HighSpeed Internet: Practical Techniques and Theoretical Foundation
LargeScale IP Traceback in HighSeed Internet: Practical Techniques and Theoretical Foundation Jun Li Minho Sung Jun (Jim) Xu College of Comuting Georgia Institute of Technology {junli,mhsung,jx}@cc.gatech.edu
More informationComparing Dissimilarity Measures for Symbolic Data Analysis
Comaring Dissimilarity Measures for Symbolic Data Analysis Donato MALERBA, Floriana ESPOSITO, Vincenzo GIOVIALE and Valentina TAMMA Diartimento di Informatica, University of Bari Via Orabona 4 76 Bari,
More informationDAYAHEAD ELECTRICITY PRICE FORECASTING BASED ON TIME SERIES MODELS: A COMPARISON
DAYAHEAD ELECTRICITY PRICE FORECASTING BASED ON TIME SERIES MODELS: A COMPARISON Rosario Esínola, Javier Contreras, Francisco J. Nogales and Antonio J. Conejo E.T.S. de Ingenieros Industriales, Universidad
More informationlecture 25: Gaussian quadrature: nodes, weights; examples; extensions
38 lecture 25: Gaussian quadrature: nodes, weights; examles; extensions 3.5 Comuting Gaussian quadrature nodes and weights When first aroaching Gaussian quadrature, the comlicated characterization of the
More informationManaging specific risk in property portfolios
Managing secific risk in roerty ortfolios Andrew Baum, PhD University of Reading, UK Peter Struemell OPC, London, UK Contact author: Andrew Baum Deartment of Real Estate and Planning University of Reading
More informationEvaluating a WebBased Information System for Managing Master of Science Summer Projects
Evaluating a WebBased Information System for Managing Master of Science Summer Projects Till Rebenich University of Southamton tr08r@ecs.soton.ac.uk Andrew M. Gravell University of Southamton amg@ecs.soton.ac.uk
More information6.042/18.062J Mathematics for Computer Science December 12, 2006 Tom Leighton and Ronitt Rubinfeld. Random Walks
6.042/8.062J Mathematics for Comuter Science December 2, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Random Walks Gambler s Ruin Today we re going to talk about onedimensional random walks. In
More informationOn the predictive content of the PPI on CPI inflation: the case of Mexico
On the redictive content of the PPI on inflation: the case of Mexico José Sidaoui, Carlos Caistrán, Daniel Chiquiar and Manuel RamosFrancia 1 1. Introduction It would be natural to exect that shocks to
More informationNormally Distributed Data. A mean with a normal value Test of Hypothesis Sign Test Paired observations within a single patient group
ANALYSIS OF CONTINUOUS VARIABLES / 31 CHAPTER SIX ANALYSIS OF CONTINUOUS VARIABLES: COMPARING MEANS In the last chater, we addressed the analysis of discrete variables. Much of the statistical analysis
More informationTworesource stochastic capacity planning employing a Bayesian methodology
Journal of the Oerational Research Society (23) 54, 1198 128 r 23 Oerational Research Society Ltd. All rights reserved. 165682/3 $25. www.algravejournals.com/jors Tworesource stochastic caacity lanning
More informationMultiperiod Portfolio Optimization with General Transaction Costs
Multieriod Portfolio Otimization with General Transaction Costs Victor DeMiguel Deartment of Management Science and Oerations, London Business School, London NW1 4SA, UK, avmiguel@london.edu Xiaoling Mei
More informationAn Analysis of Reliable Classifiers through ROC Isometrics
An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. SrinkhuizenKuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICCIKAT, Universiteit
More informationMachine Learning with Operational Costs
Journal of Machine Learning Research 14 (2013) 19892028 Submitted 12/11; Revised 8/12; Published 7/13 Machine Learning with Oerational Costs Theja Tulabandhula Deartment of Electrical Engineering and
More informationRisk in Revenue Management and Dynamic Pricing
OPERATIONS RESEARCH Vol. 56, No. 2, March Aril 2008,. 326 343 issn 0030364X eissn 15265463 08 5602 0326 informs doi 10.1287/ore.1070.0438 2008 INFORMS Risk in Revenue Management and Dynamic Pricing Yuri
More informationSOME PROPERTIES OF EXTENSIONS OF SMALL DEGREE OVER Q. 1. Quadratic Extensions
SOME PROPERTIES OF EXTENSIONS OF SMALL DEGREE OVER Q TREVOR ARNOLD Abstract This aer demonstrates a few characteristics of finite extensions of small degree over the rational numbers Q It comrises attemts
More informationA Modified Measure of Covert Network Performance
A Modified Measure of Covert Network Performance LYNNE L DOTY Marist College Deartment of Mathematics Poughkeesie, NY UNITED STATES lynnedoty@maristedu Abstract: In a covert network the need for secrecy
More informationA MOST PROBABLE POINTBASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION
9 th ASCE Secialty Conference on Probabilistic Mechanics and Structural Reliability PMC2004 Abstract A MOST PROBABLE POINTBASED METHOD FOR RELIABILITY ANALYSIS, SENSITIVITY ANALYSIS AND DESIGN OPTIMIZATION
More informationLarge firms and heterogeneity: the structure of trade and industry under oligopoly
Large firms and heterogeneity: the structure of trade and industry under oligooly Eddy Bekkers University of Linz Joseh Francois University of Linz & CEPR (London) ABSTRACT: We develo a model of trade
More information4 Perceptron Learning Rule
Percetron Learning Rule Objectives Objectives  Theory and Examles  Learning Rules  Percetron Architecture 3 SingleNeuron Percetron 5 MultileNeuron Percetron 8 Percetron Learning Rule 8 Test Problem
More informationEffects of Math Tutoring
Requestor: Math Deartment Researcher(s): Steve Blohm Date: 6/30/15 Title: Effects of Math Tutoring Effects of Math Tutoring The urose of this study is to measure the effects of math tutoring at Cabrillo
More informationPOISSON PROCESSES. Chapter 2. 2.1 Introduction. 2.1.1 Arrival processes
Chater 2 POISSON PROCESSES 2.1 Introduction A Poisson rocess is a simle and widely used stochastic rocess for modeling the times at which arrivals enter a system. It is in many ways the continuoustime
More informationImplementation of Statistic Process Control in a Painting Sector of a Automotive Manufacturer
4 th International Conference on Industrial Engineering and Industrial Management IV Congreso de Ingeniería de Organización Donostia an ebastián, etember 8 th  th Imlementation of tatistic Process Control
More informationIntroduction to NPCompleteness Written and copyright c by Jie Wang 1
91.502 Foundations of Comuter Science 1 Introduction to Written and coyright c by Jie Wang 1 We use timebounded (deterministic and nondeterministic) Turing machines to study comutational comlexity of
More informationPrecalculus Prerequisites a.k.a. Chapter 0. August 16, 2013
Precalculus Prerequisites a.k.a. Chater 0 by Carl Stitz, Ph.D. Lakeland Community College Jeff Zeager, Ph.D. Lorain County Community College August 6, 0 Table of Contents 0 Prerequisites 0. Basic Set
More informationProject Management and. Scheduling CHAPTER CONTENTS
6 Proect Management and Scheduling HAPTER ONTENTS 6.1 Introduction 6.2 Planning the Proect 6.3 Executing the Proect 6.7.1 Monitor 6.7.2 ontrol 6.7.3 losing 6.4 Proect Scheduling 6.5 ritical Path Method
More informationA Simple Model of Pricing, Markups and Market. Power Under Demand Fluctuations
A Simle Model of Pricing, Markus and Market Power Under Demand Fluctuations Stanley S. Reynolds Deartment of Economics; University of Arizona; Tucson, AZ 85721 Bart J. Wilson Economic Science Laboratory;
More information1 Gambler s Ruin Problem
Coyright c 2009 by Karl Sigman 1 Gambler s Ruin Problem Let N 2 be an integer and let 1 i N 1. Consider a gambler who starts with an initial fortune of $i and then on each successive gamble either wins
More informationIndex Numbers OPTIONAL  II Mathematics for Commerce, Economics and Business INDEX NUMBERS
Index Numbers OPTIONAL  II 38 INDEX NUMBERS Of the imortant statistical devices and techniques, Index Numbers have today become one of the most widely used for judging the ulse of economy, although in
More informationComputational Finance The Martingale Measure and Pricing of Derivatives
1 The Martingale Measure 1 Comutational Finance The Martingale Measure and Pricing of Derivatives 1 The Martingale Measure The Martingale measure or the Risk Neutral robabilities are a fundamental concet
More informationTHE RELATIONSHIP BETWEEN EMPLOYEE PERFORMANCE AND THEIR EFFICIENCY EVALUATION SYSTEM IN THE YOTH AND SPORT OFFICES IN NORTH WEST OF IRAN
THE RELATIONSHIP BETWEEN EMPLOYEE PERFORMANCE AND THEIR EFFICIENCY EVALUATION SYSTEM IN THE YOTH AND SPORT OFFICES IN NORTH WEST OF IRAN *Akbar Abdolhosenzadeh 1, Laya Mokhtari 2, Amineh Sahranavard Gargari
More informationMeasuring relative phase between two waveforms using an oscilloscope
Measuring relative hase between two waveforms using an oscilloscoe Overview There are a number of ways to measure the hase difference between two voltage waveforms using an oscilloscoe. This document covers
More informationThe Advantage of Timely Intervention
Journal of Exerimental Psychology: Learning, Memory, and Cognition 2004, Vol. 30, No. 4, 856 876 Coyright 2004 by the American Psychological Association 02787393/04/$12.00 DOI: 10.1037/02787393.30.4.856
More informationCBus Voltage Calculation
D E S I G N E R N O T E S CBus Voltage Calculation Designer note number: 3121256 Designer: Darren Snodgrass Contact Person: Darren Snodgrass Aroved: Date: Synosis: The guidelines used by installers
More informationF inding the optimal, or valuemaximizing, capital
Estimating RiskAdjusted Costs of Financial Distress by Heitor Almeida, University of Illinois at UrbanaChamaign, and Thomas Philion, New York University 1 F inding the otimal, or valuemaximizing, caital
More informationCompensating Fund Managers for RiskAdjusted Performance
Comensating Fund Managers for RiskAdjusted Performance Thomas S. Coleman Æquilibrium Investments, Ltd. Laurence B. Siegel The Ford Foundation Journal of Alternative Investments Winter 1999 In contrast
More informationENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS
ENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS Liviu Grigore Comuter Science Deartment University of Illinois at Chicago Chicago, IL, 60607 lgrigore@cs.uic.edu Ugo Buy Comuter Science
More informationLecture Notes: Discrete Mathematics
Lecture Notes: Discrete Mathematics GMU Math 125001 Sring 2007 Alexei V Samsonovich Any theoretical consideration, no matter how fundamental it is, inevitably relies on key rimary notions that are acceted
More informationAnalysis of Effectiveness of Web based E Learning Through Information Technology
International Journal of Soft Comuting and Engineering (IJSCE) Analysis of Effectiveness of Web based E Learning Through Information Technology Anand Tamrakar, Kamal K. Mehta AbstractAdvancements of
More informationFREQUENCIES OF SUCCESSIVE PAIRS OF PRIME RESIDUES
FREQUENCIES OF SUCCESSIVE PAIRS OF PRIME RESIDUES AVNER ASH, LAURA BELTIS, ROBERT GROSS, AND WARREN SINNOTT Abstract. We consider statistical roerties of the sequence of ordered airs obtained by taking
More informationCOST CALCULATION IN COMPLEX TRANSPORT SYSTEMS
OST ALULATION IN OMLEX TRANSORT SYSTEMS Zoltán BOKOR 1 Introduction Determining the real oeration and service costs is essential if transort systems are to be lanned and controlled effectively. ost information
More informationJoint Production and Financing Decisions: Modeling and Analysis
Joint Production and Financing Decisions: Modeling and Analysis Xiaodong Xu John R. Birge Deartment of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208,
More informationNBER WORKING PAPER SERIES HOW MUCH OF CHINESE EXPORTS IS REALLY MADE IN CHINA? ASSESSING DOMESTIC VALUEADDED WHEN PROCESSING TRADE IS PERVASIVE
NBER WORKING PAPER SERIES HOW MUCH OF CHINESE EXPORTS IS REALLY MADE IN CHINA? ASSESSING DOMESTIC VALUEADDED WHEN PROCESSING TRADE IS PERVASIVE Robert Kooman Zhi Wang ShangJin Wei Working Paer 14109
More informationTitle: Stochastic models of resource allocation for services
Title: Stochastic models of resource allocation for services Author: Ralh Badinelli,Professor, Virginia Tech, Deartment of BIT (235), Virginia Tech, Blacksburg VA 2461, USA, ralhb@vt.edu Phone : (54) 2317688,
More informationX How to Schedule a Cascade in an Arbitrary Graph
X How to Schedule a Cascade in an Arbitrary Grah Flavio Chierichetti, Cornell University Jon Kleinberg, Cornell University Alessandro Panconesi, Saienza University When individuals in a social network
More informationD.Sailaja, K.Nasaramma, M.Sumender Roy, Venkateswarlu Bondu
Predictive Modeling of Customers in Personalization Alications with Context D.Sailaja, K.Nasaramma, M.Sumender Roy, Venkateswarlu Bondu Nasaramma.K is currently ursuing her M.Tech in Godavari Institute
More informationPoint Location. Preprocess a planar, polygonal subdivision for point location queries. p = (18, 11)
Point Location Prerocess a lanar, olygonal subdivision for oint location ueries. = (18, 11) Inut is a subdivision S of comlexity n, say, number of edges. uild a data structure on S so that for a uery oint
More informationHow to solve a Cubic Equation Part 3 General Depression and a New Covariant
ow to Solve a ubic Equation Part 3 ow to solve a ubic Equation Part 3 General Deression and a New ovariant James F. Blinn Microsoft Research blinn@microsoft.com Originally ublished in IEEE omuter Grahics
More informationFinding a Needle in a Haystack: Pinpointing Significant BGP Routing Changes in an IP Network
Finding a Needle in a Haystack: Pinointing Significant BGP Routing Changes in an IP Network Jian Wu, Zhuoqing Morley Mao University of Michigan Jennifer Rexford Princeton University Jia Wang AT&T Labs
More informationUniversiteitUtrecht. Department. of Mathematics. Optimal a priori error bounds for the. RayleighRitz method
UniversiteitUtrecht * Deartment of Mathematics Otimal a riori error bounds for the RayleighRitz method by Gerard L.G. Sleijen, Jaser van den Eshof, and Paul Smit Prerint nr. 1160 Setember, 2000 OPTIMAL
More informationThe fast Fourier transform method for the valuation of European style options inthemoney (ITM), atthemoney (ATM) and outofthemoney (OTM)
Comutational and Alied Mathematics Journal 15; 1(1: 16 Published online January, 15 (htt://www.aascit.org/ournal/cam he fast Fourier transform method for the valuation of Euroean style otions inthemoney
More informationIMPROVING NAIVE BAYESIAN SPAM FILTERING
Master Thesis IMPROVING NAIVE BAYESIAN SPAM FILTERING Jon Kågström Mid Sweden University Deartment for Information Technology and Media Sring 005 Abstract Sam or unsolicited email has become a major roblem
More informationThe Changing Wage Return to an Undergraduate Education
DISCUSSION PAPER SERIES IZA DP No. 1549 The Changing Wage Return to an Undergraduate Education Nigel C. O'Leary Peter J. Sloane March 2005 Forschungsinstitut zur Zukunft der Arbeit Institute for the Study
More informationSoftmax Model as Generalization upon Logistic Discrimination Suffers from Overfitting
Journal of Data Science 12(2014),563574 Softmax Model as Generalization uon Logistic Discrimination Suffers from Overfitting F. Mohammadi Basatini 1 and Rahim Chiniardaz 2 1 Deartment of Statistics, Shoushtar
More informationLectures on the Dirichlet Class Number Formula for Imaginary Quadratic Fields. Tom Weston
Lectures on the Dirichlet Class Number Formula for Imaginary Quadratic Fields Tom Weston Contents Introduction 4 Chater 1. Comlex lattices and infinite sums of Legendre symbols 5 1. Comlex lattices 5
More informationAn inventory control system for spare parts at a refinery: An empirical comparison of different reorder point methods
An inventory control system for sare arts at a refinery: An emirical comarison of different reorder oint methods Eric Porras a*, Rommert Dekker b a Instituto Tecnológico y de Estudios Sueriores de Monterrey,
More informationExpert Systems with Applications
Exert Systems with Alications 38 (2011) 11984 11997 Contents lists available at ScienceDirect Exert Systems with Alications journal homeage: www.elsevier.com/locate/eswa Review On the alication of genetic
More informationPRIME NUMBERS AND THE RIEMANN HYPOTHESIS
PRIME NUMBERS AND THE RIEMANN HYPOTHESIS CARL ERICKSON This minicourse has two main goals. The first is to carefully define the Riemann zeta function and exlain how it is connected with the rime numbers.
More informationAutomatic Search for Correlated Alarms
Automatic Search for Correlated Alarms KlausDieter Tuchs, Peter Tondl, Markus Radimirsch, Klaus Jobmann Institut für Allgemeine Nachrichtentechnik, Universität Hannover Aelstraße 9a, 0167 Hanover, Germany
More informationPredicate Encryption Supporting Disjunctions, Polynomial Equations, and Inner Products
Predicate Encrytion Suorting Disjunctions, Polynomial Equations, and Inner Products Jonathan Katz Amit Sahai Brent Waters Abstract Predicate encrytion is a new aradigm for ublickey encrytion that generalizes
More informationStorage Basics Architecting the Storage Supplemental Handout
Storage Basics Architecting the Storage Sulemental Handout INTRODUCTION With digital data growing at an exonential rate it has become a requirement for the modern business to store data and analyze it
More informationService Network Design with Asset Management: Formulations and Comparative Analyzes
Service Network Design with Asset Management: Formulations and Comarative Analyzes Jardar Andersen Teodor Gabriel Crainic Marielle Christiansen October 2007 CIRRELT200740 Service Network Design with
More informationINFERRING APP DEMAND FROM PUBLICLY AVAILABLE DATA 1
RESEARCH NOTE INFERRING APP DEMAND FROM PUBLICLY AVAILABLE DATA 1 Rajiv Garg McCombs School of Business, The University of Texas at Austin, Austin, TX 78712 U.S.A. {Rajiv.Garg@mccombs.utexas.edu} Rahul
More informationProbabilistic models for mechanical properties of prestressing strands
Probabilistic models for mechanical roerties of restressing strands Luciano Jacinto a, Manuel Pia b, Luís Neves c, Luís Oliveira Santos b a Instituto Suerior de Engenharia de Lisboa, Rua Conselheiro Emídio
More informationAn Associative Memory Readout in ESN for Neural Action Potential Detection
g An Associative Memory Readout in ESN for Neural Action Potential Detection Nicolas J. Dedual, Mustafa C. Ozturk, Justin C. Sanchez and José C. Princie Abstract This aer describes how Echo State Networks
More informationEconomics 241B Hypothesis Testing: Large Sample Inference
Economics 241B Hyothesis Testing: Large Samle Inference Statistical inference in largesamle theory is base on test statistics whose istributions are nown uner the truth of the null hyothesis. Derivation
More informationTRANSCENDENTAL NUMBERS
TRANSCENDENTAL NUMBERS JEREMY BOOHER. Introduction The Greeks tried unsuccessfully to square the circle with a comass and straightedge. In the 9th century, Lindemann showed that this is imossible by demonstrating
More informationLocal Connectivity Tests to Identify Wormholes in Wireless Networks
Local Connectivity Tests to Identify Wormholes in Wireless Networks Xiaomeng Ban Comuter Science Stony Brook University xban@cs.sunysb.edu Rik Sarkar Comuter Science Freie Universität Berlin sarkar@inf.fuberlin.de
More informationMath 5330 Spring Notes Prime Numbers
Math 5330 Sring 206 Notes Prime Numbers The study of rime numbers is as old as mathematics itself. This set of notes has a bunch of facts about rimes, or related to rimes. Much of this stuff is old dating
More informationFourier Analysis of Stochastic Processes
Fourier Analysis of Stochastic Processes. Time series Given a discrete time rocess ( n ) nz, with n :! R or n :! C 8n Z, we de ne time series a realization of the rocess, that is to say a series (x n )
More informationAlpha Channel Estimation in High Resolution Images and Image Sequences
In IEEE Comuter Society Conference on Comuter Vision and Pattern Recognition (CVPR 2001), Volume I, ages 1063 68, auai Hawaii, 11th 13th Dec 2001 Alha Channel Estimation in High Resolution Images and Image
More informationInt. J. Advanced Networking and Applications Volume: 6 Issue: 4 Pages: 23862392 (2015) ISSN: 09750290
2386 Survey: Biological Insired Comuting in the Network Security V Venkata Ramana Associate Professor, Deartment of CSE, CBIT, Proddatur, Y.S.R (dist), A.P516360 Email: ramanacsecbit@gmail.com Y.Subba
More informationAsymmetric Information, Transaction Cost, and. Externalities in Competitive Insurance Markets *
Asymmetric Information, Transaction Cost, and Externalities in Cometitive Insurance Markets * Jerry W. iu Deartment of Finance, University of Notre Dame, Notre Dame, IN 465565646 wliu@nd.edu Mark J. Browne
More informationSynopsys RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Development FRANCE
RURAL ELECTRICATION PLANNING SOFTWARE (LAPER) Rainer Fronius Marc Gratton Electricité de France Research and Develoment FRANCE Synosys There is no doubt left about the benefit of electrication and subsequently
More informationOntheJob Search, Work Effort and Hyperbolic Discounting
OntheJob Search, Work Effort and Hyerbolic Discounting Thomas van Huizen March 2010  Preliminary draft  ABSTRACT This aer assesses theoretically and examines emirically the effects of time references
More informationThe Online Freezetag Problem
The Online Freezetag Problem Mikael Hammar, Bengt J. Nilsson, and Mia Persson Atus Technologies AB, IDEON, SE3 70 Lund, Sweden mikael.hammar@atus.com School of Technology and Society, Malmö University,
More informationCFRI 3,4. Zhengwei Wang PBC School of Finance, Tsinghua University, Beijing, China and SEBA, Beijing Normal University, Beijing, China
The current issue and full text archive of this journal is available at www.emeraldinsight.com/20441398.htm CFRI 3,4 322 constraints and cororate caital structure: a model Wuxiang Zhu School of Economics
More informationDiscrete Stochastic Approximation with Application to Resource Allocation
Discrete Stochastic Aroximation with Alication to Resource Allocation Stacy D. Hill An otimization roblem involves fi nding the best value of an obective function or fi gure of merit the value that otimizes
More informationAssignment 9; Due Friday, March 17
Assignment 9; Due Friday, March 17 24.4b: A icture of this set is shown below. Note that the set only contains oints on the lines; internal oints are missing. Below are choices for U and V. Notice that
More informationCABRS CELLULAR AUTOMATON BASED MRI BRAIN SEGMENTATION
XI Conference "Medical Informatics & Technologies"  2006 Rafał Henryk KARTASZYŃSKI *, Paweł MIKOŁAJCZAK ** MRI brain segmentation, CT tissue segmentation, Cellular Automaton, image rocessing, medical
More informationElectronic Commerce Research and Applications
Electronic Commerce Research and Alications 12 (2013) 246 259 Contents lists available at SciVerse ScienceDirect Electronic Commerce Research and Alications journal homeage: www.elsevier.com/locate/ecra
More informationStochastic Derivation of an Integral Equation for Probability Generating Functions
Journal of Informatics and Mathematical Sciences Volume 5 (2013), Number 3,. 157 163 RGN Publications htt://www.rgnublications.com Stochastic Derivation of an Integral Equation for Probability Generating
More informationDrinking water systems are vulnerable to
34 UNIVERSITIES COUNCIL ON WATER RESOURCES ISSUE 129 PAGES 344 OCTOBER 24 Use of Systems Analysis to Assess and Minimize Water Security Risks James Uber Regan Murray and Robert Janke U. S. Environmental
More informationFailure Behavior Analysis for Reliable Distributed Embedded Systems
Failure Behavior Analysis for Reliable Distributed Embedded Systems Mario Tra, Bernd Schürmann, Torsten Tetteroo {tra schuerma tetteroo}@informatik.unikl.de Deartment of Comuter Science, University of
More informationBranchandPrice for Service Network Design with Asset Management Constraints
BranchandPrice for Servicee Network Design with Asset Management Constraints Jardar Andersen Roar Grønhaug Mariellee Christiansen Teodor Gabriel Crainic December 2007 CIRRELT200755 BranchandPrice
More information