FixedEffect Versus RandomEffects Models


 Miles Patterson
 1 years ago
 Views:
Transcription
1 CHAPTER 13 FixedEffect Versus RandomEffects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval The null hypothesis Which model should we use? Model should not be based on the test for heterogeneity Concluding remarks INTRODUCTION In Chapter 11 and Chapter 1 we introduced the fixedeffect and randomeffects models. Here, we highlight the conceptual and practical differences between them. Consider the forest plots in Figures 13.1 and 13.. They include the same six studies, but the first uses a fixedeffect analysis and the second a randomeffects analysis. These plots provide a context for the discussion that follows. DEFINITION OF A SUMMARY EFFECT Both plots show a summary effect on the bottom line, but the meaning of this summary effect is different in the two models. In the fixedeffect analysis we assumethatthetrueeffectsizeisthesame in all studies, and the summary effect is our estimate of this common effect size. In the randomeffects analysis we assume that the true effect size varies from one study to the next, and that the studies in our analysis represent a random sample of effect sizes that could Introduction to MetaAnalysis. Michael Borenstein, L. V. Hedges, J. P. T. Higgins and H. R. Rothstein 009 John Wiley & Sons, Ltd. ISBN:
2 78 FixedEffect Versus RandomEffects Models Impact of Intervention (Fixed effect) Std Diff (g) Relative Weight Standardized mean difference (g) and 95% confidence interval Carroll Grant Peck Donat Stewart Young Summary % 13% 8% 39% 10% 18% 100% Figure 13.1 Fixedeffect model forest plot showing relative weights. Impact of Intervention (Random effects) Std Diff (g) Relative Weight Standardized mean difference (g) with 95% confidence and prediction intervals Carroll Grant Peck Donat Stewart Young Summary % 16% 13% 3% 14% 18% 100% Figure 13. Randomeffects model forest plot showing relative weights. have been observed. The summary effect is our estimate of the mean of these effects. ESTIMATING THE SUMMARY EFFECT Under the fixedeffect model we assume that the true effect size for all studies is identical, and the only reason the effect size varies between studies is sampling error (error in estimating the effect size). Therefore, when assigning
3 Chapter 13: FixedEffect Versus RandomEffects Models weights to the different studies we can largely ignore the information in the smaller studies since we have better information about the same effect size in the larger studies. By contrast, under the randomeffects model the goal is not to estimate one true effect, but to estimate the mean of a distribution of effects. Since each study provides information about a different effect size, we want to be sure that all these effect sizes are represented in the summary estimate. This means that we cannot discount a small study by giving it a very small weight (the way we would in a fixedeffect analysis). The estimate provided by that study may be imprecise, but it is information about an effect that no other study has estimated. By the same logic we cannot give too much weight to a very large study (the way we might in a fixedeffect analysis). Our goal is to estimate the mean effect in a range of studies, and we do not want that overall estimate to be overly influenced by any one of them. In these graphs, the weight assigned to each study is reflected in the size of the box (specifically, the area) for that study. Under the fixedeffect model there is a wide range of weights (as reflected in the size of the boxes) whereas under the randomeffects model the weights fall in a relatively narrow range. For example, compare the weight assigned to the largest study (Donat) with that assigned to the smallest study (Peck) under the two models. Under the fixedeffect model Donat is given about five times as much weight as Peck. Under the randomeffects model Donat is given only 1.8 times as much weight as Peck. EXTREME EFFECT SIZE IN A LARGE STUDY OR A SMALL STUDY How will the selection of a model influence the overall effect size? In this example Donat is the largest study, and also happens to have the highest effect size. Under the fixedeffect model Donat was assigned a large share (39%) of the total weight and pulled the mean effect up to By contrast, under the randomeffects model Donat was assigned a relatively modest share of the weight (3%). It therefore had less pull on the mean, which was computed as Similarly, Carroll is one of the smaller studies and happens to have the smallest effect size. Under the fixedeffect model Carroll was assigned a relatively small proportion of the total weight (1%), and had little influence on the summary effect. By contrast, under the randomeffects model Carroll carried a somewhat higher proportion of the total weight (16%) and was able to pull the weighted mean toward the left. The operating premise, as illustrated in these examples, is that whenever is nonzero, the relative weights assigned under random effects will be more balanced than those assigned under fixed effects. As we move from fixed effect to random effects, extreme studies will lose influence if they are large, and will gain influence if they are small.
4 80 FixedEffect Versus RandomEffects Models CONFIDENCE INTERVAL Under the fixedeffect model the only source of uncertainty is the withinstudy (sampling or estimation) error. Under the randomeffects model there is this same source of uncertainty plus an additional source (betweenstudies variance). It follows that the variance, standard error, and confidence interval for the summary effect will always be larger (or wider) under the randomeffects model than under the fixedeffect model (unless T is zero, in which case the two models are the same). In this example, the standard error is for the fixedeffect model, and for the randomeffects model. Study A Study B Study C Study D Fixedeffect model Effect size and 95% confidence interval Summary Figure 13.3 Very large studies under fixedeffect model. Study A Study B Study C Study D Randomeffects model Effect size and 95% confidence interval Summary Figure 13.4 Very large studies under randomeffects model.
5 Chapter 13: FixedEffect Versus RandomEffects Models Consider what would happen if we had five studies, and each study had an infinitely large sample size. Under either model the confidence interval for the effect size in each study would have a width approaching zero, since we know the effect size in that study with perfect precision. Under the fixedeffect model the summary effect would also have a confidence interval with a width of zero, since we know the common effect precisely (Figure 13.3). By contrast, under the randomeffects model the width of the confidence interval would not approach zero (Figure 13.4). While we know the effect in each study precisely, these effects have been sampled from a universe of possible effect sizes, and provide only an estimate of the mean effect. Just as the error within a study will approach zero only as the sample size approaches infinity, so too the error of these studies as an estimate of the mean effect will approach zero only as the number of studies approaches infinity. More generally, it is instructive to consider what factors influence the standard error of the summary effect under the two models. The following formulas are based on a metaanalysis of means from k onegroup studies, but the conceptual argument applies to all metaanalyses. The withinstudy variance of each mean depends on the standard deviation (denoted ) of participants scores and the sample size of each study (n). For simplicity we assume that all of the studies have the same sample size and the same standard deviation (see Box 13.1 for details). Under the fixedeffect model the standard error of the summary effect is given by rffiffiffiffiffiffiffiffiffiffiffiffi SE M ¼ : ð13:1þ k n It follows that with a large enough sample size the standard error will approach zero, and this is true whether the sample size is concentrated on one or two studies, or dispersed across any number of studies. Under the randomeffects model the standard error of the summary effect is given by rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi SE M ¼ k n þ : ð13:þ k The first term is identical to that for the fixedeffect model and, again, with a large enough sample size, this term will approach zero. By contrast, the second term (which reflects the betweenstudies variance) will only approach zero as the number of studies approaches infinity. These formulas do not apply exactly in practice, but the conceptual argument does. Namely, increasing the sample size within studies is not sufficient to reduce the standard error beyond a certain point (where that point is determined by and k). If there is only a small number of studies, then the standard error could still be substantial even if the total n is in the tens of thousands or higher.
6 8 FixedEffect Versus RandomEffects Models BOX 13.1 FACTORS THAT INFLUENCE THE STANDARD ERROR OF THE SUMMARY EFFECT. To illustrate the concepts with some simple formulas, let us consider a metaanalysis of studies with the very simplest design, such that each study comprises a single sample of n observations with standard deviation. We combine estimates of the mean in a metaanalysis. The variance of each estimate is V Yi ¼ n so the (inversevariance) weight in a fixedeffect metaanalysis is W i ¼ 1 =n ¼ n and the variance of the summary effect under the fixedeffect model the standard error is given by V M ¼ X k 1 i¼1 W i ¼ 1 k n= ¼ k n : Therefore under the fixedeffect model the (true) standard error of the summary mean is given by SE M ¼ rffiffiffiffiffiffiffiffiffiffiffiffi : k n Under the randomeffects model the weight awarded to each study is W i ¼ 1 n þ and the (true) standard error of the summary mean turns out to be SE M rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ k n þ : k
7 Chapter 13: FixedEffect Versus RandomEffects Models THE NULL HYPOTHESIS Often, after computing a summary effect, researchers perform a test of the null hypothesis. Under the fixedeffect model the null hypothesis being tested is that there is zero effect in every study. Under the randomeffects model the null hypothesis being tested is that the mean effect is zero. Although some may treat these hypotheses as interchangeable, they are in fact different, and it is imperative to choose the test that is appropriate to the inference a researcher wishes to make. WHICH MODEL SHOULD WE USE? The selection of a computational model should be based on our expectation about whether or not the studies share a common effect size and on our goals in performing the analysis. Fixed effect It makes sense to use the fixedeffect model if two conditions are met. First, we believe that all the studies included in the analysis are functionally identical. Second, our goal is to compute the common effect size for the identified population, and not to generalize to other populations. For example, suppose that a pharmaceutical company will use a thousand patients to compare a drug versus placebo. Because the staff can work with only 100 patients at a time, the company will run a series of ten trials with 100 patients in each. The studies are identical in the sense that any variables which can have an impact on the outcome are the same across the ten studies. Specifically, the studies draw patients from a common pool, using the same researchers, dose, measure, and so on (we assume that there is no concern about practice effects for the researchers, nor for the different starting times of the various cohorts). All the studies are expected to share a common effect and so the first condition is met. The goal of the analysis is to see if the drug works in the population from which the patients were drawn (and not to extrapolate to other populations), and so the second condition is met, as well. In this example the fixedeffect model is a plausible fit for the data and meets the goal of the researchers. It should be clear, however, that this situation is relatively rare. The vast majority of cases will more closely resemble those discussed immediately below. Random effects By contrast, when the researcher is accumulating data from a series of studies that had been performed by researchers operating independently, it would be unlikely that all the studies were functionally equivalent. Typically, the subjects or interventions in these studies would have differed in ways that would have impacted on
8 84 FixedEffect Versus RandomEffects Models the results, and therefore we should not assume a common effect size. Therefore, in these cases the randomeffects model is more easily justified than the fixedeffect model. Additionally, the goal of this analysis is usually to generalize to a range of scenarios. Therefore, if one did make the argument that all the studies used an identical, narrowly defined population, then it would not be possible to extrapolate from this population to others, and the utility of the analysis would be severely limited. A caveat There is one caveat to the above. If the number of studies is very small, then the estimate of the betweenstudies variance ( ) will have poor precision. While the randomeffects model is still the appropriate model, we lack the information needed to apply it correctly. In this case the reviewer may choose among several options, each of them problematic. One option is to report the separate effects and not report a summary effect. The hope is that the reader will understand that we cannot draw conclusions about the effect size and its confidence interval. The problem is that some readers will revert to vote counting (see Chapter 8) and possibly reach an erroneous conclusion. Another option is to perform a fixedeffect analysis. This approach would yield a descriptive analysis of the included studies, but would not allow us to make inferences about a wider population. The problem with this approach is that (a) we do want to make inferences about a wider population and (b) readers will make these inferences even if they are not warranted. A third option is to take a Bayesian approach, where the estimate of is based on data from outside of the current set of studies. This is probably the best option, but the problem is that relatively few researchers have expertise in Bayesian metaanalysis. Additionally, some researchers have a philosophical objection to this approach. For a more general discussion of this issue see When does it make sense to perform a metaanalysis in Chapter 40. MODEL SHOULD NOT BE BASED ON THE TEST FOR HETEROGENEITY In the next chapter we will introduce a test of the null hypothesis that the betweenstudies variance is zero. This test is based on the amount of betweenstudies variance observed, relative to the amount we would expect if the studies actually shared a common effect size. Some have adopted the practice of starting with a fixedeffect model and then switching to a randomeffects model if the test of homogeneity is statistically significant. This practice should be strongly discouraged because the decision to use the randomeffects model should be based on our understanding of whether or not all studies share a common effect size, and not on the outcome of a statistical test (especially since the test for heterogeneity often suffers from low power).
9 Chapter 13: FixedEffect Versus RandomEffects Models If the study effect sizes are seen as having been sampled from a distribution of effect sizes, then the randomeffects model, which reflects this idea, is the logical one to use. If the betweenstudies variance is substantial (and statistically significant) then the fixedeffect model is inappropriate. However, even if the betweenstudies variance does not meet the criterion for statistical significance (which may be due simply to low power) we should still take account of this variance when assigning weights. If T turns out to be zero, then the randomeffects analysis reduces to the fixedeffect analysis, and so there is no cost to using this model. On the other hand, if one has elected to use the fixedeffect model a priori but the test of homogeneity is statistically significant, then it would be important to revisit the assumptions that led to the selection of a fixedeffect model. CONCLUDING REMARKS Our discussion of differences between the fixedmodel and the randomeffects model focused largely on the computation of a summary effect and the confidence intervals for the summary effect. We did not address the implications of the dispersion itself. Under the fixedeffect model we assume that all dispersion in observed effects is due to sampling error, but under the randomeffects model we allow that some of that dispersion reflects real differences in effect size across studies. In the chapters that follow we discuss methods to quantify that dispersion and to consider its substantive implications. Although throughout this book we define a fixedeffect metaanalysis as assuming that every study has a common true effect size, some have argued that the fixedeffect method is valid without making this assumption. The point estimate of the effect in a fixedeffect metaanalysis is simply a weighted average and does not strictly require the assumption that all studies estimate the same thing. For simplicity and clarity we adopt a definition of a fixedeffect metaanalysis that does assume homogeneity of effect. SUMMARY POINTS A fixedeffect metaanalysis estimates a single effect that is assumed to be common to every study, while a randomeffects metaanalysis estimates the mean of a distribution of effects. Study weights are more balanced under the randomeffects model than under the fixedeffect model. Large studies are assigned less relative weight and small studies are assigned more relative weight as compared with the fixedeffect model. The standard error of the summary effect and (it follows) the confidence intervals for the summary effect are wider under the randomeffects model than under the fixedeffect model.
10 86 FixedEffect Versus RandomEffects Models The selection of a model must be based solely on the question of which model fits the distribution of effect sizes, and takes account of the relevant source(s) of error. When studies are gathered from the published literature, the randomeffects model is generally a more plausible match. The strategy of starting with a fixedeffect model and then moving to a randomeffects model if the test for heterogeneity is significant is a mistake, and should be strongly discouraged.
The InStat guide to choosing and interpreting statistical tests
Version 3.0 The InStat guide to choosing and interpreting statistical tests Harvey Motulsky 19902003, GraphPad Software, Inc. All rights reserved. Program design, manual and help screens: Programming:
More informationREAD MORE READ BETTER? A METAANALYSIS OF THE LITERATURE ON THE RELATIONSHIP BETWEEN EXPOSURE TO READING AND READING ACHIEVEMENT M. Lewis S. J.
READ MORE READ BETTER? A METAANALYSIS OF THE LITERATURE ON THE RELATIONSHIP BETWEEN EXPOSURE TO READING AND READING ACHIEVEMENT M. Lewis S. J. Samuels University of Minnesota 1 Abstract While most educators
More informationIntroduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
More informationAn Introduction to Regression Analysis
The Inaugural Coase Lecture An Introduction to Regression Analysis Alan O. Sykes * Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator
More informationWhen Moderation Is Mediated and Mediation Is Moderated
Journal of Personality and Social Psychology Copyright 2005 by the American Psychological Association 2005, Vol. 89, No. 6, 852 863 00223514/05/$12.00 DOI: 10.1037/00223514.89.6.852 When Moderation Is
More informationMisunderstandings between experimentalists and observationalists about causal inference
J. R. Statist. Soc. A (2008) 171, Part 2, pp. 481 502 Misunderstandings between experimentalists and observationalists about causal inference Kosuke Imai, Princeton University, USA Gary King Harvard University,
More informationThe Capital Asset Pricing Model: Some Empirical Tests
The Capital Asset Pricing Model: Some Empirical Tests Fischer Black* Deceased Michael C. Jensen Harvard Business School MJensen@hbs.edu and Myron Scholes Stanford University  Graduate School of Business
More information8 INTERPRETATION OF SURVEY RESULTS
8 INTERPRETATION OF SURVEY RESULTS 8.1 Introduction This chapter discusses the interpretation of survey results, primarily those of the final status survey. Interpreting a survey s results is most straightforward
More informationGreat Minds Think Alike, and Fools Seldom Differ: A Theory on the Moments of Firms' Performance Distributions.
Paper to be presented at the DRUID Academy 2012 on January 1921 at University of Cambridge /The Moeller Centre Great Minds Think Alike, and Fools Seldom Differ: A Theory on the Moments of Firms' Performance
More informationOtherRegarding Preferences: A Selective Survey of Experimental Results* David J. Cooper Florida State University. John H. Kagel Ohio State University
OtherRegarding Preferences: A Selective Survey of Experimental Results* David J. Cooper Florida State University John H. Kagel Ohio State University 2/12/2013 To appear in the Handbook of Experimental
More informationHow to Deal with The LanguageasFixedEffect Fallacy : Common Misconceptions and Alternative Solutions
Journal of Memory and Language 41, 416 46 (1999) Article ID jmla.1999.650, available online at http://www.idealibrary.com on How to Deal with The LanguageasFixedEffect Fallacy : Common Misconceptions
More informationResults from the 2014 AP Statistics Exam. Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu
Results from the 2014 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu The six freeresponse questions Question #1: Extracurricular activities
More informationCan political science literatures be believed? A study of publication bias in the APSR and the AJPS
Can political science literatures be believed? A study of publication bias in the APSR and the AJPS Alan Gerber Yale University Neil Malhotra Stanford University Abstract Despite great attention to the
More informationHow Much Can We Generalize from Impact Evaluations?
How Much Can We Generalize from Impact Evaluations? Eva Vivalt New York University April 30, 2015 Abstract Impact evaluations aim to predict the future, but they are rooted in particular contexts and results
More informationIntroduction to Linear Regression
14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression
More informationWas This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content
Was This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content Ruben Sipos Dept. of Computer Science Cornell University Ithaca, NY rs@cs.cornell.edu Arpita Ghosh Dept. of Information
More informationUnderstanding and Using ACS SingleYear and Multiyear Estimates
Appendix. Understanding and Using ACS SingleYear and Multiyear Estimates What Are SingleYear and Multiyear Estimates? Understanding Period Estimates The ACS produces period estimates of socioeconomic
More informationStatistical Comparisons of Classifiers over Multiple Data Sets
Journal of Machine Learning Research 7 (2006) 1 30 Submitted 8/04; Revised 4/05; Published 1/06 Statistical Comparisons of Classifiers over Multiple Data Sets Janez Demšar Faculty of Computer and Information
More informationComparative CostEffectiveness Analysis to Inform Policy in Developing Countries:
Comparative CostEffectiveness Analysis to Inform Policy in Developing Countries: A General Framework with Applications for Education Iqbal Dhaliwal, Esther Duflo, Rachel Glennerster, Caitlin Tulloch 1
More informationOPRE 6201 : 2. Simplex Method
OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2
More informationHandling Missing Values when Applying Classification Models
Journal of Machine Learning Research 8 (2007) 16251657 Submitted 7/05; Revised 5/06; Published 7/07 Handling Missing Values when Applying Classification Models Maytal SaarTsechansky The University of
More informationSCORES RELIABILITY AND VALI FERENCES ABOUT TEACHERS BA TUDENT TEST SCORES RELIABIL LIDITY OF INFERENCES ABOUT ERS BASED ON STUDENT TEST S
SCORES RELIABILITY AND VALI FERENCES ABOUT TEACHERS BA TUDENT TEST SCORES RELIABIL D VALIDITY OF INFERENCES ABO ERS BASED ON STUDENT TEST S S RELIABILITY AND VALIDITY O LIDITY OF INFERENCES ABOUT ES ABOUT
More informationBIS RESEARCH PAPER NO. 112 THE IMPACT OF UNIVERSITY DEGREES ON THE LIFECYCLE OF EARNINGS: SOME FURTHER ANALYSIS
BIS RESEARCH PAPER NO. 112 THE IMPACT OF UNIVERSITY DEGREES ON THE LIFECYCLE OF EARNINGS: SOME FURTHER ANALYSIS AUGUST 2013 1 THE IMPACT OF UNIVERSITY DEGREES ON THE LIFECYCLE OF EARNINGS: SOME FURTHER
More informationInference by eye is the interpretation of graphically
Inference by Eye Confidence Intervals and How to Read Pictures of Data Geoff Cumming and Sue Finch La Trobe University Wider use in psychology of confidence intervals (CIs), especially as error bars in
More informationLike What You Like or Like What Others Like? Conformity and Peer Effects on Facebook. Johan Egebark and Mathias Ekström
IFN Working Paper No. 886, 2011 Like What You Like or Like What Others Like? Conformity and Peer Effects on Facebook Johan Egebark and Mathias Ekström Research Institute of Industrial Economics P.O. Box
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More informationSample Attrition Bias in Randomized Experiments: A Tale of Two Surveys
DISCUSSION PAPER SERIES IZA DP No. 4162 Sample Attrition Bias in Randomized Experiments: A Tale of Two Surveys Luc Behaghel Bruno Crépon Marc Gurgand Thomas Le Barbanchon May 2009 Forschungsinstitut zur
More informationDo bursaries have an effect on retention rates? An interim report: Research. March 2014/02
March 2014/02 Research This report analyses trends in continuation rates in English higher education, and uses statistical modelling techniques to isolate the effect of institutional bursaries on retention.
More informationTesting a Hypothesis about Two Independent Means
1314 Testing a Hypothesis about Two Independent Means How can you test the null hypothesis that two population means are equal, based on the results observed in two independent samples? Why can t you use
More informationRisk Attitudes of Children and Adults: Choices Over Small and Large Probability Gains and Losses
Experimental Economics, 5:53 84 (2002) c 2002 Economic Science Association Risk Attitudes of Children and Adults: Choices Over Small and Large Probability Gains and Losses WILLIAM T. HARBAUGH University
More information