Consider a study in which How many subjects? The importance of sample size calculations Office of Research Protections Brown Bag Series KB Boomer, Ph.D. Director, boomer@stat.psu.edu A researcher conducts an experiment comparing two new methods with a well established method. Eight subjects are randomly assigned to one of the three methods. The data analysis reveals a p- value of 0.08 for the effect of method. What now? March 30, 2006 An insignificant effect: two possibilities There may truly be no effect. There may truly be an effect. If this experiment is repeated, what is the probability of detecting a significant effect, given that it truly exists? POWER What is the power of this experiment? Based on these results, how many subjects are required to increase the probability of detecting a significant effect in the future? Overview Definition of power and related terms Estimating parameters needed in power calculations Overview of available software 1
Power is a critical design component Level of significance: alpha Power increases as alpha increases Effect size: What is a meaningful change in the response? Power increases as effect size increases How many subjects are required? Power increase as sample size increases Decision Power as defined by hypothesis testing Fail to reject Null Reject Null State of Nature Null is True Correct Type I Error α Alternative is True Type II Error β Correct (power) Typical Values α =0.05 1-β=0.8-0.9 Relationship between Power and Alpha β As alpha increases Power increases Relationship between Power and Effect Size Effect size quantifies what we are hoping to detect => change in treatment means, group proportions, etc. x A =80 and x B =100 then the difference=20 Standardize to remove units: (difference) divided by standard deviation Researcher determines effect size What change would be of scientific interest? Statistical significance doesn t imply practical significance Critical value 2
Relationship between Power and Effect Size Relationship between Power and Sample Size β As the standardized difference between the null and alternative means increases, power increases Sample size, n, is used to estimate the 2 ( y y 2 reliability of our statistics: i i ) s = n 1 When creating a confidence interval, we use a standard error: SE = X 2 s / n As the sample size increases, these measures of variability decrease => more confident in our results Relationship between Power and Sample Size As variance decreases, beta (green) shrinks and power increases σ =0.9 σ =1.5 What questions will a power analysis answer? We can estimate power, effect size, or sample size; given any two, the third can be calculated 1. Experiment detected ES=0.45 with n=20 subjects; what is the power? 2. If we have 20 subjects and a power of 0.85, what ES can we detect? 3. An ES of 0.45 would be of scientific interest and we desire a power of 0.85, how many subjects are required? 3
Calculating power The next step is to estimate values needed to conduct a power analysis. Estimating the effect size Power calculations are part art, part science Catch-22: Formulae need means, population variance. But if we knew these values, we wouldn t need to do study! Remember that sample size calculations require estimates and assumptions. While these need to be close to the true values, they do not need to be perfect. Consider an ANOVA with three methods (treatment). xmax xmin ES = σ 1. Use means from previous studies 2. Estimate the mean of each new method. 3. Estimate what will be the largest mean and the smallest mean. 4. Estimate what percent change in the means will be of interest. For example, will a 15% difference be scientifically significance? Estimate the population standard deviation 1. Use values from previous studies, from the control method 2. From previous work, estimate the magnitude of the variance 3. Consider what the maximum and minimum variance values may be, and use the average of these two values 4. Consider the possible range of data values, and estimate range of values σ = 4 4
Standard Effect Sizes Cohen has suggested standard effect sizes ANOVA Correlation Regression Small 0.10 0.10 0.02 Medium 0.25 0.30 0.15 Large 0.40 0.50 0.35 Conducting the analysis: choice of software Minitab GPower Basic tests (t-tests, one proportion, one way ANOVA) Basic tests, two way ANOVA, multiple regression, chi-square Potentially misleading Does not incorporate knowledge about the specific study parameters Cohen urges caution; recommends using only when study specific values are not available SAS PASS Basic tests, two and higher order ANOVA Most extensive: repeated measure, random effects ANOVA, logistic regression, multiple regression, survival analysis Two sample t-test in Minitab G-Power ES entered as differences and standard deviation Enter multiple values 2-Sample t Test Alpha = 0.05 Assumed standard deviation = 2 Sample Target Difference Size Power Actual Power 2.5 12 0.80 0.83 2.5 13 0.85 0.86 2.5 15 0.90 0.91 3.0 9 0.80 0.85 3.0 10 0.85 0.89 3.0 11 0.90 0.92 The sample size is for each group. G-Power a priori, post-hoc, and compromise power More options A priori before conducting the experiment Post-hoc after data analysis; lower than a priori power. Test shows insignificant result; what was the power? Compromise when N is really large or really small (Cohen) Based on researchers view of whether Type I or Type II analysis is more serious. Accuracy versus speed The speed option is fast but inaccurate and the accuracy option is very accurate (up to five significant digits at least). The accuracy option may take a little longer to compute but it usually is only a couple of seconds. Download from: www.psycho.uni-duesseldorf.de/aap/projects/gpower/ 5
One way ANOVA in G-Power Click Tests-> F-test (ANOVA) Calculate effect size in another window Nice option - graphs One way ANOVA in G-Power Cohen s effects G-Power: Calculate effects G-Power: Specify graphs 6
G-Power: Power curves Power Analyses in SAS Power 1.00 0.95 0.90 0.85 0.80 0.75 0.70 0.65 20 30 40 50 60 70 Total Sample Size Several procedures UnifyPow Macro -> Proc Power (v9.1) Many procedures Proc GLMPower Calculates interactions in two-way and higher ANOVA models. SAS Proc GLMPower code data one; input gender $ condition $ level mean @@; datalines; m A 1 6.5 m A 2 4.8 m B 1 7.0 m B 2 5.5 f A 1 4.5 f A 2 3.6 f B 1 4.8 f B 2 5.5 ; run; proc glmpower data=one; class gender condition level; model mean = gender condition level gender*level; power alpha = 0.05 stddev = 2 power = 0.80 0.90 ntotal =.; run; Enter a valid GLM model SAS Proc GLMPower Output The GLMPOWER Procedure Dependent Variable mean Alpha 0.05 Error Standard Deviation 2 Computed N Total Nominal Test Error Actual N Source Power DF DF Power Total gender 0.8 1 67 0.806 72 gender 0.9 1 91 0.905 96 condition 0.8 1 171 0.800 176 condition 0.9 1 235 0.906 240 level 0.8 1 171 0.800 176 level 0.9 1 235 0.906 240 gender*level 0.8 1 227 0.812 232 gender*level 0.9 1 299 0.903 304 7
In Summary Proper planning of a study, including a solid power analysis, is an essential step of a good research study Run the analyses several times, with varying input parameters Remember that you need good estimates, not perfect ones One advantage of being a statistician is that we only need to be right 95% of the time References Cohen, J., Statistical Power Analysis for the Behavioral Sciences, 2 nd ed., New Jersey: Lawrence Erlbaum Associates, 1988. Faul, F. and Erdfelder, E. (1992) GPOWER: A priori, post-hoc and compromise power analysis for MS-DOS [Computer program]. Bonn, FRG: Bonn University, Dept. of Psychology. SCC Workshops (Fall 2006) www.stat.psu.edu/~scc Workshop Name Dates SAS Data Management SAS Introduction to Procedures Overview of Minitab, SPSS (Regression, ANOVA, ANCOVA) EDA, Proc Summary GLM vs. Mixed Categorical Data Analysis Power Analysis 1) September 12 th 2) October10 1) September 19 th & 21 st 2) October 17 th & 19 th 1) September 12 th & 14 th 2) October 10 th & 12 th October 6 th October 6 th October 7 th October 14 th More information on our web site after 8/1/06 8