Conducting a Power Analysis to Estimate Sample Size & Optimize Research Proposals Module 9 David Nussbaum, Ph.D. May 25, 2010
OUTLINE 2 1. What is Power; Why is it Important; How is it used? 2. Power: A prior and A posteriori versions 3. Logic of Null Hypothesis Significance Testing 4. Limitations of NHST 5. Significance Levels 6. Proper Interpretation of Significant Levels and Significant Results 7. Effect Sizes 8. Using G Power to Calculate Necessary Sample Size for Different Statistics and Designs Dr. David Nussbaum May 25, 2010
What is Power; Why is it Important; How is it Used? Analogy: Use of telescopes to observe stars We can see many stars with our naked eyes and not need a telescope For smaller and more distant stars we need a telescope For very distant and small stars we need more powerful telescopes
What is Power; Why is it Important; How is it Used? For more detailed views of distant stars & small planets, we need more powerful telescopes The costs and benefits of observing or missing can vary (e.g. a potential collision with earth vs. star in another galaxy Thus the required power of the telescope depends on size of the object to be observed, how important is it to observed, etc.
What is Power; Why is it Important; How is it Used? In Statistics: Power reflects how likely it is that a researcher will be able to observe a significant effect in a particular study More Powerful Designs are Afforded by: Greater Effect Size (Effect Size to be explained shortly) Larger Samples (In a sense, analogous to a thicker lens) Less Stringent Significance Levels (Less concern with mistakenly identifying a non-significant effect as significant)
What is Power; Why is it Important; How is it Used? There are Two Types of Power Analysis: A Priori A Posteriori
What is Power; Why is it Important; How is it Used? A Priori: Before one does the study, one wants to know how many observations one needs to be able to observe a significant affect assuming that the groups are not equivalent or the treatment is not ineffective. This is its principal use. This is important for both economical and ethical reasons. Granting agencies will not fund studying more participants than necessary. There is also the ethical issue of testing more. Participants than necessary to show an effect.
What is Power; Why is it Important; How is it Used? A Posteriori: After a study has been concluded, especially if only non-significant effects have been found, one wants to know what was the likelihood of having found a significant effect, given the observed Effect Size, Sample Size and predetermined critical significant level (also known as a or alpha).this information helps one interpret a non-significant results because if there was a low power and a non-significant result was found, it could be a good theory but there were too few observations to have rejected the Null Hypothesis leading to a discussion of the logic of NHST.
Logic of Null Hypothesis Significance Testing P (p (D H O ) states that we valuate the likelihood of the data under the assumption that there is no relationship between the variables in question. The typical case of NHST involves what we have described and Kline (2005, pg. 36) calls the Reject-Support Context. Thus a correct interpretation of a Significant Effect at the 0.05 level is: Less than 5% of test statistics are further away from the mean of the sampling distribution under H O than the one for the observed results (Kline, 2005)
LOGIC OF NULL HYPOTHESIS SIGNIFICANCE TESTING Empirical data cannot logically prove a hypothesis only support or challenge it because one cannot control to exclude all other possible explanations. However, one can logically reject a Null Hypothesis statement that says that two or more variables are unrelated.
LOGIC OF NULL HYPOTHESIS SIGNIFICANCE TESTING 3. There are two types of Null Hypothesis: A. No association exists between two or more variables B. No difference exists between two or more groups 4. Formally, the Null Hypothesis is expressed as: A. R=0 or B. μ 1 = μ 2
LOGIC OF NULL HYPOTHESIS SIGNIFICANCE TESTING 5. Formally, the Alternative Hypothesis is expressed as: A. r 0 B. μ 1 μ 2 6. Data are then gathered and statistics of association (e.g. r) or between group differences (e.g. t,f) are calculated to infer, under specific assumptions, the likelihood of observed data given the Null Hypothesis or formally : p (D H O )
Limitations of NHST: Significance Levels NHST are often misinterpreted. Significance Levels do not tell us: 1. The probability that the result is due to sampling variability (referred to by others as sampling error) i.e., the result happened by chance. 2. The probability that the Null Hypothesis (H o ) is true, given the data. (It is the probability of the data given H o ). 3. If H o is rejected, p = the likelihood that the decision is wrong. In an individual study, the decision is either correct or incorrect
Limitations of NHST: Significance Levels 4. Rejecting H o confirms the research hypothesis behind H 1 Further, and perhaps most important: P levels do NOT relate to the strength or magnitude of the association or differences under study. Thus they are especially vulnerable to high significance levels with small effect sizes because of use of very large samples making even trivial empirical differences unlikely under the null!
Effect Sizes and Their Importance Effect Size refers to: The magnitude of the impact of the independent variable on the dependent variable (in experimental or between groups designs), or The degree of association or covariation between variables in non-experimental or correlational designs.
Effect Sizes and Their Importance These have been referred to as: 1. Standard Group Different Indices & 2. Relationship Indices
Effect Sizes and Their Importance Both Standard Group Different Indices & Relationship Indices are metric-free effect sizes (Kline, 2005, pg. 97) that can be used to compare results across different studies and/or variables measured in different units.
Effect Sizes and Their Importance Standard Group Different Effect Size Indices are of the form: d = M 1 M 2 / σ * where: d = the standardized mean difference M 1 M 2 = the contrast or difference between the means σ * = the standardizer
Effect Sizes and Their Importance: Roles of Between Group Differences and Variability Study M 1 M 2 σ * d 1 75.00 100.00.75 2 11.25 15.00.75 3 75.00 500.00.15 4 75.00 50.00 1.50
Effect Sizes and Their Importance: Roles of Between Group Differences and Variability For Analyses of Variance, a measure of effect size has been developed to reflect the observed proportion of explained variance. This is called the Estimated Eta Squared symbolized as: ή 2 ή 2 = σ effect / σ total This estimates the proportion of total variance in the sample when all factors are fixed.
Effect Sizes and Their Importance: Conventions Regarding ή 2 For ή 2 values: < 0.06 are considered small effect sizes 0.06 0.14 are considered medium effect sizes > 0.14 are considered large effect sizes Note: ή 2 can take advantage of chance associations in a sample and is consequently considered a biased estimator with small sample sizes.
Effect Sizes and Their Importance: Correlations The conventional Pearson Correlation Coefficient (r) is a measure of association between two variables. One statistic reflecting the strength of association between the two variables is the r 2 statistic which is obtained simply by squaring the numerical value of the calculated r to provide the amount of variance in one variable accounted for (or explained ) by the other variable.
Effect Sizes and Their Importance: Correlations Thus an r value of 0.50 results in an r 2 value of 0.25 indicating that the value of one of the variables accounts for 25% of the variability in the other variable. Some contend that this minimizes the value of some important correlational findings and prefer the unsquared r.
Caveat: All Effect Size Statistics involved in Power Calculations that we have covered and will now illustrate refer only to group effects but do not have specific implications for individual applications. Thus they relate primarily to planning a study rather than stipulating that anticipated study results will be applicable to individual cases. Different statistics exist for Case Level purposes
Test Statistics: Sample Size & Effect Size Many test statistic significance levels can be expressed as the product of the sample size and the effect size: Test Significance = f (N) x Effect Size Index Thus one can obtain a highly significant observed statistic (i.e., unlikely under H o ) by having a very large data set.
Test Statistics: Sample Size & Effect Size The corollary is that some stronger associations may not produce significant results under NHST with smaller samples. We now turn to use of shareware/software called G Power to illustrate how (easily) Power Analyses can be used to help determine & justify numbers of participants in proposed studies