Chris Slaughter, DrPH. GI Research Conference June 19, 2008

Size: px

Start display at page:

Download "Chris Slaughter, DrPH. GI Research Conference June 19, 2008"

Elijah Flynn
7 years ago
Views:

1 Chris Slaughter, DrPH Assistant Professor, Department of Biostatistics Vanderbilt University School of Medicine GI Research Conference June 19, 2008

2 Outline Factors that Impact Power Conclusions and Advice

3 First question asked (and the last answered) How many subjects do I need in my study? If I enroll # subjects in a treatment and control group, how likely am I to detect a significant difference between the two groups? depends on Scientific goals Study design Analysis method Practical limitations: budget, time

4 Expectations What to expect from a sample size calculation Estimate of the approximate number of subjects for a given study design Conduct early at design phase when changes still possible Opportunity to plan data analysis before collecting any data What not to expect High accuracy if inputs (informed guesses) are not accurate A quick answer Post-hoc power analysis

5 Hypothesis Testing Hypothesis: usually a statement to be judged of the form population value = specified constant Null hypothesis (H 0 ) Usually a hypothesis of no effect H 0 is often a straw man; something you hope to disprove H 0 : µ 1 µ 2 = 0 Alternative hypothesis (H 1 ) H 1 : µ 1 µ 2 0 sample size calculation require you specify the alternative hypothesis too; e.g. H 1 : µ 1 µ 2 = 10

6 s in Hypothesis Testing Type I error (α) Prob. of rejecting your null hypothesis when it is true Declaring that a significant association exists between X and Y when, in truth, X and Y are not related α = 0.05 or 0.01 usually Type II error (β) Prob. of failing to reject your null hypothesis when it is false Not finding a significant association exists between X and Y when, in truth, X and Y are related Power = 1 β β = 0.20 or 0.10 usually

7 More Effect size: How large of a difference you expect to see between groups (e.g. a treatments and control group) Difference in means, difference in proportions, odds ratios, relative risk What is a clinically relevant difference? Precision Absence of random error Variable has nearly the same value when measured multiple times High precision leads to decreased variability and higher power Accuracy Degree to which a variable accurately measures what it is supposed to measure Increases validity of conclusions

8 Effect Size and Precision

9 Power when Allow larger type I error (α; tradeoff between type I and II errors) Larger effect observed Variability n (and n1 n 2 = 1) Required sample size (n) Allow larger type I error Larger effect observed Variability Allow larger type II error (power )

10 Power versus

11 Types of outcomes and predictors Specific power calculation will depend on the analysis method Continuous outcome, binary predictor Percent of time below ph 4 in a treatment and control group 2-sample t-test, Wilcoxon rank sum test Binary outcome, binary predictor Any improvement (yes/no) in the steroid group compared to the steroid plus dilation group Dichotomize percent of time below ph 4 χ 2 test, test of proportions, odds ratio Continuous outcome, continuous predictor Correlation, linear regression Lots of other analysis options...

12 Calculation Methods Software: PS, web, others biostat.mc.vanderbilt.edu/powersamplesize #1 and #4 on google search Formulas Repeated measures Unusual designs Simulation Study-specific

13 Example: Percent of time below ph 4 Continuous outcome, binary predictor Treatment group spends 40% of time below ph 4 Control group spends 50% of time below ph 4 Standard deviation of 10%

14 Example: Dichotomize percent of time below ph 4 Binary outcome, binary predictor Abnormal if more than half of time is spent below ph 4 Treatment group: 16% Abnormal Control group: 50% Abnormal Standard deviation determined by above percentages

15 Comparison of binary and continuous outcomes Same data assumptions Treatment: 40% of time below ph 4 (σ = 10%) would give 16% Abnormal Control: 50% of time below ph 4 (σ = 10%) would give 50% Abnormal Number of subjects needed (each group) Continuous outcome: 22 subjects Binary outcome: 38 subjects Need estimate of the variability for continuous outcomes For binary outcomes, variability is largest for p = 50%

16 and Goal: Plan a study so that the margin of error is sufficiently small The margin of error is defined to be half of the confidence interval width Basing the sample size calculations on the margin of error can lead to a study that gives scientifically relevant results even if the results are not statistically significant.

17 Example Infection rate in a population is 50% and a reduction to 40% is believed to clinically significant Enroll enough subjects so that the margin of error is 5%. Consider these two possible outcomes: 1 The new treatment is found to decrease infections by 6% (95% CI: [11%, 1%]). P-value < 0.05 ( significant ) 2 The new treatment decreases infections by only 4% (95% CI: [9%, 1%]). P-value > 0.05 ( not significant )

18 Advantages Advantages of planning for precision rather than power 1 Many studies are powered to detect a miracle and nothing less; if a miracle doesn t happen, the study provides no information Planning on the basis of precision will allow the resulting study to be interpreted if the P-value is large, because the confidence interval will not be so wide as to include both clinically significant improvement and clinically significant worsening See Borenstein M: J Clin Epi 1994; 47:

19 Using Correlation (r) to Compute Continuous outcomes, continuous predictors Without knowledge of population variances, etc., r can be useful for planning studies Choose n so that margin for error (half-width of C.L.) for r is acceptable Precision of r in estimating ρ is generally worst when population correlation is 0 This margin for error is shown in the following figure below

20 Using Correlation (r) to Compute Margin for error (length of longer side of asymmetric 0.95 confidence interval) for r in estimating ρ, when ρ = 0 (solid line) and ρ = 0.5 (dotted line). are based on Fisher s z transformation of r.

21 Other considerations Other factors that can impact required sample size Dropouts (missing data) Correlation: Paired observations or repeated measures Multiple testing and interim analyses Equivalence testing Better analysis options

22 Bad Ideas Do not... Use retrospective power calculations Calculate standardized effect sizes (Cohen) Standardize measure: small, medium, and large effects Ignores important parts of study planning, science

23 Good Ideas Do... Use power calculations prospectively to plan future studies Put science before statistics Design your study to meet scientific goals Clinically important effect sizes Statistics help identify a plan that is effective in meeting scientific goals not the other way around Conduct pilot studies Useful for estimating variance Use continuous variables when possible

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Chapter 7 Notes - Inference for Single Samples You know already for a large sample, you can invoke the CLT so: X N(µ, ). Also for a large sample, you can replace an unknown σ by s. You know how to do a