Sample size and power analysis

Transcription

1 Sample size and power analysis You should have at least 80% power of your study (Anonymous) Sample size conversation Two scenarios A researcher conducted a study comparing the effect of an intervention vs. placebo on reducing body weight, and found 5 lbs reduction among the intervention group with P=0.01. Another researcher conducted a similar study comparing the effect of the same intervention vs. the same placebo on reducing body weight, and found the same 5 lbs reduction with the intervention group but could not claim that the intervention was effective because P=0.35. What do you think the crying researcher did differently from the smiling one? Source: Question (1) to Statistician Question: How can I make my P-value smaller? Enroll as many as you can. Answer (1) to Researcher You almost always need to estimate a required sample size or estimate analytical power given a sample size when you are planning a study. Only exception may be a pilot study (a smaller study to show feasibility, or to collect data to plan a larger study). Through this process, you can avoid wasting your efforts and resources conducting studies that are hopeless to begin with. 1

2 Question (2) to Statistician Question: Can I keep enrolling participants into my study until I observe P<0.05? Answer (2) Researcher: Absolutely NOT Question (3) to Statistician If only I had a cent for every time I was asked How many participants do I need for my study? Answer (3) to Researcher Answer (3) to Researcher The purpose of sample size formulae is not to give an exact number but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection, and to give an estimate to distinguish whether tens, hundreds, or thousands of participants are required It does not seem an easy question, like How much money should I take on my holidays? Williamson et al. (2000) JRSSA 163(1): 5-13 Statistics lecture There is no such thing as a sample size problem. Sample size is but one aspect of study design. When you are asked to help determine the sample size a lot of questions must be asked and answered before you get to that one.you may often end up never discussing sample size because there are other matters that override it in importance. Russell Lenth (2001) sample size is dependent upon not only on the desired power but also the true variability in the population and a specification of a practically significant effect size Question (3) to Statistician Question: How to play with these terms? 2

3 Session Start 1 hour session Ingradiants 1- Δ Type I error: The probability of erroneously rejecting the H0. (Conclude that there is an effect, when in fact there is no effect. Type II error: The probability of erroneously failing to reject the H0. (Conclude that there is no effect, when in fact there is an effect.) Power: The chance of correctly identify H1 (Conclude that there is an effect, when in fact there is an effect) Effect: Significant difference of body weight between intervention and placebo groups Significance level () First type of error : Conclude that there is an effect, when in fact there is no effect., Significance level The level of your test is the probability that you will falsely conclude that the program has an effect, when in fact it does not. So with a level of 5%, you can be 95% confident in the validity of your conclusion that the program had an effect For policy purpose, you want to be very confident in the answer you give: the level will be set fairly low. Common level of : 5%, 10%, 1%. Purpose of power analysis 1-, Power Power analyses need to be conducted to ensure adequate sample size to detect a meaningful effect of your intervention 3

4 Interpretation A power of 80% tells us that, in 80% of the experiments of this sample size conducted in this population, if there is indeed an effect in the population, we will be able to say in our sample that there is an effect with the level of confidence desired. The larger the sample, the larger the power. Common Power used: 80%, 90% Visual concept of power Null Distribution: difference=0. Power= chance of being in the rejection region if the Clinically relevant alternative is true=area to the alternative: right of this line (in yellow) difference=10%. Rejection region. Any value >= 6.5 (0+3.3*1.96) For 5% significance level, one-tail area=2.5% (Z /2 = 1.96) Visual concept of power Power here: P(Z > ) = 3.3 P(Z > 1.06) = 85% Rejection region. Any value >= 6.5 (0+3.3*1.96) Power= chance of being in the rejection region if the alternative is true=area to the right of this line (in yellow) Is power analysis always needed? Needed when: Designing a study Applying for grant Less needed when: Secondary data analysis Pilot study to assess effect A priori power anlysis You want to find how many cases you will need to have a specified amount of power given a specified effect size the criterion of significance to be employed A posteriori power analysis You want to find out what power would be for a specified effect size sample size the criterion of significance to be employed 4

5 Effect size, Effect sizes A descriptive metric that characterizes the standardized difference (in SD units) between the mean of a control group and the mean of a treatment group (intervention) Can also be calculated from correlational data derived from pre-experimental designs or from repeated measures designs Sources of finding effect size Unstandardized effect size On the basis of previous research Meta-Analysis: Reviewing the previous literature and calculating the previously observed effect size (in the same and/or similar situations) Pilot study When no prior studies exist for which one can extrapolate an ES, it is often appropriate to conduct a small study with participants in order to get an initial estimate of the effect size On the basis of theoretical importance Deciding whether a small, medium or large effect is required. USA = s D = 0.01 s GB = s Smallest size that would be clinically meaningful. Standardized effect size Zero effect size The standard deviation captures the variability in the outcome. The more variability, the higher the standard deviation is Control Group = 0.00 Intervention Group The Standardized effect size is the effect size divided by the standard deviation of the outcome Overlapping Distributions = effect size/standard deviation = 0.00 means that the average treatment participant outperformed 0% of the control participants 5

6 Moderate effect size Large effect size = 0.40 = 0.85 Control Group Treatment Group Control Group Intervention Condition = 0.40 means that the average treatment participant outperformed 65% of the control participants = 0.85 means that the average treatment participant outperformed 80% of the control participants Attrition rate Study design Measurement of outcome Attrition rate If study is longitudinal or intervention study need to adjust sample size by attrition rate Get attrition estimates from pilot studies or the literature of studies in the same population Default estimate would be 20% Do power calculation and then adjust sample size Final N=(N from Power estimate)/(1-attrition rate) Example: 20% attrition rate Power analysis yields total sample size of 100 Targeted N=??? Study design Different designs have different power distributions and considerations Regression type design different than 2 x 2 ANOVA Longitudinal vs. cross-sectional designs Some designs harder to find power programs than others - Longitudinal - Nested/clustered designs - Dichotomous and categorical outcomes Keep in mind aim of study and not just design Measurement of outcome Level of measurement of outcome can have some influence on power estimates Differences in means - Ex: Intervention study looking at differences in depression using CESD Differences in proportions - Ex: Intervention study looking at differences in depression dx Power done for primary outcome If several important outcomes, conduct power for all and select sample size so that power is at least.80 for all outcomes 6

7 Factors needed for sample sizes Inter-relationship Power Size of the effect - Study design - Measurement of outcome Significance level desired Attrition Inter-relatioship n Sample size Standard case Sampling P(T) distribution if H 0 is true alpha 0.05 if H A is true Power 1- Δ Effect size Significance level POWER = 1 - Effect Size Increased Decreased if HP(T) 0 is true alpha 0.1 if H A is true if HP(T) 0 is true alpha 0.01 if H A is true POWER = 1 - POWER = 1 - T T 7

8 PAPER SECTION Item Description TITLE & ABSTRACT 1 How participants were allocated to interventions (e.g., "random allocation", "randomized", or "randomly assigned"). INTRODUCTION Background 2 Scientific background and explanation of rationale. METHODS Participants 3 Eligibility criteria for participants and the settings and locations where the data were collected. Interventions 4 Precise details of the interventions intended for each group and how and when they were actually administered. Objectives 5 Specific objectives and hypotheses. Outcomes 6 Clearly defined primary and secondary outcome measures and, when applicable, any methods used to enhance the quality of measurements (e.g., multiple observations, training of assessors). Sample size 7 How sample size was determined and, when applicable, explanation of any interim analyses and stopping rules. Randomization -- Sequence generation Randomization -- Allocation concealment Randomization -- Implementation 8 Method used to generate the random allocation sequence, including details of any restriction (e.g., blocking, stratification). 9 Method used to implement the random allocation sequence (e.g., numbered containers or central telephone), clarifying whether the sequence was concealed until interventions were assigned. 10 Who generated the allocation sequence, who enrolled participants, and who assigned participants to their groups. Blinding (masking) 11 Whether or not participants, those administering the interventions, and those assessing the outcomes were blinded to group assignment. When relevant, how the success of blinding was evaluated. Statistical methods 12 Statistical methods used to compare groups for primary outcome(s); Methods for additional analyses, such as subgroup analyses and adjusted analyses. RESULTS Participant flow 13 Flow of participants through each stage (a diagram is strongly recommended). Specifically, for each group report the numbers of participants randomly assigned, receiving intended treatment, completing the study protocol, and analyzed for the primary outcome. Describe protocol deviations from study as planned, together with reasons. Recruitment 14 Dates defining the periods of recruitment and follow-up. Baseline data 15 Baseline demographic and clinical characteristics of each group. Numbers analyzed 16 Number of participants (denominator) in each group included in each analysis and whether the analysis was by "intention -to-treat". State the results in absolute numbers when feasible (e.g., 10/20, not 50%). Outcomes and estimation 17 For each primary and secondary outcome, a summary of results for each group, and the estimated effect size and its precision (e.g., 95% confidence interval). Ancillary analyses 18 Address multiplicity by reporting any other analyses performed, including subgroup analyses and adjusted analyses, indicating those pre-specified and those exploratory. Adverse events 19 All important adverse events or side effects in each intervention group. DISCUSSION Interpretation 20 Interpretation of the results, taking into account study hypotheses, sources of potential bias or imprecision and the dangers associated with multiplicity of analyses and outcomes. Generalizability 21 Generalizability (external validity) of the trial findings. Overall evidence 22 General interpretation of the results in the context of current evidence. 14/06/2012 Increased n Increased if HP(T) 0 is true alpha 0.05 if H A is true Sampling P(T) distribution if H 0 is true alpha 0.05 if H A is true POWER = 1 - POWER = 1 - T Effect Size T Key points The power of a statistical test is influenced by the Sample size (n) n power Significance level (α) α power Difference (effect) to be detected (Δ) Δ power Variation in the outcome (σ 2 ) σ 2 power Key points What we need Where we get it Significance level This is often conventionally set at 5%. The lower it is, the larger the sample size needed for a given power The mean and the variability of the outcome in the comparison group The effect size that we want to detect From previous surveys conducted in similar settings. The larger the variability is, the larger the sample for a given power What is the smallest effect that should prompt a policy response? The smaller the effect size, the larger a sample size we need for a given power CONSORT 22-point checklist Scientific rationale Patient population Reporting power and sample size Sample size Study designs & methods Patient flow Statistical analysis & results Interpretation 8

9 Reporting power Reality and scientific validity Reality vs. scientific validity Resources Reality: Resources n Scientific validity: Sample size formulae Number of available participants Laboratory resources Diagnostic tests, training program etc. if needed Time you have available Set by funding agency Set by your career trajectory Funds and personnel Example Estimation of sample size comparing 2 group means (independent sample t-test): Comparing post trial values Example and software A Clinician wants to conduct RCT to assess the effect of an intervention to reduce HbA1c level among patients with type 2 diabetes. A pilot data suggests that mean HbA1c level among patients without this intervention is 8.7% with standard deviation of 2.2%. We believe that the intervention will decrease patient s HbA1c level by 1%. A total of 154 patients (77 patients in each group) are needed to achieve 80% power at two-sided 5% significance level. 9

10 Example Select the appropriate statistical test, based on the types of outcome measures. Determine the minimum effect size. For continuous outcomes, estimate the standard deviation. For dichotomous outcomes, estimate the baseline risk or incidence/ prevalence of the event. Set limits for Type I (α) and Type II (β) error. Specify your null hypothesis and alternative hypothesis (1- tailed or 2-tailed). Parameters needed for sample size computation, Significance level = 5% (2 sided) 1-, Power = 80%, Effect size = 1, Variability = 2.2 m, sample size ratio between the two groups = 1 Software G*Power 3 G*Power: PS: WinPepi: Russ Lenth: Epi Info: PASS: Determine the effect size Determine sample size Click on Determine Select n1=n2 for equal sample size Calculate and transfer to main window 154 patients 77 in each group 10

11 Determine power Sample sizes vs. power Achieved 80% power for 154 patients Conclusion Crying researcher understood 80% (CI 70%-90%) what he needs for Better understanding Good knowledge of of study design outcome measure? Good statistical approach Smaller p-value Optimum sample size Greater power to detect a true difference! If in doubt Call Biostatisticians!!!! FCEB (Flinders Centre for Epidemiology and Biostatistics) Discipline of General Practice Level 3 Health Sciences Building Flinders Medical Centre Don t miss FCEB Launch!!!! 3:00 PM today Rooms , Health Science Lecture Theatre Complex 11