Statistics for Clinicians. 7: Sample size

Similar documents
Determining the sample size

5: Introduction to Estimation

Hypothesis testing. Null and alternative hypotheses

1. C. The formula for the confidence interval for a population mean is: x t, which was

Confidence Intervals for One Mean

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Practice Problems for Test 3

I. Chi-squared Distributions

One-sample test of proportions

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Case Study. Normal and t Distributions. Density Plot. Normal Distributions


Statistical inference: example 1. Inferential Statistics

Output Analysis (2, Chapters 10 &11 Law)

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Lesson 17 Pearson s Correlation Coefficient

Confidence intervals and hypothesis tests

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

PSYCHOLOGICAL STATISTICS

Quadrat Sampling in Population Ecology

Lesson 15 ANOVA (analysis of variance)

1 Computing the Standard Deviation of Sample Means

1 Correlation and Regression Analysis

CHAPTER 3 THE TIME VALUE OF MONEY

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Chapter 14 Nonparametric Statistics

Math C067 Sampling Distributions

Chapter 7: Confidence Interval and Sample Size

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

Confidence Intervals

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Normal Distribution.

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Sequences and Series

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

Hypergeometric Distributions

Tradigms of Astundithi and Toyota

Maximum Likelihood Estimators.

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Properties of MLE: consistency, asymptotic normality. Fisher information.

CHAPTER 3 DIGITAL CODING OF SIGNALS

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Measures of Spread and Boxplots Discrete Math, Section 9.4

LECTURE 13: Cross-validation

Page 1. Real Options for Engineering Systems. What are we up to? Today s agenda. J1: Real Options for Engineering Systems. Richard de Neufville

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Chapter 7 Methods of Finding Estimators

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Amendments to employer debt Regulations

Convexity, Inequalities, and Norms

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

A Guide to the Pricing Conventions of SFE Interest Rate Products

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Descriptive Statistics

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Incremental calculation of weighted mean and variance

Now here is the important step

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Simple Annuities Present Value.

A Recursive Formula for Moments of a Binomial Distribution

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

Pre-Suit Collection Strategies

The Stable Marriage Problem

INVESTMENT PERFORMANCE COUNCIL (IPC)

Institute of Actuaries of India Subject CT1 Financial Mathematics

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

A probabilistic proof of a binomial identity

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Section 11.3: The Integral Test

3. Greatest Common Divisor - Least Common Multiple

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Basic Elements of Arithmetic Sequences and Series

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Soving Recurrence Relations

Overview of some probability distributions.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Sampling Distribution And Central Limit Theorem

CHAPTER 11 Financial mathematics

SPC for Software Reliability: Imperfect Software Debugging Model

Using Four Types Of Notches For Comparison Between Chezy s Constant(C) And Manning s Constant (N)

MARTINGALES AND A BASIC APPLICATION

Theorems About Power Series

Mathematical goals. Starting points. Materials required. Time needed

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Transcription:

J. Paediatr. Child Health (2002) 38, 300 304 Statistics for Cliicias 7: Sample size JB CARLIN 1,3 ad LW DOYLE 2,3,4 1 Cliical Epidemiology ad Biostatistics Uit, Murdoch Childre s Research Istitute, Departmets of 2 Obstetrics ad Gyaecology, ad 3 Paediatrics, Uiversity of Melboure, ad 4 Divisio of Newbor Services, Royal Wome s Hospital, Melboure, Australia Studies that are too small or too large both preset problems i cliical ivestigatio. It is ituitive to most readers that a study based o a small umber of patiets has little chace of producig clear-cut coclusios about potetially importat differeces withi or betwee groups. Eve if a differece is observed, the ucertaity aroud the result will be so wide that the study ca be iterpreted to support almost ay cliical iterpretatio. Cosider a radomized trial for the compariso of two drugs, Drug A ad Drug B, i the treatmet of a certai disorder. (As a aside, both drugs eed ot be active oe could be a placebo, as is commo i cotrolled cliical trials). The results of such a trial should be expressed i terms of a primary outcome measure, ad the type of this outcome variable will determie the statistical aalysis. The simplest possibilities are cotiuous ad dichotomous outcomes. Previous articles i this series have discussed the stadard statistical approaches to such aalyses. I the fourth 1 ad fifth 2 articles, we explaied how the differece betwee two treatmet groups ca be preseted with a cofidece iterval (CI), usually 95% CI, that reflects the amout of samplig variability or ucertaity associated with the estimated differece. A rage of results is show schematically i Fig. 1. I the first sceario, the results favour Drug A, but the differece is ot statistically sigificat (the CI for the differece crosses zero) ad it would be difficult to argue that these fidigs o their ow provide covicig support for Drug A s superiority. I the secod sceario, the results favour Drug B, but agai the differece is ot statistically sigificat. I the third ad fifth scearios, the results favour Drug A, ad the differeces are statistically sigificat (CIs do ot cross zero). However, the fidig of statistical sigificace reflects quite differet results i each case. I sceario 3, the size of the differece is larger tha i sceario 1, but the width of its 95% CI is the same. I sceario 5, the size of the differece is the same as i sceario 1, but the width of the 95% CI is arrower. Scearios 1 ad 5 produced the same results, i terms of actual (average) differece betwee the groups, but the evidece from sceario 5 is more covicig: Why? The aswer is that sceario 5 ivolved a larger study. I the fourth ad sixth scearios, the results favour Drug B, ad although the differeces are statistically sigificat i both cases, agai the results tell a rather differet story. Both scearios produced the same poit estimate for the differece, but the evidece from sceario 6 (which must have had a larger sample size) is more defiitive: the CI from sceario 4 suggests that the average advatage of Drug B could be quite small, as the upper ed of the CI comes close to 0. The statistical aalysis illustrated by these scearios ca be uderstood i terms either of precisio of estimatio of the differece betwee groups (usig the CI), or hypothesis testig: is the observed differece large eough (relative to samplig variability) for us to coclude that it may reflect a o-zero true differece? We will see i the rest of this article that approaches to sample size estimatio ca be framed usig either perspective. The hypothesis-testig perspective is the more familiar oe, ad it ivolves the cocept of power (how likely is it that my study will reject the ull hypothesis?). The precisio perspective is simpler, however, ad is the oly approach that is available whe the study has a fudametally descriptive purpose, rather tha compariso betwee two groups. I the fourth article i the series 1 we explaied that statistical sigificace may reflect either a large observed differece betwee groups or a smaller differece accompaied by less variatio withi the groups (small stadard error of the differece), as show by the sigal-to-oise ratio : Fig. 1 Possible outcomes (scearios) from a cliical trial comparig two drugs, Drug A ad Drug B. Correspodece: Associate Professor LW Doyle, Departmet of Obstetrics ad Gyaecology, Uiversity of Melboure, Parkville, Victoria 3010, Australia, Fax: +61 3 9347 1761; email: lwd@uimelb.edu.au Accepted for publicatio 24 Jauary 2002.

Statistics for cliicias 301 Test statistic ----------------------------------------------------------------------------- Sigal = --------------------------------------------------------------- Sigal (Variatio withi Group{s})/ Variatio withi Group{s} where is the sample size (ad meas is proportioal to ). A higher value of the test statistic (higher sigal-to-oise ratio) ca always be obtaied by icreasig the value of, other thigs beig equal. It is clear that the more variatio withi groups (the more oise ), the larger the sample size will eed to be i order to produce the same result. Also, the smaller the differece that is beig studied ( sigal ), the larger the sample size will eed to be. The reaso that a wide rage of coclusios is possible from scearios 1 ad 2 is that the observed differece betwee the two groups was small relative to the stadard error of the differece. This could correspod to a observed differece of a magitude that would be cliically importat, ad if this were so the ivestigator would regret ot havig performed a larger study i order to limit the play of chace that has redered the curret results icoclusive. Studies that are too large are also problematic for at least two reasos. First, sice the stadard error of ay observed differece falls with icreasig sample size, a small differece that is cliically uimportat will evetually become statistically sigificat if the sample size is large eough. Secod, studies that are too large are iefficiet if essetially the same aswer could have bee obtaied o fewer subjects. Both of these reasos ca be thought of as problems of excessive precisio. O the other had, large sample sizes might be ecessary to detect ucommo but cliically importat outcomes, such as the rare side-effects of a drug. Studies that are either too small or too large ca both be uethical. If a study is too small to aswer the research questio adequately, it is uethical because it requires subjects to udergo test procedures without the prospect that these will lead to solid coclusios. If a study is too large ad a defiitive aswer could be obtaied with smaller umbers, it is uethical to cotiue to expose trial patiets to kow iferior treatmets. Clearly, studies that are too large ca also be wasteful, because cliical trials ad other types of ivestigatios ca be very expesive. How do we decide whether the sample size of a proposed study is large eough? The basic priciple to remember, as discussed, is that statistical oise decreases as the sample size icreases, as embodied i the fact that the stadard error (SE) of ay estimated quatity is proportioal to the iverse of the square-root of sample size: 1 SE ------. I other words, the more data, the more precisio there is i the results of the study (the arrower the CI). I plaig a study we should aim to reduce the ucertaity associated with our results to a acceptable level. There ca be problems, however, i decidig what is acceptable, as this is a substative cliical issue ad caot be determied statistically. PRECISION-BASED SAMPLE SIZE CALCULATION The simplest applicatio of the reduce ucertaity priciple is i the estimatio of cofidece itervals for quatities such as proportios or meas, as might arise i descriptive studies; for example, what is the prevalece of a certai disease? What is the average value of a certai variable? The aim of this sort of study is ot to test a hypothesis but to provide a estimate of a populatio parameter of iterest. The precisio of estimatio is usually quatified by a 95% CI, but other cofidece levels, such as 99%, are sometimes specified. We ca calculate the sample size to esure that the width of the CI is likely to be less tha a certai amout, which we will deote by w. This quatity is chose to represet the maximum ucertaity that the ivestigator regards as acceptable for the coclusios of the study. The appropriate formula is easy to obtai by ivertig the formula for the SE, o which the CI is based. Recallig from article six 3 that, for a proportio π, the SE of estimatio is π( 1 π) SE = --------------------, ad that a 95% CI is obtaied as estimate ± 1.96 SE, we would like to have: π( 1 π) 2 1.96 -------------------- < w. Simple algebraic maipulatio shows that this implies 4 1.96 2 π( 1 π) > -------------------- (1) w 2. As a example, suppose we wat to kow how may voters to survey i order to estimate the proportio (or percetage) that prefers oe party to withi a 1% margi of ucertaity (i.e. w = 0.02, assumig a 95% CI is iteded). To calculate the sample size usig formula 1, we apparetly eed to kow the very thig we are desigig the study to estimate, that is, the proportio, π. This is a geeral feature of sample size estimatio: we ivariably eed to plug i reasoable guesses or estimates for some of the mai ukow parameters that are ivolved i the study. I this example, a reasoable estimate for a two-party poll is 50%, or π = 0.5, which is also the most coservative estimate sice the value of π(1-π) is greatest whe π = 0.5, but is smaller as π approaches either 1 or 0. The result is as follows: > 4 1.96 2 0.5( 1 0.5) ---------------------------- = 9604, 0.02 2 implyig that at least 9604 voters would eed to be surveyed. (As a aside, you may be aware that opiio polls are ivariably based o smaller samples tha this, explaiig why whe ewspapers quote error margis for such polls they are ever as tight as ± 1%. This calculatio is oly a approximatio for real opiio polls, as these do ot use simple radom samples, but the approximatio is ofte reasoable.) There is a similar formula for determiig the sample size that will allow the mea of a cotiuous outcome variable to be estimated with a specified precisio. We follow the same logic as above, recallig (article three 4 ) that the SE of a mea is σ/, where σ is the stadard deviatio of the outcome measure i the populatio. I this case the SE is idepedet of the parameter we wish to estimate, the mea µ. Agai deotig the maximum acceptable 95% CI width (degree of precisio) as w, the same argumet as before leads to the formula: (2) > 4 1.96 2 --- σ w 2

302 JB Carli ad LW Doyle As a example, suppose we wat to kow how may 5-yearold boys to measure such that we ca estimate their mea height to withi 1 cm either side of the populatio mea, with 95% cofidece (i.e. w = 2). To calculate the sample size, we eed a estimate of the stadard deviatio, σ, of height i 5-years-old boys, which might be 7 cm. The result is the: 4 1.96 2 7 -- 2 2 > = 188.2. Therefore at least 189 boys of 5 years of age would eed to be measured. As a aside, we should strictly oly use formula 2 whe the populatio SD, σ, is kow, which is usually ot the case. Whe σ is ukow, the CI will be obtaied usig the sample value of the SD, s, istead of σ. The resultig loss of precisio is recogised by basig the limits of the CI o the appropriate value from the t-distributio (with -1 degrees of freedom), rather tha the ormal (z-) distributio. A correspodig adjustmet should be made to the sample size calculatio, but i practice this is ot a major problem, as the absolute icremet i the sample size computed with t istead of z is o more tha three for ay sample size greater tha eight. Remember that studies with sample sizes as small as this are ot likely to produce statistical sigificace uless the true effect is very large; eve whe they do, the results are ofte regarded with suspicio, because of the likelihood that chace has played a major role. To sum up, the essetial elemets for the simple precisiobased sample size calculatios preseted here are: 1. a focussed research questio, expressed as a aim to estimate a proportio or mea of a particular outcome; 2. rough estimates of either the proportio itself (π) for a dichotomous variable or the amout of variatio (σ) for a cotiuous variable; ad 3. specificatio of a maximum acceptable level of ucertaity, meaig the width of a 95% CI, i the results (w). SAMPLE SIZE BASED ON HYPOTHESIS TESTING AND POWER The domiat approach to sample size estimatio for research that aims to make explicit comparisos betwee groups is based o hypothesis testig rather tha precisio of estimatio. To explai this approach we eed to retur to the topic of hypothesis testig ad discuss the cocepts of Type I (or α ) ad Type II (or β ) error. Our previous discussio of hypothesis testig (fourth article 1 ) focussed o the calculatio of the P value, ad its iterpretatio after completig the study as a idex of surprise. The smaller the P value, the less plausible the ull hypothesis. More formally, it is said that the ull hypothesis ca be rejected if the P value falls below a critical sigificace level such as 0.05 or 0.01. Table 1 Possible coclusios from a study i relatio to the coceptual dichotomy uderlyig the hypothesis test, givig rise to four possible results whe a study has bee completed Study: a statistically sigificat Truth: a real differece exists differece is foud Yes No Yes correct Type I (α) error No Type II (β) error correct Sample size calculatios ca be based o these ideas by figurig out the probability that the plaed study will produce a P value that falls below a chose sigificace level. If the ull hypothesis is true, the chace of this happeig is exactly the sigificace level: 5% of studies where there is o real differece will produce a P value that is less tha 0.05. Similarly, 1% of such studies will produce P <0.01. This outcome, producig a sigificat P value whe there is o real differece, is called a Type I error, ad we have just see that the probability of this happeig is equal to the sigificace level. What is the probability of producig a sigificat P value whe the ull hypothesis is false? This is called the power of the study, but ufortuately it is a slippery cocept because it depeds o what particular alterative hypothesis is true, that is, o the size of the true differece. A Type II error is said to occur if a study fails to produce a statistically sigificat differece whe the ull hypothesis is false, so the chace of a Type II error is oe mius the power of the study. To recap, wheever a cliical trial or other type of study comparig two groups is completed, the study will (formally) coclude either that the observed differece betwee the groups is statistically sigificat or it is ot. The truth may be that the two groups are actually differet or that they are ot. Thus, there are the four possible outcomes show i Table 1. The logic of hypothesis testig says we should cosider both possible worlds: the world of the ull hypothesis, i which we ru the risk of Type I error, ad the world of the alterative hypothesis, i which we ru the risk of Type II error. Icreasig the sample size will reduce the risks of either type of error. I plaig a study, the covetioal approach is to fix the Type I error rate at either 0.05 or 0.01, ad the to determie a sample size that will limit the Type II error rate to a acceptable level, usually either 0.1 or 0.2 (correspodig to power of 90% or 80%, respectively). The cocepts of Type I ad Type II error are difficult ad cause a great deal of cofusio, especially whe applied after a study has bee completed. Oce your study is doe, it may be the case that oe of these errors has occurred, but you will ever kow except occasioally whe later studies succeed i makig quite clear what the true differece really is! A Type II error is ofte suspected whe a reasoably large ad potetially importat differece has bee obtaied but the result is ot statistically sigificat. This might, for example, be the case with scearios 1 or 2 i Fig. 1 where Drug A might be superior, or Drug B might be superior, respectively. It must be remembered that the absece of a statistically sigificat differece i a study does ot mea that there is o true differece. This situatio arises most frequetly i small studies ad is prevalet i the medical literature. Returig to methods for determiig sample size, wheever we (cliicia or statisticia) estimate the power of a proposed study, we make this estimate for a assumed alterative hypothesis. This ivolves decidig o what is ofte called the differece that we would like to detect. Although the laguage of detectig effects is ofte used, a more useful way of thikig about the effect size or differece value to use i your power calculatio may be i terms of a differece that you are prepared to miss. If you decide that the study oly eeds to have high power for a large differece betwee groups, you are sayig that if the true differece is smaller, the you wo t be disappoited if the study produces a icoclusive fidig. Arrivig at a defesible choice of effect size, differece to be detected or miimum differece you ca afford to miss is

Statistics for cliicias 303 usually the most difficult part of a sample size calculatio. To some cliicias surprise, it is a task that statisticias caot perform for you (although they may sometimes be able to help!). The choice must deped o the outcome of the study, ad if there are multiple outcomes it will usually be ecessary to decide which is the most importat ad base the sample size o that oe. For example, i a cliical trial of a therapy such as dexamethasoe to reduce vetilator depedece i tiy babies, a small icrease i survival rates might be cosidered cliically more importat tha the observatio that a much higher proportio were able to be weaed off assisted vetilatio. Oe way to thik of a appropriate effect size for a ew treatmet is ofte i terms of a cliically importat differece : what is the smallest differece below which cliical maagemet would ot chage, but above which the ew treatmet might be itroduced? (Of course, other cosideratios, such as availability, might limit whether the drug could be itroduced ito cliical practice, regardless of the results of cliical trials.) I priciple, the cliically importat differece should be determied by cosiderig the gais i health outcomes from the treatmet, balaced by its costs, side-effects, ad other egative aspects, such as icoveiece. The more costly ad the more detrimetal are the side-effects, the greater would eed to be the gais i health outcomes. I thikig about this issue, it is importat to remember that the differece beig cosidered is a average differece across all patiets, ot the differece that you might expect (or like!) to see i a patiet for whom the treatmet appears to work. It should ot be surprisig that calculatio of the sample size required to provide a specified power ivolves ot oly specificatio of the sigal of iterest (miimum differece ot to be missed), but also a assumptio that determies the oise i the test statistic. If the compariso is based o the meas of cotiuous outcome variables, the oise is determied by the stadard deviatio (σ) of the outcome i the populatio. If the compariso is based o proportios (dichotomous outcomes), the oise is determied by the proportios themselves, so rough estimates of the evet rates i both arms of the study will be eeded (ot just a assumptio about the differece betwee them). The calculatio itself ca be represeted diagrammatically as i Fig. 2. You may recall from the third article i the series 4 that the samplig distributio of a summary statistic from a sample obtaied radomly from its uderlyig populatio geerally follows a ormal probability desity. Uder the ull hypothesis (H 0 ) of o differece betwee groups, its expected value (mea) is represeted i the figure by δ 0 (if the variable is cotiuous, this would be the differece i populatio meas; for a dichotomous outcome, it might be the differece i proportios). Uder the alterative hypothesis (H 1 ) of a true differece betwee the two groups, the mea is δ 1. Whe we are plaig our study, we ca figure out the value o the x-axis of the figure that marks out a regio of large values of the test statistic that has a probability α uder the ull hypothesis: this is called the critical value. The Type I (α) error is the probability to the right of the critical value uder the H 0 distributio. The Type II (β) error is the calculated uder the alterative hypothesis, as the probability that the test statistic will fall to the left of the critical value (ad so be foud to be o-sigificat ). For the most commoly used simple comparisos betwee meas ad proportios it is relatively easy to work out how large should be i order to restrict β to a desired level, give the covetioal α = 0.05 or 0.01. The Fig. 2 Probability desity fuctios for the test statistic uder both the ull hypothesis (H 0 : δ = δ 0 ) ad the alterative hypothesis (H 1 : δ = δ 1 ). Usually δ 0 = 0, represetig a ull hypothesis of o differece (whether measured as differece i meas, differece i proportios or somethig else) betwee two groups. The critical value (c.v), the α (Type I) ad the β (Type II) error probabilities are show. formulae are, however, sufficietly complex that, as with most statistical calculatios these days, oe would geerally use specialized software to compute them. The two most commoly used formulae are preseted i Appedix I. Whe the formulae i Appedix I are examied, ad ideed i the verbal descriptio above, we ca see that there are several importat igrediets required for a power-based calculatio of sample size for the differece betwee two groups: 1. Estimate of variatio i the outcome ( oise ). For a cotiuous variable this will be a estimate of the stadard deviatio, σ, ad for a dichotomous variable it will be a estimate of the proportio or rate π i the baselie compariso group. This estimate will ofte be made from previous data or a small pilot study. 2. The effect size or smallest cliically importat differece (δ): as discussed this should be determied by the ivestigator s aims ad outcome measure (ad previous studies are geerally of little relevace). 3. The Type I error rate (typically 0.05), traslated to the correspodig value o the stadard ormal curve, which assumig a two-tailed test is usually writte Z α/2. 4. The level of power desired (typically 80% or 90%), traslated to a (oe-sided) ormal deviate Z β. As a example of a sample size calculatio for a compariso betwee two groups based o a dichotomous outcome, suppose we pla to coduct a trial i which the evet rate i the cotrol group is expected to be 50%, ad we hypothesize that it will icrease to 60% i the active group. We are therefore assumig that the miimum betwee-group differece of cliical importace (δ) is 0.1, π 0 is 0.5, π 1 is 0.6, Z α/2 is 1.96, ad Z β is 0.84 for 80% power. Solvig the formula i Appedix I gives = 386.9, or 387 i oe group, ad a total sample size of 774. It is worth re-emphasisig that this result is drive almost etirely by the assumptio about δ. I proceedig with this trial, the ivestigator would eed to be comfortable with the fact that if the true effect of the ew therapy were to produce a icrease i evet rate of less tha 0.1, the the trial would ot be likely to produce a statistically sigificat result. They must be willig to accept this risk, that is i effect to bak o the true effect beig at least as large as 0.1.

304 JB Carli ad LW Doyle As a example of a sample size calculatio for a compariso betwee two groups usig a cotiuous outcome, suppose we are plaig a trial i which the stadard deviatio of the primary outcome (cliically most importat variable) i each group is 5, ad we hypothesize a icrease i the mea of this variable of 2.5 uits, or 0.5 SD. We are thus settig the value of δ at 2.5, or 0.5 SD uits. If agai we use the covetioal 2-sided α = 0.05 ad power of 80%, the Z α/2 is 1.96 ad Z β is 0.84. Solvig the formula i Appedix I gives = 62.7, or 63 i each group, ad a total sample size of 126. From the shorteed versio of the formula ( Sixtee S-squared over D-squared ) 5 = 64, ad the total sample size is 128. I the ext article i the series we discuss some oparametric tests used for makig comparisos where the assumptios required for a t-test caot be made. REFERENCES 1 Carli JB, Doyle LW. Statistics for cliicias 4: Basic cocepts of statistical reasoig: hypothesis tests ad the t-test. J. Paediatr. Child Health 2001; 37: 72 7. 2 Carli JB, Doyle LW. Statistics for cliicias 5: Comparig proportios usig the chi-squared test. J. Paediatr. Child Health 2001; 37: 392 4. 3 Carli JB, Doyle LW. Statistics for cliicias 6: Compariso of meas ad proportios usig cofidece itervals. J. Paediatr. Child Health 2001; 37: 583 6. 4 Carli JB, Doyle LW. Statistics for cliicias 3: Basic cocepts of statistical reasoig: Stadard errors ad cofidece itervals. J. Paediatr. Child Health 2000; 36: 502 5. 5 Lehr R. Sixtee S-squared over D-squared: a relatio for crude sample size estimates. Stat. Med. 1992; 11: 1099 102. 6 Armitage P, Berry G. Statistical Methods i Medical Research, 3rd ed. Blackwell Sciece, Oxford, 1994, 195 206. APPENDIX I. Sample size formulae for a differece betwee two groups Comparig proportios There is actually a rather large umber of differet formulae for this calculatio, all givig slightly differet aswers. We give oe of the simplest ad most widely accepted. 6 Assumig equal sample sizes () i each group (the formula ca be modified to allow uequal groups), the umber i each group is: 2 Z α/2 2π( 1 π) + Z β π 0 ( 1 π 0 ) + π 1 ( 1 π 1 ) = ------------------------------------------------------------------------------------------------------------------, δ where δ is the effect size or smallest group differece of iterest betwee the populatio proportios i the two groups, π 0 ad π 1 are the true proportios ad π is their average: π = (π 0 + π 1 )/2. As discussed above, to use this formula the ivestigator must come up with a estimate of oe of the true rates (usually the expected rate i the cotrol group, π 0 ), as well as the differece of iterest, δ. If evet rates caot be reasoably estimated the the most coservative approach is to assume a value for π of 0.5, sice this will lead to the largest possible sample size. The other values i the formula, Z α/2 ad Z β, are take from the stadard ormal distributio ad are determied by the desired Type I (α) ad Type II (β) error rates. The most commo choices are Z α/2 = 1.96 (Type I error 0.05, 2-sided test) ad Z β = 0.84 (Type II error 0.2; power 80%). Note that the total sample size for the study will be 2. Comparig meas The stadard formula for sample size whe comparig meas betwee two groups, usig a ormal distributio assumptio, is relatively simple: 2 Z 2 ( α/2 + Z β )σ = -------------------------------, δ where δ is the effect size (differece betwee true meas) of iterest, ad σ is the stadard deviatio withi each populatio. Agai we assume that equal group sizes are required, ad we also assume that the stadard deviatio i each populatio is the same. As for the cofidece iterval calculatio, this formula igores additioal ucertaity due to the eed to estimate σ (which implies that the t-distributio rather tha the ormal should be used); the more complex calculatio is available i various software packages ad produces sample sizes with oe or two additioal subjects per group. For the typical values Z α/ 2 = 1.96 ad Z β = 0.84, the formula above simplifies approximately to: = 16 ----- σ2, δ 2 or Sixtee S-squared over D-squared. 5 Agai, ote that the total sample size for the study will be 2. Fially, it is worth otig that the sample size formula for comparig meas oly depeds o the ratio of δ to σ, or i other words o the differece i meas expressed i uits of stadard deviatios. This quatity is sometimes called the stadardized effect size (ideed, cofusigly, i some cotexts the word stadardized gets dropped). Framig a sample size calculatio i terms of stadardized effect size allows the same calculatio to be used for multiple outcomes, ad is sometimes used where the scale of measuremet (e.g. a psychometric assessmet) has little itrisic iterpretability.