SAMPLING NTI Bulletin 2006,42/3&4, 55-62

SAMPLING NTI Bulleti 006,4/3&4, 55-6 Sample size determiatio i health studies VK Chadha * Summary Oe of the most importat factors to cosider i the desig of a itervetio trial is the choice of a appropriate study size. Studies that are too small may fail to detect importat effects o the outcomes of iterest, or may estimate those effects too imprecisely. Studies that are larger tha ecessary are a waste of resources. Statistical methods are available for estimatio of appropriate sample size depedig upo the type of outcome measure, expected disease rates or size of effects, study desig ad the requiremets of cofidece iterval/precisio or power. These cocepts alog with the methods of estimatig sample size i varied situatios are preseted i this article. Key words: Sample size, proportio, mea, cofidece iterval, precisio, ull hypothesis, power, Type I ad Type II errors. Itroductio Oe of the most importat factors to cosider i the desig of a itervetio trial is the choice of a appropriate study size. Studies that are too small may fail to detect importat effects o the outcomes of iterest, or may estimate those effects too imprecisely. Studies that are larger tha ecessary are a waste of resources. Before calculatig sample size oe has to decide o the followig: Study desig Types of outcome measure Guess at likely result Required level of sigificace Required precisio / power The procedures for sample size estimatio provide a rough estimate of the required study size, as they are ofte based o approximate estimates of expected disease rates ad subjective decisios about the size of effects. However, a rough estimate of the ecessary size of a study is geerally all that is eeded. Oe should be familiar with the followig cocepts before embarkig o the estimatio of sample size. Types of outcome measures The statistical methods for sample size determiatio deped o which type of outcome is expected. The 3 most commo types of outcomes i case of surveys / studies / trials are: i. Proportios : For example, i a trial of a ew measles vaccie, a outcome measure of iterest may be the proportio of vacciated subjects who develop high levels of atibodies. ii. Meas : For example, i a trial of a atimalarial itervetio, it may be of iterest to compare the mea packed cell volume (PCV) at the ed of the malaria seaso amog those i the itervetio group ad those i the compariso group. iii. Rates : For example, i a trail of multi-drug therapy for leprosy, the icidece rates of relapse followig treatmet may be compared i the differet study groups uder cosideratio. * Sr. Epidemiologist, Natioal Tuberculosis Istitute, Bagalore 55

For a quatitative variable, it is oly the rough estimates of the proportio, meas or rates that are required for estimatig sample size. Samplig error A estimate of a outcome measure calculated i a itervetio study is subject to samplig error, because it is based o a sample of idividuals ad ot o the whole populatio of iterest. The term does ot mea that the samplig procedure was applied icorrectly, but that whe samplig is used to decide which idividuals are i which group, there will be a elemet of radom variatio i the results. Samplig error is reduced whe the study size is icreased ad vice versa. Cofidece Iterval The methods of statistical iferece allow the ivestigator to draw coclusios about the true value of the outcome measure o the basis of the iformatio i the sample. I geeral, the observed value of the outcome measure gives the best estimate of the true value. I additio, it is useful to have some idicatio of the precisio of this estimate, ad this is doe by attachig a cofidece iterval to the estimate. The cofidece iterval is a rage of plausible values for the true value of the outcome measure. It is covetioal to quote the 95 percet cofidece iterval (also called 95 percet cofidece limits). This is calculated i such a way that there is a 95 percet probability that it icludes the true value. If the outcome measure is a proportio estimated from the sample data as `. The 95 percet cofidece itervals to be preseted here are = ` ± 1.96 x SE, where SE deotes the stadard error of the estimate. Similarly, if the outcome measure is a mea, 95 percet of the values derived from differet samples are expected to fall withi 1.96 stadard deviatios of the mea. Oe of the factors ifluecig the width of the cofidece iterval is the sample size. The larger the sample size, the arrower is the cofidece iterval. The multiplyig factor 1.96 is used whe calculatig the 95 percet cofidece iterval. I some circumstaces, cofidece itervals other tha 95 percet limits may be required ad the values of the multiplyig factor are as uder: Cofidece iterval (%) 90 1.64 95 1.96 99.58 99.9 3.9 Multiplyig factor Cofidece itervals ad their correspodig multiplyig factors, based o the Normal distributio Precisio of effect measures The arrower the cofidece iterval, the greater the precisio of the estimate. Sigificace tests & P value I some istaces, before calculatig a cofidece iterval to idicate a rage of plausible values of the outcome measure of iterest, it may be appropriate to test a specific hypothesis about the outcome measure. I the cotext of a itervetio trial, this will ofte be the hypothesis that there is o true differece betwee the outcomes i the groups uder compariso- ull hypothesis. The objective is thus to assess whether ay observed differece i outcomes betwee the study groups may have occurred just by chace due to samplig error. The methods for testig the ull hypothesis are kow as sigificace tests. The sample data are used to calculate a quatity (called a statistic) which gives a measure of the differece betwee the groups with respect to the outcome(s) of iterest. Oce the statistic has bee calculated, 56

its value is referred to a appropriate set of statistical tables, i order to determie the p value (probability value) or sigificace of the results. For example, suppose a differece i mea PCV of 1.5 percet is observed at the ed of the malaria seaso betwee two groups of idividuals, oe of which was supplied with mosquito-ets. A p-value of 0.03 would idicate that, if ets had o true effect o PCV levels, (if ull hypothesis was true) there would oly be a 3 percet chace of obtaiig a observed differece of 1.5 percet or greater. The smaller the p-value, the less plausible the ull hypothesis seems as a explaatio of the observed data. For example, a p-value of 0.001 meas that the ull hypothesis is highly implausible, ad this ca be iterpreted as very strog evidece of a real differece betwee the groups. O the other had, a p-value of 0.0 meas that a differece of the observed magitude could quite easily have occurred by chace, eve if there was o real differece betwee the groups. Covetioally, p-values of 0.05 ad below have bee regarded as sufficietly low to be take as reasoable evidece agaist the ull hypothesis, ad have bee referred to as idicatig a statistically sigificat differece. While a small p-value ca be iterpreted as evidece for a real differece betwee the groups, a larger o-sigificat p-value must ot be iterpreted as idicatig that there is o differece. It merely idicates that there is isufficiet evidece to reject the ull hypothesis, so that there may be o true differece betwee the groups. Power of study The cocept of power comes ito play whe the focus of the study is to fid out whether a sigificat differece exists betwee the two groups. Because of the variatios resultig from samplig error, we caot always be certai of obtaiig a sigificat result of a study, eve if there is a real differece. It is ecessary to cosider the probability of obtaiig a statistically sigificat result i a trial, ad this probability is called the power of the study. I other words, if the true differece exits, power of the study idicates the probability of fidig a statistically sigificat differece betwee the two groups. Thus a power of 80 percet to detect a differece of a specified size meas that if the study were to be coducted repeatedly, a statistically sigificat result would be obtaied four times out of five if the true differece was really of the specified size. Whe desigig a study, the objective is to esure that the study size is large eough to give high power if the true effect of the itervetio is large eough to be of practical importace. The power of a study depeds o: 1. The value of the true differece betwee the study groups; i other words, the true effect of the itervetio (effect size). The greater the effect, the higher the power to detect the effect as statistically sigificat for a study of a give size.. The study size; The larger the study size, higher is the power. 3. The probability level at which a differece will be regarded as statistically sigificat. The power also depeds o whether a oesided or two sided sigificace test is to be performed ad o the uderlyig variability of the data. Oe sided ad Two sided tests If it is accepted that the ull hypothesis is false, that meas alterate hypothesis is true. For example if the claim is about superiority of a ew drug, this is a oe-sided alterative. If the claim is ot of superiority or iferiority but oly that they are differet, the alterate hypothesis is two sided. This meas that whe 57

the p-value is computed, it measures the probability (if the ull hypothesis is true) of observig a differece as great as that actually observed i either directio (i.e. positive or egative). It is usual to assume that tests are two-sided. Wrogly rejectig a true ull hypothesis is called type I error. The probability of this error is referred as P value as already discussed. The maximum P value allowed i a problem is called the level of sigificace (α). I a diagostic set up, this is the probability of declarig a perso sick whe he is actually ot. I a cliical trial set up, P value is the probability that the drug is declared effective or better whe it is actually ot. Diagosis Disease actually preset No Yes Disease preset Mis-diagosis 4 Disease abset 4 Missed diagosis I a court set up, this correspods to covictig a iocet. Judgmet Assumptio of iocece true false Proouced guilty Serious error 4 Proouced ot guilty 4 error I a cliical trial set up, P value is the probability that the drug is declared effective or better whe it is actually ot. Statistical decisio True Null hypothesis False Rejected Type I error 4 Not rejected 4 Type II error This wrog coclusio ca allow a ieffective drug to be marketed as beig effective. I a court set up, this correspods to covictig a iocet. This clearly is uacceptable ad eeds to be guarded agaist. For this reaso, P value is kept at a low level (<5%). Whe P value is small, it is safe to coclude that groups are differet. This threshold, 0.05 is the level of sigificace. The secod type of error is failure to reject ull hypothesis whe it is actually false. This correspods to missed diagosis as also to proouce a crimial ot guilty. The probability of this error is deoted by β. I a cliical trial set up, this is equivalet to declarig a drug ieffective whe it is actually effective. The complimetary probability of type II error is the statistical power (1-β). Thus the power of a statistical test is a probability of correctly rejectig a ull hypothesis whe it is false. Choice of criterio The choice of which of the above two criteria (precisio or power) should be used i ay particular istace depeds o the objectives of the study. If it is kow uambiguously that the itervetio has some effect, it makes little sese to test the ull hypothesis, ad the objective may be to estimate the magitude of the effect, ad to do this with acceptable precisio. I studies of ew itervetios, it is ofte ot kow whether there will be ay impact at all o the outcomes of iterest. I these circumstaces, it may be sufficiet to esure that there will be good chace of obtaiig a sigificat result if there is ideed a effect of some specified magitude. It should be emphasized, however, that if this course is adopted, the estimates obtaied may be very imprecise. Usually, it is more importat to estimate the effect of the itervetio ad to specify a cofidece iterval aroud the estimate to idicate the likely rage, tha to test a specific hypothesis. Therefore, i may situatios it may be more appropriate to choose the sample size by settig the width of the cofidece iterval, rather tha to rely o power calculatios. 58

Allowaces of losses Losses to follow up occur i most logitudial studies. Idividuals may be lost because of various reasos like refusals, migratio, death from cause urelated to the outcome of iterest, etc. Such losses may produce bias as the idividuals who are lost ofte differ i importat respects from those who remai i the study. The losses also reduce the size of the sample available for aalysis, ad this decreases the precisio or power of the study. For these reasos, it is importat to make every attempt to reduce the umber of losses to a miimum. However, it is rarely possible to avoid losses completely. The reduced power or precisio resultig from losses may be avoided by icreasig the iitial sample size i order to compesate for the expected umber of losses. A 0% allowace is geerally cosidered appropriate. Practical costraits Resources i terms of staff, vehicles, laboratory capacity, time or moey may limit the potetial size of a study, ad it is ofte ecessary to compromise betwee the estimated study-size ad what ca be maaged with the available resource. Tryig to do a study that is beyod the capacity of the available resources is likely to be ufruitful, as data quality is likely to suffer ad the results may be subject to serious bias, or the study may eve collapse completely, thus wastig the effort ad moey that has already bee put ito it. If the calculatios idicate that a study of maageable size will yield power ad/or precisio that is uacceptably low, it is probably better ot to coduct the study at all. Cosequeces of studies those are too small Suppose first that the itervetio uder study has little or o effect o the outcome of iterest. The differece observed i the study is therefore likely to be o-sigificat. However, the width of the cofidece iterval for the effect measure, for example relative risk, will deped upo sample size. If the sample is small the cofidece iterval will be very wide ad eve though it will probably iclude the ull value, it will exted to iclude large values of the effect measure. I other words, the study will have failed to establish that the itervetio has o appreciable effect. I case the itervetio does have a appreciable effect, a study that is too small will have low power i.e it will have little chace of givig a statistically sigificat differece. FORMULAE FOR SAMPLE SIZE ESTIMATION I. Estimatig a populatio proportio: With specified absolute precisio Required iformatio for estimatig the sample size is as uder:- - Aticipated populatio proportio: P, a rough estimate of P is sufficiet. - Desired cofidece level - Absolute Precisio: (d ) - total percetage poits of the error that ca be tolerated o each side of the figure obtaied. For example, if aticipated prevalece of ifectio of TB i a populatio is 10%, we would be satisfied if our sample gives a figure of 8-1%. I this case, d = % ~ 0.0. the followig formula :- Z (1 α P) = (1) d P=aticipated proportio, d=absolute precisio required o either side of the proportio. p ad d are expressed i fractios, z is a costat, its value for a two sided test is 1.96 for 95% 59

cofidece, 1.645 for 90% cofidece ad.576 for 99% cofidece. Example: For a survey aimed at estimatig vacciatio coverage, P is usually aticipated at 0.5, d=0.1 ad the level of sigificace = 5%. The estimated sample size usig the above formula is 96 for radom samplig. The estimated sample size is applicable oly i case of simple radom samplig (SRS). If aother samplig method is used, a larger sample size is likely to be eeded because of desig effect. For cluster samplig strategy, the estimated sample size as above is multiplied by desig effect, which is defied as the ratio of variace obtaied i cluster survey to the variace for the same sample size adoptig SRS. Desig effect = Sample desig var iace var iace usig SRS I most sample surveys adoptig clustersamplig strategy, a desig effect of is take. This meas twice as may idividuals would have to be studied to obtai the same precisio as with SRS. Thus, the estimated sample size i the above example would be 96x=19. However, desig effect should ideally be estimated from previously available data or from pilot studies. II. Estimatig a populatio proportio: With specified relative precisio - Aticipated populatio proportio : P - Cofidece level - Relative Precisio: ε The sample result should fall withi ε % of the true value. followig formula : tuberculous ifectio amog childre of 0-9 years of age i a locality, how may should be icluded i the sample so that prevalece may be obtaied withi 10% of true value with 95% cofidece. The aticipated P is 5-10% P= 0.05 (lower limit is to be take to have larger sample & better precisio) Cofidece level is 95% Relative Precisio (å) = 10% = 0.1 The estimated sample size usig formula is 3457. For cluster samplig the sample size will be 3457x D (desig effect). III. Estimatig the differece betwee two populatio proportios with specified absolute precisio (Two sample situatios)- equal i the two groups - Aticipated populatio proportios = P 1 & P - Cofidece level = 95% - Absolute precisio required o either side of the true value of the differece betwee proportios = d followig formula: Z α [P(1 1 P) 1 + P (1 P )] = (3) d P 1, P = aticipated value of the proportios i the two populatios. Example : What sample size to be selected from each of two groups of people to estimate a risk differece to be withi 3 percetage poits of true differece at 95% cofidece whe aticipated P 1 & P are 40% & 3% respectively. 60 = Z (1 P) α ε P () Example : For estimatig prevalece of (1.96) [(0.4 0.6) + (0.3 0.68)] = = 1953 (0.03) IV. Hypothesis testig for two populatio

proportios - Aticipated values of the populatio proportios: P 1 & P - Level of sigificace - Power of the test: 100 (1-β) % followig formula : Z α P (1 P )] + Zβ P 1(1 P) 1 + P (1 P) = (P P ) 1 Values of Z for 90% power = 1.8 80% power = 0.8 (4) V. Estimatig a populatio mea: With specified absolute precisio - Variace : σ, kow or ca be estimated from a pilot study - Absolute precisio : d formula: Z α σ = (5) d σ = stadard deviatio estimated from a pilot study VI. { Estimatig populatio mea: With specified relative precisio - Populatio mea: µ - Variace : σ - Relative precisio: (ε) { formula : VII. σ Z α 1- = (6) εµ Estimatig differece betwee meas of two populatios with specified precisio. Differece i meas = µ 1 µ Stadard Deviatio (SD) = s 1, s Absolute precisio: d First calculate pooled variace ad estimate sample size usig formula Pooled variace (σ ) = ( 1)s ( 1)s ( 1) + ( 1) 1 1 1 Z [ σ ] α = (7) d Example : Suppose you wat to kow the sample sizes (equal groups) required for detectig a mea differece of 0.3 mg / ml betwee wellourished ad uder-ourished childre. This differece is cosidered to be cliically importat. Suppose the mea ad SD of Hemoglobi (Hb) levels available from a earlier study i a radom sample of well-ourished ad uder-ourished groups were as follows: Well-ourished (group-1) : 1 = 100, x - 1 =10.1 ad SD 1 = 0.9 Uder-ourished (group-) : = 70, x - = 9.7 ad SD = 1.1 The SDs do ot differ too much ad we ca pool them. Thus, Pooled variace (σ ) = (99 x 0.9 +69 x 1.1 ) / (100 + 69 ) = 0.97 1.96 [ (0.97) ] = = 80 (0.3) 61

VIII. Hypothesis testig for two populatio meas Aticipated values of the populatio meas : µ 1 & µ Stadard deviatio: s 1, s Level of sigificace Power of the test: 100 (1-β) % formula: Pooled variace (σ ) = = s σ Z α + Z1 β ( µ µ 1 ) s 1 (8) STUDIES WITH MULTIPLE OUTCOMES I most studies, several differet outcomes are measured. For example, i a study of the efficacy of isecticide-treated mosquito-ets o childhood malaria, there may be iterest i the effect of the itervetio o deaths, deaths attributable to malaria, episodes of cliical malaria over a period of time, splee sizes at the ed of the malaria seaso, packed cell volumes at the ed of the seaso. The ivestigator should first focus attetio o a few key outcomes. Calculate the required study size for each of these key outcomes. The outcomes that result i the largest study size would be used to determie the size. SAMPLE SIZE ESTIMATION FOR OTHER SITUATIONS For more complex study desigs, for example two groups of uequal size, compariso of more tha two groups, icidece studies, ad itervetios allocated to commuities; the methods of sample size estimatio are more complex ad may be referred to i a stadard statistical text book. There are computer programs available that perform sample size calculatios. I particular, this facility is available i the package Epi Ifo, though it does ot cover the full rage of possibilities. SAMPLE SIZE IN QUALITATIVE RESEARCH Sample size i qualitative research for e.g. Kowledge, Attitude ad Practice (KAP) studies, will deped upo expected respose, for e.g. the proportio of doctors usig itermittet regime. Based o the expected respose, the usual method for estimatig sample size ca be employed. However, i geeral assessmet of KAP caot be performed o the basis of a sigle parameter. If we use approach based o proportios, the we eed to calculate sample size for each parameter separately. I such situatios, usually a score is assiged to the correct respose to a item. Thus, a total score for all the correct resposes of each idividual member is obtaied. The total score ca the be treated as a cotiuous or a dichotomous respose for aalysis. Therefore, the usual approach for sample size estimatio viz. estimatig proportio(s) or mea (s) could be employed. 6