This work is licesed uder a Creative Commos Attributio-NoCommercial-ShareAlike Licese. Your use of this material costitutes accetace of that licese ad the coditios of use of materials o this site. Coyright 009, The Johs Hokis Uiversity ad Saifuddi Ahmed. All rights reserved. Use of these materials ermitted oly i accordace with licese rights grated. Materials rovided AS IS ; o reresetatios or warraties rovided. User assumes all resosibility for use, ad all liability related thereto, ad must ideedetly review all materials for accuracy ad efficacy. May cotai materials owed by others. User is resosible for obtaiig ermissios for use from third arties as eeded.
Methods i Samle Surveys 40.640 3rd Quarter, 009 Samle Size ad Power Estimatio Saifuddi Ahmed, PHD Biostatistics Deartmet School of Hygiee ad Public Health Johs Hokis Uiversity
Samle size ad Power Whe statisticias are ot makig their lives roducig cofidece itervals ad -values, they are ofte roducig ower calculatios Newso, 00
I laig of a samle survey, a stage is always reached at which a decisio must be made about the size of the samle. The decisio is imortat. Too large a samle imlies a waste of resources, ad too small a samle dimiishes the utility of the results. Cochra, 977
Samle size estimatio: Why? Provides validity of the cliical trials/itervetio studies i fact ay research study, eve residetial electio olls Assures that the iteded study will have a desired ower for correctly detectig a (cliically meaigful) differece of the study etity uder study if such a differece truly exists
Samle size estimatio ONLY two objectives: Measure with a recisio: Precisio aalysis Assure that the differece is correctly detected Power aalysis
First objective: measure with a recisio Wheever we roose to estimate oulatio arameters, such as, oulatio mea, roortio, or total, we eed to estimate with a secified level of recisio We like to secify a samle size that is sufficietly large to esure a high robability that errors of estimatio ca be limited withi desired limits
Stated mathematically: we wat a samle size to esure that we ca estimate a value, say, from a samle which corresods to the oulatio arameter, P. Sice we may ot guaratee that will be exact to P, we allow some error Error is limited to certai extet, that is this error should ot exceed some secified limit, say d.
That is -α 95% It is a commo ractice: α-error 5% We may exress this as: - P ± d, i.e., the differece betwee the estimated ad true P is ot greater tha d (allowable error: margi-of-error) But do we have ay cofidece that we ca get a, that is ot far away from the error of ±d? I other words, we wat some cofidece limits, say 95%, to our error estimate d.
I robability terms, that is, rob {-d - P d} - α I Eglish, we wat our estimated roortio to vary betwee -d to +d, ad we like to lace our cofidece that this will occur with a -α robability.
From our basic statistical course, we kow that we ca costruct a cofidece iterval for by: ± z -α/ *se() where z α deotes a value o the abscissa of a stadard ormal distributio (from a assumtio that the samle elemets are ormally distributed) ad se() σ is the stadard error. ± d ± z α / σ Hece, we relate ± d i robabilities such that: d Z α / σ Z α / ( )
If we square both sides, ) ( d / / α α σ Z Z d ) ( Z ) ( d Z (- ) Z d / / / α α α
For the above examle: (.96) *0.4*0.6 (.0)^ 9. 93
Note that, the samle size requiremet is highest whe 0.5. It is a commo ractice to take 0.5 whe o iformatio is available about for a coservative estimatio of samle size. As a examle, 0.5, d 0. 05 (5% margi-of-error), ad α-error 0.05: (.96) * 0.5* 0.5 (.05)^ 384.6 385 400. di.96^*.5*(-.5)/(.05^) 384.6. di (ivorm(.05/))^*.5*(-.5)/(.05^) 384.4588
Stata. samsi.5.55, (.5) oesamle Estimated samle size for oe-samle comariso of roortio to hyothesized value Test Ho: 0.5000, where is the roortio i the oulatio Assumtios: alha 0.0500 (two-sided) ower 0.5000 alterative 0.5500 Estimated required samle size: 385
Samle Size Estimatio for Relative Differeces If d is relative differece, t ( ( d * ) ) t ( d ) Cosider that 0% chage is relative to.40 i the above examle. The, d 0.4*0.00.04, that is, varies betwee 0.36 to 0.44. Now, (.96) *0.4*0.6 576.4 577 (.0*0.4)^ Note, d is very sesitive for samle size calculatio.
Chage the variace Samle Size for Cotiuous Data t σ d
Sources of variace iformatio: Published studies (Cocers: geograhical, cotextual, time issues exteral validity) Previous studies Pilot studies
Study desig ad samle size Samle size estimatio deeds o the study desig as variace of a estimate deeds o the study desig The variace formula we just used is based o simle radom samlig (SRS) I ractice, SRS strategy is rarely used Be aware of the study desig
Samle Size Uder SRS Without We kow that uder SRSWOR, Relacemet V(y) σ N N So, uder SRSWOR: d t α ( ) N N ( N NP ( ) D + P ) P( P ) where, D d / t α
For cotiuous data, ( N Nσ ) D + σ
Alterative Secificatio (i two-stages): This, say, is estimated uder simle radom samlig with relacemet (SRSWR). Whe samlig is without relacemet, we adjust the by or, ' ' + N + ' N ( is adjusted for fiite oulatio correctio factor, - /N).
Examle: For exercise study, 93 samles are eeded. Say, the oulatio size is 00. Uder SRSWOR: ' ' + N 93 93 + 00 64 + 93 0.465 93.465 63.48 Smaller samle size is eeded whe oulatio size is small, but oosite is ot true
Derivatio (alterative two-stage formula): N N N N N N S N N vs S / ' ' ' ' ' ' ' ' ' ) ( '...... ' + > + > + > + > > > Remember the relatioshi betwee
Samle Size Based o Coefficiet of Variatio I the above, the samle size is derived from a absolute measure of variatio, σ. Coefficiet of variatio (cv) is a relative measure, i which uits of measuremet is caceled by dividig with mea. Coefficiet of variatio is useful for comariso of variables.
Coefficiet of variatio is defied as, C Y S Y y, ad is estimated by c y s y y Coefficiet of variatio (CV) of mea is CV SE s / s y y y So, CV s y For roortio, CV ( ) CV ( )
Cautio about usig coefficiet of variatio (CV) If mea of a variable is close to zero, CV estimate is large ad ustable. Next, cosider CV for biomial variables. For biary variables, the choice of P ad Q-P does ot affect P(-P) estimate, but CV differs. So, the choice of P affects samle size whe CV method is used.
Cost cosideratios for samle size How may samles you may afford to iterview, give the budget costraits? C() cost of takig samles c o fixed cost c cost for each samle iterview the, C() c o + c x Examle: C()$0000 - your budget for survey imlemetatio c o $3000 - costs for iterviewer traiig, questioaire rits, etc c $8.00 - cost for each samle iterview 00003000+8* So, 875
Objective : Issues of Power Calculatio POWER The ower of a test is the robability of rejectig the ull hyothesis if it is icorrect. TRICKS to REMEMBER: R, T: Reject the ull hyothesis if it is true - Tye I error (alha error) { oe stick i R, T} A, F: Accet the ull hyothesis if it is false - Tye II error (beta error) {two sticks i A, F} POWER: - tye II error Power: Reject the ull hyothesis if it is false. Aother way: False Positive (YES istead of NO)? False Negative (NO istead of YES)?
HoLeft Curve, HaRight Curve, Area to right of liepower N 500 diff. alha.05 HoLeft Curve, HaRight Curve, Area to right of liepower N 00 diff. alha.05. 0 - -.5 0.5 x Power.99 HoLeft Curve, HaRight Curve, Area to right of liepower N 50 diff. alha.05.. 0 - -.5 0.5 x Power.8 0 - -.5 0.5 x Power.9
We take ower ito cosideratio whe we test hyotheses. Examle: Cosider followig study questios:. What roortios of regat wome received ateatal care? There is o hyothesis. b) Whether 80% of wome received ateatal care? value. There is a hyothesis: To test that the estimated value is greater, less, or equal to a re-secified c) Do wome i roject (itervetio) area more likely to utilize ateatal care, comared to cotrol area? There is a hyothesis: To test that that P is greater tha P. I terms of hyothesis: Null hyothesis: Ho:PP, i.e., P-P0 Alterative hyothesis: Ha:P > P (oe-sided) Ha:P P (two-sided) i.e., P-P 0
Issues: Oe-sided vs. two-sided tests. Oe-sided: samle size will be smaller. Two-side: samle size will be larger. Always refer "two-sided" - almost a madatory i cliical trials. Why? Ucertaity i kowledge (a riori).
How to icororate "ower" i samle size calculatios?. Proortios: (t α + t ) β d ( ) where is ( + ) / Note: for each grou. Alterative: arcsi Z + Z α β arcsi Why? Arcsi rovides ormal aroximatio to roortio quatities.
For cotiuous variables: ( Z α / + Zβ ) d s
Values of Z -α/ ad Z β corresodig to secified values of sigificace level ad ower Level % 5% 0% Values Two-sided Oe-sided.576.960.645.36.645.8 Power 80% 90% 95% 99% 0.84.8.645.36
How to icororate "ower" i samle size calculatios? a) Proortios: ( z (α / ) + z β ) variace ( of differece[var( ) )] How to estimate variace of differece? σ σ σ + σ σ σ ( ) d Uder the assumtio of ideedece,cov(, ) σ σ 0 If we also assume that var( ) var( ) var( ), i. e., have commo variace σ σ d + σ σ
So, variace commo of assumtio the uder where q z z v differece of variace z z ) / ( ) ( * ) ( ) ( )] ( [ ) ( ) / ( ) / ( + + + β α β α
The samle size formula for testig two roortios uder ideedece without the assumtio of commo variace is the: Note that Fleiss (98) suggested more recise formula: Whe ad is ot equal ad related by a ratio, say by r, the formula is: The fial formula (usig ormal aroximatio with cotiuity correctio [without the correctio, the ower is cosidered low tha exected] with roortios) is: / ) ( )] ( ) ( [ ) ( z z + + β α { } ) / (, ) ( ) ( ) ( ) ( / where z z + + + β α { } / ) ( ) ( ) ( ) ( ) ( r r z r z + + + β α ) ( 4 r r r + + + The STATA has imlemeted this formula i SAMPSI commad.
Stata imlemetatio NO Hyothesis. samsi.5.55, (.5) oesamle Estimated required samle size: 385 Study has a hyothesis, but comarig with a hyothesized value. samsi.5.55, (.8) oesamle Estimated samle size for oe-samle comariso of roortio to hyothesized value. di 783/385.033766. di (.96+.84)^/.96^.040863 Estimated required samle size: 783 Study has a hyothesis, ad comarig betwee two grous. samsi.5.55, (.8) Estimated samle size for two-samle comariso of roortios 605 605
Stata imlemetatio. samsi.5.55, (.8) ocotiuity Estimated samle size for two-samle comariso of roortios Test Ho:, where is the roortio i oulatio ad is the roortio i oulatio Assumtios: alha 0.0500 (two-sided) ower 0.8000 0.5000 0.5500 /.00. di 783* 566 I each grou, samle size is doubled Estimated required samle sizes: 565 565
Power grah i Stata Samle Size ad Power for P.5 ad P.55 ower.8.85.9.95 500 000 500 3000 3500
*Calculate ad lot samle size by ower from.8 to.99 *************************************************************************** args tye clear set obs 0 ge. ge ower. local i 0 while `i' <_N { local i `i' + local j.79 +`i'/00 quietly samsi `' `', (`j') `tye' relace r(n_) i `i' relace owerr(ower) i `i' } oisily list ower grah twoway lie ower, t("samle Size ad Power for P`' ad P`' `tye'") ***************************************************************************** Save the above commads as do file (e.g., samle_grah.do). Execute the above file by: ru samle_grah
Samle size determiatio whe exressed i relative risk I eidemiological studies, ofte the hyothesis is exressed i relative risk or odds ratio, e.g, H0:R. A samle size formula give i Doer (983) for Relative Risk (. 0) is: { Z P ( P ) + Z P { + R P ( + R ) } } α R R β c c /[ P c ( R)] Where P R [ P ( + c R)] / ad R P E / P c
Nothig but the Fleiss formula: { z ( ) + z ( ) + ( )} ε β ( C E E ) E C C where, ( E + C ) / Note, P E R PE PC RP Solutio: Relace all P E with RP C ad aly Fleiss formula How Doer s formula was derived: P(P E +P C )/(RP C +P C )/[P C (R+)]/[P C (+R)]/ P E (-P E )+P C (-P C )RP C (-RP C )+P C (-P C ) RP C -R P C +P C -P C P C (R-R P C +-P C ) P C (+R-P C (+R ) ad, (P C -P E ) (P C -RP C ) [P C (-R)] C
Samle size for odds-ratio (OR) estimates: OR P Q P P ( Q P P P P OR * PQ + OR * P ) OR * P P Q PQ OR * P OR * P + Q OR * P ( P ) OR * P OR * P * P Coveiet to do i two stages:. Estimate P from odds-ratio (OR). Aly roortio method (of Fleiss)
A examle Suose we wat to detect a OR of usig a ratio of : cases to cotrols i a oulatio with exected exosure roortio i o-cases of 0.5 while requirig a α 0.05 ad ower 0.8. How to estimate SS? EiTable calculates m m 65. (Total samle size 330). So, P.5, P (*.5)/(*.5+.75) 0.4 I Stata:. samsi.5.40, (.8) Estimated required samle sizes: 65 65
SAMPLE SIZE determiatio for Logistic Regressio Models Cosider a logistic regressio, log logit( ) α + βx We wat to estimate samle size eeded to achieve certai ower for testig ull hyothesis Ho:β0. Recall that ull hyothesis testig deeds o the variace of β. I logistic regressio, the effect size is exressed as log odds ratio (η). Hsieh(989) suggested the followig formula for oe-sided test: where, [ z + z ex( η / 4) ] ( + ˆ δ ) α β δ [ + ( + η ) ex(5η / 4)] /[ + ex( η /( ˆ η ) / 4)]
Say, you wat to examie the effect size of log odds ratio of.5 log(.5).405465 ~0.4 See, the imlemetatio of formula i STATA:. clear. set obs obs was 0, ow. *Eter "odds-ratio". ge 0.0. ge or.5. ge betalog(or). di beta.405465. ge delta (+(+beta^)*ex(5*beta^/4))/(+ex(-beta^/4)).. di delta.399909.. di " " (.645+.8*ex(- beta^/4))^*(+**delta)/(*beta^) 67.6987 So, 68 ~ 630 Samle Size for Multile logistic Regressio Multile logistic regressio requires larger to detect effects. Let R deote the multile correlatio betwee the ideedet variable of iterest, X, ad the other covariates. The, samle size: ` /(-R ) Say, if R0.5, the ` 630/(-0.5^) 67
Stata s add-o rograms for samle size estimatio STPOWER: Survival studies Samsi_reg: Liear regressio Samclus: Cluster samlig ART: radomized trials with survival time or biary outcome XSAMPSI: Cross-over trials Samlesize: Grahical results MVSAMPSI: multivariate regressio
STUDYSI: Comarative study with biary or time-to-evet outcome SSKAPP: Kaa statistics measure of iterrater aggremet CACLSI: log-rak/biomial test
Additioal toics to be covered Samle allocatio stratified samlig Samle size corrected for desigeffect(deff) Otimal samle size er cluster Samle size for clusters Samle size ad ower for re-ost surveys i rogram evaluatio