Methods in Sample Surveys 140.640 3rd Quarter, 2009

Similar documents
Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Confidence Intervals for One Mean

Hypothesis testing. Null and alternative hypotheses

Confidence Intervals for Two Proportions

1. C. The formula for the confidence interval for a population mean is: x t, which was

One-sample test of proportions

Properties of MLE: consistency, asymptotic normality. Fisher information.

Output Analysis (2, Chapters 10 &11 Law)

Determining the sample size

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

5: Introduction to Estimation

Regression with a Binary Dependent Variable (SW Ch. 11)

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Sampling Distribution And Central Limit Theorem

PSYCHOLOGICAL STATISTICS

Now here is the important step

Math C067 Sampling Distributions

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Practice Problems for Test 3


Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Statistical inference: example 1. Inferential Statistics

Lesson 17 Pearson s Correlation Coefficient

Lesson 15 ANOVA (analysis of variance)

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 7: Confidence Interval and Sample Size

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

1 Computing the Standard Deviation of Sample Means

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

THE TWO-VARIABLE LINEAR REGRESSION MODEL

1 Correlation and Regression Analysis

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Confidence Intervals

Chapter 7 Methods of Finding Estimators

Confidence intervals and hypothesis tests

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Normal Distribution.

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Maximum Likelihood Estimators.

Overview of some probability distributions.

Hypothesis testing using complex survey data

Chapter 14 Nonparametric Statistics

Confidence Intervals (2) QMET103

I. Chi-squared Distributions

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

3 Energy Non-Flow Energy Equation (NFEE) Internal Energy. MECH 225 Engineering Science 2

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Soving Recurrence Relations

A probabilistic proof of a binomial identity

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Quadrat Sampling in Population Ecology

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Measures of Spread and Boxplots Discrete Math, Section 9.4

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Incremental calculation of weighted mean and variance

Confidence Intervals for Linear Regression Slope

Page 1. Real Options for Engineering Systems. What are we up to? Today s agenda. J1: Real Options for Engineering Systems. Richard de Neufville

1. MATHEMATICAL INDUCTION

Government intervention in credit allocation: a collective decision making model. Ruth Ben-Yashar and Miriam Krausz* Bar-Ilan University, Israel

Estimating Probability Distributions by Observing Betting Practices

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

CHAPTER 3 DIGITAL CODING OF SIGNALS

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

CHAPTER 3 THE TIME VALUE OF MONEY

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Baan Service Master Data Management

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

Convexity, Inequalities, and Norms

Research Article Sign Data Derivative Recovery

Basic Elements of Arithmetic Sequences and Series

Solving Logarithms and Exponential Equations

Chapter 5: Basic Linear Regression

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

Section 11.3: The Integral Test

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

This document contains a collection of formulas and constants useful for SPC chart construction. It assumes you are already familiar with SPC.

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

HCL Dynamic Spiking Protocol

LECTURE 13: Cross-validation

Amendments to employer debt Regulations

A Guide to the Pricing Conventions of SFE Interest Rate Products

Hypergeometric Distributions

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

S. Tanny MAT 344 Spring be the minimum number of moves required.

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER?

Transcription:

This work is licesed uder a Creative Commos Attributio-NoCommercial-ShareAlike Licese. Your use of this material costitutes accetace of that licese ad the coditios of use of materials o this site. Coyright 009, The Johs Hokis Uiversity ad Saifuddi Ahmed. All rights reserved. Use of these materials ermitted oly i accordace with licese rights grated. Materials rovided AS IS ; o reresetatios or warraties rovided. User assumes all resosibility for use, ad all liability related thereto, ad must ideedetly review all materials for accuracy ad efficacy. May cotai materials owed by others. User is resosible for obtaiig ermissios for use from third arties as eeded.

Methods i Samle Surveys 40.640 3rd Quarter, 009 Samle Size ad Power Estimatio Saifuddi Ahmed, PHD Biostatistics Deartmet School of Hygiee ad Public Health Johs Hokis Uiversity

Samle size ad Power Whe statisticias are ot makig their lives roducig cofidece itervals ad -values, they are ofte roducig ower calculatios Newso, 00

I laig of a samle survey, a stage is always reached at which a decisio must be made about the size of the samle. The decisio is imortat. Too large a samle imlies a waste of resources, ad too small a samle dimiishes the utility of the results. Cochra, 977

Samle size estimatio: Why? Provides validity of the cliical trials/itervetio studies i fact ay research study, eve residetial electio olls Assures that the iteded study will have a desired ower for correctly detectig a (cliically meaigful) differece of the study etity uder study if such a differece truly exists

Samle size estimatio ONLY two objectives: Measure with a recisio: Precisio aalysis Assure that the differece is correctly detected Power aalysis

First objective: measure with a recisio Wheever we roose to estimate oulatio arameters, such as, oulatio mea, roortio, or total, we eed to estimate with a secified level of recisio We like to secify a samle size that is sufficietly large to esure a high robability that errors of estimatio ca be limited withi desired limits

Stated mathematically: we wat a samle size to esure that we ca estimate a value, say, from a samle which corresods to the oulatio arameter, P. Sice we may ot guaratee that will be exact to P, we allow some error Error is limited to certai extet, that is this error should ot exceed some secified limit, say d.

That is -α 95% It is a commo ractice: α-error 5% We may exress this as: - P ± d, i.e., the differece betwee the estimated ad true P is ot greater tha d (allowable error: margi-of-error) But do we have ay cofidece that we ca get a, that is ot far away from the error of ±d? I other words, we wat some cofidece limits, say 95%, to our error estimate d.

I robability terms, that is, rob {-d - P d} - α I Eglish, we wat our estimated roortio to vary betwee -d to +d, ad we like to lace our cofidece that this will occur with a -α robability.

From our basic statistical course, we kow that we ca costruct a cofidece iterval for by: ± z -α/ *se() where z α deotes a value o the abscissa of a stadard ormal distributio (from a assumtio that the samle elemets are ormally distributed) ad se() σ is the stadard error. ± d ± z α / σ Hece, we relate ± d i robabilities such that: d Z α / σ Z α / ( )

If we square both sides, ) ( d / / α α σ Z Z d ) ( Z ) ( d Z (- ) Z d / / / α α α

For the above examle: (.96) *0.4*0.6 (.0)^ 9. 93

Note that, the samle size requiremet is highest whe 0.5. It is a commo ractice to take 0.5 whe o iformatio is available about for a coservative estimatio of samle size. As a examle, 0.5, d 0. 05 (5% margi-of-error), ad α-error 0.05: (.96) * 0.5* 0.5 (.05)^ 384.6 385 400. di.96^*.5*(-.5)/(.05^) 384.6. di (ivorm(.05/))^*.5*(-.5)/(.05^) 384.4588

Stata. samsi.5.55, (.5) oesamle Estimated samle size for oe-samle comariso of roortio to hyothesized value Test Ho: 0.5000, where is the roortio i the oulatio Assumtios: alha 0.0500 (two-sided) ower 0.5000 alterative 0.5500 Estimated required samle size: 385

Samle Size Estimatio for Relative Differeces If d is relative differece, t ( ( d * ) ) t ( d ) Cosider that 0% chage is relative to.40 i the above examle. The, d 0.4*0.00.04, that is, varies betwee 0.36 to 0.44. Now, (.96) *0.4*0.6 576.4 577 (.0*0.4)^ Note, d is very sesitive for samle size calculatio.

Chage the variace Samle Size for Cotiuous Data t σ d

Sources of variace iformatio: Published studies (Cocers: geograhical, cotextual, time issues exteral validity) Previous studies Pilot studies

Study desig ad samle size Samle size estimatio deeds o the study desig as variace of a estimate deeds o the study desig The variace formula we just used is based o simle radom samlig (SRS) I ractice, SRS strategy is rarely used Be aware of the study desig

Samle Size Uder SRS Without We kow that uder SRSWOR, Relacemet V(y) σ N N So, uder SRSWOR: d t α ( ) N N ( N NP ( ) D + P ) P( P ) where, D d / t α

For cotiuous data, ( N Nσ ) D + σ

Alterative Secificatio (i two-stages): This, say, is estimated uder simle radom samlig with relacemet (SRSWR). Whe samlig is without relacemet, we adjust the by or, ' ' + N + ' N ( is adjusted for fiite oulatio correctio factor, - /N).

Examle: For exercise study, 93 samles are eeded. Say, the oulatio size is 00. Uder SRSWOR: ' ' + N 93 93 + 00 64 + 93 0.465 93.465 63.48 Smaller samle size is eeded whe oulatio size is small, but oosite is ot true

Derivatio (alterative two-stage formula): N N N N N N S N N vs S / ' ' ' ' ' ' ' ' ' ) ( '...... ' + > + > + > + > > > Remember the relatioshi betwee

Samle Size Based o Coefficiet of Variatio I the above, the samle size is derived from a absolute measure of variatio, σ. Coefficiet of variatio (cv) is a relative measure, i which uits of measuremet is caceled by dividig with mea. Coefficiet of variatio is useful for comariso of variables.

Coefficiet of variatio is defied as, C Y S Y y, ad is estimated by c y s y y Coefficiet of variatio (CV) of mea is CV SE s / s y y y So, CV s y For roortio, CV ( ) CV ( )

Cautio about usig coefficiet of variatio (CV) If mea of a variable is close to zero, CV estimate is large ad ustable. Next, cosider CV for biomial variables. For biary variables, the choice of P ad Q-P does ot affect P(-P) estimate, but CV differs. So, the choice of P affects samle size whe CV method is used.

Cost cosideratios for samle size How may samles you may afford to iterview, give the budget costraits? C() cost of takig samles c o fixed cost c cost for each samle iterview the, C() c o + c x Examle: C()$0000 - your budget for survey imlemetatio c o $3000 - costs for iterviewer traiig, questioaire rits, etc c $8.00 - cost for each samle iterview 00003000+8* So, 875

Objective : Issues of Power Calculatio POWER The ower of a test is the robability of rejectig the ull hyothesis if it is icorrect. TRICKS to REMEMBER: R, T: Reject the ull hyothesis if it is true - Tye I error (alha error) { oe stick i R, T} A, F: Accet the ull hyothesis if it is false - Tye II error (beta error) {two sticks i A, F} POWER: - tye II error Power: Reject the ull hyothesis if it is false. Aother way: False Positive (YES istead of NO)? False Negative (NO istead of YES)?

HoLeft Curve, HaRight Curve, Area to right of liepower N 500 diff. alha.05 HoLeft Curve, HaRight Curve, Area to right of liepower N 00 diff. alha.05. 0 - -.5 0.5 x Power.99 HoLeft Curve, HaRight Curve, Area to right of liepower N 50 diff. alha.05.. 0 - -.5 0.5 x Power.8 0 - -.5 0.5 x Power.9

We take ower ito cosideratio whe we test hyotheses. Examle: Cosider followig study questios:. What roortios of regat wome received ateatal care? There is o hyothesis. b) Whether 80% of wome received ateatal care? value. There is a hyothesis: To test that the estimated value is greater, less, or equal to a re-secified c) Do wome i roject (itervetio) area more likely to utilize ateatal care, comared to cotrol area? There is a hyothesis: To test that that P is greater tha P. I terms of hyothesis: Null hyothesis: Ho:PP, i.e., P-P0 Alterative hyothesis: Ha:P > P (oe-sided) Ha:P P (two-sided) i.e., P-P 0

Issues: Oe-sided vs. two-sided tests. Oe-sided: samle size will be smaller. Two-side: samle size will be larger. Always refer "two-sided" - almost a madatory i cliical trials. Why? Ucertaity i kowledge (a riori).

How to icororate "ower" i samle size calculatios?. Proortios: (t α + t ) β d ( ) where is ( + ) / Note: for each grou. Alterative: arcsi Z + Z α β arcsi Why? Arcsi rovides ormal aroximatio to roortio quatities.

For cotiuous variables: ( Z α / + Zβ ) d s

Values of Z -α/ ad Z β corresodig to secified values of sigificace level ad ower Level % 5% 0% Values Two-sided Oe-sided.576.960.645.36.645.8 Power 80% 90% 95% 99% 0.84.8.645.36

How to icororate "ower" i samle size calculatios? a) Proortios: ( z (α / ) + z β ) variace ( of differece[var( ) )] How to estimate variace of differece? σ σ σ + σ σ σ ( ) d Uder the assumtio of ideedece,cov(, ) σ σ 0 If we also assume that var( ) var( ) var( ), i. e., have commo variace σ σ d + σ σ

So, variace commo of assumtio the uder where q z z v differece of variace z z ) / ( ) ( * ) ( ) ( )] ( [ ) ( ) / ( ) / ( + + + β α β α

The samle size formula for testig two roortios uder ideedece without the assumtio of commo variace is the: Note that Fleiss (98) suggested more recise formula: Whe ad is ot equal ad related by a ratio, say by r, the formula is: The fial formula (usig ormal aroximatio with cotiuity correctio [without the correctio, the ower is cosidered low tha exected] with roortios) is: / ) ( )] ( ) ( [ ) ( z z + + β α { } ) / (, ) ( ) ( ) ( ) ( / where z z + + + β α { } / ) ( ) ( ) ( ) ( ) ( r r z r z + + + β α ) ( 4 r r r + + + The STATA has imlemeted this formula i SAMPSI commad.

Stata imlemetatio NO Hyothesis. samsi.5.55, (.5) oesamle Estimated required samle size: 385 Study has a hyothesis, but comarig with a hyothesized value. samsi.5.55, (.8) oesamle Estimated samle size for oe-samle comariso of roortio to hyothesized value. di 783/385.033766. di (.96+.84)^/.96^.040863 Estimated required samle size: 783 Study has a hyothesis, ad comarig betwee two grous. samsi.5.55, (.8) Estimated samle size for two-samle comariso of roortios 605 605

Stata imlemetatio. samsi.5.55, (.8) ocotiuity Estimated samle size for two-samle comariso of roortios Test Ho:, where is the roortio i oulatio ad is the roortio i oulatio Assumtios: alha 0.0500 (two-sided) ower 0.8000 0.5000 0.5500 /.00. di 783* 566 I each grou, samle size is doubled Estimated required samle sizes: 565 565

Power grah i Stata Samle Size ad Power for P.5 ad P.55 ower.8.85.9.95 500 000 500 3000 3500

*Calculate ad lot samle size by ower from.8 to.99 *************************************************************************** args tye clear set obs 0 ge. ge ower. local i 0 while `i' <_N { local i `i' + local j.79 +`i'/00 quietly samsi `' `', (`j') `tye' relace r(n_) i `i' relace owerr(ower) i `i' } oisily list ower grah twoway lie ower, t("samle Size ad Power for P`' ad P`' `tye'") ***************************************************************************** Save the above commads as do file (e.g., samle_grah.do). Execute the above file by: ru samle_grah

Samle size determiatio whe exressed i relative risk I eidemiological studies, ofte the hyothesis is exressed i relative risk or odds ratio, e.g, H0:R. A samle size formula give i Doer (983) for Relative Risk (. 0) is: { Z P ( P ) + Z P { + R P ( + R ) } } α R R β c c /[ P c ( R)] Where P R [ P ( + c R)] / ad R P E / P c

Nothig but the Fleiss formula: { z ( ) + z ( ) + ( )} ε β ( C E E ) E C C where, ( E + C ) / Note, P E R PE PC RP Solutio: Relace all P E with RP C ad aly Fleiss formula How Doer s formula was derived: P(P E +P C )/(RP C +P C )/[P C (R+)]/[P C (+R)]/ P E (-P E )+P C (-P C )RP C (-RP C )+P C (-P C ) RP C -R P C +P C -P C P C (R-R P C +-P C ) P C (+R-P C (+R ) ad, (P C -P E ) (P C -RP C ) [P C (-R)] C

Samle size for odds-ratio (OR) estimates: OR P Q P P ( Q P P P P OR * PQ + OR * P ) OR * P P Q PQ OR * P OR * P + Q OR * P ( P ) OR * P OR * P * P Coveiet to do i two stages:. Estimate P from odds-ratio (OR). Aly roortio method (of Fleiss)

A examle Suose we wat to detect a OR of usig a ratio of : cases to cotrols i a oulatio with exected exosure roortio i o-cases of 0.5 while requirig a α 0.05 ad ower 0.8. How to estimate SS? EiTable calculates m m 65. (Total samle size 330). So, P.5, P (*.5)/(*.5+.75) 0.4 I Stata:. samsi.5.40, (.8) Estimated required samle sizes: 65 65

SAMPLE SIZE determiatio for Logistic Regressio Models Cosider a logistic regressio, log logit( ) α + βx We wat to estimate samle size eeded to achieve certai ower for testig ull hyothesis Ho:β0. Recall that ull hyothesis testig deeds o the variace of β. I logistic regressio, the effect size is exressed as log odds ratio (η). Hsieh(989) suggested the followig formula for oe-sided test: where, [ z + z ex( η / 4) ] ( + ˆ δ ) α β δ [ + ( + η ) ex(5η / 4)] /[ + ex( η /( ˆ η ) / 4)]

Say, you wat to examie the effect size of log odds ratio of.5 log(.5).405465 ~0.4 See, the imlemetatio of formula i STATA:. clear. set obs obs was 0, ow. *Eter "odds-ratio". ge 0.0. ge or.5. ge betalog(or). di beta.405465. ge delta (+(+beta^)*ex(5*beta^/4))/(+ex(-beta^/4)).. di delta.399909.. di " " (.645+.8*ex(- beta^/4))^*(+**delta)/(*beta^) 67.6987 So, 68 ~ 630 Samle Size for Multile logistic Regressio Multile logistic regressio requires larger to detect effects. Let R deote the multile correlatio betwee the ideedet variable of iterest, X, ad the other covariates. The, samle size: ` /(-R ) Say, if R0.5, the ` 630/(-0.5^) 67

Stata s add-o rograms for samle size estimatio STPOWER: Survival studies Samsi_reg: Liear regressio Samclus: Cluster samlig ART: radomized trials with survival time or biary outcome XSAMPSI: Cross-over trials Samlesize: Grahical results MVSAMPSI: multivariate regressio

STUDYSI: Comarative study with biary or time-to-evet outcome SSKAPP: Kaa statistics measure of iterrater aggremet CACLSI: log-rak/biomial test

Additioal toics to be covered Samle allocatio stratified samlig Samle size corrected for desigeffect(deff) Otimal samle size er cluster Samle size for clusters Samle size ad ower for re-ost surveys i rogram evaluatio