Construction and use of sample weights * by Ibrahim S. Yansaneh **



Similar documents
Calculation of Sampling Weights

Sample Design in TIMSS and PIRLS

Demographic and Health Surveys Methodology

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

DEFINING %COMPLETE IN MICROSOFT PROJECT

The OC Curve of Attribute Acceptance Plans

The Current Employment Statistics (CES) survey,

An Alternative Way to Measure Private Equity Performance

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Overview of monitoring and evaluation

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

How To Calculate The Accountng Perod Of Nequalty

Survey Weighting and the Calculation of Sampling Variance

Editing and Imputing Administrative Tax Return Data. Charlotte Gaughan Office for National Statistics UK

A 'Virtual Population' Approach To Small Area Estimation

Instructions for Analyzing Data from CAHPS Surveys:

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Multiple-Period Attribution: Residuals and Compounding

1. Measuring association using correlation and regression

Can Auto Liability Insurance Purchases Signal Risk Attitude?

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

CHAPTER 14 MORE ABOUT REGRESSION

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Reporting Forms ARF 113.0A, ARF 113.0B, ARF 113.0C and ARF 113.0D FIRB Corporate (including SME Corporate), Sovereign and Bank Instruction Guide

Enhancing the Quality of Price Indexes A Sampling Perspective

What is Candidate Sampling

Capacity-building and training

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

Construction Rules for Morningstar Canada Target Dividend Index SM

Traffic-light a stress test for life insurance provisions

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Section 5.4 Annuities, Present Value, and Amortization

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

The Safety Board recommends that the Penn Central Transportation. Company and the American Railway Engineering Association revise

Recurrence. 1 Definitions and main statements

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank.

Implementation of Deutsch's Algorithm Using Mathcad

Time Value of Money Module

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

LIFETIME INCOME OPTIONS

Using Series to Analyze Financial Situations: Present Value

Evaluation Methods for Non- Experimental Data

IMPACT ANALYSIS OF A CELLULAR PHONE

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Quantification of qualitative data: the case of the Central Bank of Armenia

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

MARKET SHARE CONSTRAINTS AND THE LOSS FUNCTION IN CHOICE BASED CONJOINT ANALYSIS

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Forecasting the Direction and Strength of Stock Market Movement

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

World Economic Vulnerability Monitor (WEVUM) Trade shock analysis

The Application of Fractional Brownian Motion in Option Pricing

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

Protection, assistance and human rights. Recommended Principles and Guidelines on Human Rights and Human Trafficking (E/2002/68/Add.

Texas Instruments 30X IIS Calculator

Extending Probabilistic Dynamic Epistemic Logic

Analysis of Premium Liabilities for Australian Lines of Business

Software project management with GAs

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Customer Lifetime Value Modeling and Its Use for Customer Retention Planning

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Estimation of Attrition Biases in SIPP

14.74 Lecture 5: Health (2)

A Performance Analysis of View Maintenance Techniques for Data Warehouses

Design and Development of a Security Evaluation Platform Based on International Standards

Realistic Image Synthesis

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

Tuition Fee Loan application notes

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.

Start me up: The Effectiveness of a Self-Employment Programme for Needy Unemployed People in Germany*

Sketching Sampled Data Streams

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Assessment of the legal framework

Residential real estate price indices as financial soundness indicators: methodological issues

Fixed income risk attribution

An Interest-Oriented Network Evolution Mechanism for Online Communities

IT09 - Identity Management Policy

PERRON FROBENIUS THEOREM

Financial Mathemetics

LAW ENFORCEMENT TRAINING TOOLS. Training tools for law enforcement officials and the judiciary

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Estimating the Development Effort of Web Projects in Chile

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

Screening Tools Chart As of November 2011

Objectives How Can Pharmacy Staff Add to the Accountability of ACO s?

Transcription:

UNITED NATIONS SECRETARIAT ESA/STAT/AC.93/5 Statstcs Dvson 03 November 2003 Expert Group Meetng to Revew the Draft Handbook on Desgnng of Household Sample Surveys 3-5 December 2003 Englsh only D R A F T Constructon and use of sample weghts * by Ibrahm S. Yansaneh ** * ** Ths document s beng ssued wthout formal edtng. The vews expressed n ths paper are those of the author and do not mply the expresson of any opnon on the part of the Unted Natons Secretarat.

Table of contents Chapter Fve: Constructon and use of sample weghts...3 5.1. The need for samplng weghts...3 5.2. The development of samplng weghts...3 5.2.1. Adjustments of sample weghts for unknown elgblty...4 5.2.2. Adjustments of sample weghts for duplcates...4 5.3. Weghtng for unequal probabltes of selecton...4 5.3.1. A case study n constructon of weghts: the Vet Nam Natonal Health Survey...6 5.3.2. Self weghtng samples...6 5.4. The adjustment of sample weghts for non-response...6 5.4.1. Reducng non-response bas n household surveys...7 5.4.2. Compensatng for non-response bas...7 5.4.3. Non-response adjustment of sample weghts...8 5.5. The adjustment of sample weghts for non-coverage...10 5.5.1. Sources of non-coverage n household surveys...10 5.5.2. Compensatng for non-coverage n household surveys...11 5.6. Increase n varance due to weghtng...12 5.7. Concludng remarks...12 References and further readng...14 Abstract Ths chapter provdes a bref overvew of the varous stages n the constructon and adjustment of sample weghts to be used n the analyss of survey data. In partcular, the adjustment of sample weghts to compensate for non-coverage and non-response s descrbed. In addton, the chapter dscusses how sample weghts are used n the development of estmates of characterstcs of nterest. The mportant deas presented are llustrated usng real examples of current surveys conducted n developng countres, or ones that mmc real survey stuatons. Key Words. Base weght; non-response adjustment; post-stratfcaton; doman estmaton 5-2

Chapter Fve: Constructon and use of sample weghts 5.1. The need for samplng weghts 1. Samplng weghts are needed to correct for mperfectons n the sample that mght lead to bas and other departures between the sample and the reference populaton. Such mperfectons nclude the selecton of unts wth unequal probabltes, non-coverage of the populaton, and non-response. In other words, the purposes of weghtng are: a. To compensate for unequal probabltes of selecton. b. To compensate for (unt) non-response. c. To adjust the weghted sample dstrbuton for key varables of nterest (for example, age, race, and sex) to make t conform to a known populaton dstrbuton. 2. We shall dscuss n detal the procedures underlyng each of these scenaros n the sectons that follow. Once the mperfectons n the sample are compensated for, weghts can then be used n the estmaton of populaton characterstcs of nterest and also n the estmaton of the samplng errors of the survey estmates generated. (Gve an example to llustrate what happens when weghts are not used) 5.2. The development of samplng weghts 3. The development of samplng weghts usually starts wth the constructon of the base weght for each sampled unt, to correct for ther unequal probabltes of selecton. In general, the base weght of a sampled unt s the recprocal of ts probablty of selecton nto the sample. In mathematcal notaton, f a unt s ncluded n the sample wth probablty P, then ts base weght, denoted by w, s gven by w = 1/ p. 4. For example, a sampled unt selected wth probablty 1/50 represents 50 unts n the populaton from whch the sample was drawn. Thus sample weghts act as nflaton factors to represent the number of unts n the survey populaton that are accounted for by the sample unt to whch the weght s assgned. The sum of the sample weghts provdes an unbased estmate of the total number of ndvduals n the target populaton. 5. For mult-stage desgns, the base weghts must reflect the probabltes of selecton at each stage. For nstance, n the case of a two-stage desgn n whch the -th PSU s selected wth probablty p at the frst stage, and the j-th household s selected wthn a selected PSU wth probablty p j() at the second stage, then the overall probablty of selecton of the every household n the sample s gven by p j = p * p j() 5-3

and the overall base weght the household s obtaned as before, by takng the recprocal of ts overall probablty of selecton. Correspondngly, f the base weght for the j-th household s w j,b, and the weght attrbutable to compensaton for non-response s w j,nr, and the weght attrbutable to the compensaton for non-coverage s w j,nc, then the overall weght of the household s gven by: w j = w j,b * w j,nr * w j,nc 5.2.1. Adjustments of sample weghts for unknown elgblty (Dscuss weghtng for unknown elgblty) 5.2.2. Adjustments of sample weghts for duplcates 6. If t s known a pror that some unts have duplcates on the frame, then ncreased probablty of selecton of such unts can be compensated for by assgnng to them weghtng factors that are recprocals of the number of duplcate lstngs on the frame f such unts end up n the sample. Often however, duplcates are dscovered only after the sample s selected, and the probabltes of selecton of such sampled unts need to be adjusted to account for the duplcaton. Ths adjustment s mplemented as follows: Suppose the -th sampled unt has a probablty of selecton, denoted by p1 and suppose there are k-1 addtonal records on the samplng frame that are dentfed by ths sampled unt as duplcates, each wth selecton probabltes gven by p2,..., pk. Then, the adjusted probablty of selecton of the sampled unt n queston s gven by p = 1 - (1 - p 1 )(1 - p 2 )... (1 - p k ) The sampled unt s then weghted accordngly, that s, by 1/p. 7. We now llustrate the procedures for constructng sample weghts under scenaros outlned above, wth specfc examples. 5.3. Weghtng for unequal probabltes of selecton 8. An epsem sample of 5 households s selected from 250. One adult s selected at random n each sampled household. The monthly ncome (yj) and the level of educaton (zj= 1, f secondary or hgher; 0 otherwse) of the j-th sampled adult n the -th household are recorded. Let M denote the number of adults n household. Then, the overall probablty of selecton of a sampled adult s gven by: 5 1 1 1 pj = p p j( ) = = 250 M 50 M Therefore, the weght of a sampled adult s gven by: 5-4

1 w = = 50 M p j Example 9. To llustrate the estmaton procedure, let us assume a frst-stage sample of 5 households wth data obtaned from the sngle sampled adult for each household as gven n the table below: Sampled Household M w y j z j w y j w z j w z j y j 1 3 150 70 1 10,500 150 10,500 2 1 50 30 0 1,500 0 0 3 3 150 90 1 13,500 150 13,500 4 5 250 50 1 12,500 250 12,500 5 4 200 60 0 12,000 0 0 TOTAL 16 800 300 3 50,000 550 36,500 10. Estmates of varous characterstcs can then be obtaned from the above table as follows: a. The estmate of monthly ncome s w yj 50,000 y w = = = 62.5 w 800 If weghts were not used, ths estmate would be 60 (=300/5) b. The estmate of the proporton of people wth secondary or hgher educaton s 550 = wz j y w = = 0.6875 or 68.75% w 800 If weghts are not used, ths estmate would be 3/5 or 0.60 or 60%. c. The estmate of the total number of people wth secondary or hgher educaton s tˆ w z = 550 = j d. The estmate of the mean monthly ncome of adults wth secondary or hgher educaton s wz jyj 36,500 y w = = = 66.36 w z 550 j 11. Note that for estmatng totals, sampled elements need to be weghted by the recprocal of ther selecton probabltes. For estmatng means and proportons, the weghts need only be 5-5

proportonal to the recprocals of the selecton probabltes. Thus, n the precedng example, the weghts w s are proportonal to M (w =50* M ). If we use M as the weghts, then the estmate of the proporton wth secondary or hgher educaton s M zj 3 1+ 1 0 + 3 1+ 5 1+ 4 0 11 pˆ = = 0.6875 or 68.75%, M 3+ 1+ 3+ 5+ 4 16 = as before. However, the estmate of the total number of adults wth secondary or hgher educaton s p ˆ s = 50 M zj = 50 11= 550 5.3.1. A case study n constructon of weghts: the Vet Nam Natonal Health Survey 12. We now proceed to llustrate the constructon of the samplng weghts for an actual survey, the Natonal Health Survey conducted n Vet Nam n 2001. (Insert a case study of weght constructon for the VNHS) 5.3.2. Self weghtng samples 13. When the weghts of all sampled unts are the same, the sample s referred to as selfweghtng. Samples are rarely self-weghtng at the natonal level for several reasons. Frst, samplng unts are selected wth unequal probabltes of selecton. Indeed, even though the PSUs are often selected wth probablty proportonal to sze, and households selected at an approprate rate wthn PSUs to yeld a self-weghtng desgn, ths may be nullfed by the selecton of one person for ntervew n each sampled household. Second, the selected sample often has defcences ncludng non-response and under-coverage (see sectons 5.4 and 5.5). Thrd, the need for precse estmates for domans and specal subpopulatons often requres oversamplng these domans (see secton 5.5). 5.4. The adjustment of sample weghts for non-response 14. It s rarely the case that all desred nformaton s obtaned from all sampled unts n surveys. For nstance, some households may provde no data at all whle other households may provde only partal data, that s, data on some but not all questons n the survey. The former type of non-response s called unt or total non-response, whle the latter s called tem nonresponse. If there are any systematc dfferences between the respondents and non-respondents, then naïve estmates based solely on the respondents wll be based. It s mportant to keep survey non-response as low as possble, n order to reduce the possblty that the survey estmates could be based n some way by falng to nclude (or ncludng a dsproportonately small percentage of) a partcular porton of the populaton. For example, persons who lve n 5-6

urban areas and have relatvely hgh ncomes mght be less lkely to partcpate n a multpurpose survey that ncludes ncome modules. Falure to nclude a large segment of ths porton of the populaton could affect natonal estmates of average household ncome, educatonal attanment, lteracy, etc. 5.4.1. Reducng non-response bas n household surveys 15. The sze of the non-response bas for a sample mean, for nstance, s a functon of two factors: The proporton of the populaton that does not respond. The sze of the dfference n populaton means between respondent and nonrespondent groups. 16. Reducng the bas due to non-response therefore requres that ether the non-response rate be small, or that there are small dfferences between respondng and non-respondng households and persons. Wth proper record keepng of every sampled unt that s selected for the survey, t s possble to estmate drectly from the survey data, the non-response rate for the entre sample and for sub-domans of nterest. Furthermore, specal carefully desgned studes can be carred out to evaluate the dfferences between respondents and non-respondents (Groves and Couper, 1998). 17. For panel surveys (n whch data are collected from the same panel of sampled unts repeatedly over tme) the survey desgner has access to more data for studyng and adjustng for the effects of potental non-response bas than n one tme or cross-sectonal surveys. Here, nonresponse may arse from unts beng lost over the course of the survey, or refusng to partcpate n the survey after a whle due to respondent fatgue or other reasons, and so on. Data collected on prevous panel waves can then be used to learn more about dfferences between respondents and non-respondents, and to serve as the bass for the knd of adjustments descrbed below. More detals on varous technques used for compensatng for non-response n survey research are provded n Brck and Kalton (1996), Lepkowsk (1988), and references cted theren. 5.4.2. Compensatng for non-response bas 18. A number of technques can be employed to reduce the potental for non-response bas n household surveys. The standard method of compensatng for partal or tem non-response s mputaton, whch s not dscussed n ths volume. A good ntroducton to mputaton methods for large complex datasets s provded by Yansaneh et. al. (1998). For unt or total non-response, there are three basc procedures for compensaton: a. Non-response adjustment of the weghts. b. Drawng a larger sample than needed and creatng a reserve sample from whch replacements are selected n case of non-response. c. Substtuton, the process of replacng a non-respondng household wth another household that was not sampled whch s n close proxmty to the non-respondng household wth respect to the characterstc of nterest. 5-7

19. It s advsable that unt non-response n household surveys be always handled by adjustng the sample weghts to account for non-respondng households. In many surveys n developng countres, substtuton s frequently used. However, ths procedure ncreases the probabltes of selecton for the potental substtutes, because non-sampled households close to non-respondng sampled households have a hgher probablty of selecton than those close to respondng sampled households. Furthermore, attempts to substtute for non-respondng households are tme-consumng, prone to errors and bas, and very dffcult to check or montor. For example, a substtuton may be made usng a convenent household rather than the household specfcally desgnated to serve as the substtute or replacement for a non-respondng household, thereby ntroducng bas. 5.4.3. Non-response adjustment of sample weghts 20. The procedure of adjustng sample weghts for non-response s the preferred practce n major household surveys throughout the world. Essentally, the adjustment transfers the base weghts of all elgble non-respondng sampled unts to the respondng unts, and s mplemented n the followng steps: Step 1: Apply the ntal weghts (for unequal selecton probabltes and other adjustments dscussed n Secton 5.2, f applcable); Step 2: Partton the sample nto subgroups and compute weghted response rates for each subgroup; Step 3: Use the recprocal of the subgroup response rates for non-response adjustments; and Step 4: Calculate the non-response adjusted weght for the -th unt as: w = w 1 * w 2, where w 1 s the ntal weght and w 2 s the non-response adjustment weght. Note that the weghted non-response rate can be defned as the raton of the weghted number of ntervews completed wth elgble sampled cases to the weghted number of elgble sampled cases. 21. We now llustrate the deas presented n ths secton wth an example. Example 22. A stratfed mult-stage sample of 1000 households s selected from two regons (North and South) of a country. Households n the North are sampled at a rate of 1/100 and those n the south at a rate of 1/200. Response rates n urban areas are lower that those n rural areas. Let n h denote the number of households sampled n stratum h, let r h denote the number of elgble households that responded to the survey, and let t h denote the number of respondng households 5-8

wth access to prmary health care. Then, the non-response adjusted weght for the households n stratum h s gven by: w h = w 1h *w 2h, where w 2h = n h / r h. Assume that the stratum-level data are as gven n the followng table: Stratum n h r h t h w 1h w 2h w h w h r h w h t h North-Urban 100 80 70 100 1.25 125 10,000 8,750 North-Rural 300 120 100 100 2.50 250 30,000 25,000 South-Urban 200 170 150 200 1.18 236 40,120 35,400 South-Rural 400 360 180 200 1.11 222 79,920 39,960 Total 1,000 730 500 160,040 109,110 23. Therefore, the estmated proporton of households wth access to prmary health care s: wht h 109, 110 p ˆ = = 0.682 or 68.2% w r 160,040 = h h The estmated number of households wth access to prmary health care s = h t ˆ w h t =109, 110 = 68.2% of 160,040 Note that the unweghted estmated proporton of households wth access to prmary health care, usng only the respondent data s th 500 p ˆ uw = = 0.685 or 68.5%, r 730 = h and the estmated proporton usng the ntal weghts wthout non-response adjustment s w1 hth 83,000 p ˆ1 = = 0.659 or 65.9%. w r 126,000 = 1h h 24. Note also ths example s provded for the purpose of llustratng how ntal weghts are adjusted to compensate for non-response. The results show consderable dsparty between the estmated proporton usng only the ntal weghts compared to that usng non-response adjusted weghts, but the dfference between the unweghted proporton and the non-response-adjusted proporton appears to be neglgble. 5-9

25. After non-response adjustments of the weghts, further adjustments can be made to the weghts as approprate. In the next secton, we consder adjustment of the weghts to account for non-coverage. 5.5. The adjustment of sample weghts for non-coverage 26. Non-coverage refers to the falure of the samplng frame to cover all of the target populaton and thus some samplng unts have no probablty of selecton nto the sample selected for the household survey. Ths s just one of many possble defcences of samplng frames used to select samples for surveys n developng countres. See Yansaneh (2003), and references cted theren, for a detaled dscusson of samplng frame problems and some possble solutons. 5.5.1. Sources of non-coverage n household surveys 27. Most household surveys n developng countres are based on stratfed mult-stage area probablty desgns. The frst-stage unts, or prmary samplng unts, are usually geographc area unts. At the second stage, a lst of households or dwellng unts s created, from whch the sample of households s selected. At the last stage, a lst of house members or resdents s created, from whch the sample of persons s selected. Thus non-coverage may occur at three levels: the PSU level, the household level, and the person level. 28. Snce PSUs are generally based on enumeraton areas dentfed and used n a precedng populaton and housng census, they are expected to cover the entre geographc extent of the target populaton. Thus, the sze of PSU non-coverage s generally small. For household surveys n developng countres, PSU non-coverage s not as serous as non-coverage at subsequent stages of the desgn. However, non-coverage of PSUs does occur n most surveys. For nstance, a survey may be desgned to provde estmates for the entre populaton n a country, or a regon of a country, but some PSUs may be excluded on purpose at the desgn stage, because some regons of a country are naccessble due to cvl war or unrest, a natural dsaster, or other reasons. Also, remote areas wth very few households or persons are sometmes removed from the samplng frames for household surveys because they are too costly to cover, and they represent a small proporton of the populaton and so have very lttle effect on the populaton fgures. In reportng results for such a survey, the excluson of these areas must be explctly stated. The mpresson should not be created that survey results apply to the entre country or regon, when n fact a porton of the populaton s not covered. The non-coverage propertes of the survey must be fully reported n the survey report. 29. Non-coverage becomes a more serous problem at the household level. Most surveys consder households to be the collecton of persons who are usually related n some way, and who usually resde n a dwellng or housng unt. There are mportant defntonal ssues to resolve, such as who s a usual resdent; and what s a dwellng unt? How are mult-unt structures (such as apartment buldngs) and dwellng unts wth multple households handled? It may be easy to dentfy the dwellng unt, but complex socal structures may make t dffcult to dentfy the households wthn the dwellng unt. There s thus a lot of potental for msnterpretaton or nconsstent nterpretaton of these concepts by dfferent ntervewers, or n 5-10

dfferent countres or cultures. In any event, strct operatonal nstructons are needed to gude ntervewers on whom to consder a household member or what to consder a dwellng unt. 30. Other factors that contrbute to non-coverage nclude the nadvertent omsson of dwellng unts from lstngs prepared durng feld operatons, or sub-populatons of nterest (for example, young chldren or the elderly), and omssons due to errors n measurement, nonncluson of absent household members, and omssons due to msunderstandng of survey concepts. There s also a temporal dmenson to the problem, that s, dwellng unts may be unoccuped or under constructon at the tme of lstng, but become occuped at the tme of data collecton. For household surveys n developng countres, the non-coverage problem s exacerbated by the fact that most censuses n developng countres, the unque bass for constructng samplng frames, do not provde detaled addresses of samplng unts at the household and person levels. Frequently, out of date or naccurate admnstratve household lstngs are used, and ndvduals wthn a household are delberately or accdentally omtted from a household lstng of resdents. More detals on sources of non-coverage are provded n Lepkowsk (2003) and references cted theren. 5.5.2. Compensatng for non-coverage n household surveys 31. Non-coverage s a major concern for household surveys conducted n developng countres. Evdence of the mpact of non-coverage can be seen from the fact that sample estmates of populaton counts based on most developng-country surveys fall well short of populaton estmates from other sources. There are several procedures for handlng the problem of non-coverage n household surveys (Lepkowsk, 2003). These nclude: a. Improved feld procedures such as the use of multple frames and mproved lstng procedures. b. Compensatng for the non-coverage through a statstcal adjustment of the weghts. 32. In ths chapter, we shall concentrate on the second procedure. If relable control totals are avalable for the entre populaton and for specfed subgroups of the populaton, one could attempt to adjust the weghts of the sample unts n such a way as to make the sum of weghts match the control totals wthn the specfed subgroups. The subgroups are called post-strata, and the statstcal adjustment procedure s called post-stratfcaton. Ths procedure smultaneously compensates for non-response and non-coverage. It adjusts the weghted samplng dstrbuton for certan varables so as to conform to a known populaton dstrbuton. See Lehtonen and Pahknen (1995) for some practcal examples of how to analyze survey data wth poststratfcaton. A smple example s provded below, just to ad understandng of the procedure. Example 33. In the precedng example, suppose that the number of households s known to be 45,025 n the North and 115,800 n the South. Suppose further that the weghted sample totals are respectvely 40,000 and 120,040. 5-11

Step 1: Compute the post-stratfcaton factors. 45,025 For the North regon, we have: w 3 h = = 1. 126 ; and 40,000 115,800 For the South regon, we have: w 3 h = = 0. 965. 120,040 Step 2: Compute fnal, adjusted weght: = w w wf h 3h 34. The numercal results are summarzed n the followng table: Stratum r h t h w h w f w f *r h w f *t h North-Urban 80 70 125 140.75 11,256 9,849 North-Rural 120 100 250 281.40 33,768 28,140 South-Urban 170 150 236 227.77 38,709 34,155 South-Rural 360 180 222 214.20 77,112 38,556 Total 730 500 160,845 110,700 Therefore, the estmated proporton of households wth access to prmary health care s: wfth 110,700 p ˆ f = = 0.688 or 68.8% w r 160,845 = f h 35. Note that wth the weghts adjusted by post-stratfcaton, the weghted sample counts for the North and South regons are respectvely 45,024 (11,256+33,768) and 115,821 (38,709+77,112), whch closely match the control totals gven above. 5.6. Increase n varance due to weghtng ( ) 5.7. Concludng remarks 36. Sample weghts have now come to be regarded as an ntegral part of the analyss of household survey data n developng countres, as n the rest of the world. Most survey programmes now advocate the use of weghts even n the rare stuatons nvolvng self-weghtng samples (n whch case the weghts would be 1). In the past, tremendous efforts were expended by survey desgners for the vrtually unattanable goal of achevng self-weghtng samples and makng weghts unnecessary n the analyss of survey data. The conventonal wsdom was that the use of weghts made the analyses too complcated, and that there was very lttle, f any, computng nfrastructure for weghted analyss of survey data. However, advances n computer technology n past decade have nvaldated ths argument. Computer hardware and software are 5-12

now affordable and avalable n many developng countres. In addton, many specalzed computer software packages are now avalable specfcally for the analyss of survey data. These are revewed and compared n chapter 6. More detals can be obtaned from the references cted n chapter 6. 37. As dscussed, the use of weghts reduces bases due to mperfectons n the sample related to non-coverage and non-response. Non-response and non-coverage are dfferent types of error due to the falure of a desgned survey to obtan nformaton from some unts n the target populaton. For household surveys n developng countres, non-coverage s a more serous problem than non-response. The chapter provdes examples of procedures for developng and statstcally adjustng the basc weghts to compensate for these unavodable problems of household surveys, and for usng the adjusted weghts n the estmaton of parameters of nterest. The advent of fast-speed computers and affordable of free statstcal software should make the use of weghts a routne aspect of the analyss of household survey data even n developng countres. 5-13

References and further readng Brck, J.M. and Kalton, G. (1996). Handlng mssng data n survey research, Statstcal Methods n Medcal Research, Vol. 5, 215-238. Cochran, W.G. 1977. Samplng Technques, 3 rd edton. New York: John Wley & Sons. Groves, R.M., Dllman, D.A., Eltnge, J.L., and Lttle, R.J.A. 2002. Survey Non-response. New York: John Wley & Sons Groves, R.M., and Couper, M.P. 1998. Non-response n Household Intervew Surveys. New York: John Wley & Sons. Kalton, G, and Kasprzyk, D. 1986. The treatment of mssng survey data, Survey Methodology, Vol. 12, pp. 1-16 Kalton, G. (1983). Introducton to Survey Samplng. Sage Publcatons Seres, No. 35. Ksh, Lesle. 1965. Survey Samplng. New York: Wley. Ksh, L., and Hess, I. 1950. On noncoverage of sample dwellngs, Journal of the Amercan Statstcal Assocaton, Vol. 53, pp. 509-524. Lehtonen, R., and E. J. Pahknen (1995). Practcal Methods for Desgn and Analyss of Complex Surveys. New York: Wley. Lepkowsk, James M. Non-observaton Error n Household Surveys n Developng Countres, Techncal Report on Surveys n Developng and Transton Countres, Unted Natons, 2003. Lessler, J., and Kalsbeek, W. 1992. Nonsamplng Error n Surveys. New York: John Wley & Sons. Levy, Paul S. and Stanley Lemeshow (1999). Samplng of Populatons: Methods and Applcatons. Thrd edton. John Wley & Sons, New York. Lohr, Sharon. 1999. Samplng: Desgn and Analyss. Pacfc Grove: Duxbury Press. Yansaneh, I. S. An Overvew of Sample Desgn Issues for Household Surveys n Developng Countres, Techncal Report on Surveys n Developng and Transton Countres, Unted Natons, 2003. 5-14