samplng and household lstng manual Demographc and Health Surveys Methodology Ths document s part of the Demographc and Health Survey s DHS Toolkt of methodology for the MEASURE DHS Phase III project, mplemented from 2008-2013. Ths publcaton was produced for revew by the Unted States Agency for Internatonal Development (USAID). It was prepared by MEASURE DHS/ICF Internatonal.
[THIS PAGE IS INTENTIONALLY BLANK]
Demographc and Health Survey Samplng and Household Lstng Manual ICF Internatonal Calverton, Maryland USA September 2012
MEASURE DHS s a fve-year project to assst nsttutons n collectng and analyzng data needed to plan, montor, and evaluate populaton, health, and nutrton programs. MEASURE DHS s funded by the U.S. Agency for Internatonal Development (USAID). The project s mplemented by ICF Internatonal n Calverton, Maryland, n partnershp wth the Johns Hopkns Bloomberg School of Publc Health/Center for Communcaton Programs, the Program for Approprate Technology n Health (PATH), Futures Insttute, Camrs Internatonal, and Blue Raster. The man objectves of the MEASURE DHS program are to: 1) provde mproved nformaton through approprate data collecton, analyss, and evaluaton; 2) mprove coordnaton and partnershps n data collecton at the nternatonal and country levels; 3) ncrease host-country nsttutonalzaton of data collecton capacty; 4) mprove data collecton and analyss tools and methodologes; and 5) mprove the dssemnaton and utlzaton of data. For nformaton about the Demographc and Health Surveys (DHS) program, wrte to DHS, ICF Internatonal, 11785 Beltsvlle Drve, Sute 300, Calverton, MD 20705, U.S.A. (Telephone: 301-572- 0200; fax: 301-572-0999; e-mal: nfo@measuredhs.com; Internet: http://www.measuredhs.com). Recommended ctaton: ICF Internatonal. 2012. Demographc and Health Survey Samplng and Household Lstng Manual. MEASURE DHS, Calverton, Maryland, U.S.A.: ICF Internatonal
TABLE OF CONTENTS TABLES AND FIGURES... v 1 DEMOGRAPHIC AND HEALTH SURVEYS SAMPLING POLICY... 1 1.1 General prncples... 1 1.1.1 Exstng samplng frame... 1 1.1.2 Full coverage... 1 1.1.3 Probablty samplng... 2 1.1.4 Sutable sample sze... 2 1.1.5 Smple desgn... 2 1.1.6 Household lstng and pre-selecton of households... 2 1.1.7 Good sample documentaton... 2 1.1.8 Confdentalty... 3 1.1.9 Exactness of survey mplementaton... 3 1.2 Survey objectves and target populaton... 3 1.3 Survey doman... 4 1.4 Samplng frame... 4 1.4.1 Conventonal samplng frame... 5 1.4.2 Alternatve samplng frames... 5 1.4.3 Evaluaton of the samplng frame... 6 1.5 Stratfcaton... 6 1.6 Sample sze... 7 1.6.1 Sample sze and samplng errors... 7 1.6.2 Sample sze determnaton... 10 1.7 Sample allocaton... 12 1.8 Two-stage cluster samplng procedure... 15 1.9 Sample take per cluster... 16 1.9.1 Optmum sample take... 16 1.9.2 Varable sample take for self-weghtng... 17 1.10 Household lstng... 19 1.11 Household selecton n the central offce... 20 1.12 Household ntervews... 21 1.13 Samplng weght calculaton... 22 1.13.1 Why we need to weght the survey data... 22 1.13.2 Desgn weghts and samplng weghts... 22 1.13.3 How to calculate the desgn weghts... 23
1.13.4 Correcton of unt non-response and calculaton of samplng weghts... 24 1.13.5 Normalzaton of samplng weghts... 26 1.13.6 Standard weghts for HIV testng... 27 1.13.7 De-normalzaton of standard weghts for pooled data... 28 1.14 Calbraton of samplng weghts n case of bas... 29 1.15 Data qualty and samplng error reportng... 30 1.16 Sample documentaton... 31 1.17 Confdentalty... 31 2 HOUSEHOLD LISTING OPERATION... 32 2.1 Introducton... 32 2.2 Defnton of terms... 32 2.3 Responsbltes of the lstng staff... 33 2.4 Locatng the cluster... 34 2.5 Preparng locaton and sketch maps... 35 2.6 Collectng a GPS waypont for each cluster... 36 2.7 Lstng of households... 37 2.8 Segmentaton of large clusters... 38 2.9 Qualty control... 39 2.10 Prepare the household lstng forms for household selecton... 39 Appendx 2.1 Example lstng forms... 41 Appendx 2.2 Symbols for mappng and lstng... 46 Appendx 2.3 Examples of completed mappng and lstng forms... 48 3 SELECTED SAMPLING TECHNIQUES... 52 3.1 Smple random samplng... 52 3.2 Equal probablty systematc samplng... 53 3.2.1 Samplng theory... 53 3.2.2 Excel templates for systematc samplng... 55 3.3 Probablty proportonal to sze samplng... 64 3.3.1 Samplng theory... 64 3.3.2 Operatonal descrpton and examples... 65 3.4 Complex samplng procedures... 70 4 SURVEY ERRORS... 73 4.1 Errors of coverage and non-response... 73 4.1.1 Coverage errors... 73 4.1.2 Delberate restrctons of coverage... 74 4.1.3 Non-response... 74 v
4.1.4 Response rates... 76 4.2 Samplng errors... 78 5 SAMPLE DOCUMENTATION... 80 5.1 Introducton... 80 5.2 Sample desgn document... 80 5.2.1 Introducton... 80 5.2.2 Samplng frame... 81 5.2.3 Structure of the sample and the samplng procedure... 82 5.2.4 Selecton probablty and samplng weght... 84 5.3 Sample fle... 85 5.4 Results of Survey mplementaton... 88 5.5 Samplng errors... 90 5.6 Samplng parameters n DHS data fles... 91 Glossary of terms... 93 References... 97 v
TABLES AND FIGURES Table 1.1 Table 1.2 Sample sze determnaton for estmatng current use of a modern contraceptve method among currently marred women... 10 Sample sze determnaton for estmatng the prevalence of full vaccnaton coverage among chldren aged 12-23 months... 11 Table 1.3 Sample allocaton: Proportonal allocaton... 14 Table 1.4 Sample allocaton: Power allocaton... 14 Table 1.5 Optmal sample take for currently marred women 15-49 currently usng any contraceptve method based on ntracluster correlaton ρ and survey cost rato c 1 / c2 from past surveys... 17 Table 5.1 Dstrbuton of EAs and average sze of EA by regon and by type of resdence... 82 Table 5.2 Dstrbuton of households by regon and by type of resdence... 82 Table 5.3 Sample allocaton of clusters and households by regon and by type of resdence... 84 Table 5.4 Expected number of ntervews by regon and by type of resdence... 84 Table 5.5 An example sample fle... 87 Table 5.6 Example table for the results of survey mplementaton... 88 Table 5.7 Example appendx table for the results of the women s survey mplementaton... 89 Table 5.8 Example appendx table for the results of the men s survey mplementaton... 90 Table 5.9 Example table for samplng errors... 91 Fgure 3.1 Smple household selecton wth a sub-sample... 57 Fgure 3.2 Selecton of runs wth a sub-sample... 58 Fgure 3.3 Smple self-weghtng selecton wthout sample sze control... 59 Fgure 3.4 Self-weghtng selecton wth runs and wthout sample sze control... 60 Fgure 3.5 Self-weghtng selecton wth sample sze control... 61 Fgure 3.6 Self-weghtng selecton wth runs and wth sample sze control... 62 Fgure 3.7 Manual household selecton n the feld... 63 Fgure 3.8 Part of an Excel template for stratfed samplng... 68 Fgure 3.9 Part of an example for a provnce crossed urban-rural stratfed PPS samplng... 69 Fgure 3.10 Part of an example sample fle from a stratfed PPS samplng... 70 v
1 DEMOGRAPHIC AND HEALTH SURVEYS SAMPLING POLICY 1.1 General prncples Scentfc sample surveys are cost-effcent and relable ways to collect populaton-level nformaton such as socal, demographc and health data. The MEASURE DHS project s a worldwde project mplemented across varous countres and at multple ponts n tme wthn a country. In order to acheve comparablty, consstency and the best qualty n survey results, samplng actvtes n the Demographc and Health Surveys (DHS) should be guded by a number of general prncples. Ths manual presents general gudelnes on samplng for DHS surveys, although modfcatons may be requred for country-specfc stuatons. The key prncples of DHS samplng nclude: Use of an exstng samplng frame Full coverage of the target populaton Probablty samplng Usng a sutable sample sze Usng the most smple desgn possble Conductng a household lstng and pre-selecton of households Provdng good sample documentaton Mantanng confdentalty of ndvdual s nformaton Implementng the sample exactly as desgned 1.1.1 Exstng samplng frame A probablty sample can only be drawn from an exstng samplng frame whch s a complete lst of statstcal unts coverng the target populaton. Snce the constructon of a new samplng frame s lkely to be too expensve, DHS surveys should use an adequate pre-exstng samplng frame whch s offcally recognzed. Ths s possble for most of the countres where there has been a populaton census n recent years. Census frames are generally the best avalable samplng frame n terms of coverage, cartographc materals and organzaton. However, an evaluaton of the qualty and the accessblty of the frame should be consdered durng the development of the survey desgn, and a detaled study of the samplng frame s necessary before drawng the sample. In the absence of a census frame, a DHS survey can use an alternatve samplng frame, such as a complete lst of vllages or communtes n the country wth all necessary dentfcaton nformaton ncludng a measure of populaton sze (e.g. number of households), or a master sample whch s large enough to support the DHS desgn. 1.1.2 Full coverage A DHS survey should cover 100 percent of the target populaton n the country. The target populaton for the DHS survey s all women age 15-49 and chldren under fve years of age lvng n resdental households. Most surveys also nclude all men age 15-59 1. The target populaton may vary from country to country or from survey to survey, but the general samplng prncples are the same. In some cases, excluson of some areas may be necessary because of extreme naccessblty, volence or nstablty, but these ssues need to be consdered at the very begnnng of the survey, before the sample s drawn. 1 The age range vares from survey to survey and may be 15-49, 15-54, 15-59 or 15-64. 1
1.1.3 Probablty samplng A scentfc probablty samplng methodology must be used n DHS surveys. A probablty sample s defned as one n whch the unts are selected randomly wth known and nonzero probabltes. Ths s the only way to obtan unbased estmaton and to be able to evaluate the samplng errors. The term probablty samplng excludes purposve samplng, quota samplng, and other uncontrolled non-probablty methods because they cannot provde evaluaton of precson and/or confdence of survey fndngs. 1.1.4 Sutable sample sze Sample sze s a key parameter for DHS surveys because t s drectly related to survey budget, data qualty and survey precson. Theoretcally, the larger the sample sze, the better the survey precson, but ths s not always true n practce. Survey budget s not the only mportant factor n determnng the sample sze. Desred precson, the number of domans, capablty of the mplementng organzaton, data qualty concerns and cost effectveness are essental constrants n determnng the total sample sze. Thus a sutable sample sze s also a key parameter to guarantee data qualty. 1.1.5 Smple desgn In large-scale surveys, non-samplng errors (coverage errors, errors commtted n survey mplementaton and data processng, etc.) are usually the most mportant sources of error and are expensve to control and dffcult to evaluate quanttatvely. It s therefore mportant to mnmze them n survey mplementaton. In order to facltate accurate mplementaton of the survey, the samplng desgn for DHS should be as smple and straghtforward as possble. Macro s experence from 25 years of DHS surveys shows that a two-stage household-based sample desgn s relatvely easy to mplement and that qualty can be mantaned. 1.1.6 Household lstng and pre-selecton of households The DHS standard procedure recommends that households be pre-selected n the central offce pror to the start of feldwork rather than by teams n the feld who may have pressures to bas the selecton. The ntervewers are asked to ntervew only the pre-selected households. In order to prevent bas, no changes or replacements are allowed n the feld. To perform pre-selecton of households, a complete lst of all resdental households n each of the selected sample clusters s necessary. Ths lst s usually obtaned from a household lstng operaton conducted before the man survey. In some surveys, the household lstng operaton may be combned wth the man survey to form a sngle feld operaton, and households can be selected n the feld from a complete lstng. Combnng the household lstng and survey data collecton n one feld operaton s less expensve; however, t provdes ncentve to leave households off the household lst to reduce workload, thus reducng the representatveness of the survey results. Close supervson s needed durng the feld work to prevent ths problem. Separate lstng and data collecton operatons are thus requred for ths reason. Intervewers selectng households n the feld wthout a complete lstng s not acceptable for DHS surveys. 1.1.7 Good sample documentaton DHS surveys are usually year-long projects conducted by dfferent people specalzed n dfferent aspects of survey mplementaton, so good sample documentaton s necessary to guarantee the exact mplementaton of the project. The sample documentaton should nclude a sample desgn 2
document and the lst of prmary samplng unts. The sample desgn document should explan n detal the methodology, the samplng procedure, the sample sze, the sample allocaton, the survey domans and the stratfcaton. Ths should also form the bass for an appendx to the DHS fnal report descrbng the sample desgn. The sample lst should nclude all dentfcaton nformaton for all of the selected sample ponts, along wth ther probablty of selecton. 1.1.8 Confdentalty Confdentalty s a major concern n DHS, especally when human bo-markers are collected such as blood samples for HIV testng. The DHS surveys are anonymous surveys whch do not allow any potental dentfcaton of any sngle household or ndvdual n the data fle. Confdentalty s also a key factor affectng the response rate to senstve questons regardng sexual actvty and partners. In partcular, n surveys that nclude HIV testng DHS polcy requres that PSU and household codes are scrambled n the fnal data to further anonymze the data and the orgnal sample lst s destroyed. 1.1.9 Exactness of survey mplementaton Exactness of sample mplementaton s the last element n achevng good samplng precson. No matter how carefully a survey s desgned and how complete the materals for conductng samplng actvtes are, f the mplementaton of the samplng actvtes by samplng staff (offce staff responsble for selectng sample unts, feld workers responsble for the mappng and household lstng and ntervewers responsble for data collecton) s not preformed exactly as desgned, serous bas and msleadng results may occur. In the sectons that follow, DHS polces related to sample desgn and mplementaton are descrbed. 1.2 Survey objectves and target populaton The man objectve of DHS surveys s to collect up-to-date nformaton on basc demographc and health ndcators, ncludng housng characterstcs, fertlty, chldhood mortalty, contraceptve knowledge and use, maternal and chld health, nutrtonal status of mothers and chldren, knowledge, atttudes and behavor toward HIV/AIDS and other sexually transmtted nfectons (STI), women s status. The target populaton for DHS s defned as all women of reproductve age (15-49 years old) and ther young chldren under fve years of age lvng n ordnary resdental households. However, n some countres, the coverage may be restrcted to ever-marred women. The man ndcator topcs nclude: Total fertlty and age specfc fertlty rates Age at frst sex, frst brth, and frst marrage Knowledge and use of contracepton Unmet need for famly plannng Brth spacng Antenatal care Place of delvery Assstance from sklled personnel durng delvery Knowledge of HIV/AIDS and other STIs Hgher-rsk sexual behavor Condom use Chldhood vaccnaton coverage 3
Treatment of darrhea, fever, and cough Infant and under-fve mortalty rates Nutrtonal status Snce the target populaton can be easly found n resdental households, DHS s a householdbased survey. 1.3 Survey doman In DHS surveys, an mportant objectve s to compare the survey results for dfferent characterstcs such as urban and rural resdence, dfferent admnstratve or geographc regons, or dfferent educatonal levels of respondents. A survey doman or study doman s a sub-populaton for whch separate estmaton of the man ndcators s requred. There are two knds of survey domans: desgn domans and analyss domans. A desgn doman conssts of a sub-populaton whch can be dentfed n the samplng frame and therefore can be handled ndependently n the sample sze and samplng procedures, usually consstng of geographc areas or admnstratve unts. For example, urban and rural dfferences are very frequently requested; therefore, urban and rural areas are usually separate desgn domans for Demographc and Health Surveys. An analyss doman s a sub-populaton whch cannot be dentfed n the samplng frame, such as domans specfed by ndvdual characterstcs. These may nclude women wth secondary or hgher educaton, pregnant women, chldren 12-23 months, and chldren havng darrhea n the two weeks precedng the survey. In order for survey estmates to be relable at the doman level, t s necessary to ensure that the number of cases n each survey doman s suffcent, especally when desred levels of precson are requred for partcular domans. For a desgn doman, adequate sample sze s acheved by allocatng the target populaton at the survey desgn stage nto the requested desgn domans, and then calculatng the sample sze for the specfc desgn domans by takng the precson requred nto account. On the other hand, for an analyss doman, t s dffcult to guarantee a specfed precson because t s dffcult to control the sample sze at the desgn stage. However, f pror estmates of the average number of target ndvduals per household are avalable, then t s possble to control the precson for an analyss doman. For example, f survey estmates are requred for the nutrtonal status of chldren under age 5 s requred and estmates of the number of chldren under age 5 per household are avalable, t s then possble to calculate a sample sze to gve a certan level of precson. DHS reports also produce some ndcators for second level domans such as vaccnaton coverage of chldren age 12-23 months wthn a regon, where regon s the frst level doman, and chldren 12-23 months s the second level doman. Cauton must be pad to the precson requred for a second level doman because the second level doman usually ncludes a very small sub-populaton. If doman-level estmates are requred, t s better to avod a large number of domans because otherwse a very large sample sze wll be needed. The number of domans and the desred level of precson for each must be taken nto account n the budget calculaton and assessment of the mplementaton capabltes of the mplementng organzaton. The total sample sze needed s the sum of sample szes needed n all exclusve (frst level) domans. 1.4 Samplng frame A samplng frame s a complete lst of all samplng unts that entrely covers the target populaton. The exstence of a samplng frame allows a probablty selecton of samplng unts. For a mult-stage survey, a samplng frame should exst for each stage of selecton. The samplng unt for the frst stage of selecton s called the Prmary Samplng Unt (PSU); the samplng unt for the second stage of selecton s called the Secondary Samplng Unt (SSU), and so on. In most cases, DHS 4
surveys are two-stage surveys. Note that each stage of sample selecton wll nvolve samplng errors, so t s better to avod more than two stages f addtonal stages of selecton are not necessary. The avalablty of a sutable samplng frame s a major determnant of the feasblty of conductng a DHS survey. Ths ssue should be addressed n the earlest stages of plannng for a survey. A samplng frame for a DHS survey could be an exstng samplng frame, an exstng master sample, or a sample of a prevously executed survey of suffcently large sample sze, whch allows for the selecton of subsamples of desred sze for the DHS survey. 1.4.1 Conventonal samplng frame The best frame s the lst of Enumeraton Areas (EAs) from a recently completed populaton census. An EA s usually a geographc area whch groups a number of households together for convenent countng purposes for the census. A complete lst of EAs whch covers the survey area entrely s the most deal frame for DHS surveys. In most cases, a lst of EAs from a recent census s avalable. Ths lst should be thoroughly evaluated before t s used. The samplng frame used for DHS should be as up-to-date as possble. It should cover the whole survey area, wthout omsson or overlap. Basc cartographc materals should exst for each area unt or at least for groups of unts wth clearly defned boundares. Each area unt should have a unque dentfcaton code or a seres of codes that, when combned, can serve as a unque dentfcaton code. Each unt should have at least one measure of sze estmate (populaton and/or number of households). If other characterstcs of the area unts (e.g., socoeconomc level) exst, they should be evaluated and retaned as they may be used for stratfcaton. A pre-exstng master sample (whch s a random sample from the census frame) can be accepted only where there s confdence n the master sample desgn, ncludng detaled samplng desgn parameters such as samplng method, stratfcaton, and ncluson probablty for the selected prmary samplng unts. The task for the DHS survey s then to desgn a sub-samplng procedure, whch produces a sample n lne wth DHS requrements. Ths wll not always be possble. However, the larger the master sample s n relaton to the desred DHS sub-sample, the more flexblty there wll be for developng a sub-samplng desgn. A key queston wth a pre-exstng sample s whether the lstng of dwellngs/households s stll current or whether t needs to be updated. If updatng s requred, use of a pre-exstng sample may not be economcal. The potental advantages of usng a pre-exstng sample are: 1) economy, and 2) ncreased analytc power through comparatve analyss of two or more surveys. The dsadvantages are: 1) the problem of adaptng the sample to DHS requrements, and 2) the problem of repeated ntervews wth the same household or person n dfferent surveys, resultng n respondent fatgue or contamnaton. One way to avod ths last problem s to keep just the prmary samplng unts from the pre-exstng sample and reselect the households for the DHS survey. 1.4.2 Alternatve samplng frames When nether a census frame nor a master sample s avalable then alternatve frames should be consdered. Examples of such frames are: A lst of electoral zones wth estmated number of qualfed voters for each zone A grdded hgh resoluton satellte map wth estmated number of structures for each grd A lst of admnstratve unts such as vllages wth estmated populaton for each unt A man concern when usng alternatve frames are coverage problems, that s, does the frame completely cover the target populaton? Usually checkng the qualty of an alternatve frame s more dffcult because of a lack of nformaton ether from the frame tself or from admnstratve sources. 5
Another problem s the sze of the prmary samplng unt. Snce the alternatve frame s not specfcally created for a populaton census or household based survey, the sze of the PSUs of such frames may be too large or too small for a DHS survey. A thrd problem s dentfyng the boundares of the samplng unts due to the lack of cartographc materals. In the frst two examples of alternatve samplng frames, the standard DHS two-stage samplng procedure can be appled by treatng the electoral zones or the grds of satellte map as the PSUs. In the thrd case, when a lst of admnstratve unts larger than vllages (e.g. sub-dstrcts, wards or communes) s avalable, for example, a complete lst of all communes n a country may be easer to get than a complete lst of vllages, then t s necessary to use a selecton procedure that ncludes more than two stages. In the frst stage, select a number of communes; n each of the selected communes, construct a complete lst of all vllages resdng n the commune; select one vllage per commune as a DHS cluster, then proceed wth the subsequent household lstng and selecton as n a standard DHS. Ths procedure works best when the number of communes s large and the commune sze s small. A lst of admnstratve unts that are small n number but large n sze s not sutable for a DHS samplng frame because ths stuaton wll result n large samplng errors, as explaned later n Secton 1.9. 1.4.3 Evaluaton of the samplng frame No matter what knd of samplng frame wll be used, t s always necessary to check the qualty of the frame before selectng the sample. Followng are several thngs that need to be checked when usng a conventonal samplng frame: Coverage Dstrbuton Identfcaton and codng Measure of sze Consstency There are several easy but useful ways to check the qualty of a samplng frame. For example, for a census frame, check the total populaton of the samplng frame and the populaton dstrbuton among urban and rural areas and among dfferent regons/admnstratve unts obtaned from the frame wth that from the census report. Any mportant dfferences may ndcate that there may be coverage problems. If the frame provdes nformaton on populaton and households for each EA, then the average number of household members can be calculated, and a check for extreme values can help to fnd ncorrect measures of sze of the PSUs. If nformaton on populaton by sex s avalable for each EA, then a sex rato can be calculated for each EA, and a check for extreme values can help to dentfy non-resdental EAs. If the EAs are assocated wth an dentfcaton (ID) code, then check the ID codes to dentfy mscoded or msplaced EAs. A samplng frame wth full coverage and of good qualty s the frst element for a DHS survey; therefore, efforts should be made to guarantee a good start for the project. For a natonally representatve survey, geographc coverage of the survey should nclude the entre natonal terrtory unless there are strong reasons for excludng certan areas. If areas must be excluded, they should consttute a coherent doman. A survey from whch a number of scattered zones have been excluded s dffcult to nterpret and to use. 1.5 Stratfcaton Stratfcaton s the process by whch the survey populaton s dvded nto subgroups or strata that are as homogeneous as possble usng certan crtera. Explct stratfcaton s the actual sortng and separatng of the unts nto specfed strata. Wthn each stratum, the sample s desgned and 6
selected ndependently. It s also possble to systematcally sample unts from an ordered lst (wth a fxed samplng nterval between selected unts) to acheve the effect of stratfcaton. For example, n DHS survey, t s not unusual for the PSUs wthn the explct strata to be sorted geographcally. Ths s called mplct stratfcaton. The prncpal objectve of stratfcaton s to reduce samplng errors. In a stratfed sample, the samplng errors depend on the populaton varance exstng wthn the strata but not between the strata. For ths reason, t pays to create strata wth low nternal varablty (or hgh homogenety). Another major reason for stratfcaton s that, where marked dfferences exst between subgroups of the populaton (e.g., urban vs. rural areas), stratfcaton allows for a flexble sample desgn that can be dfferent for each subgroup. Stratfcaton should be ntroduced only at the frst stage of samplng. At the dwellng/household selecton stage, systematc samplng s used for convenence; however, no attempt should be made to reorder the dwellng/household lst before selecton n the hope of ncreasng the mplct stratfcaton effect. Such efforts generally have a neglgble effect. Stratfcaton can be sngle-level or mult-level. In sngle-level stratfcaton, the populaton s dvded nto strata accordng to certan crtera. In mult-level stratfcaton, the populaton s dvded nto frst-level strata accordng to certan crtera, and then the frst-level strata are subdvded nto second-level strata, and so on. A typcal two-level stratfcaton nvolves frst stratfyng the populaton by regon at the frst level and then by urban-rural wthn each regon. A DHS survey usually employs mult-level stratfcaton. Strata should not be confused wth survey domans. A survey doman s a populaton subgroup for whch separate survey estmates are desred (e.g., urban areas/rural areas). A stratum s a subgroup of homogeneous unts (e.g., subdvsons of an admnstratve regon) n whch the sample may be desgned dfferently and s selected separately. Survey domans and strata can be the same but they need not be. For example, survey domans could be the frst-level stratum n a mult-level stratfcaton. On the other hand, a survey doman could consst of one or several lower-level strata. DHS surveys typcally use explct stratfcaton by separatng urban and rural resdence wthn each regon. Where data are avalable, explct stratfcaton could also be done on the bass of socoeconomc zones or more drectly relevant characterstcs such as the level of female lteracy or the presence of health facltes n the areas. These knds of nformaton could be obtaned from admnstratve sources. Wthn each explct stratum, the unts can then be ordered accordng to locaton, thus provdng further mplct geographc stratfcaton. 1.6 Sample sze 1.6.1 Sample sze and samplng errors The estmates from a sample survey are affected by two types of errors: samplng errors and non-samplng errors. Samplng errors are the representatve errors due to samplng of a small number of elgble unts from the target populaton nstead of ncludng every elgble unt n the survey. Samplng errors are related to the sample sze and the varablty among the samplng unts. Samplng errors can be statstcally evaluated after the survey. Non-samplng errors result from problems durng data collecton and data processng, such as falure to locate and ntervew the correct household, msunderstandng of the questons on the part of ether the ntervewer or the respondent, and data entry errors. Non-samplng errors are related to the capacty of the mplementng organzaton, and experence shows that (1) non-samplng errors are always the most mportant source of error n a survey, and (2) t s dffcult to evaluate the magntude of non-samplng errors once a survey s complete. Theoretcally, wth the same survey methodology and under the same survey condtons, 7
the larger the sample sze, the better the survey precson. However, ths relatonshp does not always hold true n practce, because non-samplng errors tend to ncrease wth survey scale and sample sze. The challenge n decdng on the sample sze for a survey s to balance the demands of analyss and precson wth the capacty of the mplementng organzaton and the constrants of fundng. A common measure of precson for estmatng an ndcator s ts relatve standard error (RSE) whch s defned as ts standard error (SE) dvded by the estmated value of the ndcator. The standard error of an estmator s the representatve error due to samplng. The relatve standard error descrbes the amount of samplng error relatve to the ndcator level and s ndependent of the scale of the ndcator to be estmated; therefore, a unque RSE can be appled to a reference ndcator for all domans. If a unque RSE s desred for all domans, the doman sample sze depends on the varablty and the sze of the doman. The total sample sze s the sum of the sample szes over all domans for whch desred precson are requred. The followng are some concepts related to sample sze calculaton. 1. The standard error of an estmator when estmatng a proporton wth a smple random samplng wthout replacement 2 s gven by: 1 - f N SE = SQRT P(1 P) n N 1 where n s the sample sze (number of completed ntervews), P s the proporton, N s the target populaton sze, and f=n/n s the samplng fracton. When N s large and n s relatvely small, the above quantty can be approxmated by: Therefore the RSE of the estmator s gven by: P(1 P) SE SQRT n P(1 P) RSE( P) SQRT / P n 1 / P 1 = SQRT n 2. For a requred precson wth a relatve standard error α, the net sample sze (number of completed ntervews) needed for a smple random samplng s gven by: (1 / P 1) n = 2 α 3. Snce a smple random samplng s not feasble for a DHS, the sample sze for a complex survey wth clusterng such as the DHS can be calculated by nflatng the above calculated sample sze by usng a desgn effect (Deft). Deft s a measure of effcency of cluster samplng compared to a drect smple random samplng of ndvduals, defned as the rato between the standard error usng the gven sample desgn and the standard error that would result f a smple random sample had been used. A Deft value of 1.0 ndcates that the sample desgn s 2 A smple random sample would be a random selecton of ndvduals or households drectly from the target populaton. Ths s not feasble for DHS surveys because a lst of all elgble ndvduals or households s not avalable. 8
as effcent as a smple random sample, whle a value greater than 1.0 ndcates the ncrease n the samplng error due to the use of a more complex and less statstcally effcent desgn. The net sample sze needed for a cluster samplng wth same relatve standard error s gven by: n = Deft 2 (1 / P 1) 2 α 4. The formula for calculatng the fnal sample sze n terms of the number of households whle takng non-response nto account (the formula used n the templates for sample sze calculaton as shown n Table 1.1) s gven by: 2 (1/ P 1) n = Deft 2 α ( R Rh d) where n Deft P α R R h d s the sample sze n households; s the desgn effect (a default value of 1.5 s used for Deft f not specfed); s the estmated proporton; s the desred relatve standard error; s the ndvdual response rate; s the household gross response rate; and s the number of elgble ndvduals per household. The household gross response rate s the number of households ntervewed over the number selected. DHS reports typcally report the net household response rate whch s the number of households ntervewed over the number vald households found n the feld (.e. excludng vacant and destroyed dwellngs.) 5. If the target populaton s small (such as n a sub-natonal survey), a fnte populaton correcton of the above calculated sample sze should be appled. The fnal sample sze n s calculated by n = n + n 0 1 0 / N where n 0 s the ntal sample sze calculated n pont number 4, and N s the target populaton sze. 6. The relatonshp between the RSE and the sample sze shows that, f one reduces a desred RSE to half, then the sample sze needed wll ncrease 4 tmes. For example, the sample sze for a RSE of 5% s 4 tmes larger than the sample sze for a RSE of 10% (see Tables 1.1 and 1.2 n the next secton). Ths means that t s very expensve to reduce the RSE by ncreasng the sample sze. Therefore, when desgnng the sample sze, the effcency of the desgn must be consdered, that s, the balance between the gan n precson and the ncrease n sample sze (or survey cost). 7. The wdth of the confdence nterval s determned by the RSE. Wth a confdence level of 95%, 2*P*RSE s the half-length of the confdence nterval for P. For example, for RSE=0.10 and P=0.20, the half-length of the confdence nterval s 0.04, whch means the confdence nterval for P s (0.16, 0.24). (DHS reports +/-2*SE nstead of +/-1.96*SE as 95% confdence nterval for conservatve purposes). 9
1.6.2 Sample sze determnaton The total sample sze for a DHS survey wth a number of survey domans (desgn doman) s the sum of the sample szes over all domans. An approprate sample sze for a survey doman s the mnmum number of persons (e.g., women age 15-49, currently marred women 15-49, chldren under age fve) that acheves the desred survey precson for core ndcators at the doman level. If fundng s tght and fxed, the sample sze s the maxmum number of persons that the fundng can cover. Precson at the natonal level s usually not a problem. In almost all cases, sample sze s decded to guarantee precson at doman level wth approprate allocaton of the sample. So apart from survey costs, the total sample sze depends on the desred precson at doman level and the number of domans. If a reasonable precson s requred at doman level, experence from the MEASURE DHS program shows that a mnmum number of 800 completed ntervews wth women s necessary for some of the woman-based ndcators for hgh fertlty countres (e.g. total fertlty rate, contraceptve prevalence rate, chldhood mortalty rates); for low fertlty countres, the mnmum doman sample sze can reach 1,000 completed ntervews or more. Table 1.1 below llustrates the calculaton of sample sze for a doman accordng to dfferent levels of desred RSE for estmatng the ndcator the proporton of currently marred women who are current users of a modern contraceptve method. Table 1.1 Sample sze determnaton for estmatng current use of a modern contraceptve method among currently marred women Estmated proporton p 0.20 Total target populaton Estmated desgn effect (Deft) 1.40 # of target ndvduals/hh 1.05 Indvdual response rate 0.96 HH gross response rate 0.92 Desred Net Sample Sample sze Expected 95% confdence lmts RSE sze ndvdual Household SE Lower Upper 0.20 196 212 0.040 0.120 0.280 0.19 217 234 0.038 0.124 0.276 0.18 242 261 0.036 0.128 0.272 0.17 271 293 0.034 0.132 0.268 0.16 306 330 0.032 0.136 0.264 0.15 348 376 0.030 0.140 0.260 0.14 400 432 0.028 0.144 0.256 0.13 464 501 0.026 0.148 0.252 0.12 544 587 0.024 0.152 0.248 0.11 648 699 0.022 0.156 0.244 0.10 784 846 0.020 0.160 0.240 0.05 3136 3382 0.010 0.180 0.220 Note: The confdence lmts are calculated as P±2*SE. 10
Assumng the doman sze s large enough such that the fnte populaton correcton s neglgble, Table 1.1 gves the requred gross sample sze n terms of number of households wth estmated parameters from a DHS survey. The target populaton s currently marred women age 15-49; the estmated parameters are: the proporton of currently marred women who are current users of any modern contraceptve method, the desgn effect (Deft), the number of target ndvduals (number of currently marred women 15-49) per household, the ndvdual and the household response rates. For example, wth an estmated prevalence of 20%, f we requre a RSE of 10%, we should select 846 households n ths partcular doman. Wth a gross household response rate (the number of households completed over the total number selected) of 92% and an ndvdual response rate of 96%, we expect to obtan 784 completed ntervews of currently marred women age 15-49. The estmated quanttes at the top of the table used as nput to the calculaton can usually be obtaned from prevous surveys or from admnstratve records. The total sample sze for a survey wth several domans s the sum of the sample szes obtaned n the above table for each doman. If the same precson requred and the same ndcator level apply to all domans, then the total sample sze s the sample sze calculated for one doman multpled by the number of domans. Wth ths example, the total sample sze for a survey havng sx domans wth approxmately the same level of modern contraceptve use among currently marred women and the same precson request for each doman would be 5076 households. The Sample sze determnaton template located n the Appendx can be used to determne requred sample szes. Table 1.2 Sample sze determnaton for estmatng the prevalence of full vaccnaton coverage among chldren aged 12-23 months Estmated proporton p 0.29 Total target populaton Estmated desgn effect (Deft) 1.22 # of target ndvduals/hh 0.11 Indvdual response rate 0.96 HH gross response rate 0.92 Desred Net Sample Sample sze Expected 95% confdence lmts RSE sze ndvdual household SE Lower Upper 0.20 91 937 0.058 0.174 0.406 0.19 101 1040 0.055 0.180 0.400 0.18 112 1153 0.052 0.185 0.395 0.17 126 1297 0.049 0.191 0.389 0.16 142 1462 0.046 0.197 0.383 0.15 162 1668 0.043 0.203 0.377 0.14 186 1915 0.041 0.209 0.371 0.13 216 2224 0.038 0.215 0.365 0.12 253 2605 0.035 0.220 0.360 0.11 301 3099 0.032 0.226 0.354 0.10 364 3747 0.029 0.232 0.348 0.05 1458 15008 0.014 0.261 0.319 Note: The default value of Deft s set to be 1.5. Specfy f dfferent. The confdence lmts are calculated as P±2*SE. If response rate s not provded, the sample sze calculated s net sample sze. 11
Table 1.2 shows a smlar example for the ndcator proporton of chldren aged 12-23 months who are fully mmunzed. In ths case, the target populaton s chldren aged 12-23 months. The estmated number of target ndvduals per household s much smaller than the number of currently marred women per household gven n Table 1.1. So for the same sample sze calculated n Table 1.1, we can only get a RSE of above 20% at doman level. Wth a RSE of 10%, we need to select 3746 households n ths partcular doman whch seems unrealstc f we have several domans for the survey. Ths example shows that for a mult-ndcator survey, the sample sze requred can be very dfferent from ndcator to ndcator. So the choce of the reference ndcator upon whch the sample sze s calculated s an mportant ssue. The reference ndcator whch s used for sample sze determnaton should have demographc mportance, moderate value and moderate populaton coverage,.e. apply to a szable proporton of the populaton. Wth the same sample sze calculated n Table 1.1 for a survey havng sx domans, the RSE for the whole sample for estmatng full mmunzaton among chldren 12-23 months s between 8% and 9%. The doman sample szes often need to be balanced between domans due to budget constrants. In practce t s often the case that the total sample sze s fxed accordng to fundng avalable and mplementaton capacty, and then the sample s allocated to each doman and to each stratum wthn the doman. In the case of very tght budget constrants, we may equally allocate the total sample to the domans. In some cases, we may want to oversample a specfc doman to conduct some n-depth analyss for a certan rare phenomenon. The method (and the tables) presented n the followng secton may be used to allocate the sample at the doman level because the domans are usually frst-level strata. Regardless of the method used for allocaton, the calculaton of doman sample sze can gve us an dea about the precson we may acheve n each doman wth a gven sample sze. 1.7 Sample allocaton In cases where the total sample sze or doman sample sze has been fxed, we need to approprately allocate the sample to dfferent domans (or dfferent strata wthn a doman). Ths allocaton s amed at strengthenng the samplng effcency at the natonal level or doman level and reducng samplng errors. Assumng a constant cost across domans/strata, the optmum allocaton of the sample depends on the sze of the doman/stratum and the varablty of the ndcator to be estmated S xh n N For a gven total sample sze n the optmum allocaton for varable x s gven by: h h n = n H h N h= 1 S h N S h xh xh S xh The optmum allocaton s only optmal for the ndcator on whch the allocaton s based; that allocaton may not be approprate for other ndcators. For a multpurpose survey, f the domans/strata are not too dfferent n sze, a safe allocaton that s good for all ndcators s a proportonal allocaton, wth sample sze proportonal to the doman/stratum sze. n = n N h h = H h=1 N h Nh n N 12
Ths allocaton ntroduces a constant samplng fracton across doman/strata wth: f h = n h N h = n N Because DHS surveys are multpurpose surveys, a proportonal allocaton of sample s recommended f the domans/strata are not too dfferent n sze. However, f the domans/strata szes are very dfferent, the smaller domans/strata may receve a very small sample sze. If a desred precson s requred at doman/stratum level, by assumng equal relatve varatons across strata, a power allocaton (Banker, 1988) wth an approprate power value ) may be used to guarantee suffcent sample sze n small domans/strata. α ( 0 α 1 n h = n H M h= 1 α h M α h A power allocaton s an allocaton proportonal to the power of a sze measure M. A power value of 1 gves proportonal allocaton; a power value of 0 gves equal sze allocaton; a power value between 0 and 1 gves an allocaton between proportonal allocaton and equal sze allocaton. Proportonal allocaton s good for natonal level ndcators, but may not meet the precson request at doman level; whle an equal sze allocaton s good for comparson across domans, but may affect the precson at natonal level. A power allocaton wth power values between 0 and 1 s a tradeoff between the natonal level precson and the doman level precson. Snce the sample sze s usually large at the natonal level, the natonal level precson s not a concern. In Table 1.3 below, we gve an example of a proportonal sample allocaton of 15,000 ndvduals to 11 domans and to ther urban-rural areas. The mnmum doman sample sze s 384 for doman 2, whch s too small for estmatng the total fertlty rate (TFR) and chldhood mortalty rates. The largest sample sze s for doman 11 whch may be unnecessarly large. The actual total sample sze gven n the total row may be slghtly dfferent from the desred sample sze because of roundng. 13
Table 1.3 Sample allocaton: Proportonal allocaton Total sample sze => 15000 Power value doman=> Power value urban=> Seral Doman/ Sample Allocaton Specfc Allocaton Doman/Stratum Proporton Num stratum Name/ID urban Urban Rural Doman Urban Rural sze 1 Doman 1 0.072 0.352 382 701 1083 2 Doman 2 0.026 0.317 122 262 384 3 Doman 3 0.070 0.568 597 454 1051 4 Doman 4 0.142 0.275 586 1544 2130 5 Doman 5 0.060 0.323 292 611 903 6 Doman 6 0.046 0.135 92 593 685 7 Doman 7 0.048 0.194 141 586 727 8 Doman 8 0.094 0.251 354 1055 1409 9 Doman 9 0.164 0.288 709 1749 2458 10 Doman 10 0.091 0.191 262 1104 1366 11 Doman 11 0.187 1.000 2803 0 2803 Total 1.000 0.423 6339 8660 14999 If we mpose a condton such that the sample sze should not be smaller than 1000 n each doman, after tryng varous power values, we fnd that a power value of 0.25 s approprate, as shown n Table 1.4. In ths case, we would have a mnmum sample sze of 1,022 for doman 2. Snce doman 11 has only urban areas, the power allocaton among the domans brought down the urban percentage n the sample. In order for urban areas to be properly represented, over samplng s appled n the urban areas of the other domans. Wth a power value of 0.65, the urban proporton n the sample s close to the proporton of the target populaton. Table 1.4 Sample allocaton: Power allocaton Total sample sze => 15000 Power value doman=> 0.25 Power value urban=> 0.65 Seral Doman/ Sample Allocaton Specfc Allocaton Doman/Stratum Proporton Num stratum Name/ID urban Urban Rural Doman Urban Rural sze 1 Doman 1 0.072 0.352 533 791 1324 2 Doman 2 0.026 0.317 386 636 1022 3 Doman 3 0.070 0.568 716 599 1315 4 Doman 4 0.142 0.275 546 1023 1569 5 Doman 5 0.060 0.323 484 782 1266 6 Doman 6 0.046 0.135 271 910 1181 7 Doman 7 0.048 0.194 341 858 1199 8 Doman 8 0.094 0.251 466 949 1415 9 Doman 9 0.164 0.288 581 1045 1626 10 Doman 10 0.091 0.191 395 1009 1404 11 Doman 11 0.187 1.000 1680 0 1680 Total 1.000 0.423 6399 8602 15001 In Table 1.4, the small domans are oversampled compared wth a proportonal allocaton. Oversamplng some small domans s frequently practced f doman level precson s requred. 14
However, oversamplng a small doman too much wll harm the precson at natonal level. To prevent ths, t s recommended to regroup the small domans to form domans of moderate sze, especally when there s a very unequal populaton dstrbuton among geographc domans, however, ths s sometmes not possble due to poltcal consderatons. The above dscusson also apples to sample sze allocaton to strata wthn a doman where the doman sample sze s fxed. A proportonal allocaton wth sample sze proportonal to stratum sze s good for all ndcators and provdes the best precson for the doman as a whole. 1.8 Two-stage cluster samplng procedure The MEASURE DHS program utlzes a convenent and practcal sample selecton procedure for household based surveys developed on the bass of experence from past surveys a two-stage cluster samplng procedure. A cluster s a group of adjacent households whch serves as the PSU for feld work effcency. Intervewng a certan number of households n the same cluster can reduce greatly the amount of travel and tme needed durng data collecton. In most cases, a cluster s an EA wth a measure of sze equal to the number of households or the populaton n the EA, provded by the populaton census. At the frst stage, a stratfed sample of EAs s selected wth probablty proportonal to sze (PPS): n each stratum, a sample of a predetermned number of EAs s selected ndependently wth probablty proportonal to the EA s measure of sze. In the selected EAs, a lstng procedure s performed such that all dwellngs/households are lsted. Ths procedure s mportant for correctng errors exstng n the samplng frame, and t provdes a samplng frame for household selecton. At the second stage, after a complete household lstng s conducted n each of the selected EAs, a fxed (or varable) number of households s selected by equal probablty systematc samplng n the selected EAs. In each selected household, a household questonnare s completed to dentfy women age 15-49, men age 15-59 (15-54 or 15-49 n some surveys) and chldren under age fve. Every elgble woman wll be ntervewed wth an ndvdual questonnare, and every elgble man wll be ntervewed wth an ndvdual men s questonnare n those households selected for the men s ntervew. The advantages of ths two-stage cluster samplng procedure can be summarzed as follows: 1) It guarantees a representatve sample of the target populaton when a lst of all target ndvduals s not avalable whch prohbts a drect samplng of target ndvduals; 2) A household lstng procedure after the selecton of the frst stage and before the man survey provdes a samplng frame for household selecton n the central offce; 3) The use of resdental households as the second-stage samplng unt guarantees the best coverage of the target populaton; and 4) It reduces unnecessary samplng errors by avodng more than two stages of selecton (whch usually uses a large PSU n the frst stage of selecton). See more detals n Sectons 1.10 and 1.11 on household lstng and selecton, Chapter 2 on household lstng, and Sectons 3.2 and 3.3 of Chapter 3 on systematc samplng and samplng wth probablty proportonal to sze (PPS). 15
1.9 Sample take per cluster Once the total sample sze s determned and allocated to dfferent survey domans/strata, t should be decded how many ndvduals (sample take) should be ntervewed per sample cluster and then convert the doman/stratum sample sze to number of clusters. Snce the survey cost can be very dfferent across the survey domans/strata, the sample take can have a bg nfluence on the total survey budget. Wth a fxed sample sze, a small sample take s good for survey precson because of the reducton of the desgn effect, but s expensve because more clusters are needed. The number of clusters affects the survey budget more than the overall sample sze due to the travel between clusters durng data collecton, whch represents an mportant part of feld costs n rural areas. The MEASURE DHS program proposes a sample take of about 25-30 women per rural cluster. In urban areas, the cost advantage of a large take s generally smaller, and MEASURE DHS recommends a take of about 20-25 women per urban cluster. Snce n most DHS surveys, the number of elgble women age 15-49 s very close to one per household, the sample take of ndvduals s equvalent to the sample take of households; therefore, n the followng sectons we refer to the sample take (or cluster take) as the number of sample households per cluster. 1.9.1 Optmum sample take The optmum number of households to be selected per cluster depends on the varable under consderaton, the ntracluster correlaton ρ, and the survey cost rato c 1 / c2, where c 1 represents the cost per cluster ncludng manly the cost assocated wth travellng between the clusters for survey mplementaton (household lstng and ntervew); whle c 2 represents the cost per ndvdual ntervew (the ntervewng cost) and other costs of dong feldwork wthn a cluster. A larger sample take per cluster and fewer clusters reduces survey feld costs f the cost rato s hgh, but t could also reduce the survey precson f the ntracluster correlaton s strong. The MEASURE DHS Program has accumulated nformaton on samplng errors for selected varables for many surveys throughout the world. Usng ths nformaton, Alaga and Ren (2006) conducted a research study to determne the optmum sample take per cluster. The results of the study have nformed current practce n DHS surveys. If the average cluster sze s around 250 households, a sample take of 20-30 households per cluster s wthn the acceptable range n most surveys. The research also supports the practce of settng a larger sample take n rural clusters than n urban clusters. Usually, the cost rato n urban areas s smaller than that n rural areas. Ths would lead to a smaller sample take n an urban cluster than n a rural cluster. In sum, ths research ndcates that for the most mportant survey ndcators, a sample take between 20 to 25 households s approprate n urban clusters and a sample take between 25 to 30 households s approprate n rural clusters. Based on values of c 1 / c2 and ρ obtaned from eght surveys, Table 1.5 below shows optmal sample takes for the ndcator proporton of currently marred women 15-49 currently usng any contraceptve method. Ths ndcator has a moderate ntracluster correlaton relatve to other mportant survey ndcators. 16
Table 1.5 Optmal sample take for currently marred women 15-49 currently usng any contraceptve method based on ntracluster correlaton ρ and survey cost rato c 1 / c2 from Country past surveys Survey cost rato c 1 / c 2 Intracluster correlaton ρ Optmal sample take Country 1 10 0.025 20 Country 2 10 0.037 16 Country 3 12 0.067 13 Country 4 12 0.052 15 Country 5 15 0.084 13 Country 6 27 0.031 29 Country 7 48 0.058 28 Country 8 52 0.023 47 Average 23 0.047 23 1.9.2 Varable sample take for self-weghtng A fxed sample take per cluster s easy for survey management and mplementaton, but t requres samplng weghts that vary wthn a stratum. Dfferent samplng weghts result n larger samplng errors compared wth a smlar sample of constant weght wthn a samplng stratum,.e., a self-weghtng sample. A self-weghtng sample conssts of a sample of ndvduals n whch each ndvdual has the same probablty of beng selected, and therefore a constant samplng weght s used. In some cases a self-weghtng sample s preferred for varous reasons: t s equally representatve for every ndvdual of the target populaton; t reduces samplng errors. Snce the sample for DHS surveys s usually the result of a two-stage cluster samplng desgn, t s necessary to coordnate the sample take for each of the selected clusters. In an overall selfweghtng sample, every ndvdual n the target populaton has an equal probablty of selecton, whch results n a proportonal allocaton. However, proportonal allocaton s not feasble when samplng domans are very dfferent n sze. Self-weghtng at doman/stratum level, by contrast, s easy to acheve. Let n be the total number of clusters selected for a DHS survey, let clusters allocated to the h th stratum; let n h be the number of X h be the total number of households n the stratum h, let x hk be the number of households n cluster k of stratum h, gven by the samplng frame; then the selecton probablty of cluster k n stratum h s gven by: π hk = n h X x h hk * Let x hk be the number of households lsted n the cluster n the household lstng operaton, let mh be the number of households to be selected from the cluster for a fxed sample take, then the overall selecton probablty of a household n the cluster s gven by: 17
If x = x * hk hk n stratum h by a constant sample take m n x h h hk f hk hk = * x X hk h = π exactly for all k n stratum h, then t s easy to see that self-weghtng s acheved In practce, t s not possble that m x m h n all clusters snce x = x * hk hk h * hk n m h h f h = s a constant n stratum h. X h for all h and k, especally when the last populaton census s no longer new. Therefore there s a need for sample coordnaton n order to acheve selfweghtng. Let f h and mh be the calculated samplng fracton and average sample take n stratum h fh X h accordng to the sample allocaton wth m h = ; the number of households needed to acheve selfweghtng n cluster k of stratum h s gven nh by m hk = f h n X h h x x * hk hk = m h x x * hk hk whch s a functon of the rato of the number of households lsted over the number of households gven n the samplng frame for every cluster: take more f more are lsted or take fewer f fewer are lsted. The above formula also shows that the samplng fracton s not a necessary parameter for sample take calculaton. Usng the desgned average sample take s a more drect method because the samplng fracton s an abstract number. Ths formula s used n the self-weghtng household selecton templates presented n Chapter 3, Secton 3.2. The relatonshp between the sample take and the cluster selecton probablty s gven by m hk = f h π x * hk hk For practcal consderatons, the sample take calculated above needs to be adjusted f s t too small or too large. Usually, we apply a cut-off to control the sample take wthn the range of a mnmum of 10 households and a maxmum of 50 households per cluster. For the clusters where the cut-off s appled, the sample s no longer self-weghtng. Advantages: Dsadvantages: The advantages and dsadvantages of a self-weghtng sample can be summarzed as: 1) Equally representatve for every ndvdual wthn a samplng stratum. 2) Reduced samplng errors. 1) Dffcult for survey management (for example, to dstrbute the work-load) because of the varant sample take by cluster. 2) Dffcult to control the expected sample sze because of possble cut-offs, especally when the upper lmt cut-offs are employed. 3) The self-weghtng s not exact because of the roundng of the sample takes and ths wll brng bas n the survey estmaton. 18
4) Self-weghtng at the natonal level wll break down the specfc sample allocaton at the doman/stratum level and brng the sample allocaton back to a proportonal allocaton. It s possble to overcome the second and the thrd dsadvantages through a recursve calculaton of sample take by re-dstrbutng the cut-offs to the rest of the clusters n the stratum or control area, and by usng a randomzed sample take whch allows non-nteger numbers as sample sze. Excel templates for both the tradtonal procedure and revsed procedure are avalable. 1.10 Household lstng The household lstng operaton s a fundamental operaton n DHS surveys. After the EAs are selected for the survey, a complete lstng of dwellng unts/households n the selected EAs s conducted pror to the selecton of households. The lstng operaton conssts of vstng each of the selected clusters, collectng geographc coordnates of the cluster, drawng a locaton map of the cluster as well as a sketch map of the structures n the cluster, recordng on lstng forms a descrpton of every structure together wth the names of the heads of the households n the structures and other characterstcs. Mappng and lstng of households represents a sgnfcant feld cost, but t s essental to guarantee the exactness of sample mplementaton. The lstng operaton s an mportant procedure for reducng non-samplng errors n the survey, especally when the samplng frame s outdated. The lstng operaton provdes a complete lst of occuped resdental households n the EA. Ths nformaton s necessary for an equal probablty random selecton of households n the second stage. Wth the household lstng pror to the man survey, t s possble to pre-select the sample households n advance and the ntervewers are asked to ntervew only the pre-selected households wthout replacement of non-respondng households. Wth the sketch map and the household lstng of the cluster produced n the household lstng operaton, the sampled households can be easly relocated by ntervewers later. The feldwork procedure for DHS surveys s desgned to be replcable and therefore allows easy supervson; all these elements are desgned to prevent serous bas durng data collecton. It s sometmes suggested that lstng could be avoded by makng segments so small that they are equal to the requred sample take per cluster. One could then use a take-all rule at the last stage of samplng. Such small segments, however, wll generally be dffcult to delneate. In planned urban areas, ths dffculty may be reduced one could adopt blocks, or even sngle buldngs, as segments but urban unts of ths knd are lkely to be homogeneous, contanng smlar households, and therefore less than deal as samplng clusters. It s also not acceptable to attempt to avod lstng altogether by havng ntervewers create clusters as they go along, or by selectng the sample households at fxed ntervals durng a random walk up to a predetermned quota. Such methods are not acceptable because frst, they do not guarantee a nonzero probablty to every potental respondent; second, the procedure s not replcable, whch complcates the feld work supervson; and thrd, t can end up wth a sample of easy unts because of the lack of effort to make call backs to households or ndvduals who were not avalable at the frst attempt to ntervew. Lstng costs can be reduced by usng segmentaton to decrease the sze of the area whch has to be lsted; however, segmentaton generates ts own costs, and skll n map makng and map nterpretaton s requred. Segmentaton becomes progressvely more dffcult as segments become smaller because there are not enough natural boundares to delneate very small segments. Moreover, concentraton of the sample nto smaller segments ncreases the samplng error. Snce neghbors characterstcs are correlated, a smaller segment captures less of the varety exstng n the populaton; ths leads to less effcent samplng. There s a pont beyond whch t s not useful to attempt further segmentaton. As a general rule the average segment sze should not be less than 500 19
n populaton (approxmately 100 households) n both urban and rural areas. However, segmentaton has less economcal effect n urban areas because the urban EAs are n general small geographc areas. It s qute probable that some tradtonal tools n the household lstng process wll be modfed n the future by usng more sophstcated technology such as the geographc postonng systems (GPS) n order to collect more precse locaton nformaton for the selected EAs. Wth ths new tool we can produce more precse dstrbuton maps of the structures wth less supervson than n the tradtonal approach. The man feature s that every selected EA and every selected structure/dwellng can be located wth hgh precson and thus relocated later, f desrable. In addton, GPS nformaton s used more and more n DHS data analyss and presentaton. At present, though, the recommended protocol for collectng GIS nformaton n DHS surveys s to collect one coordnate for every selected cluster. See Chapter 2 for more detals of the household lstng operaton. 1.11 Household selecton n the central offce After the household lstng operaton, once the central offce receves the completed lstng materals for a cluster, they must frst create a seral number for each of the occuped resdental households, begnnng wth 1 and contnung to the total number of occuped resdental households lsted n the cluster. An occuped resdental household desgnates those households occuped at the tme of the lstng, even f the occupant refused to cooperate at the tme of lstng, and those households where the occupants were absent at the tme of lstng but neghbors confrmed that they would not be absent for a long perod and would be at home durng the perod of the man survey. Only occuped resdental households should be numbered. Ths seral number s an ID number for the households. The household selecton procedure wll be performed based on ths seral number. Whether or not a household s consdered occuped at the tme of the lstng s very mportant because ths fact wll be related to the proporton of vacant households n the man survey. The MEASURE DHS program has used several methods 3 clusters ncludng: for selectng households wthn 1) Systematc selecton: From a random startng pont select every nth household (see Chapter 3 Secton 3.2 for more detals). 2) Systematc selecton wth runs: From a random startng pont, select a group of sequental households called a run. Several runs may be used wthn a cluster. Runs are selected wth systematc selecton. Selectng households n runs can greatly reduce the amount of travel wthn cluster durng data collecton, especally n rural clusters where households can be far apart. The advantages of household selecton n the central offce can be summarzed as: 1) It allows for a check of coverage of the household lstng results before the man survey and for the revew and possble relstng of problematc clusters n advance. 2) Sampled households are pre-determned whch prevents potental bas ntroduced by allowng the ntervewers to select n the feld whch households are to be ntervewed. 3 The MEASURE DHS program has developed varous Excel templates for household selecton n the central offce: systematc selecton, systematc selecton wth runs, self-weghtng selecton wth and wthout control of sample sze and wth or wthout runs. Once the household lstng s completed, t s possble to just copy the number of households lsted n a cluster nto the spreadsheet and the spreadsheet wll show the selected household numbers automatcally. See Chapter 3 Secton 3.2.2 for detals. 20
3) The feld work procedure s exactly replcable whch provdes the possblty of easy and close supervson of the feld work. 4) It s easer to control the work load for each ntervewng team. However, n cases when travellng between clusters represents a substantal cost, t s possble to forego the step of selectng households n the central offce. In such cases, the household lstng operaton and the man survey can be combned nto a sngle feld operaton. No essental changes are needed n the household lstng procedure or household numberng, but makng a detaled sketch map for the cluster may not be necessary because the lstng team and the ntervewng team are the same, and the household ntervew wll begn mmedately after the lstng, so dentfyng the exact selected households durng a separate vst s no longer a problem. The household selecton must be done n the feld manually f portable computers are not avalable. Some manual selecton procedures have been developed for ths purpose. Household lstng and ntervewng are two very dfferent jobs, so n surveys where lstng, selecton and ntervewng takes place n the same vst by the same staff, t may be necessary to conduct more extensve tranng of feld teams before the feld work begns and to supervse the teams more closely durng the feldwork. See Chapter 3 Secton 3.2.2 for more detals for manual household selecton. 1.12 Household ntervews The household ntervew procedure s out of the scope of ths manual snce t s explaned n detal n the ntervewer s manual. Ths secton wll brefly dscuss the man statstcal ponts of the household ntervew. After the household selecton, ntervewers wll be recruted and traned for the household and ndvdual ntervews. The tranng of the ntervewer s an ntensve tranng lastng at least four weeks for a standard DHS survey, and longer f the survey ncludes many bomarkers. Pror to the tranng, a pretest of the questonnare wll be conducted n a small number of clusters not selected for the man survey to assess the qualty of the questonnares and the understandng of the translatons by ntervewers and respondents. Problems and potental errors observed n the pretest wll be addressed and resolved pror to feldwork tranng. Fnally, the ntervewng team wll be sent to selected clusters wth a certan work load per team. Once tranng s complete, teams of ntervewers wll be assgned a lst of clusters and deployed to the feld. Upon arrval n a new area, the ntervewer team must frst contact the local authortes for help to dentfy the correct cluster and to solct cooperaton durng the feld work. A team leader or supervsor s assgned for each ntervewng team. The supervsor s responsble for cluster dentfcaton and should guarantee that the correct cluster wll be ntervewed. After checkng the lstng materals and verfyng wth the local authortes, the supervsor wll dstrbute the sampled households among the ntervewers. After locatng a selected household, the ntervewer wll begn wth a bref household ntervew, lstng household members and vstors, and dentfyng among them all elgble women and men for the ndvdual ntervew. Elgble ndvduals are defned as those who are n the specfed age group (15-49), and are ether usual members of the selected household or who slept n the household the nght before the ntervewer s vst. Conscous omsson of elgble ndvduals on the part of an ntervewer by ms-reportng ther age outsde of the elgble age group s a real concern. Measures to elmnate ths problem should be undertaken. For example, the feld edtor should check the consstency of each completed questonnare and, f suspcous thngs are dentfed, should return to the household for further verfcaton of key tems such as the number of household members, number of elgble ndvduals and number of chldren under age fve. In the event of falure to contact a household or an elgble person n the frst vst, the ntervewer s requred to make at least two repeat vsts, or call backs, on dfferent days and at 21
dfferent tmes of the day before the ntervew s abandoned. The process of makng call backs requres the teams to stay n a cluster for at least two to three days. Some countres propose large ntervewng teams n order to try to cover an entre cluster n one day. Ths process s not acceptable for a DHS survey, even when the desgned sample sze can bear a large non-response rate, because non-response bases the survey results. A quck survey usually ends up wth poor data qualty. Both theory and practce prove that call backs and efforts to get dffcult unts to respond to the survey are the best way to remove bas and reduce the non-samplng errors to a mnmum. For more detals, refer to the DHS Survey Organzaton Manual and the Intervewer s Manual. 1.13 Samplng weght calculaton 1.13.1 Why we need to weght the survey data A DHS sample s a representatve sample randomly selected from the target populaton. Each ntervewed unt (household and ndvdual) represents a certan number of smlar unts n the target populaton. In order for any statstcal nferences drawn from the survey data to be vald, ths representatveness of the sample must be taken nto account. In general terms, samplng weghts are used to make the sample more lke the target populaton. All analyses should use the samplng weghts calculated for each ntervewed household and for each ntervewed ndvdual. A samplng weght s an nflaton factor whch extrapolates the sample to the target populaton. For example, f equal probablty samplng (or a self-weghtng sample) s appled n a doman wth a samplng fracton 1/500, ths means that each sampled ndvdual represents 500 smlar ndvduals n the target populaton. Therefore, f we observed one partcular ndvdual havng secondary educaton, we would conclude that there are 500 ndvduals n the target populaton havng secondary educaton, correspondng to ths partcular ndvdual. The total number of ndvduals wth secondary educaton n the target populaton would be 500 tmes the total number of ntervewed ndvduals havng secondary educaton observed n the sample. Ths explanaton also apples to unequal probablty samplng. It s very mportant that samplng weghts are properly calculated and appled n data analyss; otherwse, serous bas may be ntroduced, leadng to ncorrect conclusons. Although all of the DHS ndcators are means, proportons, rates or ratos, snce a natonwde self-weghtng sample s not usually feasble due to study domans as explaned n Secton 1.9, samplng weghts are always necessary. Even when a survey s desgned to be natonally selfweghtng, t s necessary to correct for the dfferent response patterns across domans/strata (see Secton 1.13.4 for more detals). Therefore, even surveys wth self-weghtng sample desgns requre the use of samplng weghts. Though the effect of samplng weghts on survey ndcators may be small, t s necessary to use samplng weghts for the followng reasons: 1) For vald statstcal nference. 2) For correctng or reducng bas; weghtng can reduce bas ntroduced by non-response or other non-samplng errors. 3) For keepng the weghted sample dstrbuton close to the target populaton dstrbuton, especally when oversamplng s appled n certan domans/strata. 1.13.2 Desgn weghts and samplng weghts The MEASURE DHS program calculates both desgn weghts and samplng weghts (or survey weghts) for both households and ndvduals. The desgn weght of a samplng unt (household or 22
ndvdual) s the nverse of the overall probablty wth whch the unt was selected n the sample. The samplng weght of a samplng unt s the desgn weght corrected for non-response or other calbratons. Snce s the DHS protocol nvolves no selecton of elgble ndvduals wthn a sampled household (except for the domestc volence module, n whch one elgble woman s selected from a sampled household), all elgble ndvduals from the same household share the same desgn weght, whch s the same as the household s desgn weght. Therefore, the desgn weght s the basc weght for DHS surveys. All other weghts are calculated based on the desgn weght. In calculatng the samplng weght, t s possble to correct for both unt non-response (a samplng unt s not ntervewed at all) and tem non-response (the samplng unt does not provde answer for a specfc queston). The polcy of the MEASURE DHS program s to correct for unt non-response at the stratum level (see Secton 1.13.4) and leave the correcton of tem non-response to data users because t s varable specfc. Correcton of unt non-response at cluster level wll ncrease the varablty of samplng weghts and therefore ncrease samplng errors. Because the correcton for unt nonresponse s the same for an entre cluster and because household selecton wthn a cluster s an equal probablty selecton, all the households n the same cluster share the same desgn weght and samplng weght, and the same s true for all ndvduals n the same cluster. Ths means that the DHS weghts (both desgn weghts and samplng weghts) are cluster weghts. 1.13.3 How to calculate the desgn weghts Assumng that a DHS survey sample s drawn wth two-stage, stratfed cluster samplng, desgn weghts wll be calculated based on the separate samplng probabltes for each samplng stage and for each cluster. We use the followng notatons: P 1h : P 2h : frst-stage samplng probablty of the th cluster n stratum h second-stage samplng probablty wthn the th cluster (household selecton) Let n h be the number of clusters selected n stratum h; let M h be the measure of sze of the cluster used n the frst stage s selecton, usually the measure of sze s the number of households resdng n the cluster accordng to the samplng frame; let be the total measure of sze n the stratum h. The probablty of selectng the th cluster n the sample s calculated as follows: Mh P 1h nh = M M h h Let b h be the proporton of households n the selected cluster compared to the total number of households n EA n stratum h f the EA s segmented, otherwseb = 1. Then the probablty of selectng cluster n the sample s: n M M h h P 1h = h Let L h be the number of households lsted n the household lstng operaton n cluster n stratum h; let t h be the number of households selected n the cluster. The second stage selecton probablty for each household n the cluster s calculated as follows: t P 2 h = L h h b h h 23
The overall selecton probablty of each household n cluster of stratum h s therefore the product of the selecton probabltes of the two stages: Ph = P1 h P2 h The desgn weght for each household n cluster of stratum h s the nverse of ts overall selecton probablty: d = 1/ h P h The calculaton of the desgn weght s not complcated; however, dffcultes often result from not havng of all the desgn parameters nvolved n the above calculaton because they are not well documented, especally when the samplng frame s a master sample. See Chapter 5 for more detals on sample documentaton. 1.13.4 Correcton of unt non-response and calculaton of samplng weghts The desgn weght calculated above s based on sample desgn parameters. If there s no nonresponse at the cluster level, at the household level, or at the ndvdual level, the desgn weght s enough for all analyses, for both household ndcators and ndvdual ndcators. However, nonresponse s nevtable n all surveys, and dfferent unts have dfferent response behavors. The experence of the MEASURE DHS program shows that urban households are less lkely to respond to the survey than ther counterparts n rural areas, households n developed regons are less lkely to respond to the survey than ther counterparts n less-developed regons, rch households are less lkely to respond to the survey than poor households, ndvduals wth hgher levels of educaton are less lkely to respond to the survey than those wth lower levels of educaton, men are less lkely to respond to the survey than women, and so forth. The dea of correctng for unt non-response s to calculate a response rate for each homogeneous response group, then nflate the desgn weght by dvdng t by the response rate for each response group. The constructon of homogeneous response groups depends on the knowledge of the response behavor of the samplng unts. DHS surveys always use the samplng stratum as the response group because the stratfcaton s usually acheved by regroupng homogeneous samplng unts n a sngle stratum. It s possble to use a cluster as a response group, but the dsadvantage s that the response rates may vary too much at the cluster level, whch wll ncrease the varablty of the samplng weght; whch n turn ncreases the samplng varance. Furthermore, correcton of nonresponse at the cluster level wll nterfere wth self-weghtng f a self-weghtng sample has been desgned. By assumng that the response groups concde wth the samplng strata, the followng steps explan how to calculate the samplng weght by frst calculatng the varous response rates for unt non-response. Please note that the response rates calculated here are dfferent from the response rates calculated n Appendx A of DHS survey fnal reports. In Appendx A, household and ndvdual response rates are calculated as ratos of the number of ntervewed unts over the number of elgble unts because the am s just to show the results of survey mplementaton. Here we use weghted ratos because the am s to correct the desgn weght to compensate for non-response, therefore the desgn weght should be nvolved. Because a non-respondng unt wth a large samplng weght wll have a larger mpact on survey estmates than a non-respondng unt wth a small desgn weght, a weghted response rate for correcton of non-response s better than an un-weghted response rate. 24
1. Cluster level response rate Let n h be the number of clusters selected n stratum h; let ntervewed. The cluster level response rate n stratum h s therefore n * h be the number of clusters 2. Household level response rate Let mh cluster of stratum h; let R = n / n ch * h h be the number of households found (see Chapter 2, Secton 2.10 for defnton) n response rate n stratum h s calculated by where dh stratum h. * mh be the number of households ntervewed n the cluster. The household R / * hh = dhmh dhmh s the desgn weght of cluster n stratum h; the summaton s over all clusters n the 3. Indvdual response rate Let kh be the number of elgble ndvduals found n cluster of stratum h; let k * h be the number of ndvduals ntervewed. The ndvdual response rate n stratum h s calculated as R / = * ph d hkh d hk h where dh s the desgn weght of cluster n stratum h; the summaton s over all clusters n the stratum h. The household samplng weght of cluster n stratum h s calculated by dvdng the household desgn weght by the product of the cluster response rate and the household response rate, for each of the samplng stratum: D = d /( R Rhh), for cluster of stratum h. h h ch The ndvdual samplng weght of cluster n stratum h s calculated by dvdng the household samplng weght by the ndvdual response rate, or equvalently, by dvdng the household desgn weght by the product of the cluster response rate, the household response rate and the ndvdual response rate, for each of the samplng strata: W = D R = d /( R R R ), for cluster of stratum h. h h / ph h ch hh ph It s easy to see that the dfference between the household samplng weghts and the ndvdual samplng weghts s ntroduced by ndvdual non-response. The samplng weghts for households selected for the men s survey and for men can be calculated smlarly. We need a separate household samplng weght for the men s survey n cases where the men s survey s conducted n a sub-sample of households selected for the women s survey, and we suppose that the response behavor of households n the men s survey sub-sample may be dfferent from the overall household response rate. If no normalzaton s requested, we can stop here. The above calculated household samplng weght and ndvdual samplng weght can be used to produce any ndcators at the household level 25
and the ndvdual level, respectvely. As we mentoned earler n Secton 1.13.1, a samplng weght s an nflaton or extrapolaton factor. The weghted sum of households ntervewed T = * D h m h s an unbased estmate of the total number of ordnary resdental households of the country; where * mh s the number of households ntervewed n the th cluster of stratum h, and the summaton s over all clusters and strata n the total sample. Smlarly, the weghted sum of all ntervewed women W = * W h k h s an unbased estmate of the total women n the target populaton (women age 15-49) of the country; where k * h s the number of women ntervewed n the th cluster of stratum h, and the summaton s over all clusters and strata n the total sample. 1.13.5 Normalzaton of samplng weghts Normalzaton of samplng weghts s not necessary for survey data analyss. In order to prevent large numbers for the number of weghted cases n the tables n DHS survey fnal reports, t s the MEASURE DHS tradton to calculate normalzed standard weghts for both households and ndvduals. Wth the normalzed standard weght, the number of unweghted cases concdes wth the number of weghted cases at the natonal level for both total households and total ndvduals. The normalzed standard weght of a samplng unt s calculated based on ts samplng weght, by multplyng the samplng weght wth a unque constant at the natonal level. The constant or the normalzaton factor s the total number of completed cases dvded by the total number of weghted cases (based on the samplng weght). Ths number s equal to the estmated total samplng fracton because the total number of weghted cases wth the samplng weght s an estmaton of the total target populaton. Therefore the standard weghts n the DHS data fles are relatve weghts. Relatve weghts can be used to estmate means, proportons, rates and ratos because the normalzaton factor s cancelled out when used n both numerator and denomnator, so t has no effect on the calculated ndcator values. Ths pont also explans why the normalzaton must be done at the natonal level and not the regonal level: at the regonal level, the normalzaton factor cannot be cancelled out, and bas wll be ntroduced n the calculated ndcator values. Because the normalzed standard weghts have no scale, they are not vald for estmatng totals. Also the normalzed weght s not vald for pooled data, even for data pooled for women and men n the same survey, because the normalzaton factor s country and sex specfc. 1. Normalzed household standard weght 4 The normalzaton factor for calculatng household standard weght s calculated as FH = m D m * * h / h h The household standard weght for cluster n stratum h s calculated by * HV 005 = = * h Dh FH Dh mh / Dhmh 4 The MEASURE DHS program has developed Excel templates for facltatng standard weght calculatons. If all desgn parameters and the survey results (number of households found and ntervewed, number of elgble women found and ntervewed, number of elgble men found and ntervewed, number of elgble women and men found and tested, by cluster) are provded n the nput page, the standard weghts wll be calculated automatcally n dfferent pages. 26
where HV005 s the household standard weght varable n the DHS Recode data fles. It s easy to see that the weghted sum of households ntervewed by usng the standard weght equals the unweghted sum of households ntervewed for the total sample. Ths condton wll not be met at the doman level or for sub-populatons. At the doman level, the weghted sum of households ntervewed may be larger or smaller than the unweghted sum of households ntervewed, dependng on whether the doman s undersampled or oversampled. 2. Normalzed women s standard weght The normalzaton factor for calculatng the women s standard weght s calculated as FW = k W k * * h / h h The women s standard weght for cluster n stratum h s calculated by * V 005 = = * h Wh FW Wh kh / Whk h where V005 s the women s standard weght varable n the DHS Recode data fles. The standard weghts for households selected for the men s survey and for men can be calculated n a smlar way. 1.13.6 Standard weghts for HIV testng The samplng weghts for HIV testng are calculated separately for women and men, but they are calculated usng the same methodology. The only dfference s n the calculaton of the normalzaton factors, f a normalzed weght s requested. In order to calculate the weghted HIV prevalence for women and men together usng a normalzed weght, the standard weght for HIV testng must be normalzed for women and men together. In most DHS surveys, HIV testng s conducted n the same subsample of households selected for men s survey, and every woman or man n the household who s elgble for the ndvdual ntervew s elgble for HIV testng. Once the household samplng weght for the men s survey s calculated usng the procedures stated n Secton 1.13.5, the samplng weghts for HIV testng for women and men may be calculated separately by correctng the household samplng weght for the non-response rates of women and men for HIV testng, respectvely. For smplcty, let MDh be the household samplng weght n cluster of stratum h for the men s survey sub-sample, the response rates to HIV testng for women and men are calculated respectvely by WR / MR / * h = MDhWHIVh MDhWHIVh * h = MDhMHIVh MDhMHIVh * where WHIV h s the number of women elgble for HIV testng, and WHIV h s the number of women * tested wth a vald test result, n cluster of stratum h; MHIV h and MHIV h are the number of men elgble and the number of men tested wth a vald test result, respectvely, n cluster of stratum h. The samplng weghts for HIV testng for women and men, respectvely, are calculated by HIV / W M h = MDh WRh, HIV h = MDh / MRh 27
In cluster of stratum h, the normalzed standard weghts for HIV testng for women and men, respectvely, are calculated by * * W * M * ( WHIVh + MHIVh ) /( HIVh WHIVh + HIVh MHIV ) * * W * M * ( WHIV + MHIV ) /( HIV WHIV + HIV MHIV ) W W HIV 05h = HIVh h M M HIV = HIV h h h h h 05h h h where the double summatons are over all clusters and strata n the total sample. 1.13.7 De-normalzaton of standard weghts for pooled data For all of the DHS data, the weght varables HV005 (household standard weght), V005 (women s standard weght) and MV005 (men s standard weght) are relatve weghts whch are normalzed so that the total number of weghted cases s equal to the total number of unweghted cases, for the three knds of unts. In some stuatons, such as analyses nvolvng data from more than one survey, data users may need the un-normalzed samplng weght for analyzng pooled data. As mentoned n Secton 1.13.5, snce normalzaton s country specfc and sex specfc, t s necessary to de-normalze the standard weghts provded n the DHS Recode data fles for analyzng pooled data. The normalzaton procedure conssts of multplyng the samplng weght by a normalzaton factor for the total sample. The normalzaton factor s the estmated total samplng fracton: the number of completed cases dvded by the number of weghted cases by usng the samplng weght, for each knd of samplng unt. The weghted number of cases wth samplng weght s an estmaton of the total target populaton. Therefore, n order to de-normalze a normalzed weght, smply dvde the normalzed weght by the total samplng fracton. The estmated total samplng fracton s usually not provded n the DHS data fle or n the fnal report. In order to calculate the total samplng fracton, t s necessary to know the total target populaton at the tme of the survey. The total target populaton at the tme of the survey s easy to get from varous sources. The country s statstcal offce, the Unted Natons Populaton Dvson s (UNPD) World Populaton Prospects 5, and the Unted Natons Populaton Fund (UNFPA) are three sources that may be easy to access. As mentoned above, f pooled data analyss s requred, the standard weght varables HV005, V005 and MV005 must be rescaled or de-normalzed. The de-normalzaton procedure s the nverse of the normalzaton procedure: that s, multply the standard weght by the target populaton and dvde by the number of completed cases, for each survey. The de-normalzed weghts for households, women and men (HV005*, V005*, and MV005*, respectvely) can be calculated usng the followng formulas: HV005* = HV005 (total number of resdental households n the country)/ (total number of households ntervewed n the survey) V005* = V005 (total female populaton 15-49 n the country)/ (total number of women 15-49 ntervewed n the survey) MV005* = MV005 (total male populaton 15-49 (15-59) n the country)/ (total number of men 15-49 (15-59) ntervewed n the survey) 5 http://esa.un.org/unpd/wpp/ndex.htm 28
If normalzed weghts are preferred, the above re-scaled weghts can be re-normalzed by multplyng by the total number of completed women s and men s ntervews combned, dvdng by the total number of weghted cases combned, and applyng the above re-scaled weghts to the pooled data. Note that the normalzaton of samplng weghts s done for the total sample for households, women and men separately. If the am s to tabulate ndcators for a certan sub-populaton from pooled data, for example, vaccnaton coverage for chldren 12-23 months, the de-normalzaton has nothng to do wth the total populaton of chldren 12-23 months because there s no standard weght calculated for chldren 12-23 months n DHS surveys. If the ndcator s tabulated at the household level usng the household weght, the household standard weghts must be de-normalzed for all of the surveys ncluded n the analyss as explaned above; lkewse, f the ndcator s tabulated at the ndvdual level usng the women s (or chld s mother s) weght, the women s standard weghts must be de-normalzed for each of the surveys. 1.14 Calbraton of samplng weghts n case of bas Generalzed calbraton (Devlle and Särndal, 1992; Devlle et al, 1993) has now become a popular and powerful framework n survey data analyss for statstcal offces n many countres. It allows for the utlzaton of dfferent sources of auxlary nformaton to mprove estmates from sample surveys. Calbraton can reduce samplng errors, can correct bas caused by non-response and other non-samplng errors, and can reduce the nfluence of extreme values. Calbraton s a weght tunng procedure such that the tuned samplng weght can produce estmates wthout error for known populaton characterstcs. The precson of an estmator usng a calbrated weght s equvalent to a regresson estmator but s much easer to calculate wth the help of calbraton software such as CALMAR, a SAS Macro procedure developed by the French Insttute of Statstcs and Economc Studes (INSEE), and the SPSS procedure developed by Statstcs Belgum. DHS surveys employ calbraton of samplng weghts only n cases where serous bas s observed n the collected data, and there s relable auxlary nformaton avalable for the calbraton. Let X be a multvarate auxlary varable wth p components such that the populaton totals of each of the component varables are known beforehand from the recent populaton census, that s, τ t x = X = ( t x, t,..., ) 1 x t 2 x s known. Let x P be the observatons of the auxlary varables from the survey U τ x = ( x1, x 2,..., x p ) for the respondent samplng unt. Let D be the samplng weght for unt. The calbraton procedure conssts of modfyng the samplng weght slghtly from D to such that a gven dstance measure between the samplng weghts D and the calbrated weghts s mnmzed under the constrants s g ( W, D ) W x = s t x W W where g s a dstance functon whch measures the dstance between D and W. The constrants mposed are that the known auxlary varable totals are estmated wthout error wth the calbrated weghts. If the varable of nterest s well correlated wth the auxlary varables, then we expect that the precson can be greatly mproved for estmatng the varable of nterest. The calbraton theory states that the calbrated weghts have the followng formula W = D F ( q x λ(s) ) τ 29
where F () s called the calbraton functon whch s the recprocal of the dervatve of the dstance functon g; q s a calbraton weght whch s usually set to 1 n the lack of pror knowledge; λ(s) s a τ τ constant dependng on the partcular sample s whch s to be solved. When F ( x λ( s) ) (1 + q x λ( s) =, whch corresponds to one of the fve proposed calbraton functons n Devlle et al, 1993, t s easy to solve, λ (s) s gven by λ( s) 1 = T s ( t x tˆ πx ) wth T s = s D q x x For a gven varable of nterest y, the calbrated estmator of the populaton total s equvalent to the generalzed regresson estmator where 1 s s tˆ y = W y s = tˆ πy τ ˆτ + B ( t B ˆ s = T qd xy s the sample estmaton of the regresson coeffcent; tˆ π and tˆ y π x are the smple estmators usng the samplng weght tˆ π y = Dy, tˆ π x = s s x s tˆ πx D x A mean estmaton of the varable of nterest y can be calculated by ) Ŷ = s s W y W The calbraton estmator can be equvalently formulated wth known proportons of one or more auxlary varables. The calbraton can be conducted at the ndvdual level, whch wll result n an ndvdual specfc weght, or t can be conducted at the cluster level wth aggregated data, whch wll result n a cluster weght. For more detals see the related references gven n the end of ths document. 1.15 Data qualty and samplng error reportng Data qualty s always a major concern for all MEASURE DHS projects. Though numerous efforts are made n mplementng DHS surveys to maxmze the qualty of the data collected, nonsamplng errors are always the man concerns for data qualty. Data qualty of a survey drectly affects the relablty of the statstcs produced. Many countres have laws that requre reports of survey fndngs to nclude an evaluaton of data qualty and relablty. Data qualty can be measured by total survey error ncludng bas ntroduced by varous samplng and non-samplng errors. DHS survey fnal reports usually nclude tables n an appendx for data qualty evaluaton purposes, ncludng: age dstrbutons of household populaton by sex; age dstrbutons of elgble and ntervewed women and men; completeness of reportng on date of brth, age at death, age/date at frst unon, educaton and anthropometrc measures, etc. The MEASURE DHS program also conducts some n-depth studes on data qualty for specfc topcs, whch are provded n publshed reports. Apart from the data qualty tables, DHS survey fnal reports provde samplng errors for selected ndcators n Appendx B. Samplng errors are mportant relablty measures whch tell the user the degree of error assocated wth a partcular estmated ndcator value, the number of cases nvolved n the calculaton of the ndcator, the effcency or clusterng effects of the sample desgn compared to a smple random samplng and the range for the true value of an ndcator at a certan 30
confdence level. The reader s referred to Chapter 4, Secton 4.2 for more detals on samplng errors and ther calculaton. DHS survey fnal reports also provde an appendx on the sample desgn of the survey. The sample desgn document reports the survey methodology used for the survey, ncludng the am of the survey, the target populaton, the sample sze, the reportng domans, the stratfcaton and sample allocaton, sample selecton procedure, samplng weght calculaton, correcton for non-response, calbraton of samplng weghts, and the results of survey mplementaton. See Chapter 5, Secton 5.2 for more detals on sample desgn. 1.16 Sample documentaton The task of a samplng statstcan does not end wth the selecton of the sample. The preservaton of samplng documentaton s an essental requste for samplng weght calculaton, for samplng error computaton, for data qualty evaluaton, for lnkage wth other data sources, and for varous knds of checks and supplementary studes. Specal efforts are needed at the tme of the sample desgn, at the end of the feldwork, and at the completon of the data fle f the task of sample documentaton s to be carred out effectvely. If preservaton of documentaton s delayed, consderable effort wll be requred to reconsttute the mssng nformaton when t s needed. The sample documentaton must comply wth the survey confdentalty requrements. When HIV testng s conducted n a DHS or AIS (AIDS Indcator Survey), the confdentalty gudelnes requre the complete destructon of all ntermedate documents whch can potentally be used to dentfy any sngle household or ndvdual who partcpated n the testng. Ths requrement renforces the mportance of tmely sample documentaton. See Chapter 5 for detaled requrements n sample documentaton. 1.17 Confdentalty The fnal data fles for DHS surveys are made avalable to nterested researchers. Therefore, the confdentalty of prvate nformaton collected from ndvdual respondents s a major concern, especally when senstve nformaton such as sexual actvty and HIV status are collected. Protectng the confdentalty of the ndvdual respondent s not only an ethcal oblgaton, but t also promotes more accurate data because respondents are more lkely to provde truthful responses f they feel confdent ther nformaton wll be kept prvate. DHS surveys follow strct rules mposed at varous steps durng the survey mplementaton to prevent the drect or ndrect dsclosure of the dentty of ndvdual respondents. The prncpal peces of nformaton that can ndrectly dentfy an ndvdual respondent are cluster number, household number, the cluster selecton probablty and the samplng weghts. The cluster number s an mportant dentfer for samplng error calculatons; the household number s mportant for household level and ndvdual level data management and tabulaton; the cluster selecton probablty s useful for cluster level modelng; and samplng weghts are necessary for all analyss. So these varables must be present n the fnal data fle. The household number n the fnal DHS data fle s not nformatve, and samplng weghts are not nformatve after correcton of non-response and normalzaton. The cluster selecton probablty s potentally nformatve only f lower level dentfcaton nformaton such as dstrct and localty are present, and DHS survey fnal data fles do not provde geographc nformaton below the level of regon or survey doman, especally when HIV testng s conducted. Thus the only concern s the dsclosure of the cluster. For DHS or AIS surveys wth HIV testng, the fnal data fles provde scrambled cluster and household numbers for further nsurance aganst dsclosure. 31
2 HOUSEHOLD LISTING OPERATION 2.1 Introducton DHS surveys are natonwde sample surveys desgned to provde nformaton on the levels of fertlty, nfant and chld mortalty, use of famly plannng, knowledge and atttudes toward HIV/AIDS and other sexually transmtted nfectons (STI), and on other famly welfare and health ndcators. The surveys generally ntervew women age 15-49 and men age 15-59 (15-49 or 15-54 n some surveys). The women and men to be ntervewed lve n ordnary resdental households whch are randomly selected from a set of sample ponts consstng of clusters of households. Pror to ntervewng, all households located n the selected clusters wll be lsted. The lstng of households for each cluster wll be used n selectng the fnal sample of households to be ncluded n the DHS survey. The lstng operaton conssts of vstng each cluster, recordng on lstng forms a descrpton of every structure together wth the names of the heads of the households found n the structure, and drawng a locaton map of the cluster as well as a detaled sketch map of all structures resdng n the cluster. These materals wll gude the ntervewers to fnd the pre-selected households for ntervewng and wll allow feld work supervsors to perform qualty control durng data collecton. The followng sectons present the general gudelnes for conductng a household lstng operaton. Modfcatons may be needed to adapt to country specfc stuatons. 2.2 Defnton of terms Followng are bref defntons of the terms used n ths document. A census Enumeraton Area (EA) s a geographcal statstcal unt created for a census and contanng a certan number of households. An EA s usually a cty block n urban areas and a vllage, a part of a vllage or a group of small vllages n the rural areas wth ts locaton and boundares well defned and recorded on census maps. A cluster s the smallest geographcal survey statstcal unt for DHS surveys. It conssts of a number of adjacent households n a geographcal area. For DHS surveys, a cluster corresponds ether to an EA or a segment of a large EA. EA. A base map s a reference map that descrbes the geographcal locaton and boundares of an A structure s a free-standng buldng or other constructon that can have one or more dwellng unts for resdental or commercal use. Resdental structures can have one or more dwellng unts (for example: sngle house, apartment structure). A dwellng unt s a room or a group of rooms normally ntended as a resdence for one household (for example: a sngle house, an apartment, a group of rooms n a house); a dwellng unt can also have more than one household. A household conssts of a person or a group of related or unrelated persons, who lve together n the same dwellng unt, who acknowledge one adult male or female 15 years old or older as the head of the household, who share the same housekeepng arrangements, and are consdered as one unt. In some cases one may fnd a group of people lvng together n the same house, but each person has separate eatng arrangements; they should be counted as separate one-person households. Collectve lvng arrangements such as army camps, boardng schools, or prsons wll not be consdered as households. Examples of households are: 32
a man wth hs wfe or hs wves wth or wthout chldren a man wth hs wfe or hs wves, hs chldren and hs parents a man wth hs wfe or hs wves, hs marred chldren lvng together for some socal or economc reasons (the group recognze one person as household head) a wdowed or dvorced man or woman wth or wthout chldren The head of household s the person who s acknowledged as such by members of the household and who s usually responsble for the upkeep and mantenance of the household. A locaton map s a map produced n the household lstng operaton whch ndcates the man access to a cluster, ncludng man roads and man landmarks n the cluster. Sometmes t may be useful even to nclude some mportant landmarks n the neghborng cluster. A sketch map s a map produced n household lstng operaton, wth locaton or marks of all structures found n the lstng operaton whch helps the ntervewer to relocate the selected households. A sketch map also contans the cluster dentfcaton nformaton, locaton nformaton, access nformaton, prncpal physcal features and landmarks such as mountans, rvers, roads and electrc poles. 2.3 Responsbltes of the lstng staff Persons recruted to partcpate n the household lstng operaton wll work n teams consstng of two enumerators. A coordnator wll montor the entre operaton. The responsbltes of the coordnator are to: 1) obtan base maps for all the clusters ncluded n the survey; 2) arrange for the reproducton of all lstng materals (lstng manuals, mappng and lstng forms); the map nformaton forms and the household lstng forms must be prepared n suffcent numbers to cover all of the clusters to be vsted. 3) assgn teams to clusters; 4) montor the recepton of the completed lstng forms at the central offce; and 5) verfy that the qualty of work s acceptable. If GPS coordnates are beng collected durng the lstng operaton, the coordnator must also: 6) obtan one GPS recever per lstng team, plus two backup recevers, and tag each GPS recever wth a number; 7) ensure that all GPS recevers have the correct settngs (see Secton 2.6 below) and dstrbute a recever to each feld team; 8) obtan and copy all GPS tranng materals for lstng staff; and 9) tran all lstng staff to record GPS wayponts n the GPS unts as well as on Form DHS/1. 33
The responsbltes of the enumerators are to: 1) dentfy the boundares of the cluster; 2) draw a locaton map showng the locaton of the cluster; 3) draw a detaled sketch map of the cluster showng the locatons of all structures resdng n the cluster; 4) lst all the households n the cluster n a systematc manner; 5) communcate to the coordnator problems encountered n the feld and follow hs nstructons. 6) transfer the completed lstng forms to the coordnator or to the central offce; If GPS coordnates are beng collected durng the lstng operaton, enumerators must also: 7) capture and record the GPS waypont of the center of the cluster; and 8) complete the porton of form DHS/1 desgnated for GPS nformaton for each cluster. The two enumerators n each team should work together at the same tme n the same area. They wll frst dentfy the cluster boundares together. Then one enumerator prepares the locaton and the sketch map whle the other does the household lstng. The materals needed for the household lstng operaton are: Manual for Household Lstng Base map of the area contanng the cluster Map Informaton Form (Form DHS/1) Household Lstng Form (Form DHS/2) Segmentaton form (Form DHS/3) If GPS coordnates are to be recorded durng the lstng operaton, the followng addtonal materals are needed: GPS recevers, batteres and cables GPS tranng manuals and handouts 2.4 Locatng the cluster The coordnator wll provde the lstng team wth a base map contanng the cluster assgned to the team. The lstng team wll typcally make two tours of the cluster: the frst to dentfy the cluster boundares and to create the locaton map, and the second to create the lstng and draw the sketch map. Upon arrval n a cluster, the team should frst contact the local authortes for help n dentfyng the boundares and get general nformaton on the cluster, for example, the rough number of resdental households n the cluster. In most cases, the cluster boundares follow easly recognzable natural features such as streams or rvers, and constructon features such as roads or ralroads. In some cases, the boundares may not be marked wth vsble features (especally n rural areas), attenton should be pad to locate the cluster boundares as precsely as possble accordng to the detaled descrpton of the cluster and ts base map. Before dong the lstng, the team should tour the cluster to determne an effcent route of travel for lstng all of the structures. The cluster should be dvded nto parts f possble. A part can be 34
a block of structures. The lstng team wll make a locaton map of the cluster ndcatng the boundares of the parts, as well as the relatve locaton of landmarks, publc structures (e.g., schools, relgous structures, publc offces and markets) and man roads. Ths locaton map wll serve as a gude for the ntervewng team when they begn data collecton. 2.5 Preparng locaton and sketch maps The coordnator wll desgnate one enumerator of the team as the mapper. The second enumerator wll be the lster. Although the two have separate tasks to perform, they must move together and work n close cooperaton; the mapper prepares the maps, and the lster collects nformaton on the structures (and correspondng households) ndcated on the sketch map. The mappng of the cluster and the lstng of the households should be done n a systematc manner so that there are no omssons or duplcatons. If the cluster conssts of a number of blocks, then the team should fnsh each block before gong to the next adjacent block. Wthn each block, start at one corner of the block and move clockwse around t. In rural areas where structures are frequently found n small groups, the team should work n one group of structures at a tme and n each group they can start at the centre (choosng any landmark, such as a school, to be the centre) and move around t clockwse. In the frst tour of the cluster, the mapper wll prepare a locaton map of the cluster on the Map Informaton Form (Form DHS/1). Frst, fll n the dentfcaton box for the cluster on the frst page. All nformaton needed for fllng n the dentfcaton box s provded by the coordnator. In the space provded on the second page, draw a map showng the locaton of the cluster and nclude nstructons on how to get to the cluster. Include all useful nformaton to fnd the cluster and ts boundares drectly on the map and n the space reserved for observatons f necessary. In the second tour of the cluster, usng the thrd page of the Map Informaton Form, the mapper wll draw a sketch map of all structures found n the cluster, ncludng vacant structures and structures under constructon. It s mportant that the mapper and lster work together and coordnate ther actvtes, snce the structure numbers that the mapper ndcates on the sketch map must correspond to the seral numbers assgned by the lster on the lstng form for the same structures. On the sketch map, mark the startng pont wth a large X. Place a small square at the spot where each structure n the cluster s located. For any non-resdental structure, dentfy ts use (for example, a store or factory). Number all structures n sequental order begnnng wth "1". Whenever there s a break n the numberng of structures (for example, when movng from one block to another), use an arrow to ndcate how the numbers proceed from one set of structures to another. Although t may be dffcult to pnpont the exact locaton of the structure on the map, even an approxmate locaton s useful for fndng the structure n the future. Add to the sketch map all landmarks (such as a park), publc structures (such as a school or church), and streets or roads. Sometmes t s useful to add to the sketch map landmarks that are found outsde the cluster boundares, f they are helpful n dentfyng other structures nsde the cluster. Use the marker or chalk provded to wrte on the entrance to the structure the number that has been assgned to the structure. Remember that ths s the seral number of the structure as assgned on the household lstng form, whch s the same as the number ndcated on the sketch map. In order to dstngush the number from other numbers that may exst already on the door of the structure, wrte DHS n front of the number, for example, for the structure number 5, wrte DHS/5, smlarly on the door of structure number 44 wrte DHS/44. A structure s called a mult-unt structure f t contans more than one household n the structure. Otherwse t s called a sngle-unt structure. All households found n a structure or mult- 35
unt structure must be numbered from 1 to m, wthn the structure 6. The structure number plus the household number form a unque dentfcaton number for a household, and for all of the households n the cluster. For example, household number 3 n structure number 44 would be unquely dentfed wth ID number DHS/44-3. It s very useful to wrte the household ID number at the entrance of the household to later assst the ntervewer to dentfy the household for ntervew. 2.6 Collectng a GPS waypont for each cluster A GPS waypont s a lattude and longtude readng that represents a locaton. For some surveys, GPS data for EAs are avalable from the census. However, f the data are not avalable, or are of questonable qualty, one GPS waypont for each cluster should be recorded durng the lstng phase of the survey. These wayponts are recorded usng a GPS unt (a Garmn ETREX unt s used n ths gude) and data collecton forms. If GPS unts other than the Garmn ETREX are used, ths gude wll stll be useful; however, some of the nstructons may not apply due to dfferences n desgn and menus. The Garmn ETREX owner s manual may be useful to consult on the bascs of the GPS unt. Take one readng for each cluster. The GPS wayponts wll be captured by the mapper whle he s mappng the clusters. One GPS waypont must be taken for each cluster, and n the case of large clusters whch are beng segmented, one pont should be taken for each segment selected for lstng. In DHS surveys, clusters are usually census EAs, sometmes vllages n rural areas or cty blocks n urban areas. Collectng only one waypont for the cluster greatly reduces the chance of compromsng confdentalty of the respondents and at the same tme s suffcent to allow for the ntegraton of multple datasets for further analyss. The DHS cluster waypont should always be taken at the geographc center of the cluster or segment. If the cluster s segmented, the pont should be taken for the segment chosen by the Mappng and Lstng Coordnator to be ncluded n the survey. Save the waypont and record the lattude, longtude, and alttude. The lattude, longtude, and alttude readng for a locaton are stored n two places: n the GPS unt s memory and on the DHS/1 paper form. GPS unts can be broken or lost, and experence has shown that a hardcopy backup s essental. In addton, the paper form provdes a backup should the data n the GPS unt be changed, deleted, or msdentfed (.e., the operator names the cluster ncorrectly n the unt). Each poston saved n the GPS unt s called a waypont, and each waypont has a unque name. If possble, the waypont ID should be the same as the DHS cluster number. If t s not possble, the waypont ID should be unque to the cluster and recorded on Form DHS/1 (do not record the same waypont ID for two dfferent clusters). When a waypont s saved, the GPS unt assgns t a default name. The mapper must edt the default name and change t to the 6-dgt DHS cluster ID number. For example, the waypont for DHS cluster 101 would be named 000101. Cluster 1101 would be named 001101. After savng the waypont, the mapper wll use the dentfcaton box of the Map Informaton Form (Form DHS/1) to record the lattude, longtude, and alttude for the cluster and segment on paper. Frst, the mapper wll wrte down the lattude and longtude coordnates n decmal degree format and alttude n meters n the Identfcaton Box on the Locaton Map Cluster Form (DHS/1). Second, the mapper wll draw a crcle, n the mddle of the cluster/segment, at the locaton where he/she captured the waypont. After the lstng s complete, the GPS unts must be collected as soon as possble and returned to the samplng offce by the Mappng and Lstng Coordnator. The wayponts wll then be downloaded and examned for problems by the desgnated samplng staff. The Samplng Coordnator should desgnate one member of the Data Processng Team to receve and process the GPS waypont fle and then gve the fle to survey manager. 6 Ths number s dfferent from the household number later gven to all of the households lsted n the whole cluster just pror to household selecton. 36
In most stuatons, the Mappng and Lstng Coordnator wll be responsble for provdng the lstng teams wth a GPS unt pror to the lstng. Before these unts are dstrbuted they should be set up for use by the lsters. For DHS surveys, the only format whch s acceptable s Decmal Degrees, regardless of what geographc standards may be n use for other purposes. To set the format, enter the SETUP menu and n the UNITS sub-menu, select the tem POSITION FRMT and press the ENTER button. Select hddd.ddddd Decmal Degrees, whch s the frst tem. Once hddd.ddddd s hghlghted, press the ENTER button. It s mportant that all the GPS unts be set up n the same way so that the wayponts returned at the end of the survey are all n the same format. For more detals on how to properly prepare the GPS unts for waypont collecton, please refer to the DHS Manual for GPS Data Collecton. 2.7 Lstng of households The lster wll use the Household Lstng Form (Form DHS/2) to record all households found n the cluster. Begn by enterng the dentfcaton nformaton for the cluster. The frst two columns are reserved for offce use only leave them blank. Complete the rest of the form as follows: Column (1) [Seral Number of Structure]: For each structure, record the same structure seral number that the mapper enters on the sketch map. All the structures recorded on the sketch map (except the landmarks) must be recorded on the lstng form and numbered. Column (2) [Address/descrpton of Structure]: Record the street address of the structure. Where structures do not have vsble street addresses (especally n rural areas), gve a descrpton of the structure and any detals that help n locatng t (for example, n front of the school, next to the store, etc.). Column (3) [Resdence Y/N]: Indcate whether the structure s used for resdental purposes (eatng and sleepng) by wrtng Y for Yes. In cases where a structure s used for commercal or other purposes, wrte N for No. Structures used both for resdental and commercal purposes (for example, a combnaton of store and home) should be classfed as resdental (.e. mark Y n column 3). Make sure to lst any household unt found n a nonresdental structure (for example, a guard lvng nsde a factory or n a church). Also do not forget to lst vacant structures and structures under constructon, and n Column (6) gve some explanaton (for example: vacant, under constructon, etc.) All structures seen n the cluster should be recorded on the sketch map of the cluster and n the lstng. Column (4) [Seral Number of Household n Structure]: Ths s the seral number assgned to each household found n the structure; there can be more than one household n a structure. The frst household n the structure wll always have number 1. If there s a second household n the structure, then ths household should be recorded on the next lne, a 2 s recorded n Column (4), and Columns (1) to (3) repeat the structure number and address or are left blank. Column (5) [Name of Head of Household]: Wrte the name of the head of the household. There can only be one head per household. If no one s home or the household refuses to cooperate, ask neghbors for the name of the head of the household. If a name cannot be determned, leave ths column blank. Note that t s not the name of the landlord or owner of the structure that s needed, but the name of the head of the household that lves there. Column (6) [Observatons/Occuped or not]: Ths space s provded for any specal remarks that mght help the coordnator decde whether to nclude a household n the household 37
selecton or not, and mght also help the ntervewng team locate the structure or dentfy the household durng the man survey feldwork. If the structure s an apartment block or block of flats, assgn one seral number to the entre structure (only one square wth one number appears on the sketch map), but complete Columns (2) through (6) for each apartment n the structure ndvdually. Each apartment should have ts own address, whch s the apartment number wthn the structure. The lstng team should be careful to locate hdden structures. In some areas, structures may have been bult so haphazardly that they are easly mssed. In rural areas, structures may be hdden by tall grasses and trees. If there s a pathway leadng from the lsted structure, check to see f the pathway goes to another structure. Talkng wth people lvng n the area may help n dentfyng the hdden structures. 2.8 Segmentaton of large clusters A certan number of the selected EAs may be very large n populaton sze. A complete lstng of EAs that are very large may not be feasble for the survey. These EAs should be subdvded nto several smaller segments, only one of whch wll be ncluded n the survey and lsted. In ths case, the DHS cluster corresponds to a segment of an EA. When the team arrves n a large EA that may need segmentaton, t should frst tour the EA and make a quck count to get the estmated number of households resdng n the EA. There s no standard threshold for the sze of an EA that needs to be segmented, or for segment sze. But for effcency and accuracy consderatons, DHS recommends that f the EA sze s bgger than 300 households, then the team should communcate to the coordnator the cluster number, the estmated number of households and the suggested number of segments to be created. The fnal decson to segment an EA, and the number of segments to be created, can only be taken by the coordnator. Ideally, for ease of operaton, an EA would only need to be segments nto 2 segments, wth an deal segment sze of 150-200 households n each segment. Dvdng an EA nto a large number of segments (more than 3) should be avoded f t s not really necessary n order to mnmze errors. In dvdng an EA nto segments, the deal would be to have segments of approxmately equal sze, but t s also mportant to adopt segment boundares that are easly dentfable. In the frst tour of the cluster draw a locaton map of the entre cluster. Usng dentfable boundares such as roads, streams, and electrc power lnes, dvde the EA nto the desgnated number of roughly equal-szed segments. On the locaton map of the EA, show clearly the boundares of the segments created. Number the segments sequentally. Estmate the relatve sze of each segment n the followng manner: quckly count the number of dwellngs n each segment, add up the total number of dwellngs n the EA and calculate the proporton of the dwellngs n the whole EA that are located n each segment. Example 2.1: A cluster of 620 dwellngs has been dvded nto 3 segments and the results are as follows: Segment 1: 220 dwellngs, or 220/620 = 35 percent Segment 2: 190 dwellngs, or 190/620 = 31 percent Segment 3: 210 dwellngs, or 210/620 = 34 percent Total: 620 dwellngs, or 620/620 = 100 percent On Form DHS/3 (Segmentaton Form) wrte the sze of the segments n the approprate columns (number and percent) and calculate the cumulatve sze of all of the segments n terms of a percentage. The cumulatve sze of the last segment on the lst must be equal to 100. 38
Segment number Number of dwellngs Percent Cumulatve percent 1 220 35 35 2 190 31 66 3 210 34 100 For each large EA to be segmented, a random number between 0 and 100 wll be selected n the central offce and ncluded n the fle. Compare ths random number wth the cumulatve sze. Select the frst segment for whch the cumulatve sze s greater than or equal to the random number. Random number: 67 Segment selected: Segment number 3 Proceed wth the household lstng operaton n segment number 3 as descrbed n the above sectons (see Appendx 2.3 for an example of how to complete the segmentaton form.) Draw a detaled sketch map of the selected segment and lst all the households found n the selected segment. 2.9 Qualty control To ensure that the work done by each lstng team s acceptable, qualty checks should be performed. The coordnator should tour the regons durng the household lstng operaton, and assess the qualty of the fnshed clusters. The coordnator should select a fnshed cluster and do an ndependent lstng of 10 percent of the cluster. If mportant errors are found, the whole cluster should be relsted. If the problem s related to systematc errors, and t s not possble to do correctons on the lstng forms, then all of the lsted clusters should be relsted. 2.10 Prepare the household lstng forms for household selecton Once the central offce receves the completed lstng materals for a cluster, they must frst assgn a seral number to all of the households n the cluster n the second column of the form DHS/2. Only occuped resdental households (ncludng households that refused to cooperate at the tme of lstng and households where the occupants were absent at the tme of lstng but would return shortly and would be at home durng the perod of household ntervew) wll be numbered. Ths s a contnuous seral number from 1 to the total number of occuped resdental households lsted n the cluster. Leave the cell n the second column blank f the household s not occuped, or f the structure s not a resdental structure. Fll n the second column only f the structure on that row s an occuped household. Make sure that the numberng of all occuped households follows sequentally from the prevous occuped household on the lst, wth no gaps or repettons n the numberng. See the example of a completed lstng form n Appendx 2.3. After assgnng the seral numbers to all households lsted n the cluster, copy the total number of households lsted to the column Number of households lsted n the Excel fle prepared for household selecton. Make sure ths number s recorded n the correct row for the cluster number. In the column Segmentaton nformaton record the percentage of the entre EA populaton that s ncluded n the selected segment. The segmentaton nformaton s mportant for correctly calculatng the samplng weghts. After the total number of households lsted n the cluster has been entered n the Excel fle, the spreadsheet automatcally generate the household numbers of those households selected to be ntervewed. Copy the numbers of the selected households to the frst column of the form DHS/2, correspondng to the seral number of the households n the lstng form. These are the households that must be ntervewed. It s recommended to use a dfferent colored pen on the lstng 39
forms to ndcate the households selected for ntervewng. It s also very helpful to use color on the cluster s sketch map to mark the structures where the selected households are located. In many surveys, a sub-sample of households wll be selected for the men s survey. The household selecton spreadsheet uses shaded columns to ndcate whch households are selected for the men s survey. Put a mark n the frst column on the form DHS/2 next to the number of the selected household to ndcate the households selected for the men s survey, or use a dfferent colored pen for the households selected for both men s and women s surveys. Make a copy of the whole package of fles (sketch maps and the lstng forms wth household selecton). Gve the orgnal to the ntervewng team for the household ntervew and keep the other copy n the central offce. 40
Appendx 2.1 Example lstng forms Form DHS/1 PAGE 1 of 3 Map Informaton Form Identfcaton Label Code Localty DHS Cluster Number... Urban/Rural (Urban=1/Rural=2)... EA Number... Dstrct Regon Name of Mapper Name of Lster GPS Unt Trackng Number... Waypont name (entered n GPS unt)... Lattude (North/South)... N / S Longtude (East/West)... E / W.. Alttude / Elevaton (Meters)... Observatons: Road access Other useful nformaton 41
Form DHS/1 Map Informaton Form PAGE 2 of 3 Localty Locaton map Dstrct DHS Cluster: 42
Form DHS/1 Map Informaton Form PAGE 3 of 3 Localty Sketch map of cluster Dstrct DHS Cluster: 43
44
Form DHS/3 Segmentaton Form Identfcaton Label Code Localty DHS Cluster Number... Urban/Rural (Urban=1/Rural=2)... EA Number... Dstrct Regon Name of Mapper Name of Lster Number of segments: Segment number Number of households Percent Cumulatve percent 1 2 3 4 5 Random number: Segment selected: 45
Appendx 2.2 Symbols for mappng and lstng Orentaton to the North Boundares of the cluster Paved road Unpaved (drt) road Footpath Rver, creek, etc. Brdge Lake, pond, etc. Mountans, hlls Water pont (wells, fountan, etc.) Market School Admnstratve structure Church, temple Mosque Cemetery Resdental structure 46
Non-resdental structure Vacant structure Hosptal, clnc, etc. Electrc pole Tree or bush 47
Appendx 2.3 Examples of completed mappng and lstng forms 48
49
50
51
3 SELECTED SAMPLING TECHNIQUES In ths secton, some of the most commonly used samplng technques and ther applcaton are presented. The presentaton wll focus manly on practcal rather than theoretcal aspects. However, the chapter does touch on some basc theoretcal propertes of the technques used n the DHS surveys. We focus on wthout replacement samplng rather than wth replacement samplng procedures, snce the latter represents a reducton of effcency for samples of a fxed sze due to the potental that some samplng unts may be repeated. When ths occurs, the amount of nformaton carred n a fxed sze sample s reduced because the same samplng unt s selected several tmes. For readers who are nterested n the theoretcal aspects of the selected samplng technques, please refer to the textbooks dealng wth survey samplng theory lsted n the references. 3.1 Smple random samplng We begn wth smple random samplng wthout replacement (SRSWOR) snce ths s a fundamental samplng procedure that s used as standard to whch the effcency of other samplng procedures s compared. Smple random samplng wthout replacement s a selecton procedure where every unt has an equal chance of beng selected. Selecton can be performed through successve draws wthout replacement from a well-mxed contaner contanng all samplng unts, or usng certan computerzed algorthms to select from a lst of all samplng unts. Let N be the total number of samplng unts, let n be the total sample sze, n<n. The probablty of selecton for every th unt s gven by: P = The desgn weght (assumng no non-response) s gven by: n N D = 1 / P = The probablty for any partcular n dfferent unts selected together n a sample s s gven by: where n N n P s = 1 / N s the total number of combnatons of n elements out of N. Let N n y 1, y 2,... y be the observatons made from the selected unts on a varable of nterest, then the weghted sample mean whch s the same as the unweghted sample mean, y n n 1 n = Dy / D = y 1 1 n 1, 1 N s an unbased estmator of the populaton mean, Y = y 1, wth ts samplng varance gven by N 1 f 2 Vsrs ( y ) = S y n 2 1 N where S ( ) 2 y = 1 1 y Y s the fnte populaton varance of the varable y and f=n/n s the N samplng fracton. An unbased estmaton of ths varance can be made usng n 52
2 n where ( ) 2 υ 1 f n 2 ( y ) = srs s y 1 s y = 1 1 y y s the sample varance. When n and N are large, the standardzed n varable y Y SE( y ) follows a student-t dstrbuton wth n-1 degrees of freedom and SE ( y ) s the square root of ( y ) υ. Therefore the confdence lmts of the populaton mean Y can be constructed based on sample observatons allowng for 95% confdence that the true value of Y wll le wthn the range of y 1.96 * SE( y) and y + 1.96 * SE( y). DHS reports use y ± 2 * SE( y) for a conservatve estmate of 95% confdence lmts. Gven a complete lst of all samplng unts n a computerzed fle, the easest way to draw a smple random sample of sze n s to frst generate a unformly dstrbuted random number between 0 and 1 and assocate a number wth each of the samplng unts. Next, sort the fle based on the generated random numbers n ascendng order, and the frst n unts assocated wth the n smallest random numbers are the selected unts. Ths procedure provdes a SRSWOR sample of sze n. Ths procedure s easy to mplement, but requres sortng of the samplng frame. Snce sortng s tme consumng, the followng algorthm (Tllé, 2001) may be used wth the samplng frame wthout sortng: srs Defnton of terms and the ntal step k: the k th unt of the frame fle; j: the j th selected unt k = 0 j = 0 repeat f j < n generate a unformly dstrbuted random number between [0,1) n j f u < then N k unt k + 1 s selected; j = j + 1 else unt k + 1 s not selected k = k + 1 3.2 Equal probablty systematc samplng 3.2.1 Samplng theory Systematc samplng (SYS) s the selecton of samplng unts at a fxed nterval from a lst, startng from a randomly determned pont. Selecton s systematc because selecton of the frst samplng unt determnes the selecton of the remanng samplng unts. Compared wth SRSWOR, systematc samplng has the followng advantages: 1) It s easer to perform; 2) It allows easy verfcaton of the selecton; 3) If the samplng frame s n some order, t provdes a stratfcaton effect wth respect to the varables on whch the frame s sorted, and wth a proportonal allocaton. Ths stratfcaton s called mplct stratfcaton. 53
4) Implct stratfcaton prevents unexpected concentraton of sample ponts n certan areas such as s possble wth SRSWOR. Because of these advantages, especally (3) and (4), systematc selecton s more often used than smple random samplng. Systematc samplng s normally carred out as follows: assumng a whole number nterval I=N/n, where N s the number of unts n the frame lst and n s the number of unts to be selected. The procedure begns wth an nteger random number S that s less than or equal to I. The unts to be selected are S, S+I, S+2*I,..., S+(n-1)*I. When I s not a whole number there may be apprecable errors n roundng t to the nearest whole number, t s suggested that the decmal nterval method be used. Selecton wth a decmal nterval may be carred out as follows: 1) Calculate the nterval I rounded to two decmal places. 2) Generate a random number R between 0 and 1 wth two decmal ponts. 3) Compute the sequence of samplng numbers: R*I, R*I + I, R*I + 2*I,..., R*I + (n - 1)*I 4) Round up the above calculated samplng numbers to the next hghest whole numbers; these are the selected unts numbers. Example 3.2.1: Let N=100, n=14, so that I=7.14; let the generated random number be R=0.96. The samplng numbers and the correspondng selected unt numbers are as follows: 6.85 13.99 21.13 28.27 35.41 42.55 49.69 56.83 63.97 71.11 78.25 85.39 92.53 99.67 7 14 22 29 36 43 50 57 64 72 79 86 93 100 In ths example, the decmal nterval method gves a selecton nterval whch s sometmes 7 or sometmes 8. The household selecton templates are all programmed wth decmal samplng ntervals. Often sample desgn requres numerous systematc samples as s the case when a systematc sample of households s needed wthn each selected cluster. In ths stuaton a separate random start R should be determned ndependently for each cluster. Wth SYS, the probablty of selecton for any unt s gven by P = 1 = I The desgn weght (assumng no non-response) s gven by N D = 1 / P = n Let y 1, y 2,... y n be the observatons made from the selected unts on a varable of nterest, then the weghted sample mean whch s the same as the unweghted sample mean n N y n n 1 n = Dy / D = y 1 1 n 1 54
s an unbased estmator of the populaton mean 1 = N N Y y 1 samplng nterval I, the samplng varance of the sample mean s gven by V sys ( 1 1 / N) 2 y ( y ) = S [ 1 + ( n 1) ρ ] n. For smplcty, assumng an nteger 2 1 N where S ( ) 2 y = 1 1 y k Y s the populaton varance; ρ w s the correlaton coeffcent between N pars of unts n the same systematc sample. When ρ w s negatve, SYS s more precse than SRSWOR; when varance estmate ρ w s postve, SYS s less precse than SRSWOR. Unlke the case of SRSWOR, the υ 1 f n 2 ( y ) = sys s y 1 s y = 1 1 y y s the sample n υ y s the specal case of the recommended Hartley-Rao (1962) estmator n 2 n s not an unbased estmate of the samplng varance; where ( ) 2 varance. However, sys ( ) the case of un-equal probablty systematc samplng. ( y ) sys w υ s equvalent to treatng the systematc sample as f t was drawn by SRSWOR, and therefore s called an estmator wth smple random samplng approxmaton. Theoretcally, wth SYS there s no unbased estmator for the varance of the sample mean snce systematc samplng s equvalent to randomly selectng one sample among the I possble samples. Ths s a major drawback for the SYS. However, when the samplng unts n the frame fle do not present any lnear trend n the varable of nterest, nor perodc changes, or the unts are randomly y y. When there s a lnear trend ordered, υ ( ) s a good approxmaton of the samplng varance ( ) sys n the varable of nterest, assumng the selecton of the k th systematc sample, where the summaton s over non-overlappng successve unts, the followng estmator (Wolter, 1984; Wolter 1985) s a better approxmaton of V sys ( y ): V sys 1 f [ n / 2] ( y ) = ( y y ) * 1 2 sys + 1 k ( j 1)* I k + j* I υ n n However, when confdence lmts are requred, sys ( y ) * coverage rates of the true populaton mean. It should be noted that the propertes of ( y ) υ s preferred because of ts hgh υ are dfferent from the collapsed strata estmator for stratfed samplng wth one unt per stratum because the successve observatons n a SYS sample are probablty-one correlated, whle the collapsed strata estmator for stratfed samplng has a set of completely ndependent observatons. When n and N are large, the sample mean has the same asymptotc propertes as that of the smple random sample mean; therefore confdence ntervals can be constructed n a smlar way to those for a smple random sample. 3.2.2 Excel templates for systematc samplng The MEASURE DHS program has developed Excel templates that can be used for equal probablty systematc samplng of households. The templates can be used to perform smple selecton, selecton wth runs, self-weghtng selecton wthout sample sze control and self-weghtng selecton wth sample sze control. Fgure 3.1 below shows a porton of the smple selecton procedure wth a sample take of 20 households per cluster. The darker shaded areas requre data nput. The area to the sys 55
left of the column labeled, Num HH lsted s reserved for cluster IDs. Numbers for the selected households are shown to the rght of the column labeled Random (0-1). Fgure 3.2 below shows a porton of the selecton procedure wth runs of 4 households. Both selectons ncorporate a selecton of a sub-sample. Fgure 3.3 shows a smple self-weghtng selecton wth an average sample take of 20 households, wthout sample sze control, but wth the mnmum and maxmum number of sample takes of 10 and 30 households respectvely. Fgure 3.4 shows a self-weghtng selecton, wth runs, wth an average sample take of 20 households per cluster, wthout sample sze control, but wth mnmum and maxmum sample takes of 10 and 30 households respectvely; both of the selectons ncorporate a sub-sample of 10 households per cluster. Note that the selecton procedure wth runs s crcular, meanng that when the selecton nterval s not an nteger, and when the run s not a dvsor of the total number of households lsted, then the last selected household number may be smaller than the frst selected household number. Fgures 3.5 and 3.6 show self-weghtng selectons wth sample sze control; the control area s the samplng stratum. The dsadvantage of the self-weghtng selecton wth sample sze control s that the selecton procedure wll do the household selecton only f the household lstng results are entered for the entre control area. Ths condton may represent a constrant n some stuatons. Fgure 3.7 shows a manual selecton carred out n the feld that can be performed easly usng a smple calculator. If household selecton at the central offce s not feasble; the ntervewer can perform the household selecton n the feld. The numbers n red represent nformaton that s entered and the calculated terms. Ths procedure requres a tradtonal household lstng operaton where households are numbered and lsted on household lstng forms. Usng the total number of households lsted and the number of households to be selected, the ntervewer can frst calculate the selecton nterval then use the random number, R, assocated wth the selected cluster, to calculate the frst samplng number or term t 1 and enter the frst term to the cell for t 1. For the subsequent samplng numbers or terms, the ntervewer adds the samplng nterval to the prevous samplng number or term. After the calculaton of the samplng numbers, the ntervewer should round the samplng numbers to ntegers n the next column; these are the selected household numbers. The ntervewer s asked to copy the address and the name of the head of household of the selected households from the household lstng form. The household selecton form s subject to revew by the feld work supervsor. 56
Fgure 3.1 Smple household selecton wth a sub-sample HOUSEHOLD SELECTION Run sze 1 Sub-sample take per cluster 10 1 2 3 4 5 6 7 8 9 10 Cluster Num Num HHs Lsted Num Selected Select nterval Random (0-1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 138 20 6.90 0.03800 1 8 15 21 28 35 42 49 56 63 70 77 84 90 97 104 111 118 125 132 2 151 20 7.55 0.65268 5 13 21 28 36 43 51 58 66 73 81 88 96 104 111 119 126 134 141 149 3 182 20 9.10 0.97489 9 18 28 37 46 55 64 73 82 91 100 109 119 128 137 146 155 164 173 182 4 129 20 6.45 0.41931 3 10 16 23 29 35 42 48 55 61 68 74 81 87 94 100 106 113 119 126 5 180 20 9.00 0.53756 5 14 23 32 41 50 59 68 77 86 95 104 113 122 131 140 149 158 167 176 6 173 20 8.65 0.70405 7 15 24 33 41 50 58 67 76 84 93 102 110 119 128 136 145 154 162 171 7 C 140 20 7.00 0.51868 4 11 18 25 32 39 46 53 60 67 74 81 88 95 102 109 116 123 130 137 8 l 69 20 3.45 0.25579 1 5 8 12 15 19 22 26 29 32 36 39 43 46 50 53 57 60 63 67 9 u 176 20 8.80 0.96775 9 18 27 35 44 53 62 71 79 88 97 106 115 123 132 141 150 159 167 176 10 s 90 20 4.50 0.40192 2 7 11 16 20 25 29 34 38 43 47 52 56 61 65 70 74 79 83 88 11 t 131 20 6.55 0.32702 3 9 16 22 29 35 42 48 55 62 68 75 81 88 94 101 107 114 121 127 12 e 92 20 4.60 0.76363 4 9 13 18 22 27 32 36 41 45 50 55 59 64 68 73 78 82 87 91 13 r 126 20 6.30 0.41681 3 9 16 22 28 35 41 47 54 60 66 72 79 85 91 98 104 110 117 123 14 199 20 9.95 0.84599 9 19 29 39 49 59 69 79 89 98 108 118 128 138 148 158 168 178 188 198 15 225 20 11.25 0.91906 11 22 33 45 56 67 78 90 101 112 123 135 146 157 168 180 191 202 213 225 16 I 205 20 10.25 0.12089 2 12 22 32 43 53 63 73 84 94 104 114 125 135 145 155 166 176 186 196 17 D 148 20 7.40 0.88941 7 14 22 29 37 44 51 59 66 74 81 88 96 103 111 118 125 133 140 148 18 146 20 7.30 0.25095 2 10 17 24 32 39 46 53 61 68 75 83 90 97 105 112 119 126 134 141 19 139 20 6.95 0.14534 2 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 20 201 20 10.05 0.84172 9 19 29 39 49 59 69 79 89 99 109 120 130 140 150 160 170 180 190 200 57
Fgure 3.2 Selecton of runs wth a sub-sample HOUSEHOLD SELECTION Run sze 4 Sub-sample take per cluster 10 1 2 3 4 5 6 7 8 9 10 Cluster Num Num HHs Lsted Num Selected Select nterval Random (0-1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 138 20 6.90 0.77576 21 22 23 24 49 50 51 52 77 78 79 80 105 106 107 108 129 130 131 132 2 151 20 7.55 0.05693 1 2 3 4 29 30 31 32 61 62 63 64 93 94 95 96 121 122 123 124 3 182 20 9.10 0.10590 1 2 3 4 41 42 43 44 77 78 79 80 113 114 115 116 149 150 151 152 4 129 20 6.45 0.64741 17 18 19 20 41 42 43 44 69 70 71 72 93 94 95 96 117 118 119 120 5 180 20 9.00 0.60810 21 22 23 24 57 58 59 60 93 94 95 96 129 130 131 132 165 166 167 168 6 173 20 8.65 0.96364 33 34 35 36 65 66 67 68 101 102 103 104 137 138 139 140 169 170 171 172 7 C 140 20 7.00 0.11160 1 2 3 4 29 30 31 32 57 58 59 60 85 86 87 88 113 114 115 116 8 l 69 20 3.45 0.15540 1 2 3 4 13 14 15 16 29 30 31 32 41 42 43 44 57 58 59 60 9 u 176 20 8.80 0.00870 1 2 3 4 33 34 35 36 69 70 71 72 105 106 107 108 141 142 143 144 10 s 90 20 4.50 0.32205 5 6 7 8 21 22 23 24 41 42 43 44 57 58 59 60 77 78 79 80 11 t 131 20 6.55 0.69849 17 18 19 20 45 46 47 48 69 70 71 72 97 98 99 100 121 122 123 124 12 e 92 20 4.60 0.51119 9 10 11 12 25 26 27 28 45 46 47 48 65 66 67 68 81 82 83 84 13 r 126 20 6.30 0.31826 9 10 11 12 33 34 35 36 57 58 59 60 81 82 83 84 109 110 111 112 14 199 20 9.95 0.69129 25 26 27 28 65 66 67 68 105 106 107 108 145 146 147 148 185 186 187 188 15 225 20 11.25 0.67523 29 30 31 32 73 74 75 76 121 122 123 124 165 166 167 168 209 210 211 212 16 I 205 20 10.25 0.30267 13 14 15 16 53 54 55 56 93 94 95 96 133 134 135 136 177 178 179 180 17 D 148 20 7.40 0.53373 13 14 15 16 45 46 47 48 73 74 75 76 105 106 107 108 133 134 135 136 18 146 20 7.30 0.32483 9 10 11 12 37 38 39 40 65 66 67 68 97 98 99 100 125 126 127 128 19 139 20 6.95 0.69275 17 18 19 20 45 46 47 48 73 74 75 76 101 102 103 104 129 130 131 132 20 201 20 10.05 0.34629 13 14 15 16 53 54 55 56 93 94 95 96 133 134 135 136 173 174 175 176 58
Fgure 3.3 Smple self-weghtng selecton wthout sample sze control H o u s e h o l d s e l e c t I o n Average sample take 20 Ave. take for sub-sample 10 Col name for PSU proba Mn sample take 10 Col name for EA proba b Max sample take 30 Col name Num HH n base c Run sze 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Cluster num EA Proba HH n base Overall proba Segment nfo HH lsted Sample take Selecton nterval Random (0-1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 0.089851 456 0.003941 345 15 23.00 0.13710 4 27 50 73 96 119 142 165 188 211 234 257 280 303 326 2 0.037832 192 0.003941 103 11 9.36 0.94140 9 19 28 37 47 56 65 75 84 94 103 3 0.026009 132 0.003941 127 19 6.68 0.74823 6 12 19 26 32 39 46 52 59 66 72 79 86 92 99 106 112 119 126 4 0.029753 151 0.003941 127 17 7.47 0.47966 4 12 19 26 34 41 49 56 64 71 79 86 94 101 109 116 124 5 0.019507 99 0.003941 98 20 4.90 0.35329 2 7 12 17 22 27 32 37 41 46 51 56 61 66 71 76 81 86 90 95 6 0.026601 135 0.003941 132 20 6.60 0.35072 3 9 16 23 29 36 42 49 56 62 69 75 82 89 95 102 108 115 122 128 7 0.034679 176 0.003941 218 25 8.72 0.14457 2 10 19 28 37 45 54 63 72 80 89 98 106 115 124 133 141 150 159 167 176 185 194 202 211 8 0.033103 168 0.003941 92 11 8.36 0.74902 7 15 23 32 40 49 57 65 74 82 90 9 0.088471 449 0.003941 0.46 247 24 10.29 0.65592 7 18 28 38 48 59 69 79 90 100 110 120 131 141 151 162 172 182 193 203 213 223 234 244 10 0.101279 514 0.003941 0.55 245 17 14.41 0.15798 3 17 32 46 60 75 89 104 118 132 147 161 176 190 205 219 233 11 0.019507 99 0.003941 122 25 4.88 0.85734 5 10 14 19 24 29 34 39 44 49 53 58 63 68 73 78 83 88 93 97 102 107 112 117 122 12 0.009939 76 0.002615 40 11 3.64 0.18437 1 5 8 12 16 19 23 27 30 34 38 13 0.012424 95 0.002615 160 30 5.33 0.66882 4 9 15 20 25 31 36 41 47 52 57 63 68 73 79 84 89 95 100 105 111 116 121 127 132 137 143 148 153 159 14 0.008893 68 0.002615 69 20 3.45 0.26100 1 5 8 12 15 19 22 26 29 32 36 39 43 46 50 53 57 60 64 67 15 0.018439 141 0.002615 133 19 7.00 0.69656 5 12 19 26 33 40 47 54 61 68 75 82 89 96 103 110 117 124 131 16 0.013731 105 0.002615 120 23 5.22 0.51406 3 8 14 19 24 29 34 40 45 50 55 61 66 71 76 81 87 92 97 102 108 113 118 17 0.018178 139 0.002615 165 24 6.88 0.00231 1 7 14 21 28 35 42 49 56 62 69 76 83 90 97 104 111 117 124 131 138 145 152 159 18 0.008239 63 0.002615 90 29 3.10 0.87493 3 6 9 13 16 19 22 25 28 31 34 37 40 44 47 50 53 56 59 62 65 68 71 75 78 81 84 87 90 19 0.016608 127 0.002615 98 15 6.53 0.87072 6 13 19 26 32 39 45 52 58 65 72 78 85 91 98 20 0.009416 72 0.002615 75 21 3.57 0.29377 2 5 9 12 16 19 23 27 30 34 37 41 44 48 52 55 59 62 66 69 73 59
Fgure 3.4 Self-weghtng selecton wth runs and wthout sample sze control H o u s e h o l d s e l e c t I o n Average sample take 20 Ave. take for sub-sample 10 Col name for PSU proba Mn sample take 10 Col name for EA proba b Max sample take 30 Col name Num HH n base c Run sze 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Cluster num EA Proba HH n base Overall proba Segment nfo HH lsted Sample take Selecton nterval Random (0-1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 0.089851 456 0.003941 345 15 23.00 0.06005 6 7 8 9 10 121 122 123 124 125 236 237 238 239 240 2 0.037832 192 0.003941 103 11 9.36 0.51677 21 22 23 24 25 71 72 73 74 75 13 3 0.026009 132 0.003941 127 19 6.68 0.83579 26 27 28 29 30 61 62 63 64 65 91 92 93 94 95 126 127 1 2 4 0.029753 151 0.003941 127 17 7.47 0.85753 31 32 33 34 35 66 67 68 69 70 106 107 108 109 110 14 15 5 0.019507 99 0.003941 98 20 4.90 0.55776 11 12 13 14 15 36 37 38 39 40 61 62 63 64 65 86 87 88 89 90 6 0.026601 135 0.003941 132 20 6.60 0.29854 6 7 8 9 10 41 42 43 44 45 76 77 78 79 80 106 107 108 109 110 7 0.034679 176 0.003941 218 25 8.72 0.82987 36 37 38 39 40 76 77 78 79 80 121 122 123 124 125 166 167 168 169 170 211 212 213 214 215 8 0.033103 168 0.003941 92 11 8.36 0.77278 31 32 33 34 35 71 72 73 74 75 24 9 0.088471 449 0.003941 0.46 247 24 10.29 0.85756 41 42 43 44 45 96 97 98 99 100 146 147 148 149 150 196 197 198 199 200 246 247 1 2 10 0.101279 514 0.003941 0.55 245 17 14.41 0.99471 71 72 73 74 75 141 142 143 144 145 216 217 218 219 220 41 42 11 0.019507 99 0.003941 122 25 4.88 0.14094 1 2 3 4 5 26 27 28 29 30 51 52 53 54 55 76 77 78 79 80 101 102 103 104 105 12 0.009939 76 0.002615 40 11 3.64 0.11903 1 2 3 4 5 21 22 23 24 25 36 13 0.012424 95 0.002615 160 30 5.33 0.03859 1 2 3 4 5 26 27 28 29 30 51 52 53 54 55 81 82 83 84 85 106 107 108 109 110 131 132 133 134 135 14 0.008893 68 0.002615 69 20 3.45 0.28596 1 2 3 4 5 21 22 23 24 25 36 37 38 39 40 56 57 58 59 60 15 0.018439 141 0.002615 133 19 7.00 0.05647 1 2 3 4 5 36 37 38 39 40 71 72 73 74 75 106 107 108 109 16 0.013731 105 0.002615 120 23 5.22 0.26691 6 7 8 9 10 31 32 33 34 35 56 57 58 59 60 86 87 88 89 90 111 112 113 17 0.018178 139 0.002615 165 24 6.88 0.00029 1 2 3 4 5 31 32 33 34 35 66 67 68 69 70 101 102 103 104 105 136 137 138 139 18 0.008239 63 0.002615 90 29 3.10 0.57751 6 7 8 9 10 21 22 23 24 25 36 37 38 39 40 56 57 58 59 60 71 72 73 74 75 86 87 88 89 19 0.016608 127 0.002615 98 15 6.53 0.89801 26 27 28 29 30 61 62 63 64 65 91 92 93 94 95 20 0.009416 72 0.002615 75 21 3.57 0.71086 11 12 13 14 15 31 32 33 34 35 46 47 48 49 50 66 67 68 69 70 6 60
Fgure 3.5 Self-weghtng selecton wth sample sze control H o u s e h o l d s e l e c t I o n Num of HHs expected Num of HHs selected Man sample 620 Man sample 619 31 31 31 31 31 29 26 25 22 16 12 8 6 2 1 Subsample 1 310 Subsample 1 302 31 31 31 31 31 31 31 31 31 31 30 29 27 26 26 25 22 22 21 16 14 12 10 8 7 6 4 2 1 1 Segment Num HHs Num of HHs Overall Random 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Cluster EA HH n Stratum num Probablty base nfo lsted selected Probablty (0, 1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 0.089851 456 1 345 16 0.004276 0.99090 22 43 65 87 108 130 151 173 194 216 237 259 281 302 324 345 2 0.037832 192 1 103 11 0.004275 0.75239 8 17 26 36 45 54 64 73 82 92 101 3 0.026009 132 1 127 21 0.004274 0.86458 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 103 109 115 121 127 4 0.029753 151 1 127 18 0.004276 0.99072 7 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 1 5 0.019507 99 1 98 22 0.004276 0.04095 1 5 10 14 18 23 27 32 36 41 45 50 54 59 63 67 72 76 81 85 90 94 6 0.026601 135 1 132 21 0.004274 0.85131 6 12 18 25 31 37 44 50 56 62 69 75 81 88 94 100 106 113 119 125 132 7 0.034679 176 1 218 27 0.004274 0.04537 1 9 17 25 33 41 49 57 65 73 82 90 98 106 114 122 130 138 146 154 162 170 178 186 195 203 211 8 0.033103 168 1 92 12 0.004275 0.60119 5 13 20 28 36 43 51 59 66 74 82 89 9 0.088471 449 1 0.46 247 26 0.004274 0.60089 6 16 25 35 44 54 63 73 82 92 101 111 120 130 139 149 158 168 177 187 196 206 215 225 234 244 10 0.101279 514 1 0.55 245 19 0.004274 0.15320 2 15 28 41 54 67 80 93 106 118 131 144 157 170 183 196 209 222 234 11 0.019507 99 1 122 26 0.004276 0.83106 4 9 14 18 23 28 33 37 42 47 51 56 61 65 70 75 79 84 89 94 98 103 108 112 117 122 12 0.009939 76 2 40 10 0.002572 0.47381 2 6 10 14 18 22 26 30 34 38 13 0.012424 95 2 160 30 0.002329 0.92044 5 11 16 21 27 32 37 43 48 53 59 64 69 75 80 85 91 96 101 107 112 117 123 128 133 139 144 149 155 160 14 0.008893 68 2 69 20 0.002573 0.88266 4 7 10 14 17 21 24 28 31 35 38 41 45 48 52 55 59 62 66 69 15 0.018439 141 2 133 19 0.002572 0.02506 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 16 0.013731 105 2 120 23 0.002573 0.19580 2 7 12 17 22 28 33 38 43 49 54 59 64 69 75 80 85 90 95 101 106 111 116 17 0.018178 139 2 165 23 0.002573 0.84644 7 14 21 28 35 42 50 57 64 71 78 85 93 100 107 114 121 128 136 143 150 157 164 18 0.008239 63 2 90 28 0.002573 0.53018 2 5 9 12 15 18 21 25 28 31 34 38 41 44 47 50 54 57 60 63 66 70 73 76 79 82 86 89 19 0.016608 127 2 98 15 0.002572 0.54311 4 11 17 24 30 37 43 50 56 63 69 76 82 89 95 20 0.009416 72 2 75 20 0.002574 0.52813 2 6 10 14 17 21 25 29 32 36 40 44 47 51 55 59 62 66 70 74 61
Fgure 3.6 Self-weghtng selecton wth runs and wth sample sze control H o u s e h o l d s e l e c t I o n Num of HHs expected Num of HHs selected Man sample 620 Man sample 624 31 31 31 31 31 31 27 26 23 20 15 9 7 5 1 Subsample 1 310 Subsample 1 319 31 31 31 31 31 31 31 31 31 31 31 30 27 27 26 25 23 22 20 16 15 13 9 8 7 6 5 2 1 1 Segment Num HHs Num of HHs Overall Random 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Cluster EA HH n Stratum num Probablty base nfo lsted selected probablty (0, 1) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 0.089851 456 1 345 17 0.004276 0.37154 36 37 38 39 40 136 137 138 139 140 241 242 243 244 245 341 342 2 0.037832 192 1 103 12 0.004275 0.17906 6 7 8 9 10 51 52 53 54 55 91 92 3 0.026009 132 1 127 21 0.004274 0.37346 11 12 13 14 15 41 42 43 44 45 71 72 73 74 75 101 102 103 104 105 4 4 0.029753 151 1 127 18 0.004276 0.97771 31 32 33 34 35 66 67 68 69 70 106 107 108 109 110 14 15 16 5 0.019507 99 1 98 22 0.004276 0.25939 6 7 8 9 10 26 27 28 29 30 51 52 53 54 55 71 72 73 74 75 91 92 6 0.026601 135 1 132 22 0.004274 0.17919 6 7 8 9 10 36 37 38 39 40 66 67 68 69 70 96 97 98 99 100 126 127 7 0.034679 176 1 218 27 0.004274 0.85089 31 32 33 34 35 71 72 73 74 75 116 117 118 119 120 156 157 158 159 160 196 197 198 199 200 18 19 8 0.033103 168 1 92 12 0.004275 0.50068 16 17 18 19 20 56 57 58 59 60 4 5 9 0.088471 449 1 0.46 247 26 0.004274 0.31150 11 12 13 14 15 61 62 63 64 65 106 107 108 109 110 156 157 158 159 160 201 202 203 204 205 4 10 0.101279 514 1 0.55 245 19 0.004274 0.58240 36 37 38 39 40 101 102 103 104 105 166 167 168 169 170 231 232 233 234 11 0.019507 99 1 122 27 0.004276 0.58464 11 12 13 14 15 36 37 38 39 40 56 57 58 59 60 81 82 83 84 85 101 102 103 104 105 4 5 12 0.009939 76 2 40 11 0.002572 0.00559 1 2 3 4 5 16 17 18 19 20 36 13 0.012424 95 2 160 30 0.002329 0.28430 6 7 8 9 10 31 32 33 34 35 61 62 63 64 65 86 87 88 89 90 111 112 113 114 115 141 142 143 144 145 14 0.008893 68 2 69 20 0.002573 0.16263 1 2 3 4 5 21 22 23 24 25 36 37 38 39 40 51 52 53 54 55 15 0.018439 141 2 133 18 0.002572 0.83721 31 32 33 34 35 66 67 68 69 70 101 102 103 104 105 8 9 10 16 0.013731 105 2 120 22 0.002573 0.78642 21 22 23 24 25 46 47 48 49 50 76 77 78 79 80 101 102 103 104 105 11 12 17 0.018178 139 2 165 23 0.002573 0.44707 16 17 18 19 20 51 52 53 54 55 86 87 88 89 90 121 122 123 124 125 156 157 158 18 0.008239 63 2 90 28 0.002573 0.80425 11 12 13 14 15 26 27 28 29 30 46 47 48 49 50 61 62 63 64 65 76 77 78 79 80 1 2 3 19 0.016608 127 2 98 15 0.002572 0.18038 6 7 8 9 10 36 37 38 39 40 71 72 73 74 75 20 0.009416 72 2 75 21 0.002574 0.01884 1 2 3 4 5 16 17 18 19 20 36 37 38 39 40 51 52 53 54 55 71 62
Fgure 3.7 Manual household selecton n the feld 63
3.3 Probablty proportonal to sze samplng 3.3.1 Samplng theory In order to ncrease samplng effcency, a samplng procedure can attrbute dfferent selecton probabltes to dfferent samplng unts. In general, a large samplng unt wll contrbute more to the samplng varance f equal probablty selecton s used. If large samplng unts are selected wth larger chances, samplng varance may be greatly reduced. To the extreme, a good strategy s to select very large samplng unts wth certanty or wth a probablty of one. Assumng that each samplng unt has some knd of known measure of sze whch s postvely correlated wth the varable of nterest, a Probablty Proportonal to the measure of Sze (PPS) selecton has the same four advantages as SYS samplng. Ths procedure assgns each samplng unt a specfc chance to be selected n the sample before the samplng begns, and the chance s proportonal to ts measure of sze. Let M be the measure of sze of unt ; let N M 1 be the total measure of sze; let n be the desgn sample sze. A PPS samplng procedure wll select unt wth a probablty π such that π = nm M The desgn weght (assumng no non-response) s gven by D M = 1 / π = nm Let y 1, y 2,... y n be the observatons made from the selected unts on a varable of nterest, then the weghted sum of the observatons ˆ n = D y = n ypps 1 1 y π s an unbased estmator of the populaton total N y Y =. The varance of ths estmator s gven by 1 V N N 1 ( yˆ ) ( π π ) PPS = 2 = 1 j, j= 1 2 y y j π j j (Yates-Grundy, 1953) π π j where π j s the jont probablty of selectng unts and j together n a sample. If all the jont probabltes π > 0, then the above varance can be estmated unbased by: j ( y ) Vˆ ˆ PPS 1 2 n n = = 1 j, j= 1 π π j π j y y j π j π π j 2 (Yates-Grundy, 1953) However, the above estmator s not calculable because the jont probabltes π j are usually unknown. Hartley and Rao (1962) provded an approxmaton of the above estmator whch nvolves only the frst order selecton probabltes π : Vˆ HR n n 1 1 ( yˆ ) 1 ( + π ) PPS 2 n 1 = 1 j, j= 1 1 N 2 y y j π + j π (Hartley-Rao, 1962) 1 k n π π j But the Hartley-Rao estmator requres knowledge of the selecton probablty of all samplng unts n the populaton (through N 2 π ) whch s usually not calculated n the sample selecton. The general 1 k 2 64
documentaton just keeps the selecton probablty for the selected unts. By replacng N 2 π by ts 2 nπ sample estmaton = π n 1 1 π, the Hartley-Rao estmator can be further smplfed (Ren, 2003) Vˆ R n n ( yˆ ) ( 1 π ) PPS = n 1 1 y ˆ ypps π n In the case of equal probablty samplng, both V ˆ ( ˆ ) and V ( ˆ ) HR y PPS 2 ˆ wll be reduced to the varance estmator wth smple random samplng approxmaton. Suppose that π < 1 for all, both Yates-Grundy and Hartley-Rao estmators may produce negatve varance estmaton, whle V ( ˆ ) always postve. R y PPS 1 k ˆ s R y PPS Wolter (1984; 1985) conducted an extensve study on the varance estmaton for systematc * υ y. He recommends the use of samplng, ncludng the successve dfference estmator smlar to ( ) the Hartley-Rao estmator f the populaton does not present any trends n the measure of sze varable and the varable of nterest, especally when a confdence nterval s requred. The above results for populaton total estmaton can be adapted to mean estmaton: y n n n y n 1 PPS = D y / D = / 1 1 1 π 1 π y PPS s an approxmately unbased estmator for the populaton mean wth approxmate varance gven by: V N N 1 ( y ) ( π π π ) PPS = N sys y Y y j Y π π 2 j j = 1 j = 1 j If the unts are not specally ordered accordng to the varable of nterest n the samplng frame, the approxmate sample varance of the estmator can be estmated by Vˆ R ( y ) PPS = 1 n 2 n 1 n ( D ) 1 n 1 ( 1 π ) y ˆ ypps π n The above estmator wll be reduced to the smple random samplng approxmaton ( y ) equal probablty systematc samplng. 3.3.2 Operatonal descrpton and examples 2 υ n case of There are many ways to draw a PPS sample, but the easest way s the PPS systematc samplng summarzed n the followng: 1) Lst the samplng unts wth ther measure of sze M sys k 2) Calculate the cumulatve measure of sze C k = M 1 for each unt k, and check that the last entry C N equals the total measure of sze N M 1 3) Let n be the number of unts to be selected. Compute the samplng nterval I = N 1 n M 65
4) Generate a random number R between 0 and 1 5) Compute the samplng numbers R*I, R*I+I, R*I+2*I,..., R*I+(n-1)*I 6) For each samplng number R*I+(j-1)*I, the j th sampled unt s unt k f C k s the frst cumulatve sze bgger than the samplng number R*I+(j-1)*I n * M 7) Calculate the selecton probablty of each selected unt j: N M The followng example demonstrates how manual selecton s done. Example 3.3.1: 20 Let N=20, n=5, M = 4004 1 ; therefore the samplng nterval I = 801 ; let the generated random number be R = 305. The samplng numbers and the selected unt numbers are as follows: 1 j ID number Sze measure M Cumulatve C k 1 139 139 Samplng number j th selected unt Selecton probablty 2 101 240 3 184 424 305 1 0.22977 4 184 608 5 104 712 6 259 971 7 219 1190 1106 2 0.273477 8 192 1382 9 224 1606 10 197 1803 11 150 1953 1907 3 0.187313 12 257 2210 13 270 2480 14 195 2675 15 296 2971 2707 4 0.36963 16 178 3149 17 256 3405 18 227 3632 3508 5 0.283467 19 247 3879 20 125 4004 The PPS samplng has the same advantages as equal probablty systematc samplng, but wth ths procedure a unt may be selected more than once f the unt s measure of sze s bgger than the samplng nterval. These large unts are sad to have been selected wth certanty, or are selfrepresentng unts. A unt selected more than once should be segmented to form a number of smaller unts correspondng to the number of tmes the unt s selected. The selecton probabltes should be recalculated usng the szes of the segmented unts. Wth ths strategy, the total sample sze s kept the same as desgned and the selecton probabltes of the non-certanty unts do not need to be adjusted. 66
Another way to deal wth large unts conssts of examnng the lst of unts before samplng begns. Computaton of the nterval wll reveal whether there are any unts of sze greater than I. The smplest soluton to prevent repetton durng samplng mght be to splt each such unt nto two or more approxmately equal subunts of sze less than I. The splt would be made frst on paper only. The measure of sze for the orgnal unt s dvded equally among the subunts before samplng proceeds. Later the splt s materalzed, ether by drawng a lne on the map of the unt, or by dentfyng a sutable dvdng lne durng the frst feld vst to the unt. If a substantal number of the unts chosen to serve as PSUs are larger than the nterval I, then the choce of such unts to serve as PSUs was clearly ncorrect. One soluton to ths problem s to place all PSUs wth a measure of sze larger than a threshold (not necessarly greater than or equal to I) before samplng and to gve them specal treatment, and call them self-representng unts. They are not, therefore, samplng unts but strata by defnton. A new type of samplng unt has to be desgnated to serve as PSU wthn these areas. For the purpose of samplng error computaton, t s mportant to realze that the term self-representng PSU s msleadng. The self-representng unts are n fact strata, whle the new, smaller unts or sub-unts wthn them are the true PSUs. Ths treatment requres re-calculatng the sample allocaton, and then proceedng wth sample selecton ndependently n each stratum. An Excel template for stratfed PPS or equal probablty systematc samplng has been developed. Fgure 3.8 below shows a porton of a blank template. Fgure 3.9 shows an example of stratfed PPS samplng wth the strata beng the urban and rural areas wthn each provnce. 67
Fgure 3.8 Part of an Excel template for stratfed samplng Stratfed systematc samplng wth probablty proportonal to sze Random (0, 1) Stratum num Stratum sze St Sample sze Stratum num Stratum sze St Sample sze Stratum num Stratum sze St Sample sze Stratum num Stratum sze St Sample sze Col name of Dom/Regon Col name of urban/rural Col name of PSU sze Total number of strata Total sample sze # of Dff PSU selected Seral numb Dom/Regon name/code Urban/ rural PSU Sze Stratum number Selecton Probablty # of tmes Selected Stratum sze Stratum sample sze Measure sze: stratum Paste the frame fle below 1 2 3 4 5 6 7 8 9 10 68
Fgure 3.9 Part of an example for a provnce crossed urban-rural stratfed PPS samplng In Fgure 3.9 above, the number of tmes n whch an EA s selected s ndcated n the column labeled # of tmes selected. Use the flter to locate the selected unts and copy them to a new fle. Fgure 3.10 below gves an example of a porton of a prepared sample fle. Ths s an example; t does not reflect any actual clusters selected for a DHS. The frst column gves the cluster number whch s assgned by the statstcan. The clusters are sorted n the orgnal order as n the samplng frame. The last sx columns are the samplng parameters calculated by the program ncludng: EA selecton probablty Selecton Proba, number of EAs by stratum Stratum sze, number of EAs selected by stratum Stratum sam-sze, total measure of sze by stratum (total number of households) Measure sze-strat, stratum number and number of tmes the unt has been selected. These are mportant samplng parameters whch must be present n a sample fle. 69
Fgure 3.10 Part of an example sample fle from a stratfed PPS samplng 3.4 Complex samplng procedures The samplng procedures used n DHS surveys are usually complex nvolvng mult-stage selecton, clusterng and stratfcaton, wth a combnaton of PPS samplng n the frst stage and an equal probablty systematc samplng n the second stage. Mult-stage selecton s employed due to the lack of a samplng frame at the ndvdual level; clusterng s used for mplementng effcency and stratfcaton for the reducton of samplng errors. The DHS samplng procedure has been dscussed n some detal n Secton 1.8; here we gve the basc theoretcal propertes of the estmator, the varance and varance estmaton for a two stage cluster samplng. Consder a two-stage stratfed cluster samplng, wth nh PSUs selected n stratum h n the frst stage wth PPS samplng, and for each of the selected PSUs, an equal probablty systematc sample of m SSUs s selected. Let y hj, y,... 1 hj2, y hjm be observatons from the j th PSU n stratum h. An unbased estmator of the populaton total s gven by 70
Yˆ hj Yˆ hj PPS =, wth Ŷ hj = π h j Phj where π Phj s the selecton probablty of the j th PSU n stratum h; M hj s the number of SSUs n the j th PSU n stratum h. The varance of ths estmator s gven by V 2 Y Y hk hj M h ( Yˆ 1 1 1 1/ 2 ) = ( π π π ) + S ( 1+ ( m 1) ρ ) PPS 2 π Phk Phj Phkj h k j Phk π Phl M m ( V P ) ( V s ) The frst part ( V P ) represents the samplng varance of the selecton of a PSU, the summaton s over all strata for dfferent PSU j and k wthn the same stratum; the second part ( V S ) represents the samplng varance of the selecton of an SSU, the summaton s over all strata and PSU. Estmators for the frst part and second part are obtaned from the results n prevous sectons 2 ˆ 1 ˆ ˆ 2 π Phkπ Phj π Phkj = Y Yhj 1 f hk hj VP, Vˆ 1 2 S = sh h k j π Phkj π Phk π Phl h j π Phj m Snce the Vˆ P s not an unbased estmate of V P and t usually over estmatesv P, and that V S s usually smaller compared tov P, therefore the second part s usually dropped n the varance estmaton, ths gves an approxmate varance estmaton gven by ( ˆ PPS ) The above estmator can be smplfed as V ( ˆ ) ˆ 1 ˆ ˆ 2 π Phkπ Phj π Phkj = Y Y hk hj V Y h k j π Phkj π Phk π Phl ˆ n Secton 3.3.1 R y PPS ( ˆ nh YPPS ) = ( 1 π Phj ) h j y π hj Phj ˆ ˆ ˆ 1 Yhj Yh VR h nh j π Phj nh whch s reduced to the Woodruff (1971) estmator f π f for all h: ( ˆ nh ) ( 1 fh ) YPPS = ˆ ˆ ˆ 1 Yhj Yh VW h nh j π Phj nh Phj h 2 2 2 m h hw where Yˆ hj Yˆ h = s the sample estmaton of the populaton total of stratum h. j π Phj The above estmator can be expanded to estmate a mean or a rato by usng Woodruff s (1971) lnearzaton approach: let Rˆ = Yˆ PPS / Xˆ PPS, where Ŷ PPS represents the total weghted sample value for varable y, and Xˆ PPS represents the total weghted sample value for varable x or the total number of weghted cases n the group or subgroup under consderaton. The approxmate varance of Rˆ can be computed usng Woodruff s formula: Vˆ (ˆ) R = W 1 Xˆ nh n h(1 f h ) z h z h nh = 1 nh 2 h 1 PPs 2 n whch 71
z h ( Yˆ hj Rˆˆ X hj )/ π Phj =, and z h = Yˆ h Rˆ Xˆ h The above estmator s wdely used n commercal statstcal software such as SAS, SPSS and Stata. Repeated replcaton methods such as Bootstrap and Jackknfe (Efron, 1982; Efron 1993) can also be used to estmate the varance of Rˆ, as explaned n Secton 4.2 for estmatng samplng errors for complex demographc rates. It should be noted that the DHS survey samplng error calculaton procedure has tradtonally used the Taylor lnearzaton method (Woodruff, 1971) to calculate the samplng varance for means and ratos because the lnearzaton method s faster computatonally than the replcaton methods. 72
4 SURVEY ERRORS The estmates from a sample survey are affected by two types of errors: non-samplng errors and samplng errors. Non-samplng errors are the results of problems occurrng durng data collecton and data processng, such as falure to locate and ntervew the correct household, msunderstandng of the questons on the part of ether the ntervewer or the respondent, and data entry errors. Although numerous efforts are made durng the mplementaton of a DHS to mnmze ths type of error, non-samplng errors are mpossble to avod and dffcult to evaluate statstcally. Samplng errors, on the other hand, can be evaluated statstcally. The sample of respondents selected n a DHS s only one of many samples that could have been selected from the same populaton, usng the same desgn and expected sze. Each of these samples would yeld results that dffer somewhat from the results of the actual sample selected. Samplng errors are a measure of the varablty between all possble samples. Although the degree of varablty s not known exactly, t can be estmated from the survey results. Samplng errors are addressed n some detal n Secton 1.6. The followng sectons of ths chapter concentrate on non-samplng errors, ncludng the nature and the sources of errors and the strateges to control them. As mentoned n Secton 1.6, non-samplng errors are usually the man source of errors n a sample survey, and they are dffcult to evaluate statstcally after the survey s complete. Therefore t s best to mnmze ths type of error throughout the whole survey mplementaton process. 4.1 Errors of coverage and non-response A coverage error occurs when a samplng unt s mstakenly excluded from or ncluded n the survey durng survey mplementaton. Over-coverage occurs when a non-elgble or a non-sampled samplng unt s delberately or mstakenly ncluded n the sample; under-coverage occurs when a sampled elgble samplng unt s delberately or mstakenly excluded from the sample. Non-response, on the other hand, relates to a faled attempt to ntervew a sampled samplng unt. Ths secton deals wth problems n the defnton and estmaton of such error rates. 4.1.1 Coverage errors In DHS surveys, errors of over-coverage (ncluson of unts that do not belong n the sample), do not occur as often as under-coverage errors (errors due to excluson of unts that belong n the sample). A typcal source of over-coverage occurs when vacant households or non-resdental households are sampled for ntervew. Ths may occur f a household s occupancy status has changed between the tme of the household lstng and the household ntervew. Therefore, t s recommended that the tme gap between the household lstng and the man data collecton should be reasonably small. For under-coverage, several sources of error may be dentfed. The frst source of undercoverage error arses n the lstng stage when the lstng staff covers less than the desgnated area. A second source of under-coverage error occurs when an age lmt s used to determne elgblty for ndvdual ntervew, feld staff may msreport an ndvdual s age to push them out of the elgble age range. A thrd source comes when surveys collect nformaton only from de facto ndvduals (.e., those who slept n the household the nght before the survey). There may be delberate omssons of elgble ndvduals by conscously msreportng ther resdency status as non de facto, whch thereby dsqualfes an ndvdual from beng elgble for ntervew. A fourth source comes when a seres of questons n the questonnare are only asked of a certan group. For example, questons related to pregnancy, delvery and chld health are only asked for chldren born snce a partcular date there may be omssons of chldren due to ms-recordng of dates of brth as before the cutoff date or 73
questons regardng knowledge, atttudes and practces related to HIV are only asked f the respondent s recorded as knowng HIV or AIDS there may be omssons of respondents due to ms-recordng of ther knowledge of HIV/AIDS. All four types of coverage errors may nvolve delberate bas by feldworkers seekng to reduce ther workload. Intentonal errors can be controlled by ntensve tranng and close supervson. Errors due to an outdated area frame can be reduced by schedulng the household lstng operaton before the man survey. Errors due to age dstorton can be reduced by close supervson and routne qualty control. Errors due to resdency status can be reduced by changng the data collecton strategy to ntervew all ndvduals wthn the age range regardless of ther de facto status. For example, n DHS surveys, the ntervewers are now nstructed to ntervew all women age 15-49 regardless of whether they slept n the household the nght before the survey. By requrng the ntervewng of all women, the ncentve for msreportng resdency status has been elmnated. However, the de facto character of the surveys s mantaned at the data analyss stage. Usng dfferent feldworkers to conduct the household schedule and ndvdual ntervews wll also help n elmnatng age dstorton, msreportng of resdency status and ms-recordng of dates and other key nformaton. Actve montorng of feldwork through feldwork supervson vsts and the early use of feld check tabulatons on collected data can also lmt the scope and scale of under-coverage. Coverage errors can be nvestgated after the survey feldwork by a varety of methods. The sample can be extrapolated to the total populaton, and data from the last census can be extrapolated to the survey date for comparson. Ths check should be done separately for households and ndvduals. Age dstortons can be nvestgated by studyng the dscontnuty n trends across the elgblty boundares, for example, by lookng at the rato of women age 14 wth those age 15, and those age 49 compared wth those age 50. Whle t s temptng to ntroduce comparsons wth males as a control, t should be noted that n most socetes more males are educated than females, so more precse knowledge of ther own age may reduce heapng at ages 15 and 50 among males compared wth females. 4.1.2 Delberate restrctons of coverage In many surveys, whether n developed or developng countres, certan parts of the natonal terrtory are delberately excluded from the survey for reasons of dffculty of access. Two dstnct cases arse: Excluson of clearly dentfed areas from the samplng frame n ths case, t s usual to state the coverage lmtaton n the survey report, whch then becomes a report on the remander of the country. Such exclusons are not regarded as coverage or response errors but smply as part of the defnton of the survey doman. Ad hoc exclusons decded durng or just pror to feldwork n many surveys t s not uncommon for the survey organzaton to abandon the attempt to conduct feldwork n certan sampled clusters, whether due to floods, cvl dsturbance, or other practcal constrants. Here the exclusons usually occur after sample selecton. If such excluded areas form a meanngful doman, t may be acceptable to deal wth the problem by redefnng the survey doman. More commonly, however, the excluded areas wll not form a meanngful doman and wll have to be accepted as consttutng errors. Ths type of excluson should be classfed as non-response rather than coverage error. 4.1.3 Non-response The response rate provdes nformaton on the survey coverage problems and s an mportant survey parameter. At frst sght, the concept of non-response seems smple and clear: t occurs when 74
a sampled unt, household or ndvdual, refuses to be ntervewed; the non-response rate s the proporton of the number of non-ntervewed unts over the number of unts selected. Takng nto account the dstncton between coverage error and non-response ndcated earler, ths can be modfed by sayng that the nformaton desred s the percentage of attempted ntervews that faled. In practce, there are two features found n some sample desgns whch complcate ths smple ssue. Frst, n many surveys the fnal unts for ntervew are dentfed through a progressve sftng process. For example, n a typcal DHS survey, survey personnel lst and select dwellngs, ntervew the household currently n the dwellng, then ntervew any women age 15-49 n that household. If falure occurs at one of the earler steps, the nformaton whch would enable us to classfy the effects at the fnal level (.e., the ndvdual level) s lackng. For example, f the ntervewer cannot fnd the selected dwellng, t s not known whether t contans a woman elgble for ntervew; f the household does not contan any elgble women, then the falure has no effect on the ntervew response rate. To deal wth ths problem, take the women s survey as an example, and assume that there are only two steps n the sftng process, namely households and women. The tradton of DHS surveys s to compute the response rates for the household survey and the women s ntervew separately because of the way that sample weghts are calculated. There are sx quanttes of potental nterest n computng response rates: A. Households selected B. Households found or elgble (excludng vacant, destroyed, etc.) C. Households ntervewed D. Women selected E. Women found or elgble (all de facto women 15-49 found) F. Women ntervewed Snce the survey prmarly concerns women, the relevant response rate s F/D (.e., women ntervewed dvded by women selected). However, the quantty D s unknown because of the nonrespondng households. It s of nterest to know the total number of elgble women n all selected households but, only the number the number of women found n the households ntervewed (E) s known. Therefore D must be estmated by takng the household non-response nto account. Assumng that the number of elgble women per household s the same among non-respondng households as t s among ntervewed households, the number of women selected can be estmated as: C D = E B where C / B s the effectve household response rate. The reason to use the effectve household response rate s that the non-elgble (vacant, destroyed or other) households A-B s consdered as over-coverage, assumng that same over-coverage exsts n the household lstng. These assumptons may not be very convncng, but the effect of any departure from them on the estmate of D s lkely to be very small. On ths bass the overall response rate for the women s survey, R=F/D, becomes: F F R = = D E Ths response rate s the product of the response rates observed at each of the two stages, households and women. Ths basc prncple provdes a soluton for the problem of not knowng the total number of women sampled. Where two or more steps of sftng are nvolved, the overall C B 75
response rate can be estmated by multplyng together the response rates observed at each step. In dong so, the assumpton s made that the response/non-response outcomes at the dfferent steps occur ndependently. DHS surveys do not allow the replacement of non-respondng households because of the potental bas whch may result from the replaced households beng easer to contact. However, when a sampled household n a selected dwellng moves away between the lstng and the ntervew, the MEASURE DHS program recommends ntervewng the new household (f any) that has moved n by the tme of the man survey. Ths s not consdered a replacement; n fact t reflects the fact that the samplng unt s defned as the dwellng structure rather than ts occupants. The desgn calls for the lstng and selecton of dwellngs, and then for the ntervew of the household found n the dwellng at the tme of the survey. Snce n many areas there s no address system, the ntal lstng operaton has to dentfy the dwellngs n terms of the names of the occupyng households, but these merely serve as addresses. The fact that, n some cases, a new household moves n between the tme of lstng and ntervew does not mean that replacement of a samplng unt has occurred. Thus, such cases do not requre any specal treatment. Moreover, just as a new household movng n does not consttute a replacement, so the case of a household movng out after the lstng wthout another movng n, creatng a vacant household, does not consttute non-response. The elgble household sample s defned as the set of households exstng at the tme of ntervewng n the dwellngs selected from the dwellng lst. 4.1.4 Response rates As seen n the prevous secton, the women s overall response rate s the product of the observed household and women s response rates, therefore, t s meanngful to calculate these two response rates separately. As we mentoned n Secton 1.13, non-response brngs bas. Therefore, the dfferent response rates reflect the data qualty. A separate response rate s useful n sample sze desgn and feld work mprovement. In order to categorze n detal the non-respondng households and ndvduals, the MEASURE DHS program standardzed the response codes to be entered on the questonnares and feld records, and expressed the formulae for response rates n terms of these codes. In DHS surveys, the followng response categores are used at the household level: 1H 2H 3H 4H 5H 6H 7H 8H 9H Completed No household member at home or no competent respondent at home Entre household absent for extended perod Postponed Refused Dwellng vacant or address not a dwellng Dwellng destroyed Dwellng not found Other Note that household above refers to the household found n the dwellng at the tme of the ntervew, not necessarly the household named at the tme of the lstng operaton. The DHS survey fnal reports provde the household response rate calculated by: 76
R 1H = H 1 H + 2 H + 4 H + 5 H + 8 H The reason to nclude 8H n the denomnator s that a household that s not found at the tme of the feldwork may not be a vacant household. It may be that the household was not found because of some error that occurred durng the survey mplementaton. Note also that ths response rate s dfferent from the weghted response rate calculated n Secton 1.13. In Secton 1.13 the am s to calculate the samplng weght, whle here the response rate s used as a data qualty ndcator. It s also worth notng that the above calculated response rate s a net response rate. For the purpose of sample sze determnaton, one should use the gross response rate whch s the number of households ntervewed over the number selected: R 1H = HG 1 H + 2 H + 3 H + 4 H + 5 H + 6 H + 7 H + 8 H + 9 H If the net response rate s used to calculate sample sze, the survey may not obtan the desgned number of ntervews because some of sampled households wll always end up beng nonelgble, especally when there s a long tme lag between household lstng and the man feld work. At the ndvdual level the followng response categores are used: 1I 2I 3I 4I 5I 6I 7I Completed Not at home Postponed Refused Partly completed Incapactated Other The ndvdual response rate s thus: R 1I = I 1 I + 2 I + 3 I + 4 I + 5 I + 6 I + 7 I The category no elgble woman n the household s not ncluded n the lst snce t s rrelevant to the response rate, appearng nether n the numerator nor the denomnator. The same s true for non de facto women. Although an ndvdual questonnare s admnstered to non-de facto women who lve n the household to reduce under-coverage errors as mentoned n Secton 4.1.1, these ntervews are not counted n the numerator or the denomnator of the response rate because non-de facto women are not elgble accordng to the defnton of elgblty. Whenever the other code s used, the ntervewers should specfy the reason for non-response. At the household level, the analyst should revew a prntout of the other codes and recode as many as possble nto the exstng categores. Smlarly, all other codes for the ndvdual ntervew should be examned and recoded. Any questonnare n whch the household or the woman was deemed nelgble should be clearly marked as nelgble and removed from the data fle. An nelgble household may be one n a dwellng unt that does not le wthn the sample area or a neghborng household that was ntervewed ncorrectly as a replacement household. An nelgble woman may be one who was 77
reported as 16 years old n the household questonnare, but later turned out to be 14 (n whch case her age n the household questonnare should be corrected approprately). The overall response rate s obtaned by multplyng the household and the ndvdual level response rates: R = R h R I However, f there has been a delberate excluson of certan areas such as clusters whch were not ntervewed (see Secton 1.13 on cluster level non-response), the overall response rate must also take the cluster response rate nto account. In summary, the fnal overall estmated response rate s obtaned from the formula: R = R R R h I C where * R c = n / n s the rato of the number of clusters ntervewed over the number selected. Such response rates should be computed and publshed separately for the man geographc domans of the sample as well as the whole survey doman. If the sample s self-weghtng wthn doman but has dfferent weghts across domans, the response rates should be computed and publshed for each dfferently weghted doman. 4.2 Samplng errors We ntroduced the concept of samplng errors n Secton 1.6 for sample sze determnaton. In ths secton, we focus on the calculaton of the samplng errors. Samplng errors are usually reported for selected ndcators n Appendx B of the DHS fnal report. A samplng error s usually measured n terms of the standard error for a partcular statstc (mean, percentage, etc.), whch s the square root of the varance. The standard error can be used to calculate confdence ntervals wthn whch the true value for the populaton can reasonably be assumed to fall. For example, for any gven statstc calculated from a sample survey, the value of that statstc wll fall wthn a range of plus or mnus two tmes the standard error (DHS reports +/-2*SE nstead of +/-1.96*SE as 95% confdence nterval as explaned n secton 1.6.1) of that statstc n 95 percent of all possble samples of dentcal sze and desgn. If the sample of respondents were selected as a smple random sample, t would have been possble to use straghtforward formulae to calculate samplng errors. However, DHS survey samples are the result of a mult-stage stratfed desgn, so t s necessary to use more complex formulae. There s a varety of computer software whch can be used to calculate samplng errors, such as the Integrated System for Survey Analyss (ISSA) samplng errors module and the ICF developed SAS macro as well as software such as Wesvar, Cenvar, and Sudaan. These software use the Taylor Lnearzaton Method (Woodruff, 1971) of varance estmaton for survey estmates that are means or proportons. Ths same method s wdely used n commercalzed statstcal software such as SAS, SPSS and STATA. The Jackknfe Repeated Replcaton Method (Efron, 1982, 1993) s used for varance estmaton of more complex statstcs such as fertlty and mortalty rates. The Taylor Lnearzaton Method treats any percentage or average as a rato estmate, r = y/x, where y represents the total weghted sample value for varable y, and x represents the total weghted sample value for varable x or the total number of weghted cases n the group or subgroup under consderaton. The varance of r s computed usng the formula gven below, wth the standard error beng the square root of the varance: 78
2 1 SE ( r) = var( r) = x 2 H n h(1 fh) h zhj 2 h= 1 nh 1 j nh z n whch z h = y rx, and z h = y h rxh h h where h represents the samplng stratum whch vares from 1 to H, n h s the total number of clusters selected n the h th stratum, y hj s the sum of weghted values of varable y n the j th cluster n the h th stratum, x hj s the sum of weghted values of varable x n the j th cluster n the h th stratum, f h s the samplng fracton n stratum h, t can be gnored when t s small x s the sum of weghted values of varable x over the total sample The Jackknfe Repeated Replcaton Method derves estmates of complex rates from each of several replcatons of the parent sample, and calculates standard errors for these estmates usng smple formulae. Each replcaton consders all but one cluster n the calculaton of the estmates. Pseudo-ndependent replcatons are thus created. The varance of a rate r s calculated as follows: n whch SE 2 ( r ) = Var( r ) = k r = kr 1 k ( ) 2 ( 1) r r k = 1 ( k 1) r( ) where r s the estmate computed from the full sample of k clusters, r () s the estmate computed from the reduced sample of k-1 clusters (wth th cluster excluded), and k s the total number of clusters. In addton to the standard error, the procedure computes the desgn effect (DEFT) for estmates whch are means, proportons or ratos. For complex demographc rates, the procedure computes an approxmaton of DEFT. DEFT s defned as the rato between the standard error usng the gven sample desgn and the standard error that would result f a smple random sample had been used. A DEFT value of 1.0 ndcates that the sample desgn s as effcent as a smple random sample, whle a value greater than 1.0 ndcates the ncrease n the samplng error due to the use of a more complex and less statstcally effcent desgn. The procedure also computes the relatve error and confdence lmts for the estmates. Samplng errors are usually reported for the total sample, for the urban and rural areas, and for each of the survey domans. 79
5 SAMPLE DOCUMENTATION 5.1 Introducton Sample documentaton s an mportant part of a DHS survey. The documentaton should nclude all useful nformaton for data analyss, for data qualty assessment, for sample desgn of subsequent surveys, and for data users. Basc sample documentaton should be ncluded n DHS survey fnal reports. Good sample documentaton should nclude the followng aspects from dfferent stages of the survey mplementaton: 1) Target populaton 2) Expected sample sze 3) Man ndcators 4) Report domans 5) Samplng frame 6) Prmary and the secondary samplng unts 7) Stratfcaton 8) Sample allocaton 9) Samplng procedure 10) Selecton probablty 11) Household lstng results 12) Samplng weghts 13) Results of survey mplementaton 14) Samplng errors Ponts 1 to 10 and pont 12 are usually addressed n a Sample Desgn Document from the very begnnng of the survey. For pont 11, the number of households lsted, the number of households selected, and segmentaton nformaton for each of the selected clusters should be provded. A full descrpton sample desgn should be ncluded n Appendx A of the DHS fnal reports. For pont 13, the number of elgble samplng unts selected, the number ntervewed and the household and ndvdual response rates should be presented. Samplng errors (pont 14) are presented n Appendx B of DHS fnal reports for selected ndcators. 5.2 Sample desgn document A sample desgn document s an mportant document whch records the purpose of the survey, the target populaton, the source of the samplng frame, the statstcal methodology, the sample sze and the sample allocaton, and other related topcs. Ths secton gves an example of a sample desgn document to show the detals whch should be ncluded n a sample desgn document. 5.2.1 Introducton The Country Demographc and Health Survey 2012 (XDHS 2012) wll be the fourth DHS followng those mplemented n 1995, 2000 and 2005. A natonally representatve sample of 18,450 households wll be selected. All women 15-49 who are usual resdents of a selected household or who slept n a selected household the nght before the survey are elgble for the survey. The survey wll result n about 17,900 ntervews of women 15-49. As wth the pror surveys, the man objectves of the XDHS 2012 survey are to provde up-to-date nformaton on fertlty and chldhood mortalty levels; fertlty preferences; awareness, approval and use of famly plannng methods; maternal and chld health; knowledge and atttudes toward HIV/AIDS and other sexually transmtted nfectons (STI). 80
Apart from the women s survey, a men s survey wll also be conducted at the same tme n a sub-sample consstng of one household n every three selected for the women s survey. All men 15-59 who are usual resdents of a selected household or who slept n a selected household the nght before the survey are elgble for the men s survey. The survey wll collect nformaton on ther basc demographc and socal status; on ther knowledge and use of famly plannng methods; and on ther knowledge and atttudes toward HIV/AIDS and other sexually transmtted nfectons. The survey wll result n about 5,000 ntervews of men 15-49. In ths sub-sample, all women 15-49, all chldren under 5 years of age wll be weghed, measured and tested for anema n order to study ther nutrtonal status. The survey s desgned to produce representatve estmates for most of the ndcators for the country as a whole, for the urban and the rural areas separately, for the captal cty of the country, and for each of the ten geographcal regons. 5.2.2 Samplng frame The samplng frame used for XDHS 2010 s the Country Populaton and Housng Census conducted n 2006 (XPHC 2006), provded by the Central Statstcal Offce (CSO). CSO has made avalable an electronc fle consstng of 81,654 Enumeraton Areas (EAs) created for the 2006 census n 9 of ts 10 regons. An EA s a geographc area consstng of a convenent number of dwellng unts whch served as a countng unt for the census. The frame fle contans nformaton about the locaton, the type of resdence and the number of resdental households for each of the 81,654 EAs. Sketch maps are also avalable for each EA whch delneate the geographc boundares of the EA. It should be ponted out that ths fle does not nclude Regon 10 because the census conducted n Regon 10 used a dfferent methodology due to dffculty of access. Therefore, the samplng frame for Regon 10 s n a dfferent fle and uses a dfferent format. It s also worth notng that the samplng frame excluded some specal EAs whch have dsputed boundares; ths knd of EA represents only 0.1% of the total populaton. The census cartographc work for Regon 10 was conducted usng two dfferent methods. In two of ts sx dstrcts, namely, Dstrcts 2 and 4, tradtonal cartographcal work smlar to the other regons of the country was carred out, whle n the other four dstrcts, the cartographc work was carred out by usng satellte photos wthout physcal vsts of the area. The census data could not be used to update the cartographc work n Regon 10 because of codng problems. So n Regon 10, a samplng frame wth a smlar format as n the other regons s avalable only for the three zones where a tradtonal cartographc work had been carred out. However, the number of households n the samplng frame for these three zones s based on the number of households estmated durng the cartographc work precedng the census and not the actual number of households counted n the census. Due to securty concerns, as n the XDHS 2000 and XDHS 2005, t has been decded that the XDHS 2012 wll be conducted only n these two dstrcts. These two dstrcts together have 1,246 EAs, and they represent 53% of the regonal total populaton. Takng nto account the specal EAs whch are excluded from the census frame, the samplng frame used for the XDHS 2012 covered 98.4% of the country s total populaton. Country s dvded nto 10 geographcal regons; each regon s sub-dvded nto dstrcts, and each dstrcts nto wards. Table 5.1 shows the dstrbuton of the EAs and the mean number of households per EA by regon and by type of resdence. The samplng frame ncludes 82,900 EAs, among them 17,346 are n urban areas and 65,554 are n rural areas. The average sze of an EA n terms of number of households s 170 n an urban EA and 182 n a rural EA, for an overall average sze of 180 households per EA. Table 5.2 shows the dstrbutons of households by regon and by type of resdence. The dstrbuton s a very skewed dstrbuton snce 83.4% of the country s households are concentrated n 3 regons, namely, Regon 3, Regon 4 and Regon 6; whle the fve small regons 81
Regon 2, Regon 5, Regon 7, Regon 8 and Regon 9 together represent only 3.8% of the country s total households. Table 5.1 Dstrbuton of EAs and average sze of EA by regon and by type of resdence Regon Number of EA Average EA sze Urban Rural Total Urban Rural Total Regon 1 1,541 4,139 5,680 153 177 171 Regon 2 260 828 1,088 177 233 219 Regon 3 3,391 18,016 21,407 183 182 182 Regon 4 5,030 25,800 30,830 172 179 178 Regon 5 188 786 974 140 152 150 Regon 6 2,124 14,490 16,614 166 184 182 Regon 7 133 347 480 145 129 134 Regon 8 172 98 270 163 180 169 Captal Cty 3,865 3,865 167 167 Regon 9 318 128 446 163 169 165 Regon 10* 324 922 1,246 154 267 237 Country 17,346 65,554 82,900 170 182 180 Source: XPHC 2006; Regon 10 has only two dstrcts ncluded. Table 5.2 Dstrbuton of households by regon and by type of resdence Regon Number of households % Urban Rural Total Urban % of Country Regon 1 235,530 734,357 969,887 0.243 0.065 Regon 2 45,910 192,554 238,464 0.193 0.016 Regon 3 619,796 3,284,512 3,904,308 0.159 0.262 Regon 4 864,303 4,630,702 5,495,005 0.157 0.369 Regon 5 26,314 119,446 145,760 0.181 0.010 Regon 6 353,554 2,667,787 3,021,341 0.117 0.203 Regon 7 19,275 44,879 64,154 0.300 0.004 Regon 8 27,975 17,651 45,626 0.613 0.003 Captal Cty 646,216 0 646,216 1.000 0.043 Regon 9 51,991 21,643 73,634 0.706 0.005 Regon 10* 49,844 245,922 295,766 0.169 0.020 Country 2,940,708 11,959,453 14,900,161 0.197 1.000 Source: XPHC 2006; Regon 10 has only two dstrcts ncluded. 5.2.3 Structure of the sample and the samplng procedure The sample for the XDHS 2012 wll be a stratfed sample selected n two stages from the 2006 census frame. Stratfcaton was acheved by separatng each regon nto urban and rural areas. In total, 19 samplng strata have been created snce the regon of Captal has only urban areas. Samples wll be selected ndependently n each samplng stratum, by two-stage selecton. Implct stratfcaton and proportonal allocaton s acheved at each of the lower admnstratve levels by sortng the 82
samplng frame accordng to admnstratve unts n dfferent levels and by usng a probablty proportonal to sze selecton at the frst stage of samplng. In the frst stage, 615 EAs have been selected wth probablty proportonal to EA sze and wth ndependent selecton n each samplng stratum wth the sample allocaton gven n table 5.3 below. Takng nto account the tme passed snce the last populaton census, a household lstng operaton wll be carred out n all of the selected EAs before the man survey. The household lstng operaton conssts of vstng each of the 615 selected EAs; drawng a locaton map and a detaled sketch map; and recordng on the household lstng forms all resdental households found n the EA wth the address and the name of the head of the household. The resultng lst of households wll serve as the samplng frame for the selecton of households n the second stage. Some of the selected EAs may be found to be large n sze n the household lstng operaton. In order to mnmze the task of household lstng, the selected EAs contanng an estmated number of households greater than 300 wll be segmented. Only one segment wll be selected for the survey wth probablty proportonal to the segment sze. The methodology and the detaled household lstng procedure are addressed n the Household Lstng Manual (see Chapter 2). At the second stage, a fxed number of 30 households wll be selected from each EA. Table 5.3 shows the sample dstrbuton of clusters and households by regon and by type of resdence. Among the 615 EAs selected, 185 are n urban areas and 430 are n rural areas. The total number of households to be selected s 18,450; among them, 5,550 wll be n urban areas and 12,900 wll be n rural areas. In the samplng frame, the household dstrbuton by regon vares from 0.3 percent for Regon 8, to 36.9 percent for Regon 4 (see Table 5.2 n Secton 5.2.2). To allocate the approxmately 17,900 women ntervews to dfferent regons, a proportonal allocaton wll provde the best precson for natonal level ndctors, but not for regonal level ndcators. The small regons such as Regon 7, Regon 8 and Regon 9 would receve a sample sze whch s too small to acheve the degree of precson desred for regonal level estmates. In order for the precson of estmates to be acceptable across regons, experence shows that a mnmum of 800 women s ntervews are needed so that relable estmatons for most of the DHS ndcators can be obtaned. The fnal sample allocaton reflects a power allocaton whch s between the proportonal allocaton and the equal sze allocaton. So that the survey precson n the urban areas s comparable wth the rural areas, urban areas are slghtly over-sampled. The allocatons of clusters and households by regon and by type of resdence are functons of the estmated average number of women age 15-49 per household and the household and ndvdual response rates. Estmates for these parameters are obtaned from the XDHS 2005 survey. Accordng to the results of XDHS 2005, the average number of women age 15-49 per household s 1.20 n urban areas and 1.00 n rural areas. The number of men age 15-49 per household s 1.05 n urban areas and 0.95 n rural areas. The household response rates are 92 percent n urban areas and 94 percent n rural areas; the women s response rates are 94 percent and 96 percent n the urban and rural areas, respectvely; the men s response rates are 85 percent and 90 percent n the urban and rural areas, respectvely. 83
Table 5.3 Sample allocaton of clusters and households by regon and by type of resdence Allocaton of clusters Allocaton of households Regon Urban Rural Regon Urban Rural Regon Regon 1 13 47 60 390 1,410 1,800 Regon 2 10 38 48 300 1,140 1,440 Regon 3 10 62 72 300 1,860 2,160 Regon 4 13 62 75 390 1,860 2,250 Regon 5 6 42 48 180 1,260 1,440 Regon 6 7 65 72 210 1,950 2,160 Regon 7 9 37 46 270 1,110 1,380 Regon 8 25 17 42 750 510 1,260 Captal Cty 54 na 54 1,620 na 1,620 Regon 9 27 15 42 810 450 1,260 Regon 10 11 45 56 330 1,350 1,680 Country 185 430 615 5,550 12,900 18,450 Table 5.4 Expected number of ntervews by regon and by type of resdence Statstcal Regon Women ntervewed Men ntervewed Urban Rural Regon Urban Rural Regon Regon 1 434 1,280 1,714 98 358 456 Regon 2 333 1,035 1,368 76 290 366 Regon 3 333 1,689 2,022 76 472 548 Regon 4 434 1,689 2,123 98 472 570 Regon 5 200 1,144 1,344 45 320 365 Regon 6 233 1,771 2,004 53 495 548 Regon 7 299 1,008 1,307 69 282 351 Regon 8 834 463 1,297 189 130 319 Captal Cty 1,800 na 1,800 408 na 408 Regon 9 901 409 1,310 205 114 319 Regon 10 367 1,226 1,593 83 342 426 Country 6,168 11,714 17,882 1,400 3,275 4,676 Men s survey wll be carred out n one household n every three selected for women s survey. 5.2.4 Selecton probablty and samplng weght Due to the non-proportonal allocaton of the sample to the dfferent regons and to ther urban and rural areas, samplng weghts wll be requred for any analyss usng XDHS 2012 data to ensure the survey results are representatve at natonal and regonal levels. Snce the XDHS 2012 sample s a two-stage stratfed cluster sample, samplng weghts wll be calculated based on the separate samplng probabltes for each samplng stage and for each cluster. We use the followng notatons: 84
P 1h : P 2h : frst-stage samplng probablty of the th cluster n stratum h second-stage samplng probablty wthn the th cluster (household selecton) Let n h be the number of clusters selected n stratum h, M h the number of households accordng to the samplng frame n the th cluster, and the total number of households n the stratum. The probablty of selectng the th cluster n the XDHS 2012 sample s calculated as follows: P 1h nh M = M A dfferent formula must be used to calculate the probablty of selectng a cluster that has been segmented. Let b be the proporton of households n the selected segment compared to the h total number of households n the EA n stratum h f the EA s segmented, otherwse b = 1. Then the probablty of selectng cluster n the sample s: n M M h h h h P 1h= h Let L h be the number of households lsted n the household lstng operaton n cluster n stratum h, let t h be the number of households selected n the cluster. The second stage selecton probablty for each household n the cluster s calculated as follows: t P 2h = L h h b Mh h h The overall selecton probablty of each household n cluster of stratum h s therefore the producton of the two selecton probabltes: Ph = P1 h P2 h The desgn weght for each household n cluster of stratum h s the nverse of ts overall selecton probablty: W h = 1 / P h A spreadsheet contanng all samplng parameters and selecton probabltes s prepared to facltate the calculaton of samplng weghts. Samplng weghts wll be adjusted for household nonresponse as well as for ndvdual non-response, for the women s and men s surveys respectvely. The dfferences between the household weghts and the ndvdual weghts are ntroduced by ndvdual non-response. The fnal weghts are normalzed so that the total number of unweghted cases wll equal the total number of weghted cases at the natonal level, for both household weghts and ndvdual weghts. 5.3 Sample fle A sample fle ncludng all samplng parameters s very mportant for survey management and for samplng weght calculaton. Once the sample ponts are selected, an Excel fle should be prepared whch should nclude the cluster number and cluster ID nformaton, and all samplng parameters such as the doman, stratum and EA selecton probablty. The cluster number s a unque seral number 85
from 1 to the total number of clusters selected. It s mportant for communcaton and for feld work supervson. The cluster number s the offcal cluster ID once assgned. It s also useful to nclude n the sample fle the EA sze, the total sze of the stratum, the number of EAs n the stratum and the number of EAs selected n the stratum. These peces of nformaton allow for reconstructon of the selecton probablty, f needed, for example, for checkng purposes and for replacement clusters. If a selected cluster s not accessble due to securty problems and a replacement cluster s selected, then from the samplng parameters t s easy to calculate the selecton probablty for the replacement cluster. Table 5.5 below shows a part of an example sample fle. The columns wth the lghter colored headngs represent the samplng nformaton provded by the samplng statstcan. The columns wth the darker colored headngs represent the EA dentfcaton nformaton from the samplng frame. Ths fle should be updated after the household lstng operaton by addng the number of households lsted, the segmentaton nformaton, and the number of households selected. These 3 peces of nformaton are necessary for developng the desgn weght for each cluster. 86
Table 5.5 An example sample fle 87
5.4 Results of Survey mplementaton Once the feld work for the survey has been completed, and the data entry s fnshed, some tables for the results of the survey mplementaton should be produced to evaluate the survey coverage and the departures from the survey desgn. These tables typcally nclude a summary table and ndvdual tables for the household, women s and men s surveys, respectvely. A summary table s usually presented n Chapter 1 of the DHS fnal report, ncludng the number of clusters selected and ntervewed, the number of households selected and ntervewed, the number of women selected and ntervewed, and the number of men selected and ntervewed. The detaled tables for the household, women s and men s surveys are usually present n Appendx A of the DHS fnal report along wth the sample desgn document. These tables both reflect the survey coverage and the data qualty and provde varous response rates and the number of elgble ndvduals per household, whch are useful nformaton for the sample desgn for subsequent surveys. The followng tables are example tables that should be ncluded n the fnal report. Table 5.6 Example table for the results of survey mplementaton 88
Table 5.7 Example appendx table for the results of the women s survey mplementaton 89
Table 5.8 Example appendx table for the results of the men s survey mplementaton 5.5 Samplng errors Samplng errors are mportant data qualty parameters whch gve a measure of the precson of the survey estmates. The DHS survey fnal reports present samplng errors n Appendx B for selected ndcators. The samplng error tables present the estmated ndcator value, the standard error, the number of unweghted and weghted cases, the desgn effect, the relatve standard error and the confdence lmts. The desgn effect can be used n sample sze calculaton for subsequent survey desgns. Secton 4.2 deals wth the detals of the calculaton of samplng errors; here we gve an example of the natonal level samplng error table. 90
Table 5.9 Example table for samplng errors 5.6 Samplng parameters n DHS data fles Some mportant samplng parameters should be ncluded n the DHS fnal data set, such as doman, stratum, EA selecton probablty, and samplng weghts. DHS survey fnal data fles usually present geographc dentfers only down to doman or regon level; dstrct level dentfers are usually not presented due to confdentalty constrants. As for the samplng stratum dentfer, DHS fnal data fles should provde the true samplng stratum, whch s mportant for many statstcal analyses such as the samplng error calculaton. However, n case of small strata havng only a few clusters selected, 91
confdentalty constrants do not allow DHS data fles to present the true samplng stratum dentfer. In these cases, a hgher level stratfcaton dentfer s ncluded nstead, whch should be close to the true stratfcaton and wll not ntroduce substantal bas. The standard samplng parameters ncluded n the DHS Recode data fles nclude: 1) Cluster ndcator varable 2) Stratfcaton varable 3) Samplng weght varables 4) Survey doman varables 5) Frst level geographcal/admnstratve unt varable (regon or provnce or department, etc.) 92
Glossary of terms Analyss doman Base map Cluster Collectve lvng quarters Confdence nterval Degrees of freedom Desgn doman Desgn Effect (Deft) Desgn weght Desred precson Dwellng unt A sub-populaton whch cannot be dentfed n the samplng frame, such as domans specfed by ndvdual characterstcs. See also Desgn doman. A reference map that descrbes the geographc locaton and boundares of an EA. The smallest geographc survey statstcal unt for DHS surveys. It conssts of a number of adjacent households n a geographc area. For DHS surveys, a cluster corresponds ether to an EA or a segment of a large EA. Lvng quarters such as army camps, boardng schools, or prsons where persons lve ndvdually. Collectve lvng quarters are not consdered as ordnary households and are excluded from DHS samples. A range wthn whch the true value of an estmate lkely les. Usually reported as, wth 95% confdence, the true value of Y wll le wthn the range of y 1.96 * SE( y ) and y + 1.96 * SE( y ). Typcally, DHS reports use y ± 2 * SE( y ) for a conservatve estmate of 95% confdence lmts. The number of ndependent unts of nformaton n a sample relevant to the estmaton of a parameter or calculaton of a statstc. A sub-populaton whch can be dentfed n the samplng frame and therefore can be handled ndependently n the sample sze and samplng procedures, usually consstng of geographc areas or admnstratve unts. See also Analyss doman. A measure of effcency of a complex samplng procedure compared to smple random samplng, defned as the rato between the standard error usng the gven sample desgn and the standard error that would result f a smple random sample had been used. The nverse of the overall probablty wth whch a samplng unt (household or ndvdual) was selected n the sample. See also Samplng weght. The level of accuracy of the results desred, often expressed as Relatve standard error or coeffcent of varaton. A room or a group of rooms normally ntended as a resdence for one household (for example: a sngle house, an apartment, a group of rooms n a house); a dwellng unt can have more than one household. 93
Enumeraton Area (EA) Explct stratfcaton Gross response rate Head of household Household Household lstng Household selecton A geographc statstcal unt whch s created as a countng unt for a census and contans a certan number of households. The actual dvson of the samplng unts nto specfed parts known as strata. See also Implct stratfcaton. The number of households or ndvduals ntervewed over the number selected. A person who s acknowledged as such by members of the household and who s usually responsble for the upkeep and mantenance of the household. A person or a group of related or unrelated persons, who lve together n the same dwellng unt, who acknowledge one adult male or female 15 years old or older as the head of the household, who share the same housekeepng arrangements, and are consdered as one unt. A complete lstng of dwellng unts/households n the selected EAs prepared pror to the selecton of households. Random selecton of the households from the household lstng, typcally by systematc selecton. Implct stratfcaton The systematc samplng or probablty proportonal to sze samplng of samplng unts from an ordered lst to acheve the effect of Stratfcaton. See also Explct stratfcaton. Item non-response Locaton map Master sample Measure of sze Non-samplng errors A samplng unt does not provde an answer for a specfc queston. See also Unt non-response. A map produced n the household lstng operaton whch ndcates the man access to a cluster, ncludng man roads and man landmarks n the cluster. A random sample of large sze drawn from the census frame and prepared for use n a number of surveys, from whch sub-samples can be selected for specfc surveys. A measurement reflectng the sze of the samplng unt, typcally the number of households or the total populaton of the samplng unt, avalable for each and every prmary samplng unt n the country. Non-samplng errors result from problems durng data collecton and data processng, such as falure to locate and ntervew the correct household, msunderstandng of the questons on the part of ether the ntervewer or the respondent, and data entry errors. 94
Normalzed standard weghts Prmary Samplng Unt (PSU) Probablty sample Relatve standard error (RSE) Sample take Samplng errors Samplng frame Samplng unt Samplng weght Secondary Samplng Unt (SSU) Self-weghtng sample Smple random sample (SRS) Sketch map SRSWOR Samplng weght normalzed by a constant factor such that the unweghted number of cases s the same as the weghted number of cases at the natonal level. Normalzed standard weghts are calculated for total households, total women and total men. The samplng unt for the frst stage of selecton n a mult-stage samplng procedure; n DHS, typcally an EA or a segment of an EA. A sample n whch the unts are selected randomly wth known and nonzero probabltes. The amount of samplng error relatve to the ndcator level, ndependent of the scale of the ndcator, calculated by dvdng the standard error by the estmated value of the ndcator The number of households or ndvduals to be ntervewed per sample cluster. Samplng errors are the representatve errors due to samplng of a small number of elgble unts from the target populaton nstead of ncludng every elgble unt n the survey. A complete lst of all samplng unts that entrely covers the target populaton. The unt of selecton at each stage of the samplng process. In a typcal DHS wth two-stage cluster samplng, the samplng unt at the frst stage (Prmary samplng unt) would be the EA, and the samplng unt at the second stage (Secondary samplng unt) would be the household. The desgn weght corrected for non-response or other calbratons. The samplng unt for the second stage of selecton; n a typcal DHS two-stage sample ths s a household. A sample of ndvduals n whch each ndvdual has the same probablty of beng selected, and therefore a constant samplng weght s used. Also known as an equal probablty sample. A random selecton of ndvduals or households drawn drectly from the target populaton wth each ndvdual or household havng equal probablty of beng selected. A map produced n the household lstng operaton, wth locaton of all structures found n the lstng operaton whch helps the ntervewer locate the selected households. A sketch map also contans the cluster dentfcaton nformaton, locaton nformaton, access nformaton, and prncpal physcal features and landmarks such as mountans, rvers, roads and electrc poles. Smple random sample wthout replacement. 95
Standard error (SE) Stratfcaton Structure Student s t dstrbuton Survey doman/study doman Systematc selecton (SYS) Target populaton Two-stage cluster samplng Unformly dstrbuted random number Unt non-response Varance Weght The standard devaton of the samplng dstrbuton of a statstc, or representatve error due to samplng. See also Samplng errors. The process by whch the survey populaton s dvded nto subgroups or strata that are as homogeneous as possble based on certan crtera. The prncpal objectve of stratfcaton s to reduce samplng errors. A free-standng buldng or other constructon that can have one or more unts for resdental or commercal use. Resdental structures can have one or more dwellng unts (for example: sngle house, apartment structure). A famly of contnuous probablty dstrbutons that arses when estmatng the mean of a normally dstrbuted populaton n stuatons where the sample sze s small and populaton standard devaton s unknown. A sub-populaton for whch separate estmaton of the man ndcators s requred. Selecton of unts startng from a random pont and selectng every n th unt. The populaton of nterest n the survey, typcally, n DHS, women age 15-49 and chldren under fve years of age lvng n resdental households. Most surveys also nclude men age 15-59. At the frst stage, a stratfed sample of EAs s selected n each stratum, typcally n DHS wth probablty proportonal to sze (PPS). At the second stage, a fxed (or varable) number of households s selected typcally n DHS by equal probablty systematc samplng. A random number whch comes from a unform dstrbuton, that s, all possble values n the nterval wthn whch the random number s selected have equal probablty of selecton. A samplng unt (cluster, household, ndvdual) s not ntervewed at all. See also Item non-response. A measure of how far a set of numbers s spread out around ther mean. An nflaton factor whch extrapolates the sample to the target populaton. See also Desgn weght and Samplng weght. 96
References Alaga, A. & Ren, R (2006). Optmal sample szes for two-stage cluster samplng n Demographc and Health Surveys. DHS workng papers No. 30. Banker, M. D. (1998). Power allocatons: determnng sample sze for sub-natonal areas. The Amercan Statstcan, Vol. 42, PP. 174-177. Cochran, W. G. (1977). Samplng technques. John Wley & Sons, New York Devlle, J.-C. & Särndal, C.-E. (1992). Calbraton Estmators n Survey Samplng, JASA, Vol. 87, No. 418, pp. 376-382. Devlle, J.-C., Särndal, C.-E. & Sautory, O. (1993). Generalzed Rakng Procedures n Survey Samplng, JASA, Vol. 88, No. 423, pp. 1013-1020. Dupont, F. (1994). Calbraton Used as a Nonresponse Adjustment, IN: Dday, E. (ed.) New Approaches n Classfcaton and Data Analyss, Sprnger Verlag, pp. 539-548. Efron, B. & Tbshran, R. J. (1993). An ntroducton to the Bootstrap. Chapman & Hall. Hartley, H. O. & Rao, J. N. K. (1962). Samplng wth unequal probabltes and wthout replacement. Annals of Mathematcal Statstcs, Vol. 33, pp. 350-374. Ksh, L. (1965). Survey Samplng. John Wley & Sons, New York. Lundström, S. & Särndal, C.-E. (1999). Calbraton as a Standard Method for Treatment of nonresponse, JOS, Vol. 15, No. 2, pp.305-327. Macro Internatonal Inc. (1996). Samplng Manual. DHS-III Basc Documentaton No. 6. Calverton, Maryland. Neuwenbroek, N., Renssen R., Slootbeek, G. & Veugen, T. (1997). A General Weghtng Package Includng Estmates for Populaton Totals and Correspondng Varances: Extended Verson, CBS Research Paper, No. 9745. Platek, R. & Särndal, C.-E. (2001). Can a Statstcan Delver? JOS, Vol. 17, pp. 1-20. Ren. R (2003). Théores des sondages. Lecture notes, ENSAI, France Sautory, O. (1993). La macro SAS CALMAR: Redressement d un Echantllon par Calage sur Marges, Document de traval de la Drecton des Statstques Démographques et Socales, no. F9310, INSEE. Sknner, C. (1999). Calbraton Weghtng and Non-Samplng Errors, Research n Offcal Statstcs, No. 1, pp.33-43. Smth, T.M.F. (1990). Comment on Rao and Bellhouse: Foundatons of survey based estmaton and analyss. Survey Methodology, vol. 20, pp. 3-22. 97
Tllé, Y. (2001). Théores des sondages. Dunod, Pars. Wolter, K. M. (1984). An nvestgaton of some estmators of varance for systematc samplng. JASA, Vol. 79, pp. 781-790 Wolter, K. M. (1985). Introducton to varance estmaton. Sprnger-Verlag, New York Woodruff, R. S. (1971). A smple method for approxmatng the varance of a complcated estmate. JASA, Vol. 66, pp. 411-414. Yates, F. & Grundy, P. M. (1953). Selecton Wthout Replacement from Wthn Strata wth Probablty Proportonal to Sze. Journal of the Royal Statstcal Socety. Seres B (Methodologcal) Vol. 15, No. 2 (1953), pp. 253-261. Blackwell Publshng. 98