Demographic and Health Surveys Methodology
|
|
|
- Sharyl Stevens
- 10 years ago
- Views:
Transcription
1 samplng and household lstng manual Demographc and Health Surveys Methodology Ths document s part of the Demographc and Health Survey s DHS Toolkt of methodology for the MEASURE DHS Phase III project, mplemented from Ths publcaton was produced for revew by the Unted States Agency for Internatonal Development (USAID). It was prepared by MEASURE DHS/ICF Internatonal.
2 [THIS PAGE IS INTENTIONALLY BLANK]
3 Demographc and Health Survey Samplng and Household Lstng Manual ICF Internatonal Calverton, Maryland USA September 2012
4 MEASURE DHS s a fve-year project to assst nsttutons n collectng and analyzng data needed to plan, montor, and evaluate populaton, health, and nutrton programs. MEASURE DHS s funded by the U.S. Agency for Internatonal Development (USAID). The project s mplemented by ICF Internatonal n Calverton, Maryland, n partnershp wth the Johns Hopkns Bloomberg School of Publc Health/Center for Communcaton Programs, the Program for Approprate Technology n Health (PATH), Futures Insttute, Camrs Internatonal, and Blue Raster. The man objectves of the MEASURE DHS program are to: 1) provde mproved nformaton through approprate data collecton, analyss, and evaluaton; 2) mprove coordnaton and partnershps n data collecton at the nternatonal and country levels; 3) ncrease host-country nsttutonalzaton of data collecton capacty; 4) mprove data collecton and analyss tools and methodologes; and 5) mprove the dssemnaton and utlzaton of data. For nformaton about the Demographc and Health Surveys (DHS) program, wrte to DHS, ICF Internatonal, Beltsvlle Drve, Sute 300, Calverton, MD 20705, U.S.A. (Telephone: ; fax: ; e-mal: [email protected]; Internet: Recommended ctaton: ICF Internatonal Demographc and Health Survey Samplng and Household Lstng Manual. MEASURE DHS, Calverton, Maryland, U.S.A.: ICF Internatonal
5 TABLE OF CONTENTS TABLES AND FIGURES... v 1 DEMOGRAPHIC AND HEALTH SURVEYS SAMPLING POLICY General prncples Exstng samplng frame Full coverage Probablty samplng Sutable sample sze Smple desgn Household lstng and pre-selecton of households Good sample documentaton Confdentalty Exactness of survey mplementaton Survey objectves and target populaton Survey doman Samplng frame Conventonal samplng frame Alternatve samplng frames Evaluaton of the samplng frame Stratfcaton Sample sze Sample sze and samplng errors Sample sze determnaton Sample allocaton Two-stage cluster samplng procedure Sample take per cluster Optmum sample take Varable sample take for self-weghtng Household lstng Household selecton n the central offce Household ntervews Samplng weght calculaton Why we need to weght the survey data Desgn weghts and samplng weghts How to calculate the desgn weghts... 23
6 Correcton of unt non-response and calculaton of samplng weghts Normalzaton of samplng weghts Standard weghts for HIV testng De-normalzaton of standard weghts for pooled data Calbraton of samplng weghts n case of bas Data qualty and samplng error reportng Sample documentaton Confdentalty HOUSEHOLD LISTING OPERATION Introducton Defnton of terms Responsbltes of the lstng staff Locatng the cluster Preparng locaton and sketch maps Collectng a GPS waypont for each cluster Lstng of households Segmentaton of large clusters Qualty control Prepare the household lstng forms for household selecton Appendx 2.1 Example lstng forms Appendx 2.2 Symbols for mappng and lstng Appendx 2.3 Examples of completed mappng and lstng forms SELECTED SAMPLING TECHNIQUES Smple random samplng Equal probablty systematc samplng Samplng theory Excel templates for systematc samplng Probablty proportonal to sze samplng Samplng theory Operatonal descrpton and examples Complex samplng procedures SURVEY ERRORS Errors of coverage and non-response Coverage errors Delberate restrctons of coverage Non-response v
7 4.1.4 Response rates Samplng errors SAMPLE DOCUMENTATION Introducton Sample desgn document Introducton Samplng frame Structure of the sample and the samplng procedure Selecton probablty and samplng weght Sample fle Results of Survey mplementaton Samplng errors Samplng parameters n DHS data fles Glossary of terms References v
8
9 TABLES AND FIGURES Table 1.1 Table 1.2 Sample sze determnaton for estmatng current use of a modern contraceptve method among currently marred women Sample sze determnaton for estmatng the prevalence of full vaccnaton coverage among chldren aged months Table 1.3 Sample allocaton: Proportonal allocaton Table 1.4 Sample allocaton: Power allocaton Table 1.5 Optmal sample take for currently marred women currently usng any contraceptve method based on ntracluster correlaton ρ and survey cost rato c 1 / c2 from past surveys Table 5.1 Dstrbuton of EAs and average sze of EA by regon and by type of resdence Table 5.2 Dstrbuton of households by regon and by type of resdence Table 5.3 Sample allocaton of clusters and households by regon and by type of resdence Table 5.4 Expected number of ntervews by regon and by type of resdence Table 5.5 An example sample fle Table 5.6 Example table for the results of survey mplementaton Table 5.7 Example appendx table for the results of the women s survey mplementaton Table 5.8 Example appendx table for the results of the men s survey mplementaton Table 5.9 Example table for samplng errors Fgure 3.1 Smple household selecton wth a sub-sample Fgure 3.2 Selecton of runs wth a sub-sample Fgure 3.3 Smple self-weghtng selecton wthout sample sze control Fgure 3.4 Self-weghtng selecton wth runs and wthout sample sze control Fgure 3.5 Self-weghtng selecton wth sample sze control Fgure 3.6 Self-weghtng selecton wth runs and wth sample sze control Fgure 3.7 Manual household selecton n the feld Fgure 3.8 Part of an Excel template for stratfed samplng Fgure 3.9 Part of an example for a provnce crossed urban-rural stratfed PPS samplng Fgure 3.10 Part of an example sample fle from a stratfed PPS samplng v
10
11 1 DEMOGRAPHIC AND HEALTH SURVEYS SAMPLING POLICY 1.1 General prncples Scentfc sample surveys are cost-effcent and relable ways to collect populaton-level nformaton such as socal, demographc and health data. The MEASURE DHS project s a worldwde project mplemented across varous countres and at multple ponts n tme wthn a country. In order to acheve comparablty, consstency and the best qualty n survey results, samplng actvtes n the Demographc and Health Surveys (DHS) should be guded by a number of general prncples. Ths manual presents general gudelnes on samplng for DHS surveys, although modfcatons may be requred for country-specfc stuatons. The key prncples of DHS samplng nclude: Use of an exstng samplng frame Full coverage of the target populaton Probablty samplng Usng a sutable sample sze Usng the most smple desgn possble Conductng a household lstng and pre-selecton of households Provdng good sample documentaton Mantanng confdentalty of ndvdual s nformaton Implementng the sample exactly as desgned Exstng samplng frame A probablty sample can only be drawn from an exstng samplng frame whch s a complete lst of statstcal unts coverng the target populaton. Snce the constructon of a new samplng frame s lkely to be too expensve, DHS surveys should use an adequate pre-exstng samplng frame whch s offcally recognzed. Ths s possble for most of the countres where there has been a populaton census n recent years. Census frames are generally the best avalable samplng frame n terms of coverage, cartographc materals and organzaton. However, an evaluaton of the qualty and the accessblty of the frame should be consdered durng the development of the survey desgn, and a detaled study of the samplng frame s necessary before drawng the sample. In the absence of a census frame, a DHS survey can use an alternatve samplng frame, such as a complete lst of vllages or communtes n the country wth all necessary dentfcaton nformaton ncludng a measure of populaton sze (e.g. number of households), or a master sample whch s large enough to support the DHS desgn Full coverage A DHS survey should cover 100 percent of the target populaton n the country. The target populaton for the DHS survey s all women age and chldren under fve years of age lvng n resdental households. Most surveys also nclude all men age The target populaton may vary from country to country or from survey to survey, but the general samplng prncples are the same. In some cases, excluson of some areas may be necessary because of extreme naccessblty, volence or nstablty, but these ssues need to be consdered at the very begnnng of the survey, before the sample s drawn. 1 The age range vares from survey to survey and may be 15-49, 15-54, or
12 1.1.3 Probablty samplng A scentfc probablty samplng methodology must be used n DHS surveys. A probablty sample s defned as one n whch the unts are selected randomly wth known and nonzero probabltes. Ths s the only way to obtan unbased estmaton and to be able to evaluate the samplng errors. The term probablty samplng excludes purposve samplng, quota samplng, and other uncontrolled non-probablty methods because they cannot provde evaluaton of precson and/or confdence of survey fndngs Sutable sample sze Sample sze s a key parameter for DHS surveys because t s drectly related to survey budget, data qualty and survey precson. Theoretcally, the larger the sample sze, the better the survey precson, but ths s not always true n practce. Survey budget s not the only mportant factor n determnng the sample sze. Desred precson, the number of domans, capablty of the mplementng organzaton, data qualty concerns and cost effectveness are essental constrants n determnng the total sample sze. Thus a sutable sample sze s also a key parameter to guarantee data qualty Smple desgn In large-scale surveys, non-samplng errors (coverage errors, errors commtted n survey mplementaton and data processng, etc.) are usually the most mportant sources of error and are expensve to control and dffcult to evaluate quanttatvely. It s therefore mportant to mnmze them n survey mplementaton. In order to facltate accurate mplementaton of the survey, the samplng desgn for DHS should be as smple and straghtforward as possble. Macro s experence from 25 years of DHS surveys shows that a two-stage household-based sample desgn s relatvely easy to mplement and that qualty can be mantaned Household lstng and pre-selecton of households The DHS standard procedure recommends that households be pre-selected n the central offce pror to the start of feldwork rather than by teams n the feld who may have pressures to bas the selecton. The ntervewers are asked to ntervew only the pre-selected households. In order to prevent bas, no changes or replacements are allowed n the feld. To perform pre-selecton of households, a complete lst of all resdental households n each of the selected sample clusters s necessary. Ths lst s usually obtaned from a household lstng operaton conducted before the man survey. In some surveys, the household lstng operaton may be combned wth the man survey to form a sngle feld operaton, and households can be selected n the feld from a complete lstng. Combnng the household lstng and survey data collecton n one feld operaton s less expensve; however, t provdes ncentve to leave households off the household lst to reduce workload, thus reducng the representatveness of the survey results. Close supervson s needed durng the feld work to prevent ths problem. Separate lstng and data collecton operatons are thus requred for ths reason. Intervewers selectng households n the feld wthout a complete lstng s not acceptable for DHS surveys Good sample documentaton DHS surveys are usually year-long projects conducted by dfferent people specalzed n dfferent aspects of survey mplementaton, so good sample documentaton s necessary to guarantee the exact mplementaton of the project. The sample documentaton should nclude a sample desgn 2
13 document and the lst of prmary samplng unts. The sample desgn document should explan n detal the methodology, the samplng procedure, the sample sze, the sample allocaton, the survey domans and the stratfcaton. Ths should also form the bass for an appendx to the DHS fnal report descrbng the sample desgn. The sample lst should nclude all dentfcaton nformaton for all of the selected sample ponts, along wth ther probablty of selecton Confdentalty Confdentalty s a major concern n DHS, especally when human bo-markers are collected such as blood samples for HIV testng. The DHS surveys are anonymous surveys whch do not allow any potental dentfcaton of any sngle household or ndvdual n the data fle. Confdentalty s also a key factor affectng the response rate to senstve questons regardng sexual actvty and partners. In partcular, n surveys that nclude HIV testng DHS polcy requres that PSU and household codes are scrambled n the fnal data to further anonymze the data and the orgnal sample lst s destroyed Exactness of survey mplementaton Exactness of sample mplementaton s the last element n achevng good samplng precson. No matter how carefully a survey s desgned and how complete the materals for conductng samplng actvtes are, f the mplementaton of the samplng actvtes by samplng staff (offce staff responsble for selectng sample unts, feld workers responsble for the mappng and household lstng and ntervewers responsble for data collecton) s not preformed exactly as desgned, serous bas and msleadng results may occur. In the sectons that follow, DHS polces related to sample desgn and mplementaton are descrbed. 1.2 Survey objectves and target populaton The man objectve of DHS surveys s to collect up-to-date nformaton on basc demographc and health ndcators, ncludng housng characterstcs, fertlty, chldhood mortalty, contraceptve knowledge and use, maternal and chld health, nutrtonal status of mothers and chldren, knowledge, atttudes and behavor toward HIV/AIDS and other sexually transmtted nfectons (STI), women s status. The target populaton for DHS s defned as all women of reproductve age (15-49 years old) and ther young chldren under fve years of age lvng n ordnary resdental households. However, n some countres, the coverage may be restrcted to ever-marred women. The man ndcator topcs nclude: Total fertlty and age specfc fertlty rates Age at frst sex, frst brth, and frst marrage Knowledge and use of contracepton Unmet need for famly plannng Brth spacng Antenatal care Place of delvery Assstance from sklled personnel durng delvery Knowledge of HIV/AIDS and other STIs Hgher-rsk sexual behavor Condom use Chldhood vaccnaton coverage 3
14 Treatment of darrhea, fever, and cough Infant and under-fve mortalty rates Nutrtonal status Snce the target populaton can be easly found n resdental households, DHS s a householdbased survey. 1.3 Survey doman In DHS surveys, an mportant objectve s to compare the survey results for dfferent characterstcs such as urban and rural resdence, dfferent admnstratve or geographc regons, or dfferent educatonal levels of respondents. A survey doman or study doman s a sub-populaton for whch separate estmaton of the man ndcators s requred. There are two knds of survey domans: desgn domans and analyss domans. A desgn doman conssts of a sub-populaton whch can be dentfed n the samplng frame and therefore can be handled ndependently n the sample sze and samplng procedures, usually consstng of geographc areas or admnstratve unts. For example, urban and rural dfferences are very frequently requested; therefore, urban and rural areas are usually separate desgn domans for Demographc and Health Surveys. An analyss doman s a sub-populaton whch cannot be dentfed n the samplng frame, such as domans specfed by ndvdual characterstcs. These may nclude women wth secondary or hgher educaton, pregnant women, chldren months, and chldren havng darrhea n the two weeks precedng the survey. In order for survey estmates to be relable at the doman level, t s necessary to ensure that the number of cases n each survey doman s suffcent, especally when desred levels of precson are requred for partcular domans. For a desgn doman, adequate sample sze s acheved by allocatng the target populaton at the survey desgn stage nto the requested desgn domans, and then calculatng the sample sze for the specfc desgn domans by takng the precson requred nto account. On the other hand, for an analyss doman, t s dffcult to guarantee a specfed precson because t s dffcult to control the sample sze at the desgn stage. However, f pror estmates of the average number of target ndvduals per household are avalable, then t s possble to control the precson for an analyss doman. For example, f survey estmates are requred for the nutrtonal status of chldren under age 5 s requred and estmates of the number of chldren under age 5 per household are avalable, t s then possble to calculate a sample sze to gve a certan level of precson. DHS reports also produce some ndcators for second level domans such as vaccnaton coverage of chldren age months wthn a regon, where regon s the frst level doman, and chldren months s the second level doman. Cauton must be pad to the precson requred for a second level doman because the second level doman usually ncludes a very small sub-populaton. If doman-level estmates are requred, t s better to avod a large number of domans because otherwse a very large sample sze wll be needed. The number of domans and the desred level of precson for each must be taken nto account n the budget calculaton and assessment of the mplementaton capabltes of the mplementng organzaton. The total sample sze needed s the sum of sample szes needed n all exclusve (frst level) domans. 1.4 Samplng frame A samplng frame s a complete lst of all samplng unts that entrely covers the target populaton. The exstence of a samplng frame allows a probablty selecton of samplng unts. For a mult-stage survey, a samplng frame should exst for each stage of selecton. The samplng unt for the frst stage of selecton s called the Prmary Samplng Unt (PSU); the samplng unt for the second stage of selecton s called the Secondary Samplng Unt (SSU), and so on. In most cases, DHS 4
15 surveys are two-stage surveys. Note that each stage of sample selecton wll nvolve samplng errors, so t s better to avod more than two stages f addtonal stages of selecton are not necessary. The avalablty of a sutable samplng frame s a major determnant of the feasblty of conductng a DHS survey. Ths ssue should be addressed n the earlest stages of plannng for a survey. A samplng frame for a DHS survey could be an exstng samplng frame, an exstng master sample, or a sample of a prevously executed survey of suffcently large sample sze, whch allows for the selecton of subsamples of desred sze for the DHS survey Conventonal samplng frame The best frame s the lst of Enumeraton Areas (EAs) from a recently completed populaton census. An EA s usually a geographc area whch groups a number of households together for convenent countng purposes for the census. A complete lst of EAs whch covers the survey area entrely s the most deal frame for DHS surveys. In most cases, a lst of EAs from a recent census s avalable. Ths lst should be thoroughly evaluated before t s used. The samplng frame used for DHS should be as up-to-date as possble. It should cover the whole survey area, wthout omsson or overlap. Basc cartographc materals should exst for each area unt or at least for groups of unts wth clearly defned boundares. Each area unt should have a unque dentfcaton code or a seres of codes that, when combned, can serve as a unque dentfcaton code. Each unt should have at least one measure of sze estmate (populaton and/or number of households). If other characterstcs of the area unts (e.g., socoeconomc level) exst, they should be evaluated and retaned as they may be used for stratfcaton. A pre-exstng master sample (whch s a random sample from the census frame) can be accepted only where there s confdence n the master sample desgn, ncludng detaled samplng desgn parameters such as samplng method, stratfcaton, and ncluson probablty for the selected prmary samplng unts. The task for the DHS survey s then to desgn a sub-samplng procedure, whch produces a sample n lne wth DHS requrements. Ths wll not always be possble. However, the larger the master sample s n relaton to the desred DHS sub-sample, the more flexblty there wll be for developng a sub-samplng desgn. A key queston wth a pre-exstng sample s whether the lstng of dwellngs/households s stll current or whether t needs to be updated. If updatng s requred, use of a pre-exstng sample may not be economcal. The potental advantages of usng a pre-exstng sample are: 1) economy, and 2) ncreased analytc power through comparatve analyss of two or more surveys. The dsadvantages are: 1) the problem of adaptng the sample to DHS requrements, and 2) the problem of repeated ntervews wth the same household or person n dfferent surveys, resultng n respondent fatgue or contamnaton. One way to avod ths last problem s to keep just the prmary samplng unts from the pre-exstng sample and reselect the households for the DHS survey Alternatve samplng frames When nether a census frame nor a master sample s avalable then alternatve frames should be consdered. Examples of such frames are: A lst of electoral zones wth estmated number of qualfed voters for each zone A grdded hgh resoluton satellte map wth estmated number of structures for each grd A lst of admnstratve unts such as vllages wth estmated populaton for each unt A man concern when usng alternatve frames are coverage problems, that s, does the frame completely cover the target populaton? Usually checkng the qualty of an alternatve frame s more dffcult because of a lack of nformaton ether from the frame tself or from admnstratve sources. 5
16 Another problem s the sze of the prmary samplng unt. Snce the alternatve frame s not specfcally created for a populaton census or household based survey, the sze of the PSUs of such frames may be too large or too small for a DHS survey. A thrd problem s dentfyng the boundares of the samplng unts due to the lack of cartographc materals. In the frst two examples of alternatve samplng frames, the standard DHS two-stage samplng procedure can be appled by treatng the electoral zones or the grds of satellte map as the PSUs. In the thrd case, when a lst of admnstratve unts larger than vllages (e.g. sub-dstrcts, wards or communes) s avalable, for example, a complete lst of all communes n a country may be easer to get than a complete lst of vllages, then t s necessary to use a selecton procedure that ncludes more than two stages. In the frst stage, select a number of communes; n each of the selected communes, construct a complete lst of all vllages resdng n the commune; select one vllage per commune as a DHS cluster, then proceed wth the subsequent household lstng and selecton as n a standard DHS. Ths procedure works best when the number of communes s large and the commune sze s small. A lst of admnstratve unts that are small n number but large n sze s not sutable for a DHS samplng frame because ths stuaton wll result n large samplng errors, as explaned later n Secton Evaluaton of the samplng frame No matter what knd of samplng frame wll be used, t s always necessary to check the qualty of the frame before selectng the sample. Followng are several thngs that need to be checked when usng a conventonal samplng frame: Coverage Dstrbuton Identfcaton and codng Measure of sze Consstency There are several easy but useful ways to check the qualty of a samplng frame. For example, for a census frame, check the total populaton of the samplng frame and the populaton dstrbuton among urban and rural areas and among dfferent regons/admnstratve unts obtaned from the frame wth that from the census report. Any mportant dfferences may ndcate that there may be coverage problems. If the frame provdes nformaton on populaton and households for each EA, then the average number of household members can be calculated, and a check for extreme values can help to fnd ncorrect measures of sze of the PSUs. If nformaton on populaton by sex s avalable for each EA, then a sex rato can be calculated for each EA, and a check for extreme values can help to dentfy non-resdental EAs. If the EAs are assocated wth an dentfcaton (ID) code, then check the ID codes to dentfy mscoded or msplaced EAs. A samplng frame wth full coverage and of good qualty s the frst element for a DHS survey; therefore, efforts should be made to guarantee a good start for the project. For a natonally representatve survey, geographc coverage of the survey should nclude the entre natonal terrtory unless there are strong reasons for excludng certan areas. If areas must be excluded, they should consttute a coherent doman. A survey from whch a number of scattered zones have been excluded s dffcult to nterpret and to use. 1.5 Stratfcaton Stratfcaton s the process by whch the survey populaton s dvded nto subgroups or strata that are as homogeneous as possble usng certan crtera. Explct stratfcaton s the actual sortng and separatng of the unts nto specfed strata. Wthn each stratum, the sample s desgned and 6
17 selected ndependently. It s also possble to systematcally sample unts from an ordered lst (wth a fxed samplng nterval between selected unts) to acheve the effect of stratfcaton. For example, n DHS survey, t s not unusual for the PSUs wthn the explct strata to be sorted geographcally. Ths s called mplct stratfcaton. The prncpal objectve of stratfcaton s to reduce samplng errors. In a stratfed sample, the samplng errors depend on the populaton varance exstng wthn the strata but not between the strata. For ths reason, t pays to create strata wth low nternal varablty (or hgh homogenety). Another major reason for stratfcaton s that, where marked dfferences exst between subgroups of the populaton (e.g., urban vs. rural areas), stratfcaton allows for a flexble sample desgn that can be dfferent for each subgroup. Stratfcaton should be ntroduced only at the frst stage of samplng. At the dwellng/household selecton stage, systematc samplng s used for convenence; however, no attempt should be made to reorder the dwellng/household lst before selecton n the hope of ncreasng the mplct stratfcaton effect. Such efforts generally have a neglgble effect. Stratfcaton can be sngle-level or mult-level. In sngle-level stratfcaton, the populaton s dvded nto strata accordng to certan crtera. In mult-level stratfcaton, the populaton s dvded nto frst-level strata accordng to certan crtera, and then the frst-level strata are subdvded nto second-level strata, and so on. A typcal two-level stratfcaton nvolves frst stratfyng the populaton by regon at the frst level and then by urban-rural wthn each regon. A DHS survey usually employs mult-level stratfcaton. Strata should not be confused wth survey domans. A survey doman s a populaton subgroup for whch separate survey estmates are desred (e.g., urban areas/rural areas). A stratum s a subgroup of homogeneous unts (e.g., subdvsons of an admnstratve regon) n whch the sample may be desgned dfferently and s selected separately. Survey domans and strata can be the same but they need not be. For example, survey domans could be the frst-level stratum n a mult-level stratfcaton. On the other hand, a survey doman could consst of one or several lower-level strata. DHS surveys typcally use explct stratfcaton by separatng urban and rural resdence wthn each regon. Where data are avalable, explct stratfcaton could also be done on the bass of socoeconomc zones or more drectly relevant characterstcs such as the level of female lteracy or the presence of health facltes n the areas. These knds of nformaton could be obtaned from admnstratve sources. Wthn each explct stratum, the unts can then be ordered accordng to locaton, thus provdng further mplct geographc stratfcaton. 1.6 Sample sze Sample sze and samplng errors The estmates from a sample survey are affected by two types of errors: samplng errors and non-samplng errors. Samplng errors are the representatve errors due to samplng of a small number of elgble unts from the target populaton nstead of ncludng every elgble unt n the survey. Samplng errors are related to the sample sze and the varablty among the samplng unts. Samplng errors can be statstcally evaluated after the survey. Non-samplng errors result from problems durng data collecton and data processng, such as falure to locate and ntervew the correct household, msunderstandng of the questons on the part of ether the ntervewer or the respondent, and data entry errors. Non-samplng errors are related to the capacty of the mplementng organzaton, and experence shows that (1) non-samplng errors are always the most mportant source of error n a survey, and (2) t s dffcult to evaluate the magntude of non-samplng errors once a survey s complete. Theoretcally, wth the same survey methodology and under the same survey condtons, 7
18 the larger the sample sze, the better the survey precson. However, ths relatonshp does not always hold true n practce, because non-samplng errors tend to ncrease wth survey scale and sample sze. The challenge n decdng on the sample sze for a survey s to balance the demands of analyss and precson wth the capacty of the mplementng organzaton and the constrants of fundng. A common measure of precson for estmatng an ndcator s ts relatve standard error (RSE) whch s defned as ts standard error (SE) dvded by the estmated value of the ndcator. The standard error of an estmator s the representatve error due to samplng. The relatve standard error descrbes the amount of samplng error relatve to the ndcator level and s ndependent of the scale of the ndcator to be estmated; therefore, a unque RSE can be appled to a reference ndcator for all domans. If a unque RSE s desred for all domans, the doman sample sze depends on the varablty and the sze of the doman. The total sample sze s the sum of the sample szes over all domans for whch desred precson are requred. The followng are some concepts related to sample sze calculaton. 1. The standard error of an estmator when estmatng a proporton wth a smple random samplng wthout replacement 2 s gven by: 1 - f N SE = SQRT P(1 P) n N 1 where n s the sample sze (number of completed ntervews), P s the proporton, N s the target populaton sze, and f=n/n s the samplng fracton. When N s large and n s relatvely small, the above quantty can be approxmated by: Therefore the RSE of the estmator s gven by: P(1 P) SE SQRT n P(1 P) RSE( P) SQRT / P n 1 / P 1 = SQRT n 2. For a requred precson wth a relatve standard error α, the net sample sze (number of completed ntervews) needed for a smple random samplng s gven by: (1 / P 1) n = 2 α 3. Snce a smple random samplng s not feasble for a DHS, the sample sze for a complex survey wth clusterng such as the DHS can be calculated by nflatng the above calculated sample sze by usng a desgn effect (Deft). Deft s a measure of effcency of cluster samplng compared to a drect smple random samplng of ndvduals, defned as the rato between the standard error usng the gven sample desgn and the standard error that would result f a smple random sample had been used. A Deft value of 1.0 ndcates that the sample desgn s 2 A smple random sample would be a random selecton of ndvduals or households drectly from the target populaton. Ths s not feasble for DHS surveys because a lst of all elgble ndvduals or households s not avalable. 8
19 as effcent as a smple random sample, whle a value greater than 1.0 ndcates the ncrease n the samplng error due to the use of a more complex and less statstcally effcent desgn. The net sample sze needed for a cluster samplng wth same relatve standard error s gven by: n = Deft 2 (1 / P 1) 2 α 4. The formula for calculatng the fnal sample sze n terms of the number of households whle takng non-response nto account (the formula used n the templates for sample sze calculaton as shown n Table 1.1) s gven by: 2 (1/ P 1) n = Deft 2 α ( R Rh d) where n Deft P α R R h d s the sample sze n households; s the desgn effect (a default value of 1.5 s used for Deft f not specfed); s the estmated proporton; s the desred relatve standard error; s the ndvdual response rate; s the household gross response rate; and s the number of elgble ndvduals per household. The household gross response rate s the number of households ntervewed over the number selected. DHS reports typcally report the net household response rate whch s the number of households ntervewed over the number vald households found n the feld (.e. excludng vacant and destroyed dwellngs.) 5. If the target populaton s small (such as n a sub-natonal survey), a fnte populaton correcton of the above calculated sample sze should be appled. The fnal sample sze n s calculated by n = n + n / N where n 0 s the ntal sample sze calculated n pont number 4, and N s the target populaton sze. 6. The relatonshp between the RSE and the sample sze shows that, f one reduces a desred RSE to half, then the sample sze needed wll ncrease 4 tmes. For example, the sample sze for a RSE of 5% s 4 tmes larger than the sample sze for a RSE of 10% (see Tables 1.1 and 1.2 n the next secton). Ths means that t s very expensve to reduce the RSE by ncreasng the sample sze. Therefore, when desgnng the sample sze, the effcency of the desgn must be consdered, that s, the balance between the gan n precson and the ncrease n sample sze (or survey cost). 7. The wdth of the confdence nterval s determned by the RSE. Wth a confdence level of 95%, 2*P*RSE s the half-length of the confdence nterval for P. For example, for RSE=0.10 and P=0.20, the half-length of the confdence nterval s 0.04, whch means the confdence nterval for P s (0.16, 0.24). (DHS reports +/-2*SE nstead of +/-1.96*SE as 95% confdence nterval for conservatve purposes). 9
20 1.6.2 Sample sze determnaton The total sample sze for a DHS survey wth a number of survey domans (desgn doman) s the sum of the sample szes over all domans. An approprate sample sze for a survey doman s the mnmum number of persons (e.g., women age 15-49, currently marred women 15-49, chldren under age fve) that acheves the desred survey precson for core ndcators at the doman level. If fundng s tght and fxed, the sample sze s the maxmum number of persons that the fundng can cover. Precson at the natonal level s usually not a problem. In almost all cases, sample sze s decded to guarantee precson at doman level wth approprate allocaton of the sample. So apart from survey costs, the total sample sze depends on the desred precson at doman level and the number of domans. If a reasonable precson s requred at doman level, experence from the MEASURE DHS program shows that a mnmum number of 800 completed ntervews wth women s necessary for some of the woman-based ndcators for hgh fertlty countres (e.g. total fertlty rate, contraceptve prevalence rate, chldhood mortalty rates); for low fertlty countres, the mnmum doman sample sze can reach 1,000 completed ntervews or more. Table 1.1 below llustrates the calculaton of sample sze for a doman accordng to dfferent levels of desred RSE for estmatng the ndcator the proporton of currently marred women who are current users of a modern contraceptve method. Table 1.1 Sample sze determnaton for estmatng current use of a modern contraceptve method among currently marred women Estmated proporton p 0.20 Total target populaton Estmated desgn effect (Deft) 1.40 # of target ndvduals/hh 1.05 Indvdual response rate 0.96 HH gross response rate 0.92 Desred Net Sample Sample sze Expected 95% confdence lmts RSE sze ndvdual Household SE Lower Upper Note: The confdence lmts are calculated as P±2*SE. 10
21 Assumng the doman sze s large enough such that the fnte populaton correcton s neglgble, Table 1.1 gves the requred gross sample sze n terms of number of households wth estmated parameters from a DHS survey. The target populaton s currently marred women age 15-49; the estmated parameters are: the proporton of currently marred women who are current users of any modern contraceptve method, the desgn effect (Deft), the number of target ndvduals (number of currently marred women 15-49) per household, the ndvdual and the household response rates. For example, wth an estmated prevalence of 20%, f we requre a RSE of 10%, we should select 846 households n ths partcular doman. Wth a gross household response rate (the number of households completed over the total number selected) of 92% and an ndvdual response rate of 96%, we expect to obtan 784 completed ntervews of currently marred women age The estmated quanttes at the top of the table used as nput to the calculaton can usually be obtaned from prevous surveys or from admnstratve records. The total sample sze for a survey wth several domans s the sum of the sample szes obtaned n the above table for each doman. If the same precson requred and the same ndcator level apply to all domans, then the total sample sze s the sample sze calculated for one doman multpled by the number of domans. Wth ths example, the total sample sze for a survey havng sx domans wth approxmately the same level of modern contraceptve use among currently marred women and the same precson request for each doman would be 5076 households. The Sample sze determnaton template located n the Appendx can be used to determne requred sample szes. Table 1.2 Sample sze determnaton for estmatng the prevalence of full vaccnaton coverage among chldren aged months Estmated proporton p 0.29 Total target populaton Estmated desgn effect (Deft) 1.22 # of target ndvduals/hh 0.11 Indvdual response rate 0.96 HH gross response rate 0.92 Desred Net Sample Sample sze Expected 95% confdence lmts RSE sze ndvdual household SE Lower Upper Note: The default value of Deft s set to be 1.5. Specfy f dfferent. The confdence lmts are calculated as P±2*SE. If response rate s not provded, the sample sze calculated s net sample sze. 11
22 Table 1.2 shows a smlar example for the ndcator proporton of chldren aged months who are fully mmunzed. In ths case, the target populaton s chldren aged months. The estmated number of target ndvduals per household s much smaller than the number of currently marred women per household gven n Table 1.1. So for the same sample sze calculated n Table 1.1, we can only get a RSE of above 20% at doman level. Wth a RSE of 10%, we need to select 3746 households n ths partcular doman whch seems unrealstc f we have several domans for the survey. Ths example shows that for a mult-ndcator survey, the sample sze requred can be very dfferent from ndcator to ndcator. So the choce of the reference ndcator upon whch the sample sze s calculated s an mportant ssue. The reference ndcator whch s used for sample sze determnaton should have demographc mportance, moderate value and moderate populaton coverage,.e. apply to a szable proporton of the populaton. Wth the same sample sze calculated n Table 1.1 for a survey havng sx domans, the RSE for the whole sample for estmatng full mmunzaton among chldren months s between 8% and 9%. The doman sample szes often need to be balanced between domans due to budget constrants. In practce t s often the case that the total sample sze s fxed accordng to fundng avalable and mplementaton capacty, and then the sample s allocated to each doman and to each stratum wthn the doman. In the case of very tght budget constrants, we may equally allocate the total sample to the domans. In some cases, we may want to oversample a specfc doman to conduct some n-depth analyss for a certan rare phenomenon. The method (and the tables) presented n the followng secton may be used to allocate the sample at the doman level because the domans are usually frst-level strata. Regardless of the method used for allocaton, the calculaton of doman sample sze can gve us an dea about the precson we may acheve n each doman wth a gven sample sze. 1.7 Sample allocaton In cases where the total sample sze or doman sample sze has been fxed, we need to approprately allocate the sample to dfferent domans (or dfferent strata wthn a doman). Ths allocaton s amed at strengthenng the samplng effcency at the natonal level or doman level and reducng samplng errors. Assumng a constant cost across domans/strata, the optmum allocaton of the sample depends on the sze of the doman/stratum and the varablty of the ndcator to be estmated S xh n N For a gven total sample sze n the optmum allocaton for varable x s gven by: h h n = n H h N h= 1 S h N S h xh xh S xh The optmum allocaton s only optmal for the ndcator on whch the allocaton s based; that allocaton may not be approprate for other ndcators. For a multpurpose survey, f the domans/strata are not too dfferent n sze, a safe allocaton that s good for all ndcators s a proportonal allocaton, wth sample sze proportonal to the doman/stratum sze. n = n N h h = H h=1 N h Nh n N 12
23 Ths allocaton ntroduces a constant samplng fracton across doman/strata wth: f h = n h N h = n N Because DHS surveys are multpurpose surveys, a proportonal allocaton of sample s recommended f the domans/strata are not too dfferent n sze. However, f the domans/strata szes are very dfferent, the smaller domans/strata may receve a very small sample sze. If a desred precson s requred at doman/stratum level, by assumng equal relatve varatons across strata, a power allocaton (Banker, 1988) wth an approprate power value ) may be used to guarantee suffcent sample sze n small domans/strata. α ( 0 α 1 n h = n H M h= 1 α h M α h A power allocaton s an allocaton proportonal to the power of a sze measure M. A power value of 1 gves proportonal allocaton; a power value of 0 gves equal sze allocaton; a power value between 0 and 1 gves an allocaton between proportonal allocaton and equal sze allocaton. Proportonal allocaton s good for natonal level ndcators, but may not meet the precson request at doman level; whle an equal sze allocaton s good for comparson across domans, but may affect the precson at natonal level. A power allocaton wth power values between 0 and 1 s a tradeoff between the natonal level precson and the doman level precson. Snce the sample sze s usually large at the natonal level, the natonal level precson s not a concern. In Table 1.3 below, we gve an example of a proportonal sample allocaton of 15,000 ndvduals to 11 domans and to ther urban-rural areas. The mnmum doman sample sze s 384 for doman 2, whch s too small for estmatng the total fertlty rate (TFR) and chldhood mortalty rates. The largest sample sze s for doman 11 whch may be unnecessarly large. The actual total sample sze gven n the total row may be slghtly dfferent from the desred sample sze because of roundng. 13
24 Table 1.3 Sample allocaton: Proportonal allocaton Total sample sze => Power value doman=> Power value urban=> Seral Doman/ Sample Allocaton Specfc Allocaton Doman/Stratum Proporton Num stratum Name/ID urban Urban Rural Doman Urban Rural sze 1 Doman Doman Doman Doman Doman Doman Doman Doman Doman Doman Doman Total If we mpose a condton such that the sample sze should not be smaller than 1000 n each doman, after tryng varous power values, we fnd that a power value of 0.25 s approprate, as shown n Table 1.4. In ths case, we would have a mnmum sample sze of 1,022 for doman 2. Snce doman 11 has only urban areas, the power allocaton among the domans brought down the urban percentage n the sample. In order for urban areas to be properly represented, over samplng s appled n the urban areas of the other domans. Wth a power value of 0.65, the urban proporton n the sample s close to the proporton of the target populaton. Table 1.4 Sample allocaton: Power allocaton Total sample sze => Power value doman=> 0.25 Power value urban=> 0.65 Seral Doman/ Sample Allocaton Specfc Allocaton Doman/Stratum Proporton Num stratum Name/ID urban Urban Rural Doman Urban Rural sze 1 Doman Doman Doman Doman Doman Doman Doman Doman Doman Doman Doman Total In Table 1.4, the small domans are oversampled compared wth a proportonal allocaton. Oversamplng some small domans s frequently practced f doman level precson s requred. 14
25 However, oversamplng a small doman too much wll harm the precson at natonal level. To prevent ths, t s recommended to regroup the small domans to form domans of moderate sze, especally when there s a very unequal populaton dstrbuton among geographc domans, however, ths s sometmes not possble due to poltcal consderatons. The above dscusson also apples to sample sze allocaton to strata wthn a doman where the doman sample sze s fxed. A proportonal allocaton wth sample sze proportonal to stratum sze s good for all ndcators and provdes the best precson for the doman as a whole. 1.8 Two-stage cluster samplng procedure The MEASURE DHS program utlzes a convenent and practcal sample selecton procedure for household based surveys developed on the bass of experence from past surveys a two-stage cluster samplng procedure. A cluster s a group of adjacent households whch serves as the PSU for feld work effcency. Intervewng a certan number of households n the same cluster can reduce greatly the amount of travel and tme needed durng data collecton. In most cases, a cluster s an EA wth a measure of sze equal to the number of households or the populaton n the EA, provded by the populaton census. At the frst stage, a stratfed sample of EAs s selected wth probablty proportonal to sze (PPS): n each stratum, a sample of a predetermned number of EAs s selected ndependently wth probablty proportonal to the EA s measure of sze. In the selected EAs, a lstng procedure s performed such that all dwellngs/households are lsted. Ths procedure s mportant for correctng errors exstng n the samplng frame, and t provdes a samplng frame for household selecton. At the second stage, after a complete household lstng s conducted n each of the selected EAs, a fxed (or varable) number of households s selected by equal probablty systematc samplng n the selected EAs. In each selected household, a household questonnare s completed to dentfy women age 15-49, men age (15-54 or n some surveys) and chldren under age fve. Every elgble woman wll be ntervewed wth an ndvdual questonnare, and every elgble man wll be ntervewed wth an ndvdual men s questonnare n those households selected for the men s ntervew. The advantages of ths two-stage cluster samplng procedure can be summarzed as follows: 1) It guarantees a representatve sample of the target populaton when a lst of all target ndvduals s not avalable whch prohbts a drect samplng of target ndvduals; 2) A household lstng procedure after the selecton of the frst stage and before the man survey provdes a samplng frame for household selecton n the central offce; 3) The use of resdental households as the second-stage samplng unt guarantees the best coverage of the target populaton; and 4) It reduces unnecessary samplng errors by avodng more than two stages of selecton (whch usually uses a large PSU n the frst stage of selecton). See more detals n Sectons 1.10 and 1.11 on household lstng and selecton, Chapter 2 on household lstng, and Sectons 3.2 and 3.3 of Chapter 3 on systematc samplng and samplng wth probablty proportonal to sze (PPS). 15
26 1.9 Sample take per cluster Once the total sample sze s determned and allocated to dfferent survey domans/strata, t should be decded how many ndvduals (sample take) should be ntervewed per sample cluster and then convert the doman/stratum sample sze to number of clusters. Snce the survey cost can be very dfferent across the survey domans/strata, the sample take can have a bg nfluence on the total survey budget. Wth a fxed sample sze, a small sample take s good for survey precson because of the reducton of the desgn effect, but s expensve because more clusters are needed. The number of clusters affects the survey budget more than the overall sample sze due to the travel between clusters durng data collecton, whch represents an mportant part of feld costs n rural areas. The MEASURE DHS program proposes a sample take of about women per rural cluster. In urban areas, the cost advantage of a large take s generally smaller, and MEASURE DHS recommends a take of about women per urban cluster. Snce n most DHS surveys, the number of elgble women age s very close to one per household, the sample take of ndvduals s equvalent to the sample take of households; therefore, n the followng sectons we refer to the sample take (or cluster take) as the number of sample households per cluster Optmum sample take The optmum number of households to be selected per cluster depends on the varable under consderaton, the ntracluster correlaton ρ, and the survey cost rato c 1 / c2, where c 1 represents the cost per cluster ncludng manly the cost assocated wth travellng between the clusters for survey mplementaton (household lstng and ntervew); whle c 2 represents the cost per ndvdual ntervew (the ntervewng cost) and other costs of dong feldwork wthn a cluster. A larger sample take per cluster and fewer clusters reduces survey feld costs f the cost rato s hgh, but t could also reduce the survey precson f the ntracluster correlaton s strong. The MEASURE DHS Program has accumulated nformaton on samplng errors for selected varables for many surveys throughout the world. Usng ths nformaton, Alaga and Ren (2006) conducted a research study to determne the optmum sample take per cluster. The results of the study have nformed current practce n DHS surveys. If the average cluster sze s around 250 households, a sample take of households per cluster s wthn the acceptable range n most surveys. The research also supports the practce of settng a larger sample take n rural clusters than n urban clusters. Usually, the cost rato n urban areas s smaller than that n rural areas. Ths would lead to a smaller sample take n an urban cluster than n a rural cluster. In sum, ths research ndcates that for the most mportant survey ndcators, a sample take between 20 to 25 households s approprate n urban clusters and a sample take between 25 to 30 households s approprate n rural clusters. Based on values of c 1 / c2 and ρ obtaned from eght surveys, Table 1.5 below shows optmal sample takes for the ndcator proporton of currently marred women currently usng any contraceptve method. Ths ndcator has a moderate ntracluster correlaton relatve to other mportant survey ndcators. 16
27 Table 1.5 Optmal sample take for currently marred women currently usng any contraceptve method based on ntracluster correlaton ρ and survey cost rato c 1 / c2 from Country past surveys Survey cost rato c 1 / c 2 Intracluster correlaton ρ Optmal sample take Country Country Country Country Country Country Country Country Average Varable sample take for self-weghtng A fxed sample take per cluster s easy for survey management and mplementaton, but t requres samplng weghts that vary wthn a stratum. Dfferent samplng weghts result n larger samplng errors compared wth a smlar sample of constant weght wthn a samplng stratum,.e., a self-weghtng sample. A self-weghtng sample conssts of a sample of ndvduals n whch each ndvdual has the same probablty of beng selected, and therefore a constant samplng weght s used. In some cases a self-weghtng sample s preferred for varous reasons: t s equally representatve for every ndvdual of the target populaton; t reduces samplng errors. Snce the sample for DHS surveys s usually the result of a two-stage cluster samplng desgn, t s necessary to coordnate the sample take for each of the selected clusters. In an overall selfweghtng sample, every ndvdual n the target populaton has an equal probablty of selecton, whch results n a proportonal allocaton. However, proportonal allocaton s not feasble when samplng domans are very dfferent n sze. Self-weghtng at doman/stratum level, by contrast, s easy to acheve. Let n be the total number of clusters selected for a DHS survey, let clusters allocated to the h th stratum; let n h be the number of X h be the total number of households n the stratum h, let x hk be the number of households n cluster k of stratum h, gven by the samplng frame; then the selecton probablty of cluster k n stratum h s gven by: π hk = n h X x h hk * Let x hk be the number of households lsted n the cluster n the household lstng operaton, let mh be the number of households to be selected from the cluster for a fxed sample take, then the overall selecton probablty of a household n the cluster s gven by: 17
28 If x = x * hk hk n stratum h by a constant sample take m n x h h hk f hk hk = * x X hk h = π exactly for all k n stratum h, then t s easy to see that self-weghtng s acheved In practce, t s not possble that m x m h n all clusters snce x = x * hk hk h * hk n m h h f h = s a constant n stratum h. X h for all h and k, especally when the last populaton census s no longer new. Therefore there s a need for sample coordnaton n order to acheve selfweghtng. Let f h and mh be the calculated samplng fracton and average sample take n stratum h fh X h accordng to the sample allocaton wth m h = ; the number of households needed to acheve selfweghtng n cluster k of stratum h s gven nh by m hk = f h n X h h x x * hk hk = m h x x * hk hk whch s a functon of the rato of the number of households lsted over the number of households gven n the samplng frame for every cluster: take more f more are lsted or take fewer f fewer are lsted. The above formula also shows that the samplng fracton s not a necessary parameter for sample take calculaton. Usng the desgned average sample take s a more drect method because the samplng fracton s an abstract number. Ths formula s used n the self-weghtng household selecton templates presented n Chapter 3, Secton 3.2. The relatonshp between the sample take and the cluster selecton probablty s gven by m hk = f h π x * hk hk For practcal consderatons, the sample take calculated above needs to be adjusted f s t too small or too large. Usually, we apply a cut-off to control the sample take wthn the range of a mnmum of 10 households and a maxmum of 50 households per cluster. For the clusters where the cut-off s appled, the sample s no longer self-weghtng. Advantages: Dsadvantages: The advantages and dsadvantages of a self-weghtng sample can be summarzed as: 1) Equally representatve for every ndvdual wthn a samplng stratum. 2) Reduced samplng errors. 1) Dffcult for survey management (for example, to dstrbute the work-load) because of the varant sample take by cluster. 2) Dffcult to control the expected sample sze because of possble cut-offs, especally when the upper lmt cut-offs are employed. 3) The self-weghtng s not exact because of the roundng of the sample takes and ths wll brng bas n the survey estmaton. 18
29 4) Self-weghtng at the natonal level wll break down the specfc sample allocaton at the doman/stratum level and brng the sample allocaton back to a proportonal allocaton. It s possble to overcome the second and the thrd dsadvantages through a recursve calculaton of sample take by re-dstrbutng the cut-offs to the rest of the clusters n the stratum or control area, and by usng a randomzed sample take whch allows non-nteger numbers as sample sze. Excel templates for both the tradtonal procedure and revsed procedure are avalable Household lstng The household lstng operaton s a fundamental operaton n DHS surveys. After the EAs are selected for the survey, a complete lstng of dwellng unts/households n the selected EAs s conducted pror to the selecton of households. The lstng operaton conssts of vstng each of the selected clusters, collectng geographc coordnates of the cluster, drawng a locaton map of the cluster as well as a sketch map of the structures n the cluster, recordng on lstng forms a descrpton of every structure together wth the names of the heads of the households n the structures and other characterstcs. Mappng and lstng of households represents a sgnfcant feld cost, but t s essental to guarantee the exactness of sample mplementaton. The lstng operaton s an mportant procedure for reducng non-samplng errors n the survey, especally when the samplng frame s outdated. The lstng operaton provdes a complete lst of occuped resdental households n the EA. Ths nformaton s necessary for an equal probablty random selecton of households n the second stage. Wth the household lstng pror to the man survey, t s possble to pre-select the sample households n advance and the ntervewers are asked to ntervew only the pre-selected households wthout replacement of non-respondng households. Wth the sketch map and the household lstng of the cluster produced n the household lstng operaton, the sampled households can be easly relocated by ntervewers later. The feldwork procedure for DHS surveys s desgned to be replcable and therefore allows easy supervson; all these elements are desgned to prevent serous bas durng data collecton. It s sometmes suggested that lstng could be avoded by makng segments so small that they are equal to the requred sample take per cluster. One could then use a take-all rule at the last stage of samplng. Such small segments, however, wll generally be dffcult to delneate. In planned urban areas, ths dffculty may be reduced one could adopt blocks, or even sngle buldngs, as segments but urban unts of ths knd are lkely to be homogeneous, contanng smlar households, and therefore less than deal as samplng clusters. It s also not acceptable to attempt to avod lstng altogether by havng ntervewers create clusters as they go along, or by selectng the sample households at fxed ntervals durng a random walk up to a predetermned quota. Such methods are not acceptable because frst, they do not guarantee a nonzero probablty to every potental respondent; second, the procedure s not replcable, whch complcates the feld work supervson; and thrd, t can end up wth a sample of easy unts because of the lack of effort to make call backs to households or ndvduals who were not avalable at the frst attempt to ntervew. Lstng costs can be reduced by usng segmentaton to decrease the sze of the area whch has to be lsted; however, segmentaton generates ts own costs, and skll n map makng and map nterpretaton s requred. Segmentaton becomes progressvely more dffcult as segments become smaller because there are not enough natural boundares to delneate very small segments. Moreover, concentraton of the sample nto smaller segments ncreases the samplng error. Snce neghbors characterstcs are correlated, a smaller segment captures less of the varety exstng n the populaton; ths leads to less effcent samplng. There s a pont beyond whch t s not useful to attempt further segmentaton. As a general rule the average segment sze should not be less than
30 n populaton (approxmately 100 households) n both urban and rural areas. However, segmentaton has less economcal effect n urban areas because the urban EAs are n general small geographc areas. It s qute probable that some tradtonal tools n the household lstng process wll be modfed n the future by usng more sophstcated technology such as the geographc postonng systems (GPS) n order to collect more precse locaton nformaton for the selected EAs. Wth ths new tool we can produce more precse dstrbuton maps of the structures wth less supervson than n the tradtonal approach. The man feature s that every selected EA and every selected structure/dwellng can be located wth hgh precson and thus relocated later, f desrable. In addton, GPS nformaton s used more and more n DHS data analyss and presentaton. At present, though, the recommended protocol for collectng GIS nformaton n DHS surveys s to collect one coordnate for every selected cluster. See Chapter 2 for more detals of the household lstng operaton Household selecton n the central offce After the household lstng operaton, once the central offce receves the completed lstng materals for a cluster, they must frst create a seral number for each of the occuped resdental households, begnnng wth 1 and contnung to the total number of occuped resdental households lsted n the cluster. An occuped resdental household desgnates those households occuped at the tme of the lstng, even f the occupant refused to cooperate at the tme of lstng, and those households where the occupants were absent at the tme of lstng but neghbors confrmed that they would not be absent for a long perod and would be at home durng the perod of the man survey. Only occuped resdental households should be numbered. Ths seral number s an ID number for the households. The household selecton procedure wll be performed based on ths seral number. Whether or not a household s consdered occuped at the tme of the lstng s very mportant because ths fact wll be related to the proporton of vacant households n the man survey. The MEASURE DHS program has used several methods 3 clusters ncludng: for selectng households wthn 1) Systematc selecton: From a random startng pont select every nth household (see Chapter 3 Secton 3.2 for more detals). 2) Systematc selecton wth runs: From a random startng pont, select a group of sequental households called a run. Several runs may be used wthn a cluster. Runs are selected wth systematc selecton. Selectng households n runs can greatly reduce the amount of travel wthn cluster durng data collecton, especally n rural clusters where households can be far apart. The advantages of household selecton n the central offce can be summarzed as: 1) It allows for a check of coverage of the household lstng results before the man survey and for the revew and possble relstng of problematc clusters n advance. 2) Sampled households are pre-determned whch prevents potental bas ntroduced by allowng the ntervewers to select n the feld whch households are to be ntervewed. 3 The MEASURE DHS program has developed varous Excel templates for household selecton n the central offce: systematc selecton, systematc selecton wth runs, self-weghtng selecton wth and wthout control of sample sze and wth or wthout runs. Once the household lstng s completed, t s possble to just copy the number of households lsted n a cluster nto the spreadsheet and the spreadsheet wll show the selected household numbers automatcally. See Chapter 3 Secton for detals. 20
31 3) The feld work procedure s exactly replcable whch provdes the possblty of easy and close supervson of the feld work. 4) It s easer to control the work load for each ntervewng team. However, n cases when travellng between clusters represents a substantal cost, t s possble to forego the step of selectng households n the central offce. In such cases, the household lstng operaton and the man survey can be combned nto a sngle feld operaton. No essental changes are needed n the household lstng procedure or household numberng, but makng a detaled sketch map for the cluster may not be necessary because the lstng team and the ntervewng team are the same, and the household ntervew wll begn mmedately after the lstng, so dentfyng the exact selected households durng a separate vst s no longer a problem. The household selecton must be done n the feld manually f portable computers are not avalable. Some manual selecton procedures have been developed for ths purpose. Household lstng and ntervewng are two very dfferent jobs, so n surveys where lstng, selecton and ntervewng takes place n the same vst by the same staff, t may be necessary to conduct more extensve tranng of feld teams before the feld work begns and to supervse the teams more closely durng the feldwork. See Chapter 3 Secton for more detals for manual household selecton Household ntervews The household ntervew procedure s out of the scope of ths manual snce t s explaned n detal n the ntervewer s manual. Ths secton wll brefly dscuss the man statstcal ponts of the household ntervew. After the household selecton, ntervewers wll be recruted and traned for the household and ndvdual ntervews. The tranng of the ntervewer s an ntensve tranng lastng at least four weeks for a standard DHS survey, and longer f the survey ncludes many bomarkers. Pror to the tranng, a pretest of the questonnare wll be conducted n a small number of clusters not selected for the man survey to assess the qualty of the questonnares and the understandng of the translatons by ntervewers and respondents. Problems and potental errors observed n the pretest wll be addressed and resolved pror to feldwork tranng. Fnally, the ntervewng team wll be sent to selected clusters wth a certan work load per team. Once tranng s complete, teams of ntervewers wll be assgned a lst of clusters and deployed to the feld. Upon arrval n a new area, the ntervewer team must frst contact the local authortes for help to dentfy the correct cluster and to solct cooperaton durng the feld work. A team leader or supervsor s assgned for each ntervewng team. The supervsor s responsble for cluster dentfcaton and should guarantee that the correct cluster wll be ntervewed. After checkng the lstng materals and verfyng wth the local authortes, the supervsor wll dstrbute the sampled households among the ntervewers. After locatng a selected household, the ntervewer wll begn wth a bref household ntervew, lstng household members and vstors, and dentfyng among them all elgble women and men for the ndvdual ntervew. Elgble ndvduals are defned as those who are n the specfed age group (15-49), and are ether usual members of the selected household or who slept n the household the nght before the ntervewer s vst. Conscous omsson of elgble ndvduals on the part of an ntervewer by ms-reportng ther age outsde of the elgble age group s a real concern. Measures to elmnate ths problem should be undertaken. For example, the feld edtor should check the consstency of each completed questonnare and, f suspcous thngs are dentfed, should return to the household for further verfcaton of key tems such as the number of household members, number of elgble ndvduals and number of chldren under age fve. In the event of falure to contact a household or an elgble person n the frst vst, the ntervewer s requred to make at least two repeat vsts, or call backs, on dfferent days and at 21
32 dfferent tmes of the day before the ntervew s abandoned. The process of makng call backs requres the teams to stay n a cluster for at least two to three days. Some countres propose large ntervewng teams n order to try to cover an entre cluster n one day. Ths process s not acceptable for a DHS survey, even when the desgned sample sze can bear a large non-response rate, because non-response bases the survey results. A quck survey usually ends up wth poor data qualty. Both theory and practce prove that call backs and efforts to get dffcult unts to respond to the survey are the best way to remove bas and reduce the non-samplng errors to a mnmum. For more detals, refer to the DHS Survey Organzaton Manual and the Intervewer s Manual Samplng weght calculaton Why we need to weght the survey data A DHS sample s a representatve sample randomly selected from the target populaton. Each ntervewed unt (household and ndvdual) represents a certan number of smlar unts n the target populaton. In order for any statstcal nferences drawn from the survey data to be vald, ths representatveness of the sample must be taken nto account. In general terms, samplng weghts are used to make the sample more lke the target populaton. All analyses should use the samplng weghts calculated for each ntervewed household and for each ntervewed ndvdual. A samplng weght s an nflaton factor whch extrapolates the sample to the target populaton. For example, f equal probablty samplng (or a self-weghtng sample) s appled n a doman wth a samplng fracton 1/500, ths means that each sampled ndvdual represents 500 smlar ndvduals n the target populaton. Therefore, f we observed one partcular ndvdual havng secondary educaton, we would conclude that there are 500 ndvduals n the target populaton havng secondary educaton, correspondng to ths partcular ndvdual. The total number of ndvduals wth secondary educaton n the target populaton would be 500 tmes the total number of ntervewed ndvduals havng secondary educaton observed n the sample. Ths explanaton also apples to unequal probablty samplng. It s very mportant that samplng weghts are properly calculated and appled n data analyss; otherwse, serous bas may be ntroduced, leadng to ncorrect conclusons. Although all of the DHS ndcators are means, proportons, rates or ratos, snce a natonwde self-weghtng sample s not usually feasble due to study domans as explaned n Secton 1.9, samplng weghts are always necessary. Even when a survey s desgned to be natonally selfweghtng, t s necessary to correct for the dfferent response patterns across domans/strata (see Secton for more detals). Therefore, even surveys wth self-weghtng sample desgns requre the use of samplng weghts. Though the effect of samplng weghts on survey ndcators may be small, t s necessary to use samplng weghts for the followng reasons: 1) For vald statstcal nference. 2) For correctng or reducng bas; weghtng can reduce bas ntroduced by non-response or other non-samplng errors. 3) For keepng the weghted sample dstrbuton close to the target populaton dstrbuton, especally when oversamplng s appled n certan domans/strata Desgn weghts and samplng weghts The MEASURE DHS program calculates both desgn weghts and samplng weghts (or survey weghts) for both households and ndvduals. The desgn weght of a samplng unt (household or 22
33 ndvdual) s the nverse of the overall probablty wth whch the unt was selected n the sample. The samplng weght of a samplng unt s the desgn weght corrected for non-response or other calbratons. Snce s the DHS protocol nvolves no selecton of elgble ndvduals wthn a sampled household (except for the domestc volence module, n whch one elgble woman s selected from a sampled household), all elgble ndvduals from the same household share the same desgn weght, whch s the same as the household s desgn weght. Therefore, the desgn weght s the basc weght for DHS surveys. All other weghts are calculated based on the desgn weght. In calculatng the samplng weght, t s possble to correct for both unt non-response (a samplng unt s not ntervewed at all) and tem non-response (the samplng unt does not provde answer for a specfc queston). The polcy of the MEASURE DHS program s to correct for unt non-response at the stratum level (see Secton ) and leave the correcton of tem non-response to data users because t s varable specfc. Correcton of unt non-response at cluster level wll ncrease the varablty of samplng weghts and therefore ncrease samplng errors. Because the correcton for unt nonresponse s the same for an entre cluster and because household selecton wthn a cluster s an equal probablty selecton, all the households n the same cluster share the same desgn weght and samplng weght, and the same s true for all ndvduals n the same cluster. Ths means that the DHS weghts (both desgn weghts and samplng weghts) are cluster weghts How to calculate the desgn weghts Assumng that a DHS survey sample s drawn wth two-stage, stratfed cluster samplng, desgn weghts wll be calculated based on the separate samplng probabltes for each samplng stage and for each cluster. We use the followng notatons: P 1h : P 2h : frst-stage samplng probablty of the th cluster n stratum h second-stage samplng probablty wthn the th cluster (household selecton) Let n h be the number of clusters selected n stratum h; let M h be the measure of sze of the cluster used n the frst stage s selecton, usually the measure of sze s the number of households resdng n the cluster accordng to the samplng frame; let be the total measure of sze n the stratum h. The probablty of selectng the th cluster n the sample s calculated as follows: Mh P 1h nh = M M h h Let b h be the proporton of households n the selected cluster compared to the total number of households n EA n stratum h f the EA s segmented, otherwseb = 1. Then the probablty of selectng cluster n the sample s: n M M h h P 1h = h Let L h be the number of households lsted n the household lstng operaton n cluster n stratum h; let t h be the number of households selected n the cluster. The second stage selecton probablty for each household n the cluster s calculated as follows: t P 2 h = L h h b h h 23
34 The overall selecton probablty of each household n cluster of stratum h s therefore the product of the selecton probabltes of the two stages: Ph = P1 h P2 h The desgn weght for each household n cluster of stratum h s the nverse of ts overall selecton probablty: d = 1/ h P h The calculaton of the desgn weght s not complcated; however, dffcultes often result from not havng of all the desgn parameters nvolved n the above calculaton because they are not well documented, especally when the samplng frame s a master sample. See Chapter 5 for more detals on sample documentaton Correcton of unt non-response and calculaton of samplng weghts The desgn weght calculated above s based on sample desgn parameters. If there s no nonresponse at the cluster level, at the household level, or at the ndvdual level, the desgn weght s enough for all analyses, for both household ndcators and ndvdual ndcators. However, nonresponse s nevtable n all surveys, and dfferent unts have dfferent response behavors. The experence of the MEASURE DHS program shows that urban households are less lkely to respond to the survey than ther counterparts n rural areas, households n developed regons are less lkely to respond to the survey than ther counterparts n less-developed regons, rch households are less lkely to respond to the survey than poor households, ndvduals wth hgher levels of educaton are less lkely to respond to the survey than those wth lower levels of educaton, men are less lkely to respond to the survey than women, and so forth. The dea of correctng for unt non-response s to calculate a response rate for each homogeneous response group, then nflate the desgn weght by dvdng t by the response rate for each response group. The constructon of homogeneous response groups depends on the knowledge of the response behavor of the samplng unts. DHS surveys always use the samplng stratum as the response group because the stratfcaton s usually acheved by regroupng homogeneous samplng unts n a sngle stratum. It s possble to use a cluster as a response group, but the dsadvantage s that the response rates may vary too much at the cluster level, whch wll ncrease the varablty of the samplng weght; whch n turn ncreases the samplng varance. Furthermore, correcton of nonresponse at the cluster level wll nterfere wth self-weghtng f a self-weghtng sample has been desgned. By assumng that the response groups concde wth the samplng strata, the followng steps explan how to calculate the samplng weght by frst calculatng the varous response rates for unt non-response. Please note that the response rates calculated here are dfferent from the response rates calculated n Appendx A of DHS survey fnal reports. In Appendx A, household and ndvdual response rates are calculated as ratos of the number of ntervewed unts over the number of elgble unts because the am s just to show the results of survey mplementaton. Here we use weghted ratos because the am s to correct the desgn weght to compensate for non-response, therefore the desgn weght should be nvolved. Because a non-respondng unt wth a large samplng weght wll have a larger mpact on survey estmates than a non-respondng unt wth a small desgn weght, a weghted response rate for correcton of non-response s better than an un-weghted response rate. 24
35 1. Cluster level response rate Let n h be the number of clusters selected n stratum h; let ntervewed. The cluster level response rate n stratum h s therefore n * h be the number of clusters 2. Household level response rate Let mh cluster of stratum h; let R = n / n ch * h h be the number of households found (see Chapter 2, Secton 2.10 for defnton) n response rate n stratum h s calculated by where dh stratum h. * mh be the number of households ntervewed n the cluster. The household R / * hh = dhmh dhmh s the desgn weght of cluster n stratum h; the summaton s over all clusters n the 3. Indvdual response rate Let kh be the number of elgble ndvduals found n cluster of stratum h; let k * h be the number of ndvduals ntervewed. The ndvdual response rate n stratum h s calculated as R / = * ph d hkh d hk h where dh s the desgn weght of cluster n stratum h; the summaton s over all clusters n the stratum h. The household samplng weght of cluster n stratum h s calculated by dvdng the household desgn weght by the product of the cluster response rate and the household response rate, for each of the samplng stratum: D = d /( R Rhh), for cluster of stratum h. h h ch The ndvdual samplng weght of cluster n stratum h s calculated by dvdng the household samplng weght by the ndvdual response rate, or equvalently, by dvdng the household desgn weght by the product of the cluster response rate, the household response rate and the ndvdual response rate, for each of the samplng strata: W = D R = d /( R R R ), for cluster of stratum h. h h / ph h ch hh ph It s easy to see that the dfference between the household samplng weghts and the ndvdual samplng weghts s ntroduced by ndvdual non-response. The samplng weghts for households selected for the men s survey and for men can be calculated smlarly. We need a separate household samplng weght for the men s survey n cases where the men s survey s conducted n a sub-sample of households selected for the women s survey, and we suppose that the response behavor of households n the men s survey sub-sample may be dfferent from the overall household response rate. If no normalzaton s requested, we can stop here. The above calculated household samplng weght and ndvdual samplng weght can be used to produce any ndcators at the household level 25
36 and the ndvdual level, respectvely. As we mentoned earler n Secton , a samplng weght s an nflaton or extrapolaton factor. The weghted sum of households ntervewed T = * D h m h s an unbased estmate of the total number of ordnary resdental households of the country; where * mh s the number of households ntervewed n the th cluster of stratum h, and the summaton s over all clusters and strata n the total sample. Smlarly, the weghted sum of all ntervewed women W = * W h k h s an unbased estmate of the total women n the target populaton (women age 15-49) of the country; where k * h s the number of women ntervewed n the th cluster of stratum h, and the summaton s over all clusters and strata n the total sample Normalzaton of samplng weghts Normalzaton of samplng weghts s not necessary for survey data analyss. In order to prevent large numbers for the number of weghted cases n the tables n DHS survey fnal reports, t s the MEASURE DHS tradton to calculate normalzed standard weghts for both households and ndvduals. Wth the normalzed standard weght, the number of unweghted cases concdes wth the number of weghted cases at the natonal level for both total households and total ndvduals. The normalzed standard weght of a samplng unt s calculated based on ts samplng weght, by multplyng the samplng weght wth a unque constant at the natonal level. The constant or the normalzaton factor s the total number of completed cases dvded by the total number of weghted cases (based on the samplng weght). Ths number s equal to the estmated total samplng fracton because the total number of weghted cases wth the samplng weght s an estmaton of the total target populaton. Therefore the standard weghts n the DHS data fles are relatve weghts. Relatve weghts can be used to estmate means, proportons, rates and ratos because the normalzaton factor s cancelled out when used n both numerator and denomnator, so t has no effect on the calculated ndcator values. Ths pont also explans why the normalzaton must be done at the natonal level and not the regonal level: at the regonal level, the normalzaton factor cannot be cancelled out, and bas wll be ntroduced n the calculated ndcator values. Because the normalzed standard weghts have no scale, they are not vald for estmatng totals. Also the normalzed weght s not vald for pooled data, even for data pooled for women and men n the same survey, because the normalzaton factor s country and sex specfc. 1. Normalzed household standard weght 4 The normalzaton factor for calculatng household standard weght s calculated as FH = m D m * * h / h h The household standard weght for cluster n stratum h s calculated by * HV 005 = = * h Dh FH Dh mh / Dhmh 4 The MEASURE DHS program has developed Excel templates for facltatng standard weght calculatons. If all desgn parameters and the survey results (number of households found and ntervewed, number of elgble women found and ntervewed, number of elgble men found and ntervewed, number of elgble women and men found and tested, by cluster) are provded n the nput page, the standard weghts wll be calculated automatcally n dfferent pages. 26
37 where HV005 s the household standard weght varable n the DHS Recode data fles. It s easy to see that the weghted sum of households ntervewed by usng the standard weght equals the unweghted sum of households ntervewed for the total sample. Ths condton wll not be met at the doman level or for sub-populatons. At the doman level, the weghted sum of households ntervewed may be larger or smaller than the unweghted sum of households ntervewed, dependng on whether the doman s undersampled or oversampled. 2. Normalzed women s standard weght The normalzaton factor for calculatng the women s standard weght s calculated as FW = k W k * * h / h h The women s standard weght for cluster n stratum h s calculated by * V 005 = = * h Wh FW Wh kh / Whk h where V005 s the women s standard weght varable n the DHS Recode data fles. The standard weghts for households selected for the men s survey and for men can be calculated n a smlar way Standard weghts for HIV testng The samplng weghts for HIV testng are calculated separately for women and men, but they are calculated usng the same methodology. The only dfference s n the calculaton of the normalzaton factors, f a normalzed weght s requested. In order to calculate the weghted HIV prevalence for women and men together usng a normalzed weght, the standard weght for HIV testng must be normalzed for women and men together. In most DHS surveys, HIV testng s conducted n the same subsample of households selected for men s survey, and every woman or man n the household who s elgble for the ndvdual ntervew s elgble for HIV testng. Once the household samplng weght for the men s survey s calculated usng the procedures stated n Secton , the samplng weghts for HIV testng for women and men may be calculated separately by correctng the household samplng weght for the non-response rates of women and men for HIV testng, respectvely. For smplcty, let MDh be the household samplng weght n cluster of stratum h for the men s survey sub-sample, the response rates to HIV testng for women and men are calculated respectvely by WR / MR / * h = MDhWHIVh MDhWHIVh * h = MDhMHIVh MDhMHIVh * where WHIV h s the number of women elgble for HIV testng, and WHIV h s the number of women * tested wth a vald test result, n cluster of stratum h; MHIV h and MHIV h are the number of men elgble and the number of men tested wth a vald test result, respectvely, n cluster of stratum h. The samplng weghts for HIV testng for women and men, respectvely, are calculated by HIV / W M h = MDh WRh, HIV h = MDh / MRh 27
38 In cluster of stratum h, the normalzed standard weghts for HIV testng for women and men, respectvely, are calculated by * * W * M * ( WHIVh + MHIVh ) /( HIVh WHIVh + HIVh MHIV ) * * W * M * ( WHIV + MHIV ) /( HIV WHIV + HIV MHIV ) W W HIV 05h = HIVh h M M HIV = HIV h h h h h 05h h h where the double summatons are over all clusters and strata n the total sample De-normalzaton of standard weghts for pooled data For all of the DHS data, the weght varables HV005 (household standard weght), V005 (women s standard weght) and MV005 (men s standard weght) are relatve weghts whch are normalzed so that the total number of weghted cases s equal to the total number of unweghted cases, for the three knds of unts. In some stuatons, such as analyses nvolvng data from more than one survey, data users may need the un-normalzed samplng weght for analyzng pooled data. As mentoned n Secton , snce normalzaton s country specfc and sex specfc, t s necessary to de-normalze the standard weghts provded n the DHS Recode data fles for analyzng pooled data. The normalzaton procedure conssts of multplyng the samplng weght by a normalzaton factor for the total sample. The normalzaton factor s the estmated total samplng fracton: the number of completed cases dvded by the number of weghted cases by usng the samplng weght, for each knd of samplng unt. The weghted number of cases wth samplng weght s an estmaton of the total target populaton. Therefore, n order to de-normalze a normalzed weght, smply dvde the normalzed weght by the total samplng fracton. The estmated total samplng fracton s usually not provded n the DHS data fle or n the fnal report. In order to calculate the total samplng fracton, t s necessary to know the total target populaton at the tme of the survey. The total target populaton at the tme of the survey s easy to get from varous sources. The country s statstcal offce, the Unted Natons Populaton Dvson s (UNPD) World Populaton Prospects 5, and the Unted Natons Populaton Fund (UNFPA) are three sources that may be easy to access. As mentoned above, f pooled data analyss s requred, the standard weght varables HV005, V005 and MV005 must be rescaled or de-normalzed. The de-normalzaton procedure s the nverse of the normalzaton procedure: that s, multply the standard weght by the target populaton and dvde by the number of completed cases, for each survey. The de-normalzed weghts for households, women and men (HV005*, V005*, and MV005*, respectvely) can be calculated usng the followng formulas: HV005* = HV005 (total number of resdental households n the country)/ (total number of households ntervewed n the survey) V005* = V005 (total female populaton n the country)/ (total number of women ntervewed n the survey) MV005* = MV005 (total male populaton (15-59) n the country)/ (total number of men (15-59) ntervewed n the survey)
39 If normalzed weghts are preferred, the above re-scaled weghts can be re-normalzed by multplyng by the total number of completed women s and men s ntervews combned, dvdng by the total number of weghted cases combned, and applyng the above re-scaled weghts to the pooled data. Note that the normalzaton of samplng weghts s done for the total sample for households, women and men separately. If the am s to tabulate ndcators for a certan sub-populaton from pooled data, for example, vaccnaton coverage for chldren months, the de-normalzaton has nothng to do wth the total populaton of chldren months because there s no standard weght calculated for chldren months n DHS surveys. If the ndcator s tabulated at the household level usng the household weght, the household standard weghts must be de-normalzed for all of the surveys ncluded n the analyss as explaned above; lkewse, f the ndcator s tabulated at the ndvdual level usng the women s (or chld s mother s) weght, the women s standard weghts must be de-normalzed for each of the surveys Calbraton of samplng weghts n case of bas Generalzed calbraton (Devlle and Särndal, 1992; Devlle et al, 1993) has now become a popular and powerful framework n survey data analyss for statstcal offces n many countres. It allows for the utlzaton of dfferent sources of auxlary nformaton to mprove estmates from sample surveys. Calbraton can reduce samplng errors, can correct bas caused by non-response and other non-samplng errors, and can reduce the nfluence of extreme values. Calbraton s a weght tunng procedure such that the tuned samplng weght can produce estmates wthout error for known populaton characterstcs. The precson of an estmator usng a calbrated weght s equvalent to a regresson estmator but s much easer to calculate wth the help of calbraton software such as CALMAR, a SAS Macro procedure developed by the French Insttute of Statstcs and Economc Studes (INSEE), and the SPSS procedure developed by Statstcs Belgum. DHS surveys employ calbraton of samplng weghts only n cases where serous bas s observed n the collected data, and there s relable auxlary nformaton avalable for the calbraton. Let X be a multvarate auxlary varable wth p components such that the populaton totals of each of the component varables are known beforehand from the recent populaton census, that s, τ t x = X = ( t x, t,..., ) 1 x t 2 x s known. Let x P be the observatons of the auxlary varables from the survey U τ x = ( x1, x 2,..., x p ) for the respondent samplng unt. Let D be the samplng weght for unt. The calbraton procedure conssts of modfyng the samplng weght slghtly from D to such that a gven dstance measure between the samplng weghts D and the calbrated weghts s mnmzed under the constrants s g ( W, D ) W x = s t x W W where g s a dstance functon whch measures the dstance between D and W. The constrants mposed are that the known auxlary varable totals are estmated wthout error wth the calbrated weghts. If the varable of nterest s well correlated wth the auxlary varables, then we expect that the precson can be greatly mproved for estmatng the varable of nterest. The calbraton theory states that the calbrated weghts have the followng formula W = D F ( q x λ(s) ) τ 29
40 where F () s called the calbraton functon whch s the recprocal of the dervatve of the dstance functon g; q s a calbraton weght whch s usually set to 1 n the lack of pror knowledge; λ(s) s a τ τ constant dependng on the partcular sample s whch s to be solved. When F ( x λ( s) ) (1 + q x λ( s) =, whch corresponds to one of the fve proposed calbraton functons n Devlle et al, 1993, t s easy to solve, λ (s) s gven by λ( s) 1 = T s ( t x tˆ πx ) wth T s = s D q x x For a gven varable of nterest y, the calbrated estmator of the populaton total s equvalent to the generalzed regresson estmator where 1 s s tˆ y = W y s = tˆ πy τ ˆτ + B ( t B ˆ s = T qd xy s the sample estmaton of the regresson coeffcent; tˆ π and tˆ y π x are the smple estmators usng the samplng weght tˆ π y = Dy, tˆ π x = s s x s tˆ πx D x A mean estmaton of the varable of nterest y can be calculated by ) Ŷ = s s W y W The calbraton estmator can be equvalently formulated wth known proportons of one or more auxlary varables. The calbraton can be conducted at the ndvdual level, whch wll result n an ndvdual specfc weght, or t can be conducted at the cluster level wth aggregated data, whch wll result n a cluster weght. For more detals see the related references gven n the end of ths document Data qualty and samplng error reportng Data qualty s always a major concern for all MEASURE DHS projects. Though numerous efforts are made n mplementng DHS surveys to maxmze the qualty of the data collected, nonsamplng errors are always the man concerns for data qualty. Data qualty of a survey drectly affects the relablty of the statstcs produced. Many countres have laws that requre reports of survey fndngs to nclude an evaluaton of data qualty and relablty. Data qualty can be measured by total survey error ncludng bas ntroduced by varous samplng and non-samplng errors. DHS survey fnal reports usually nclude tables n an appendx for data qualty evaluaton purposes, ncludng: age dstrbutons of household populaton by sex; age dstrbutons of elgble and ntervewed women and men; completeness of reportng on date of brth, age at death, age/date at frst unon, educaton and anthropometrc measures, etc. The MEASURE DHS program also conducts some n-depth studes on data qualty for specfc topcs, whch are provded n publshed reports. Apart from the data qualty tables, DHS survey fnal reports provde samplng errors for selected ndcators n Appendx B. Samplng errors are mportant relablty measures whch tell the user the degree of error assocated wth a partcular estmated ndcator value, the number of cases nvolved n the calculaton of the ndcator, the effcency or clusterng effects of the sample desgn compared to a smple random samplng and the range for the true value of an ndcator at a certan 30
41 confdence level. The reader s referred to Chapter 4, Secton 4.2 for more detals on samplng errors and ther calculaton. DHS survey fnal reports also provde an appendx on the sample desgn of the survey. The sample desgn document reports the survey methodology used for the survey, ncludng the am of the survey, the target populaton, the sample sze, the reportng domans, the stratfcaton and sample allocaton, sample selecton procedure, samplng weght calculaton, correcton for non-response, calbraton of samplng weghts, and the results of survey mplementaton. See Chapter 5, Secton 5.2 for more detals on sample desgn Sample documentaton The task of a samplng statstcan does not end wth the selecton of the sample. The preservaton of samplng documentaton s an essental requste for samplng weght calculaton, for samplng error computaton, for data qualty evaluaton, for lnkage wth other data sources, and for varous knds of checks and supplementary studes. Specal efforts are needed at the tme of the sample desgn, at the end of the feldwork, and at the completon of the data fle f the task of sample documentaton s to be carred out effectvely. If preservaton of documentaton s delayed, consderable effort wll be requred to reconsttute the mssng nformaton when t s needed. The sample documentaton must comply wth the survey confdentalty requrements. When HIV testng s conducted n a DHS or AIS (AIDS Indcator Survey), the confdentalty gudelnes requre the complete destructon of all ntermedate documents whch can potentally be used to dentfy any sngle household or ndvdual who partcpated n the testng. Ths requrement renforces the mportance of tmely sample documentaton. See Chapter 5 for detaled requrements n sample documentaton Confdentalty The fnal data fles for DHS surveys are made avalable to nterested researchers. Therefore, the confdentalty of prvate nformaton collected from ndvdual respondents s a major concern, especally when senstve nformaton such as sexual actvty and HIV status are collected. Protectng the confdentalty of the ndvdual respondent s not only an ethcal oblgaton, but t also promotes more accurate data because respondents are more lkely to provde truthful responses f they feel confdent ther nformaton wll be kept prvate. DHS surveys follow strct rules mposed at varous steps durng the survey mplementaton to prevent the drect or ndrect dsclosure of the dentty of ndvdual respondents. The prncpal peces of nformaton that can ndrectly dentfy an ndvdual respondent are cluster number, household number, the cluster selecton probablty and the samplng weghts. The cluster number s an mportant dentfer for samplng error calculatons; the household number s mportant for household level and ndvdual level data management and tabulaton; the cluster selecton probablty s useful for cluster level modelng; and samplng weghts are necessary for all analyss. So these varables must be present n the fnal data fle. The household number n the fnal DHS data fle s not nformatve, and samplng weghts are not nformatve after correcton of non-response and normalzaton. The cluster selecton probablty s potentally nformatve only f lower level dentfcaton nformaton such as dstrct and localty are present, and DHS survey fnal data fles do not provde geographc nformaton below the level of regon or survey doman, especally when HIV testng s conducted. Thus the only concern s the dsclosure of the cluster. For DHS or AIS surveys wth HIV testng, the fnal data fles provde scrambled cluster and household numbers for further nsurance aganst dsclosure. 31
42 2 HOUSEHOLD LISTING OPERATION 2.1 Introducton DHS surveys are natonwde sample surveys desgned to provde nformaton on the levels of fertlty, nfant and chld mortalty, use of famly plannng, knowledge and atttudes toward HIV/AIDS and other sexually transmtted nfectons (STI), and on other famly welfare and health ndcators. The surveys generally ntervew women age and men age (15-49 or n some surveys). The women and men to be ntervewed lve n ordnary resdental households whch are randomly selected from a set of sample ponts consstng of clusters of households. Pror to ntervewng, all households located n the selected clusters wll be lsted. The lstng of households for each cluster wll be used n selectng the fnal sample of households to be ncluded n the DHS survey. The lstng operaton conssts of vstng each cluster, recordng on lstng forms a descrpton of every structure together wth the names of the heads of the households found n the structure, and drawng a locaton map of the cluster as well as a detaled sketch map of all structures resdng n the cluster. These materals wll gude the ntervewers to fnd the pre-selected households for ntervewng and wll allow feld work supervsors to perform qualty control durng data collecton. The followng sectons present the general gudelnes for conductng a household lstng operaton. Modfcatons may be needed to adapt to country specfc stuatons. 2.2 Defnton of terms Followng are bref defntons of the terms used n ths document. A census Enumeraton Area (EA) s a geographcal statstcal unt created for a census and contanng a certan number of households. An EA s usually a cty block n urban areas and a vllage, a part of a vllage or a group of small vllages n the rural areas wth ts locaton and boundares well defned and recorded on census maps. A cluster s the smallest geographcal survey statstcal unt for DHS surveys. It conssts of a number of adjacent households n a geographcal area. For DHS surveys, a cluster corresponds ether to an EA or a segment of a large EA. EA. A base map s a reference map that descrbes the geographcal locaton and boundares of an A structure s a free-standng buldng or other constructon that can have one or more dwellng unts for resdental or commercal use. Resdental structures can have one or more dwellng unts (for example: sngle house, apartment structure). A dwellng unt s a room or a group of rooms normally ntended as a resdence for one household (for example: a sngle house, an apartment, a group of rooms n a house); a dwellng unt can also have more than one household. A household conssts of a person or a group of related or unrelated persons, who lve together n the same dwellng unt, who acknowledge one adult male or female 15 years old or older as the head of the household, who share the same housekeepng arrangements, and are consdered as one unt. In some cases one may fnd a group of people lvng together n the same house, but each person has separate eatng arrangements; they should be counted as separate one-person households. Collectve lvng arrangements such as army camps, boardng schools, or prsons wll not be consdered as households. Examples of households are: 32
43 a man wth hs wfe or hs wves wth or wthout chldren a man wth hs wfe or hs wves, hs chldren and hs parents a man wth hs wfe or hs wves, hs marred chldren lvng together for some socal or economc reasons (the group recognze one person as household head) a wdowed or dvorced man or woman wth or wthout chldren The head of household s the person who s acknowledged as such by members of the household and who s usually responsble for the upkeep and mantenance of the household. A locaton map s a map produced n the household lstng operaton whch ndcates the man access to a cluster, ncludng man roads and man landmarks n the cluster. Sometmes t may be useful even to nclude some mportant landmarks n the neghborng cluster. A sketch map s a map produced n household lstng operaton, wth locaton or marks of all structures found n the lstng operaton whch helps the ntervewer to relocate the selected households. A sketch map also contans the cluster dentfcaton nformaton, locaton nformaton, access nformaton, prncpal physcal features and landmarks such as mountans, rvers, roads and electrc poles. 2.3 Responsbltes of the lstng staff Persons recruted to partcpate n the household lstng operaton wll work n teams consstng of two enumerators. A coordnator wll montor the entre operaton. The responsbltes of the coordnator are to: 1) obtan base maps for all the clusters ncluded n the survey; 2) arrange for the reproducton of all lstng materals (lstng manuals, mappng and lstng forms); the map nformaton forms and the household lstng forms must be prepared n suffcent numbers to cover all of the clusters to be vsted. 3) assgn teams to clusters; 4) montor the recepton of the completed lstng forms at the central offce; and 5) verfy that the qualty of work s acceptable. If GPS coordnates are beng collected durng the lstng operaton, the coordnator must also: 6) obtan one GPS recever per lstng team, plus two backup recevers, and tag each GPS recever wth a number; 7) ensure that all GPS recevers have the correct settngs (see Secton 2.6 below) and dstrbute a recever to each feld team; 8) obtan and copy all GPS tranng materals for lstng staff; and 9) tran all lstng staff to record GPS wayponts n the GPS unts as well as on Form DHS/1. 33
44 The responsbltes of the enumerators are to: 1) dentfy the boundares of the cluster; 2) draw a locaton map showng the locaton of the cluster; 3) draw a detaled sketch map of the cluster showng the locatons of all structures resdng n the cluster; 4) lst all the households n the cluster n a systematc manner; 5) communcate to the coordnator problems encountered n the feld and follow hs nstructons. 6) transfer the completed lstng forms to the coordnator or to the central offce; If GPS coordnates are beng collected durng the lstng operaton, enumerators must also: 7) capture and record the GPS waypont of the center of the cluster; and 8) complete the porton of form DHS/1 desgnated for GPS nformaton for each cluster. The two enumerators n each team should work together at the same tme n the same area. They wll frst dentfy the cluster boundares together. Then one enumerator prepares the locaton and the sketch map whle the other does the household lstng. The materals needed for the household lstng operaton are: Manual for Household Lstng Base map of the area contanng the cluster Map Informaton Form (Form DHS/1) Household Lstng Form (Form DHS/2) Segmentaton form (Form DHS/3) If GPS coordnates are to be recorded durng the lstng operaton, the followng addtonal materals are needed: GPS recevers, batteres and cables GPS tranng manuals and handouts 2.4 Locatng the cluster The coordnator wll provde the lstng team wth a base map contanng the cluster assgned to the team. The lstng team wll typcally make two tours of the cluster: the frst to dentfy the cluster boundares and to create the locaton map, and the second to create the lstng and draw the sketch map. Upon arrval n a cluster, the team should frst contact the local authortes for help n dentfyng the boundares and get general nformaton on the cluster, for example, the rough number of resdental households n the cluster. In most cases, the cluster boundares follow easly recognzable natural features such as streams or rvers, and constructon features such as roads or ralroads. In some cases, the boundares may not be marked wth vsble features (especally n rural areas), attenton should be pad to locate the cluster boundares as precsely as possble accordng to the detaled descrpton of the cluster and ts base map. Before dong the lstng, the team should tour the cluster to determne an effcent route of travel for lstng all of the structures. The cluster should be dvded nto parts f possble. A part can be 34
45 a block of structures. The lstng team wll make a locaton map of the cluster ndcatng the boundares of the parts, as well as the relatve locaton of landmarks, publc structures (e.g., schools, relgous structures, publc offces and markets) and man roads. Ths locaton map wll serve as a gude for the ntervewng team when they begn data collecton. 2.5 Preparng locaton and sketch maps The coordnator wll desgnate one enumerator of the team as the mapper. The second enumerator wll be the lster. Although the two have separate tasks to perform, they must move together and work n close cooperaton; the mapper prepares the maps, and the lster collects nformaton on the structures (and correspondng households) ndcated on the sketch map. The mappng of the cluster and the lstng of the households should be done n a systematc manner so that there are no omssons or duplcatons. If the cluster conssts of a number of blocks, then the team should fnsh each block before gong to the next adjacent block. Wthn each block, start at one corner of the block and move clockwse around t. In rural areas where structures are frequently found n small groups, the team should work n one group of structures at a tme and n each group they can start at the centre (choosng any landmark, such as a school, to be the centre) and move around t clockwse. In the frst tour of the cluster, the mapper wll prepare a locaton map of the cluster on the Map Informaton Form (Form DHS/1). Frst, fll n the dentfcaton box for the cluster on the frst page. All nformaton needed for fllng n the dentfcaton box s provded by the coordnator. In the space provded on the second page, draw a map showng the locaton of the cluster and nclude nstructons on how to get to the cluster. Include all useful nformaton to fnd the cluster and ts boundares drectly on the map and n the space reserved for observatons f necessary. In the second tour of the cluster, usng the thrd page of the Map Informaton Form, the mapper wll draw a sketch map of all structures found n the cluster, ncludng vacant structures and structures under constructon. It s mportant that the mapper and lster work together and coordnate ther actvtes, snce the structure numbers that the mapper ndcates on the sketch map must correspond to the seral numbers assgned by the lster on the lstng form for the same structures. On the sketch map, mark the startng pont wth a large X. Place a small square at the spot where each structure n the cluster s located. For any non-resdental structure, dentfy ts use (for example, a store or factory). Number all structures n sequental order begnnng wth "1". Whenever there s a break n the numberng of structures (for example, when movng from one block to another), use an arrow to ndcate how the numbers proceed from one set of structures to another. Although t may be dffcult to pnpont the exact locaton of the structure on the map, even an approxmate locaton s useful for fndng the structure n the future. Add to the sketch map all landmarks (such as a park), publc structures (such as a school or church), and streets or roads. Sometmes t s useful to add to the sketch map landmarks that are found outsde the cluster boundares, f they are helpful n dentfyng other structures nsde the cluster. Use the marker or chalk provded to wrte on the entrance to the structure the number that has been assgned to the structure. Remember that ths s the seral number of the structure as assgned on the household lstng form, whch s the same as the number ndcated on the sketch map. In order to dstngush the number from other numbers that may exst already on the door of the structure, wrte DHS n front of the number, for example, for the structure number 5, wrte DHS/5, smlarly on the door of structure number 44 wrte DHS/44. A structure s called a mult-unt structure f t contans more than one household n the structure. Otherwse t s called a sngle-unt structure. All households found n a structure or mult- 35
46 unt structure must be numbered from 1 to m, wthn the structure 6. The structure number plus the household number form a unque dentfcaton number for a household, and for all of the households n the cluster. For example, household number 3 n structure number 44 would be unquely dentfed wth ID number DHS/44-3. It s very useful to wrte the household ID number at the entrance of the household to later assst the ntervewer to dentfy the household for ntervew. 2.6 Collectng a GPS waypont for each cluster A GPS waypont s a lattude and longtude readng that represents a locaton. For some surveys, GPS data for EAs are avalable from the census. However, f the data are not avalable, or are of questonable qualty, one GPS waypont for each cluster should be recorded durng the lstng phase of the survey. These wayponts are recorded usng a GPS unt (a Garmn ETREX unt s used n ths gude) and data collecton forms. If GPS unts other than the Garmn ETREX are used, ths gude wll stll be useful; however, some of the nstructons may not apply due to dfferences n desgn and menus. The Garmn ETREX owner s manual may be useful to consult on the bascs of the GPS unt. Take one readng for each cluster. The GPS wayponts wll be captured by the mapper whle he s mappng the clusters. One GPS waypont must be taken for each cluster, and n the case of large clusters whch are beng segmented, one pont should be taken for each segment selected for lstng. In DHS surveys, clusters are usually census EAs, sometmes vllages n rural areas or cty blocks n urban areas. Collectng only one waypont for the cluster greatly reduces the chance of compromsng confdentalty of the respondents and at the same tme s suffcent to allow for the ntegraton of multple datasets for further analyss. The DHS cluster waypont should always be taken at the geographc center of the cluster or segment. If the cluster s segmented, the pont should be taken for the segment chosen by the Mappng and Lstng Coordnator to be ncluded n the survey. Save the waypont and record the lattude, longtude, and alttude. The lattude, longtude, and alttude readng for a locaton are stored n two places: n the GPS unt s memory and on the DHS/1 paper form. GPS unts can be broken or lost, and experence has shown that a hardcopy backup s essental. In addton, the paper form provdes a backup should the data n the GPS unt be changed, deleted, or msdentfed (.e., the operator names the cluster ncorrectly n the unt). Each poston saved n the GPS unt s called a waypont, and each waypont has a unque name. If possble, the waypont ID should be the same as the DHS cluster number. If t s not possble, the waypont ID should be unque to the cluster and recorded on Form DHS/1 (do not record the same waypont ID for two dfferent clusters). When a waypont s saved, the GPS unt assgns t a default name. The mapper must edt the default name and change t to the 6-dgt DHS cluster ID number. For example, the waypont for DHS cluster 101 would be named Cluster 1101 would be named After savng the waypont, the mapper wll use the dentfcaton box of the Map Informaton Form (Form DHS/1) to record the lattude, longtude, and alttude for the cluster and segment on paper. Frst, the mapper wll wrte down the lattude and longtude coordnates n decmal degree format and alttude n meters n the Identfcaton Box on the Locaton Map Cluster Form (DHS/1). Second, the mapper wll draw a crcle, n the mddle of the cluster/segment, at the locaton where he/she captured the waypont. After the lstng s complete, the GPS unts must be collected as soon as possble and returned to the samplng offce by the Mappng and Lstng Coordnator. The wayponts wll then be downloaded and examned for problems by the desgnated samplng staff. The Samplng Coordnator should desgnate one member of the Data Processng Team to receve and process the GPS waypont fle and then gve the fle to survey manager. 6 Ths number s dfferent from the household number later gven to all of the households lsted n the whole cluster just pror to household selecton. 36
47 In most stuatons, the Mappng and Lstng Coordnator wll be responsble for provdng the lstng teams wth a GPS unt pror to the lstng. Before these unts are dstrbuted they should be set up for use by the lsters. For DHS surveys, the only format whch s acceptable s Decmal Degrees, regardless of what geographc standards may be n use for other purposes. To set the format, enter the SETUP menu and n the UNITS sub-menu, select the tem POSITION FRMT and press the ENTER button. Select hddd.ddddd Decmal Degrees, whch s the frst tem. Once hddd.ddddd s hghlghted, press the ENTER button. It s mportant that all the GPS unts be set up n the same way so that the wayponts returned at the end of the survey are all n the same format. For more detals on how to properly prepare the GPS unts for waypont collecton, please refer to the DHS Manual for GPS Data Collecton. 2.7 Lstng of households The lster wll use the Household Lstng Form (Form DHS/2) to record all households found n the cluster. Begn by enterng the dentfcaton nformaton for the cluster. The frst two columns are reserved for offce use only leave them blank. Complete the rest of the form as follows: Column (1) [Seral Number of Structure]: For each structure, record the same structure seral number that the mapper enters on the sketch map. All the structures recorded on the sketch map (except the landmarks) must be recorded on the lstng form and numbered. Column (2) [Address/descrpton of Structure]: Record the street address of the structure. Where structures do not have vsble street addresses (especally n rural areas), gve a descrpton of the structure and any detals that help n locatng t (for example, n front of the school, next to the store, etc.). Column (3) [Resdence Y/N]: Indcate whether the structure s used for resdental purposes (eatng and sleepng) by wrtng Y for Yes. In cases where a structure s used for commercal or other purposes, wrte N for No. Structures used both for resdental and commercal purposes (for example, a combnaton of store and home) should be classfed as resdental (.e. mark Y n column 3). Make sure to lst any household unt found n a nonresdental structure (for example, a guard lvng nsde a factory or n a church). Also do not forget to lst vacant structures and structures under constructon, and n Column (6) gve some explanaton (for example: vacant, under constructon, etc.) All structures seen n the cluster should be recorded on the sketch map of the cluster and n the lstng. Column (4) [Seral Number of Household n Structure]: Ths s the seral number assgned to each household found n the structure; there can be more than one household n a structure. The frst household n the structure wll always have number 1. If there s a second household n the structure, then ths household should be recorded on the next lne, a 2 s recorded n Column (4), and Columns (1) to (3) repeat the structure number and address or are left blank. Column (5) [Name of Head of Household]: Wrte the name of the head of the household. There can only be one head per household. If no one s home or the household refuses to cooperate, ask neghbors for the name of the head of the household. If a name cannot be determned, leave ths column blank. Note that t s not the name of the landlord or owner of the structure that s needed, but the name of the head of the household that lves there. Column (6) [Observatons/Occuped or not]: Ths space s provded for any specal remarks that mght help the coordnator decde whether to nclude a household n the household 37
48 selecton or not, and mght also help the ntervewng team locate the structure or dentfy the household durng the man survey feldwork. If the structure s an apartment block or block of flats, assgn one seral number to the entre structure (only one square wth one number appears on the sketch map), but complete Columns (2) through (6) for each apartment n the structure ndvdually. Each apartment should have ts own address, whch s the apartment number wthn the structure. The lstng team should be careful to locate hdden structures. In some areas, structures may have been bult so haphazardly that they are easly mssed. In rural areas, structures may be hdden by tall grasses and trees. If there s a pathway leadng from the lsted structure, check to see f the pathway goes to another structure. Talkng wth people lvng n the area may help n dentfyng the hdden structures. 2.8 Segmentaton of large clusters A certan number of the selected EAs may be very large n populaton sze. A complete lstng of EAs that are very large may not be feasble for the survey. These EAs should be subdvded nto several smaller segments, only one of whch wll be ncluded n the survey and lsted. In ths case, the DHS cluster corresponds to a segment of an EA. When the team arrves n a large EA that may need segmentaton, t should frst tour the EA and make a quck count to get the estmated number of households resdng n the EA. There s no standard threshold for the sze of an EA that needs to be segmented, or for segment sze. But for effcency and accuracy consderatons, DHS recommends that f the EA sze s bgger than 300 households, then the team should communcate to the coordnator the cluster number, the estmated number of households and the suggested number of segments to be created. The fnal decson to segment an EA, and the number of segments to be created, can only be taken by the coordnator. Ideally, for ease of operaton, an EA would only need to be segments nto 2 segments, wth an deal segment sze of households n each segment. Dvdng an EA nto a large number of segments (more than 3) should be avoded f t s not really necessary n order to mnmze errors. In dvdng an EA nto segments, the deal would be to have segments of approxmately equal sze, but t s also mportant to adopt segment boundares that are easly dentfable. In the frst tour of the cluster draw a locaton map of the entre cluster. Usng dentfable boundares such as roads, streams, and electrc power lnes, dvde the EA nto the desgnated number of roughly equal-szed segments. On the locaton map of the EA, show clearly the boundares of the segments created. Number the segments sequentally. Estmate the relatve sze of each segment n the followng manner: quckly count the number of dwellngs n each segment, add up the total number of dwellngs n the EA and calculate the proporton of the dwellngs n the whole EA that are located n each segment. Example 2.1: A cluster of 620 dwellngs has been dvded nto 3 segments and the results are as follows: Segment 1: 220 dwellngs, or 220/620 = 35 percent Segment 2: 190 dwellngs, or 190/620 = 31 percent Segment 3: 210 dwellngs, or 210/620 = 34 percent Total: 620 dwellngs, or 620/620 = 100 percent On Form DHS/3 (Segmentaton Form) wrte the sze of the segments n the approprate columns (number and percent) and calculate the cumulatve sze of all of the segments n terms of a percentage. The cumulatve sze of the last segment on the lst must be equal to
49 Segment number Number of dwellngs Percent Cumulatve percent For each large EA to be segmented, a random number between 0 and 100 wll be selected n the central offce and ncluded n the fle. Compare ths random number wth the cumulatve sze. Select the frst segment for whch the cumulatve sze s greater than or equal to the random number. Random number: 67 Segment selected: Segment number 3 Proceed wth the household lstng operaton n segment number 3 as descrbed n the above sectons (see Appendx 2.3 for an example of how to complete the segmentaton form.) Draw a detaled sketch map of the selected segment and lst all the households found n the selected segment. 2.9 Qualty control To ensure that the work done by each lstng team s acceptable, qualty checks should be performed. The coordnator should tour the regons durng the household lstng operaton, and assess the qualty of the fnshed clusters. The coordnator should select a fnshed cluster and do an ndependent lstng of 10 percent of the cluster. If mportant errors are found, the whole cluster should be relsted. If the problem s related to systematc errors, and t s not possble to do correctons on the lstng forms, then all of the lsted clusters should be relsted Prepare the household lstng forms for household selecton Once the central offce receves the completed lstng materals for a cluster, they must frst assgn a seral number to all of the households n the cluster n the second column of the form DHS/2. Only occuped resdental households (ncludng households that refused to cooperate at the tme of lstng and households where the occupants were absent at the tme of lstng but would return shortly and would be at home durng the perod of household ntervew) wll be numbered. Ths s a contnuous seral number from 1 to the total number of occuped resdental households lsted n the cluster. Leave the cell n the second column blank f the household s not occuped, or f the structure s not a resdental structure. Fll n the second column only f the structure on that row s an occuped household. Make sure that the numberng of all occuped households follows sequentally from the prevous occuped household on the lst, wth no gaps or repettons n the numberng. See the example of a completed lstng form n Appendx 2.3. After assgnng the seral numbers to all households lsted n the cluster, copy the total number of households lsted to the column Number of households lsted n the Excel fle prepared for household selecton. Make sure ths number s recorded n the correct row for the cluster number. In the column Segmentaton nformaton record the percentage of the entre EA populaton that s ncluded n the selected segment. The segmentaton nformaton s mportant for correctly calculatng the samplng weghts. After the total number of households lsted n the cluster has been entered n the Excel fle, the spreadsheet automatcally generate the household numbers of those households selected to be ntervewed. Copy the numbers of the selected households to the frst column of the form DHS/2, correspondng to the seral number of the households n the lstng form. These are the households that must be ntervewed. It s recommended to use a dfferent colored pen on the lstng 39
50 forms to ndcate the households selected for ntervewng. It s also very helpful to use color on the cluster s sketch map to mark the structures where the selected households are located. In many surveys, a sub-sample of households wll be selected for the men s survey. The household selecton spreadsheet uses shaded columns to ndcate whch households are selected for the men s survey. Put a mark n the frst column on the form DHS/2 next to the number of the selected household to ndcate the households selected for the men s survey, or use a dfferent colored pen for the households selected for both men s and women s surveys. Make a copy of the whole package of fles (sketch maps and the lstng forms wth household selecton). Gve the orgnal to the ntervewng team for the household ntervew and keep the other copy n the central offce. 40
51 Appendx 2.1 Example lstng forms Form DHS/1 PAGE 1 of 3 Map Informaton Form Identfcaton Label Code Localty DHS Cluster Number... Urban/Rural (Urban=1/Rural=2)... EA Number... Dstrct Regon Name of Mapper Name of Lster GPS Unt Trackng Number... Waypont name (entered n GPS unt)... Lattude (North/South)... N / S Longtude (East/West)... E / W.. Alttude / Elevaton (Meters)... Observatons: Road access Other useful nformaton 41
52 Form DHS/1 Map Informaton Form PAGE 2 of 3 Localty Locaton map Dstrct DHS Cluster: 42
53 Form DHS/1 Map Informaton Form PAGE 3 of 3 Localty Sketch map of cluster Dstrct DHS Cluster: 43
54 44
55 Form DHS/3 Segmentaton Form Identfcaton Label Code Localty DHS Cluster Number... Urban/Rural (Urban=1/Rural=2)... EA Number... Dstrct Regon Name of Mapper Name of Lster Number of segments: Segment number Number of households Percent Cumulatve percent Random number: Segment selected: 45
56 Appendx 2.2 Symbols for mappng and lstng Orentaton to the North Boundares of the cluster Paved road Unpaved (drt) road Footpath Rver, creek, etc. Brdge Lake, pond, etc. Mountans, hlls Water pont (wells, fountan, etc.) Market School Admnstratve structure Church, temple Mosque Cemetery Resdental structure 46
57 Non-resdental structure Vacant structure Hosptal, clnc, etc. Electrc pole Tree or bush 47
58 Appendx 2.3 Examples of completed mappng and lstng forms 48
59 49
60 50
61 51
62 3 SELECTED SAMPLING TECHNIQUES In ths secton, some of the most commonly used samplng technques and ther applcaton are presented. The presentaton wll focus manly on practcal rather than theoretcal aspects. However, the chapter does touch on some basc theoretcal propertes of the technques used n the DHS surveys. We focus on wthout replacement samplng rather than wth replacement samplng procedures, snce the latter represents a reducton of effcency for samples of a fxed sze due to the potental that some samplng unts may be repeated. When ths occurs, the amount of nformaton carred n a fxed sze sample s reduced because the same samplng unt s selected several tmes. For readers who are nterested n the theoretcal aspects of the selected samplng technques, please refer to the textbooks dealng wth survey samplng theory lsted n the references. 3.1 Smple random samplng We begn wth smple random samplng wthout replacement (SRSWOR) snce ths s a fundamental samplng procedure that s used as standard to whch the effcency of other samplng procedures s compared. Smple random samplng wthout replacement s a selecton procedure where every unt has an equal chance of beng selected. Selecton can be performed through successve draws wthout replacement from a well-mxed contaner contanng all samplng unts, or usng certan computerzed algorthms to select from a lst of all samplng unts. Let N be the total number of samplng unts, let n be the total sample sze, n<n. The probablty of selecton for every th unt s gven by: P = The desgn weght (assumng no non-response) s gven by: n N D = 1 / P = The probablty for any partcular n dfferent unts selected together n a sample s s gven by: where n N n P s = 1 / N s the total number of combnatons of n elements out of N. Let N n y 1, y 2,... y be the observatons made from the selected unts on a varable of nterest, then the weghted sample mean whch s the same as the unweghted sample mean, y n n 1 n = Dy / D = y 1 1 n 1, 1 N s an unbased estmator of the populaton mean, Y = y 1, wth ts samplng varance gven by N 1 f 2 Vsrs ( y ) = S y n 2 1 N where S ( ) 2 y = 1 1 y Y s the fnte populaton varance of the varable y and f=n/n s the N samplng fracton. An unbased estmaton of ths varance can be made usng n 52
63 2 n where ( ) 2 υ 1 f n 2 ( y ) = srs s y 1 s y = 1 1 y y s the sample varance. When n and N are large, the standardzed n varable y Y SE( y ) follows a student-t dstrbuton wth n-1 degrees of freedom and SE ( y ) s the square root of ( y ) υ. Therefore the confdence lmts of the populaton mean Y can be constructed based on sample observatons allowng for 95% confdence that the true value of Y wll le wthn the range of y 1.96 * SE( y) and y * SE( y). DHS reports use y ± 2 * SE( y) for a conservatve estmate of 95% confdence lmts. Gven a complete lst of all samplng unts n a computerzed fle, the easest way to draw a smple random sample of sze n s to frst generate a unformly dstrbuted random number between 0 and 1 and assocate a number wth each of the samplng unts. Next, sort the fle based on the generated random numbers n ascendng order, and the frst n unts assocated wth the n smallest random numbers are the selected unts. Ths procedure provdes a SRSWOR sample of sze n. Ths procedure s easy to mplement, but requres sortng of the samplng frame. Snce sortng s tme consumng, the followng algorthm (Tllé, 2001) may be used wth the samplng frame wthout sortng: srs Defnton of terms and the ntal step k: the k th unt of the frame fle; j: the j th selected unt k = 0 j = 0 repeat f j < n generate a unformly dstrbuted random number between [0,1) n j f u < then N k unt k + 1 s selected; j = j + 1 else unt k + 1 s not selected k = k Equal probablty systematc samplng Samplng theory Systematc samplng (SYS) s the selecton of samplng unts at a fxed nterval from a lst, startng from a randomly determned pont. Selecton s systematc because selecton of the frst samplng unt determnes the selecton of the remanng samplng unts. Compared wth SRSWOR, systematc samplng has the followng advantages: 1) It s easer to perform; 2) It allows easy verfcaton of the selecton; 3) If the samplng frame s n some order, t provdes a stratfcaton effect wth respect to the varables on whch the frame s sorted, and wth a proportonal allocaton. Ths stratfcaton s called mplct stratfcaton. 53
64 4) Implct stratfcaton prevents unexpected concentraton of sample ponts n certan areas such as s possble wth SRSWOR. Because of these advantages, especally (3) and (4), systematc selecton s more often used than smple random samplng. Systematc samplng s normally carred out as follows: assumng a whole number nterval I=N/n, where N s the number of unts n the frame lst and n s the number of unts to be selected. The procedure begns wth an nteger random number S that s less than or equal to I. The unts to be selected are S, S+I, S+2*I,..., S+(n-1)*I. When I s not a whole number there may be apprecable errors n roundng t to the nearest whole number, t s suggested that the decmal nterval method be used. Selecton wth a decmal nterval may be carred out as follows: 1) Calculate the nterval I rounded to two decmal places. 2) Generate a random number R between 0 and 1 wth two decmal ponts. 3) Compute the sequence of samplng numbers: R*I, R*I + I, R*I + 2*I,..., R*I + (n - 1)*I 4) Round up the above calculated samplng numbers to the next hghest whole numbers; these are the selected unts numbers. Example 3.2.1: Let N=100, n=14, so that I=7.14; let the generated random number be R=0.96. The samplng numbers and the correspondng selected unt numbers are as follows: In ths example, the decmal nterval method gves a selecton nterval whch s sometmes 7 or sometmes 8. The household selecton templates are all programmed wth decmal samplng ntervals. Often sample desgn requres numerous systematc samples as s the case when a systematc sample of households s needed wthn each selected cluster. In ths stuaton a separate random start R should be determned ndependently for each cluster. Wth SYS, the probablty of selecton for any unt s gven by P = 1 = I The desgn weght (assumng no non-response) s gven by N D = 1 / P = n Let y 1, y 2,... y n be the observatons made from the selected unts on a varable of nterest, then the weghted sample mean whch s the same as the unweghted sample mean n N y n n 1 n = Dy / D = y 1 1 n 1 54
65 s an unbased estmator of the populaton mean 1 = N N Y y 1 samplng nterval I, the samplng varance of the sample mean s gven by V sys ( 1 1 / N) 2 y ( y ) = S [ 1 + ( n 1) ρ ] n. For smplcty, assumng an nteger 2 1 N where S ( ) 2 y = 1 1 y k Y s the populaton varance; ρ w s the correlaton coeffcent between N pars of unts n the same systematc sample. When ρ w s negatve, SYS s more precse than SRSWOR; when varance estmate ρ w s postve, SYS s less precse than SRSWOR. Unlke the case of SRSWOR, the υ 1 f n 2 ( y ) = sys s y 1 s y = 1 1 y y s the sample n υ y s the specal case of the recommended Hartley-Rao (1962) estmator n 2 n s not an unbased estmate of the samplng varance; where ( ) 2 varance. However, sys ( ) the case of un-equal probablty systematc samplng. ( y ) sys w υ s equvalent to treatng the systematc sample as f t was drawn by SRSWOR, and therefore s called an estmator wth smple random samplng approxmaton. Theoretcally, wth SYS there s no unbased estmator for the varance of the sample mean snce systematc samplng s equvalent to randomly selectng one sample among the I possble samples. Ths s a major drawback for the SYS. However, when the samplng unts n the frame fle do not present any lnear trend n the varable of nterest, nor perodc changes, or the unts are randomly y y. When there s a lnear trend ordered, υ ( ) s a good approxmaton of the samplng varance ( ) sys n the varable of nterest, assumng the selecton of the k th systematc sample, where the summaton s over non-overlappng successve unts, the followng estmator (Wolter, 1984; Wolter 1985) s a better approxmaton of V sys ( y ): V sys 1 f [ n / 2] ( y ) = ( y y ) * 1 2 sys + 1 k ( j 1)* I k + j* I υ n n However, when confdence lmts are requred, sys ( y ) * coverage rates of the true populaton mean. It should be noted that the propertes of ( y ) υ s preferred because of ts hgh υ are dfferent from the collapsed strata estmator for stratfed samplng wth one unt per stratum because the successve observatons n a SYS sample are probablty-one correlated, whle the collapsed strata estmator for stratfed samplng has a set of completely ndependent observatons. When n and N are large, the sample mean has the same asymptotc propertes as that of the smple random sample mean; therefore confdence ntervals can be constructed n a smlar way to those for a smple random sample Excel templates for systematc samplng The MEASURE DHS program has developed Excel templates that can be used for equal probablty systematc samplng of households. The templates can be used to perform smple selecton, selecton wth runs, self-weghtng selecton wthout sample sze control and self-weghtng selecton wth sample sze control. Fgure 3.1 below shows a porton of the smple selecton procedure wth a sample take of 20 households per cluster. The darker shaded areas requre data nput. The area to the sys 55
66 left of the column labeled, Num HH lsted s reserved for cluster IDs. Numbers for the selected households are shown to the rght of the column labeled Random (0-1). Fgure 3.2 below shows a porton of the selecton procedure wth runs of 4 households. Both selectons ncorporate a selecton of a sub-sample. Fgure 3.3 shows a smple self-weghtng selecton wth an average sample take of 20 households, wthout sample sze control, but wth the mnmum and maxmum number of sample takes of 10 and 30 households respectvely. Fgure 3.4 shows a self-weghtng selecton, wth runs, wth an average sample take of 20 households per cluster, wthout sample sze control, but wth mnmum and maxmum sample takes of 10 and 30 households respectvely; both of the selectons ncorporate a sub-sample of 10 households per cluster. Note that the selecton procedure wth runs s crcular, meanng that when the selecton nterval s not an nteger, and when the run s not a dvsor of the total number of households lsted, then the last selected household number may be smaller than the frst selected household number. Fgures 3.5 and 3.6 show self-weghtng selectons wth sample sze control; the control area s the samplng stratum. The dsadvantage of the self-weghtng selecton wth sample sze control s that the selecton procedure wll do the household selecton only f the household lstng results are entered for the entre control area. Ths condton may represent a constrant n some stuatons. Fgure 3.7 shows a manual selecton carred out n the feld that can be performed easly usng a smple calculator. If household selecton at the central offce s not feasble; the ntervewer can perform the household selecton n the feld. The numbers n red represent nformaton that s entered and the calculated terms. Ths procedure requres a tradtonal household lstng operaton where households are numbered and lsted on household lstng forms. Usng the total number of households lsted and the number of households to be selected, the ntervewer can frst calculate the selecton nterval then use the random number, R, assocated wth the selected cluster, to calculate the frst samplng number or term t 1 and enter the frst term to the cell for t 1. For the subsequent samplng numbers or terms, the ntervewer adds the samplng nterval to the prevous samplng number or term. After the calculaton of the samplng numbers, the ntervewer should round the samplng numbers to ntegers n the next column; these are the selected household numbers. The ntervewer s asked to copy the address and the name of the head of household of the selected households from the household lstng form. The household selecton form s subject to revew by the feld work supervsor. 56
67 Fgure 3.1 Smple household selecton wth a sub-sample HOUSEHOLD SELECTION Run sze 1 Sub-sample take per cluster Cluster Num Num HHs Lsted Num Selected Select nterval Random (0-1) C l u s t e r I D
68 Fgure 3.2 Selecton of runs wth a sub-sample HOUSEHOLD SELECTION Run sze 4 Sub-sample take per cluster Cluster Num Num HHs Lsted Num Selected Select nterval Random (0-1) C l u s t e r I D
69 Fgure 3.3 Smple self-weghtng selecton wthout sample sze control H o u s e h o l d s e l e c t I o n Average sample take 20 Ave. take for sub-sample 10 Col name for PSU proba Mn sample take 10 Col name for EA proba b Max sample take 30 Col name Num HH n base c Run sze Cluster num EA Proba HH n base Overall proba Segment nfo HH lsted Sample take Selecton nterval Random (0-1)
70 Fgure 3.4 Self-weghtng selecton wth runs and wthout sample sze control H o u s e h o l d s e l e c t I o n Average sample take 20 Ave. take for sub-sample 10 Col name for PSU proba Mn sample take 10 Col name for EA proba b Max sample take 30 Col name Num HH n base c Run sze Cluster num EA Proba HH n base Overall proba Segment nfo HH lsted Sample take Selecton nterval Random (0-1)
71 Fgure 3.5 Self-weghtng selecton wth sample sze control H o u s e h o l d s e l e c t I o n Num of HHs expected Num of HHs selected Man sample 620 Man sample Subsample Subsample Segment Num HHs Num of HHs Overall Random Cluster EA HH n Stratum num Probablty base nfo lsted selected Probablty (0, 1)
72 Fgure 3.6 Self-weghtng selecton wth runs and wth sample sze control H o u s e h o l d s e l e c t I o n Num of HHs expected Num of HHs selected Man sample 620 Man sample Subsample Subsample Segment Num HHs Num of HHs Overall Random Cluster EA HH n Stratum num Probablty base nfo lsted selected probablty (0, 1)
73 Fgure 3.7 Manual household selecton n the feld 63
74 3.3 Probablty proportonal to sze samplng Samplng theory In order to ncrease samplng effcency, a samplng procedure can attrbute dfferent selecton probabltes to dfferent samplng unts. In general, a large samplng unt wll contrbute more to the samplng varance f equal probablty selecton s used. If large samplng unts are selected wth larger chances, samplng varance may be greatly reduced. To the extreme, a good strategy s to select very large samplng unts wth certanty or wth a probablty of one. Assumng that each samplng unt has some knd of known measure of sze whch s postvely correlated wth the varable of nterest, a Probablty Proportonal to the measure of Sze (PPS) selecton has the same four advantages as SYS samplng. Ths procedure assgns each samplng unt a specfc chance to be selected n the sample before the samplng begns, and the chance s proportonal to ts measure of sze. Let M be the measure of sze of unt ; let N M 1 be the total measure of sze; let n be the desgn sample sze. A PPS samplng procedure wll select unt wth a probablty π such that π = nm M The desgn weght (assumng no non-response) s gven by D M = 1 / π = nm Let y 1, y 2,... y n be the observatons made from the selected unts on a varable of nterest, then the weghted sum of the observatons ˆ n = D y = n ypps 1 1 y π s an unbased estmator of the populaton total N y Y =. The varance of ths estmator s gven by 1 V N N 1 ( yˆ ) ( π π ) PPS = 2 = 1 j, j= 1 2 y y j π j j (Yates-Grundy, 1953) π π j where π j s the jont probablty of selectng unts and j together n a sample. If all the jont probabltes π > 0, then the above varance can be estmated unbased by: j ( y ) Vˆ ˆ PPS 1 2 n n = = 1 j, j= 1 π π j π j y y j π j π π j 2 (Yates-Grundy, 1953) However, the above estmator s not calculable because the jont probabltes π j are usually unknown. Hartley and Rao (1962) provded an approxmaton of the above estmator whch nvolves only the frst order selecton probabltes π : Vˆ HR n n 1 1 ( yˆ ) 1 ( + π ) PPS 2 n 1 = 1 j, j= 1 1 N 2 y y j π + j π (Hartley-Rao, 1962) 1 k n π π j But the Hartley-Rao estmator requres knowledge of the selecton probablty of all samplng unts n the populaton (through N 2 π ) whch s usually not calculated n the sample selecton. The general 1 k 2 64
75 documentaton just keeps the selecton probablty for the selected unts. By replacng N 2 π by ts 2 nπ sample estmaton = π n 1 1 π, the Hartley-Rao estmator can be further smplfed (Ren, 2003) Vˆ R n n ( yˆ ) ( 1 π ) PPS = n 1 1 y ˆ ypps π n In the case of equal probablty samplng, both V ˆ ( ˆ ) and V ( ˆ ) HR y PPS 2 ˆ wll be reduced to the varance estmator wth smple random samplng approxmaton. Suppose that π < 1 for all, both Yates-Grundy and Hartley-Rao estmators may produce negatve varance estmaton, whle V ( ˆ ) always postve. R y PPS 1 k ˆ s R y PPS Wolter (1984; 1985) conducted an extensve study on the varance estmaton for systematc * υ y. He recommends the use of samplng, ncludng the successve dfference estmator smlar to ( ) the Hartley-Rao estmator f the populaton does not present any trends n the measure of sze varable and the varable of nterest, especally when a confdence nterval s requred. The above results for populaton total estmaton can be adapted to mean estmaton: y n n n y n 1 PPS = D y / D = / π 1 π y PPS s an approxmately unbased estmator for the populaton mean wth approxmate varance gven by: V N N 1 ( y ) ( π π π ) PPS = N sys y Y y j Y π π 2 j j = 1 j = 1 j If the unts are not specally ordered accordng to the varable of nterest n the samplng frame, the approxmate sample varance of the estmator can be estmated by Vˆ R ( y ) PPS = 1 n 2 n 1 n ( D ) 1 n 1 ( 1 π ) y ˆ ypps π n The above estmator wll be reduced to the smple random samplng approxmaton ( y ) equal probablty systematc samplng Operatonal descrpton and examples 2 υ n case of There are many ways to draw a PPS sample, but the easest way s the PPS systematc samplng summarzed n the followng: 1) Lst the samplng unts wth ther measure of sze M sys k 2) Calculate the cumulatve measure of sze C k = M 1 for each unt k, and check that the last entry C N equals the total measure of sze N M 1 3) Let n be the number of unts to be selected. Compute the samplng nterval I = N 1 n M 65
76 4) Generate a random number R between 0 and 1 5) Compute the samplng numbers R*I, R*I+I, R*I+2*I,..., R*I+(n-1)*I 6) For each samplng number R*I+(j-1)*I, the j th sampled unt s unt k f C k s the frst cumulatve sze bgger than the samplng number R*I+(j-1)*I n * M 7) Calculate the selecton probablty of each selected unt j: N M The followng example demonstrates how manual selecton s done. Example 3.3.1: 20 Let N=20, n=5, M = ; therefore the samplng nterval I = 801 ; let the generated random number be R = 305. The samplng numbers and the selected unt numbers are as follows: 1 j ID number Sze measure M Cumulatve C k Samplng number j th selected unt Selecton probablty The PPS samplng has the same advantages as equal probablty systematc samplng, but wth ths procedure a unt may be selected more than once f the unt s measure of sze s bgger than the samplng nterval. These large unts are sad to have been selected wth certanty, or are selfrepresentng unts. A unt selected more than once should be segmented to form a number of smaller unts correspondng to the number of tmes the unt s selected. The selecton probabltes should be recalculated usng the szes of the segmented unts. Wth ths strategy, the total sample sze s kept the same as desgned and the selecton probabltes of the non-certanty unts do not need to be adjusted. 66
77 Another way to deal wth large unts conssts of examnng the lst of unts before samplng begns. Computaton of the nterval wll reveal whether there are any unts of sze greater than I. The smplest soluton to prevent repetton durng samplng mght be to splt each such unt nto two or more approxmately equal subunts of sze less than I. The splt would be made frst on paper only. The measure of sze for the orgnal unt s dvded equally among the subunts before samplng proceeds. Later the splt s materalzed, ether by drawng a lne on the map of the unt, or by dentfyng a sutable dvdng lne durng the frst feld vst to the unt. If a substantal number of the unts chosen to serve as PSUs are larger than the nterval I, then the choce of such unts to serve as PSUs was clearly ncorrect. One soluton to ths problem s to place all PSUs wth a measure of sze larger than a threshold (not necessarly greater than or equal to I) before samplng and to gve them specal treatment, and call them self-representng unts. They are not, therefore, samplng unts but strata by defnton. A new type of samplng unt has to be desgnated to serve as PSU wthn these areas. For the purpose of samplng error computaton, t s mportant to realze that the term self-representng PSU s msleadng. The self-representng unts are n fact strata, whle the new, smaller unts or sub-unts wthn them are the true PSUs. Ths treatment requres re-calculatng the sample allocaton, and then proceedng wth sample selecton ndependently n each stratum. An Excel template for stratfed PPS or equal probablty systematc samplng has been developed. Fgure 3.8 below shows a porton of a blank template. Fgure 3.9 shows an example of stratfed PPS samplng wth the strata beng the urban and rural areas wthn each provnce. 67
78 Fgure 3.8 Part of an Excel template for stratfed samplng Stratfed systematc samplng wth probablty proportonal to sze Random (0, 1) Stratum num Stratum sze St Sample sze Stratum num Stratum sze St Sample sze Stratum num Stratum sze St Sample sze Stratum num Stratum sze St Sample sze Col name of Dom/Regon Col name of urban/rural Col name of PSU sze Total number of strata Total sample sze # of Dff PSU selected Seral numb Dom/Regon name/code Urban/ rural PSU Sze Stratum number Selecton Probablty # of tmes Selected Stratum sze Stratum sample sze Measure sze: stratum Paste the frame fle below
79 Fgure 3.9 Part of an example for a provnce crossed urban-rural stratfed PPS samplng In Fgure 3.9 above, the number of tmes n whch an EA s selected s ndcated n the column labeled # of tmes selected. Use the flter to locate the selected unts and copy them to a new fle. Fgure 3.10 below gves an example of a porton of a prepared sample fle. Ths s an example; t does not reflect any actual clusters selected for a DHS. The frst column gves the cluster number whch s assgned by the statstcan. The clusters are sorted n the orgnal order as n the samplng frame. The last sx columns are the samplng parameters calculated by the program ncludng: EA selecton probablty Selecton Proba, number of EAs by stratum Stratum sze, number of EAs selected by stratum Stratum sam-sze, total measure of sze by stratum (total number of households) Measure sze-strat, stratum number and number of tmes the unt has been selected. These are mportant samplng parameters whch must be present n a sample fle. 69
80 Fgure 3.10 Part of an example sample fle from a stratfed PPS samplng 3.4 Complex samplng procedures The samplng procedures used n DHS surveys are usually complex nvolvng mult-stage selecton, clusterng and stratfcaton, wth a combnaton of PPS samplng n the frst stage and an equal probablty systematc samplng n the second stage. Mult-stage selecton s employed due to the lack of a samplng frame at the ndvdual level; clusterng s used for mplementng effcency and stratfcaton for the reducton of samplng errors. The DHS samplng procedure has been dscussed n some detal n Secton 1.8; here we gve the basc theoretcal propertes of the estmator, the varance and varance estmaton for a two stage cluster samplng. Consder a two-stage stratfed cluster samplng, wth nh PSUs selected n stratum h n the frst stage wth PPS samplng, and for each of the selected PSUs, an equal probablty systematc sample of m SSUs s selected. Let y hj, y,... 1 hj2, y hjm be observatons from the j th PSU n stratum h. An unbased estmator of the populaton total s gven by 70
81 Yˆ hj Yˆ hj PPS =, wth Ŷ hj = π h j Phj where π Phj s the selecton probablty of the j th PSU n stratum h; M hj s the number of SSUs n the j th PSU n stratum h. The varance of ths estmator s gven by V 2 Y Y hk hj M h ( Yˆ / 2 ) = ( π π π ) + S ( 1+ ( m 1) ρ ) PPS 2 π Phk Phj Phkj h k j Phk π Phl M m ( V P ) ( V s ) The frst part ( V P ) represents the samplng varance of the selecton of a PSU, the summaton s over all strata for dfferent PSU j and k wthn the same stratum; the second part ( V S ) represents the samplng varance of the selecton of an SSU, the summaton s over all strata and PSU. Estmators for the frst part and second part are obtaned from the results n prevous sectons 2 ˆ 1 ˆ ˆ 2 π Phkπ Phj π Phkj = Y Yhj 1 f hk hj VP, Vˆ 1 2 S = sh h k j π Phkj π Phk π Phl h j π Phj m Snce the Vˆ P s not an unbased estmate of V P and t usually over estmatesv P, and that V S s usually smaller compared tov P, therefore the second part s usually dropped n the varance estmaton, ths gves an approxmate varance estmaton gven by ( ˆ PPS ) The above estmator can be smplfed as V ( ˆ ) ˆ 1 ˆ ˆ 2 π Phkπ Phj π Phkj = Y Y hk hj V Y h k j π Phkj π Phk π Phl ˆ n Secton R y PPS ( ˆ nh YPPS ) = ( 1 π Phj ) h j y π hj Phj ˆ ˆ ˆ 1 Yhj Yh VR h nh j π Phj nh whch s reduced to the Woodruff (1971) estmator f π f for all h: ( ˆ nh ) ( 1 fh ) YPPS = ˆ ˆ ˆ 1 Yhj Yh VW h nh j π Phj nh Phj h m h hw where Yˆ hj Yˆ h = s the sample estmaton of the populaton total of stratum h. j π Phj The above estmator can be expanded to estmate a mean or a rato by usng Woodruff s (1971) lnearzaton approach: let Rˆ = Yˆ PPS / Xˆ PPS, where Ŷ PPS represents the total weghted sample value for varable y, and Xˆ PPS represents the total weghted sample value for varable x or the total number of weghted cases n the group or subgroup under consderaton. The approxmate varance of Rˆ can be computed usng Woodruff s formula: Vˆ (ˆ) R = W 1 Xˆ nh n h(1 f h ) z h z h nh = 1 nh 2 h 1 PPs 2 n whch 71
82 z h ( Yˆ hj Rˆˆ X hj )/ π Phj =, and z h = Yˆ h Rˆ Xˆ h The above estmator s wdely used n commercal statstcal software such as SAS, SPSS and Stata. Repeated replcaton methods such as Bootstrap and Jackknfe (Efron, 1982; Efron 1993) can also be used to estmate the varance of Rˆ, as explaned n Secton 4.2 for estmatng samplng errors for complex demographc rates. It should be noted that the DHS survey samplng error calculaton procedure has tradtonally used the Taylor lnearzaton method (Woodruff, 1971) to calculate the samplng varance for means and ratos because the lnearzaton method s faster computatonally than the replcaton methods. 72
83 4 SURVEY ERRORS The estmates from a sample survey are affected by two types of errors: non-samplng errors and samplng errors. Non-samplng errors are the results of problems occurrng durng data collecton and data processng, such as falure to locate and ntervew the correct household, msunderstandng of the questons on the part of ether the ntervewer or the respondent, and data entry errors. Although numerous efforts are made durng the mplementaton of a DHS to mnmze ths type of error, non-samplng errors are mpossble to avod and dffcult to evaluate statstcally. Samplng errors, on the other hand, can be evaluated statstcally. The sample of respondents selected n a DHS s only one of many samples that could have been selected from the same populaton, usng the same desgn and expected sze. Each of these samples would yeld results that dffer somewhat from the results of the actual sample selected. Samplng errors are a measure of the varablty between all possble samples. Although the degree of varablty s not known exactly, t can be estmated from the survey results. Samplng errors are addressed n some detal n Secton 1.6. The followng sectons of ths chapter concentrate on non-samplng errors, ncludng the nature and the sources of errors and the strateges to control them. As mentoned n Secton 1.6, non-samplng errors are usually the man source of errors n a sample survey, and they are dffcult to evaluate statstcally after the survey s complete. Therefore t s best to mnmze ths type of error throughout the whole survey mplementaton process. 4.1 Errors of coverage and non-response A coverage error occurs when a samplng unt s mstakenly excluded from or ncluded n the survey durng survey mplementaton. Over-coverage occurs when a non-elgble or a non-sampled samplng unt s delberately or mstakenly ncluded n the sample; under-coverage occurs when a sampled elgble samplng unt s delberately or mstakenly excluded from the sample. Non-response, on the other hand, relates to a faled attempt to ntervew a sampled samplng unt. Ths secton deals wth problems n the defnton and estmaton of such error rates Coverage errors In DHS surveys, errors of over-coverage (ncluson of unts that do not belong n the sample), do not occur as often as under-coverage errors (errors due to excluson of unts that belong n the sample). A typcal source of over-coverage occurs when vacant households or non-resdental households are sampled for ntervew. Ths may occur f a household s occupancy status has changed between the tme of the household lstng and the household ntervew. Therefore, t s recommended that the tme gap between the household lstng and the man data collecton should be reasonably small. For under-coverage, several sources of error may be dentfed. The frst source of undercoverage error arses n the lstng stage when the lstng staff covers less than the desgnated area. A second source of under-coverage error occurs when an age lmt s used to determne elgblty for ndvdual ntervew, feld staff may msreport an ndvdual s age to push them out of the elgble age range. A thrd source comes when surveys collect nformaton only from de facto ndvduals (.e., those who slept n the household the nght before the survey). There may be delberate omssons of elgble ndvduals by conscously msreportng ther resdency status as non de facto, whch thereby dsqualfes an ndvdual from beng elgble for ntervew. A fourth source comes when a seres of questons n the questonnare are only asked of a certan group. For example, questons related to pregnancy, delvery and chld health are only asked for chldren born snce a partcular date there may be omssons of chldren due to ms-recordng of dates of brth as before the cutoff date or 73
84 questons regardng knowledge, atttudes and practces related to HIV are only asked f the respondent s recorded as knowng HIV or AIDS there may be omssons of respondents due to ms-recordng of ther knowledge of HIV/AIDS. All four types of coverage errors may nvolve delberate bas by feldworkers seekng to reduce ther workload. Intentonal errors can be controlled by ntensve tranng and close supervson. Errors due to an outdated area frame can be reduced by schedulng the household lstng operaton before the man survey. Errors due to age dstorton can be reduced by close supervson and routne qualty control. Errors due to resdency status can be reduced by changng the data collecton strategy to ntervew all ndvduals wthn the age range regardless of ther de facto status. For example, n DHS surveys, the ntervewers are now nstructed to ntervew all women age regardless of whether they slept n the household the nght before the survey. By requrng the ntervewng of all women, the ncentve for msreportng resdency status has been elmnated. However, the de facto character of the surveys s mantaned at the data analyss stage. Usng dfferent feldworkers to conduct the household schedule and ndvdual ntervews wll also help n elmnatng age dstorton, msreportng of resdency status and ms-recordng of dates and other key nformaton. Actve montorng of feldwork through feldwork supervson vsts and the early use of feld check tabulatons on collected data can also lmt the scope and scale of under-coverage. Coverage errors can be nvestgated after the survey feldwork by a varety of methods. The sample can be extrapolated to the total populaton, and data from the last census can be extrapolated to the survey date for comparson. Ths check should be done separately for households and ndvduals. Age dstortons can be nvestgated by studyng the dscontnuty n trends across the elgblty boundares, for example, by lookng at the rato of women age 14 wth those age 15, and those age 49 compared wth those age 50. Whle t s temptng to ntroduce comparsons wth males as a control, t should be noted that n most socetes more males are educated than females, so more precse knowledge of ther own age may reduce heapng at ages 15 and 50 among males compared wth females Delberate restrctons of coverage In many surveys, whether n developed or developng countres, certan parts of the natonal terrtory are delberately excluded from the survey for reasons of dffculty of access. Two dstnct cases arse: Excluson of clearly dentfed areas from the samplng frame n ths case, t s usual to state the coverage lmtaton n the survey report, whch then becomes a report on the remander of the country. Such exclusons are not regarded as coverage or response errors but smply as part of the defnton of the survey doman. Ad hoc exclusons decded durng or just pror to feldwork n many surveys t s not uncommon for the survey organzaton to abandon the attempt to conduct feldwork n certan sampled clusters, whether due to floods, cvl dsturbance, or other practcal constrants. Here the exclusons usually occur after sample selecton. If such excluded areas form a meanngful doman, t may be acceptable to deal wth the problem by redefnng the survey doman. More commonly, however, the excluded areas wll not form a meanngful doman and wll have to be accepted as consttutng errors. Ths type of excluson should be classfed as non-response rather than coverage error Non-response The response rate provdes nformaton on the survey coverage problems and s an mportant survey parameter. At frst sght, the concept of non-response seems smple and clear: t occurs when 74
85 a sampled unt, household or ndvdual, refuses to be ntervewed; the non-response rate s the proporton of the number of non-ntervewed unts over the number of unts selected. Takng nto account the dstncton between coverage error and non-response ndcated earler, ths can be modfed by sayng that the nformaton desred s the percentage of attempted ntervews that faled. In practce, there are two features found n some sample desgns whch complcate ths smple ssue. Frst, n many surveys the fnal unts for ntervew are dentfed through a progressve sftng process. For example, n a typcal DHS survey, survey personnel lst and select dwellngs, ntervew the household currently n the dwellng, then ntervew any women age n that household. If falure occurs at one of the earler steps, the nformaton whch would enable us to classfy the effects at the fnal level (.e., the ndvdual level) s lackng. For example, f the ntervewer cannot fnd the selected dwellng, t s not known whether t contans a woman elgble for ntervew; f the household does not contan any elgble women, then the falure has no effect on the ntervew response rate. To deal wth ths problem, take the women s survey as an example, and assume that there are only two steps n the sftng process, namely households and women. The tradton of DHS surveys s to compute the response rates for the household survey and the women s ntervew separately because of the way that sample weghts are calculated. There are sx quanttes of potental nterest n computng response rates: A. Households selected B. Households found or elgble (excludng vacant, destroyed, etc.) C. Households ntervewed D. Women selected E. Women found or elgble (all de facto women found) F. Women ntervewed Snce the survey prmarly concerns women, the relevant response rate s F/D (.e., women ntervewed dvded by women selected). However, the quantty D s unknown because of the nonrespondng households. It s of nterest to know the total number of elgble women n all selected households but, only the number the number of women found n the households ntervewed (E) s known. Therefore D must be estmated by takng the household non-response nto account. Assumng that the number of elgble women per household s the same among non-respondng households as t s among ntervewed households, the number of women selected can be estmated as: C D = E B where C / B s the effectve household response rate. The reason to use the effectve household response rate s that the non-elgble (vacant, destroyed or other) households A-B s consdered as over-coverage, assumng that same over-coverage exsts n the household lstng. These assumptons may not be very convncng, but the effect of any departure from them on the estmate of D s lkely to be very small. On ths bass the overall response rate for the women s survey, R=F/D, becomes: F F R = = D E Ths response rate s the product of the response rates observed at each of the two stages, households and women. Ths basc prncple provdes a soluton for the problem of not knowng the total number of women sampled. Where two or more steps of sftng are nvolved, the overall C B 75
86 response rate can be estmated by multplyng together the response rates observed at each step. In dong so, the assumpton s made that the response/non-response outcomes at the dfferent steps occur ndependently. DHS surveys do not allow the replacement of non-respondng households because of the potental bas whch may result from the replaced households beng easer to contact. However, when a sampled household n a selected dwellng moves away between the lstng and the ntervew, the MEASURE DHS program recommends ntervewng the new household (f any) that has moved n by the tme of the man survey. Ths s not consdered a replacement; n fact t reflects the fact that the samplng unt s defned as the dwellng structure rather than ts occupants. The desgn calls for the lstng and selecton of dwellngs, and then for the ntervew of the household found n the dwellng at the tme of the survey. Snce n many areas there s no address system, the ntal lstng operaton has to dentfy the dwellngs n terms of the names of the occupyng households, but these merely serve as addresses. The fact that, n some cases, a new household moves n between the tme of lstng and ntervew does not mean that replacement of a samplng unt has occurred. Thus, such cases do not requre any specal treatment. Moreover, just as a new household movng n does not consttute a replacement, so the case of a household movng out after the lstng wthout another movng n, creatng a vacant household, does not consttute non-response. The elgble household sample s defned as the set of households exstng at the tme of ntervewng n the dwellngs selected from the dwellng lst Response rates As seen n the prevous secton, the women s overall response rate s the product of the observed household and women s response rates, therefore, t s meanngful to calculate these two response rates separately. As we mentoned n Secton 1.13, non-response brngs bas. Therefore, the dfferent response rates reflect the data qualty. A separate response rate s useful n sample sze desgn and feld work mprovement. In order to categorze n detal the non-respondng households and ndvduals, the MEASURE DHS program standardzed the response codes to be entered on the questonnares and feld records, and expressed the formulae for response rates n terms of these codes. In DHS surveys, the followng response categores are used at the household level: 1H 2H 3H 4H 5H 6H 7H 8H 9H Completed No household member at home or no competent respondent at home Entre household absent for extended perod Postponed Refused Dwellng vacant or address not a dwellng Dwellng destroyed Dwellng not found Other Note that household above refers to the household found n the dwellng at the tme of the ntervew, not necessarly the household named at the tme of the lstng operaton. The DHS survey fnal reports provde the household response rate calculated by: 76
87 R 1H = H 1 H + 2 H + 4 H + 5 H + 8 H The reason to nclude 8H n the denomnator s that a household that s not found at the tme of the feldwork may not be a vacant household. It may be that the household was not found because of some error that occurred durng the survey mplementaton. Note also that ths response rate s dfferent from the weghted response rate calculated n Secton In Secton 1.13 the am s to calculate the samplng weght, whle here the response rate s used as a data qualty ndcator. It s also worth notng that the above calculated response rate s a net response rate. For the purpose of sample sze determnaton, one should use the gross response rate whch s the number of households ntervewed over the number selected: R 1H = HG 1 H + 2 H + 3 H + 4 H + 5 H + 6 H + 7 H + 8 H + 9 H If the net response rate s used to calculate sample sze, the survey may not obtan the desgned number of ntervews because some of sampled households wll always end up beng nonelgble, especally when there s a long tme lag between household lstng and the man feld work. At the ndvdual level the followng response categores are used: 1I 2I 3I 4I 5I 6I 7I Completed Not at home Postponed Refused Partly completed Incapactated Other The ndvdual response rate s thus: R 1I = I 1 I + 2 I + 3 I + 4 I + 5 I + 6 I + 7 I The category no elgble woman n the household s not ncluded n the lst snce t s rrelevant to the response rate, appearng nether n the numerator nor the denomnator. The same s true for non de facto women. Although an ndvdual questonnare s admnstered to non-de facto women who lve n the household to reduce under-coverage errors as mentoned n Secton 4.1.1, these ntervews are not counted n the numerator or the denomnator of the response rate because non-de facto women are not elgble accordng to the defnton of elgblty. Whenever the other code s used, the ntervewers should specfy the reason for non-response. At the household level, the analyst should revew a prntout of the other codes and recode as many as possble nto the exstng categores. Smlarly, all other codes for the ndvdual ntervew should be examned and recoded. Any questonnare n whch the household or the woman was deemed nelgble should be clearly marked as nelgble and removed from the data fle. An nelgble household may be one n a dwellng unt that does not le wthn the sample area or a neghborng household that was ntervewed ncorrectly as a replacement household. An nelgble woman may be one who was 77
88 reported as 16 years old n the household questonnare, but later turned out to be 14 (n whch case her age n the household questonnare should be corrected approprately). The overall response rate s obtaned by multplyng the household and the ndvdual level response rates: R = R h R I However, f there has been a delberate excluson of certan areas such as clusters whch were not ntervewed (see Secton 1.13 on cluster level non-response), the overall response rate must also take the cluster response rate nto account. In summary, the fnal overall estmated response rate s obtaned from the formula: R = R R R h I C where * R c = n / n s the rato of the number of clusters ntervewed over the number selected. Such response rates should be computed and publshed separately for the man geographc domans of the sample as well as the whole survey doman. If the sample s self-weghtng wthn doman but has dfferent weghts across domans, the response rates should be computed and publshed for each dfferently weghted doman. 4.2 Samplng errors We ntroduced the concept of samplng errors n Secton 1.6 for sample sze determnaton. In ths secton, we focus on the calculaton of the samplng errors. Samplng errors are usually reported for selected ndcators n Appendx B of the DHS fnal report. A samplng error s usually measured n terms of the standard error for a partcular statstc (mean, percentage, etc.), whch s the square root of the varance. The standard error can be used to calculate confdence ntervals wthn whch the true value for the populaton can reasonably be assumed to fall. For example, for any gven statstc calculated from a sample survey, the value of that statstc wll fall wthn a range of plus or mnus two tmes the standard error (DHS reports +/-2*SE nstead of +/-1.96*SE as 95% confdence nterval as explaned n secton 1.6.1) of that statstc n 95 percent of all possble samples of dentcal sze and desgn. If the sample of respondents were selected as a smple random sample, t would have been possble to use straghtforward formulae to calculate samplng errors. However, DHS survey samples are the result of a mult-stage stratfed desgn, so t s necessary to use more complex formulae. There s a varety of computer software whch can be used to calculate samplng errors, such as the Integrated System for Survey Analyss (ISSA) samplng errors module and the ICF developed SAS macro as well as software such as Wesvar, Cenvar, and Sudaan. These software use the Taylor Lnearzaton Method (Woodruff, 1971) of varance estmaton for survey estmates that are means or proportons. Ths same method s wdely used n commercalzed statstcal software such as SAS, SPSS and STATA. The Jackknfe Repeated Replcaton Method (Efron, 1982, 1993) s used for varance estmaton of more complex statstcs such as fertlty and mortalty rates. The Taylor Lnearzaton Method treats any percentage or average as a rato estmate, r = y/x, where y represents the total weghted sample value for varable y, and x represents the total weghted sample value for varable x or the total number of weghted cases n the group or subgroup under consderaton. The varance of r s computed usng the formula gven below, wth the standard error beng the square root of the varance: 78
89 2 1 SE ( r) = var( r) = x 2 H n h(1 fh) h zhj 2 h= 1 nh 1 j nh z n whch z h = y rx, and z h = y h rxh h h where h represents the samplng stratum whch vares from 1 to H, n h s the total number of clusters selected n the h th stratum, y hj s the sum of weghted values of varable y n the j th cluster n the h th stratum, x hj s the sum of weghted values of varable x n the j th cluster n the h th stratum, f h s the samplng fracton n stratum h, t can be gnored when t s small x s the sum of weghted values of varable x over the total sample The Jackknfe Repeated Replcaton Method derves estmates of complex rates from each of several replcatons of the parent sample, and calculates standard errors for these estmates usng smple formulae. Each replcaton consders all but one cluster n the calculaton of the estmates. Pseudo-ndependent replcatons are thus created. The varance of a rate r s calculated as follows: n whch SE 2 ( r ) = Var( r ) = k r = kr 1 k ( ) 2 ( 1) r r k = 1 ( k 1) r( ) where r s the estmate computed from the full sample of k clusters, r () s the estmate computed from the reduced sample of k-1 clusters (wth th cluster excluded), and k s the total number of clusters. In addton to the standard error, the procedure computes the desgn effect (DEFT) for estmates whch are means, proportons or ratos. For complex demographc rates, the procedure computes an approxmaton of DEFT. DEFT s defned as the rato between the standard error usng the gven sample desgn and the standard error that would result f a smple random sample had been used. A DEFT value of 1.0 ndcates that the sample desgn s as effcent as a smple random sample, whle a value greater than 1.0 ndcates the ncrease n the samplng error due to the use of a more complex and less statstcally effcent desgn. The procedure also computes the relatve error and confdence lmts for the estmates. Samplng errors are usually reported for the total sample, for the urban and rural areas, and for each of the survey domans. 79
90 5 SAMPLE DOCUMENTATION 5.1 Introducton Sample documentaton s an mportant part of a DHS survey. The documentaton should nclude all useful nformaton for data analyss, for data qualty assessment, for sample desgn of subsequent surveys, and for data users. Basc sample documentaton should be ncluded n DHS survey fnal reports. Good sample documentaton should nclude the followng aspects from dfferent stages of the survey mplementaton: 1) Target populaton 2) Expected sample sze 3) Man ndcators 4) Report domans 5) Samplng frame 6) Prmary and the secondary samplng unts 7) Stratfcaton 8) Sample allocaton 9) Samplng procedure 10) Selecton probablty 11) Household lstng results 12) Samplng weghts 13) Results of survey mplementaton 14) Samplng errors Ponts 1 to 10 and pont 12 are usually addressed n a Sample Desgn Document from the very begnnng of the survey. For pont 11, the number of households lsted, the number of households selected, and segmentaton nformaton for each of the selected clusters should be provded. A full descrpton sample desgn should be ncluded n Appendx A of the DHS fnal reports. For pont 13, the number of elgble samplng unts selected, the number ntervewed and the household and ndvdual response rates should be presented. Samplng errors (pont 14) are presented n Appendx B of DHS fnal reports for selected ndcators. 5.2 Sample desgn document A sample desgn document s an mportant document whch records the purpose of the survey, the target populaton, the source of the samplng frame, the statstcal methodology, the sample sze and the sample allocaton, and other related topcs. Ths secton gves an example of a sample desgn document to show the detals whch should be ncluded n a sample desgn document Introducton The Country Demographc and Health Survey 2012 (XDHS 2012) wll be the fourth DHS followng those mplemented n 1995, 2000 and A natonally representatve sample of 18,450 households wll be selected. All women who are usual resdents of a selected household or who slept n a selected household the nght before the survey are elgble for the survey. The survey wll result n about 17,900 ntervews of women As wth the pror surveys, the man objectves of the XDHS 2012 survey are to provde up-to-date nformaton on fertlty and chldhood mortalty levels; fertlty preferences; awareness, approval and use of famly plannng methods; maternal and chld health; knowledge and atttudes toward HIV/AIDS and other sexually transmtted nfectons (STI). 80
91 Apart from the women s survey, a men s survey wll also be conducted at the same tme n a sub-sample consstng of one household n every three selected for the women s survey. All men who are usual resdents of a selected household or who slept n a selected household the nght before the survey are elgble for the men s survey. The survey wll collect nformaton on ther basc demographc and socal status; on ther knowledge and use of famly plannng methods; and on ther knowledge and atttudes toward HIV/AIDS and other sexually transmtted nfectons. The survey wll result n about 5,000 ntervews of men In ths sub-sample, all women 15-49, all chldren under 5 years of age wll be weghed, measured and tested for anema n order to study ther nutrtonal status. The survey s desgned to produce representatve estmates for most of the ndcators for the country as a whole, for the urban and the rural areas separately, for the captal cty of the country, and for each of the ten geographcal regons Samplng frame The samplng frame used for XDHS 2010 s the Country Populaton and Housng Census conducted n 2006 (XPHC 2006), provded by the Central Statstcal Offce (CSO). CSO has made avalable an electronc fle consstng of 81,654 Enumeraton Areas (EAs) created for the 2006 census n 9 of ts 10 regons. An EA s a geographc area consstng of a convenent number of dwellng unts whch served as a countng unt for the census. The frame fle contans nformaton about the locaton, the type of resdence and the number of resdental households for each of the 81,654 EAs. Sketch maps are also avalable for each EA whch delneate the geographc boundares of the EA. It should be ponted out that ths fle does not nclude Regon 10 because the census conducted n Regon 10 used a dfferent methodology due to dffculty of access. Therefore, the samplng frame for Regon 10 s n a dfferent fle and uses a dfferent format. It s also worth notng that the samplng frame excluded some specal EAs whch have dsputed boundares; ths knd of EA represents only 0.1% of the total populaton. The census cartographc work for Regon 10 was conducted usng two dfferent methods. In two of ts sx dstrcts, namely, Dstrcts 2 and 4, tradtonal cartographcal work smlar to the other regons of the country was carred out, whle n the other four dstrcts, the cartographc work was carred out by usng satellte photos wthout physcal vsts of the area. The census data could not be used to update the cartographc work n Regon 10 because of codng problems. So n Regon 10, a samplng frame wth a smlar format as n the other regons s avalable only for the three zones where a tradtonal cartographc work had been carred out. However, the number of households n the samplng frame for these three zones s based on the number of households estmated durng the cartographc work precedng the census and not the actual number of households counted n the census. Due to securty concerns, as n the XDHS 2000 and XDHS 2005, t has been decded that the XDHS 2012 wll be conducted only n these two dstrcts. These two dstrcts together have 1,246 EAs, and they represent 53% of the regonal total populaton. Takng nto account the specal EAs whch are excluded from the census frame, the samplng frame used for the XDHS 2012 covered 98.4% of the country s total populaton. Country s dvded nto 10 geographcal regons; each regon s sub-dvded nto dstrcts, and each dstrcts nto wards. Table 5.1 shows the dstrbuton of the EAs and the mean number of households per EA by regon and by type of resdence. The samplng frame ncludes 82,900 EAs, among them 17,346 are n urban areas and 65,554 are n rural areas. The average sze of an EA n terms of number of households s 170 n an urban EA and 182 n a rural EA, for an overall average sze of 180 households per EA. Table 5.2 shows the dstrbutons of households by regon and by type of resdence. The dstrbuton s a very skewed dstrbuton snce 83.4% of the country s households are concentrated n 3 regons, namely, Regon 3, Regon 4 and Regon 6; whle the fve small regons 81
92 Regon 2, Regon 5, Regon 7, Regon 8 and Regon 9 together represent only 3.8% of the country s total households. Table 5.1 Dstrbuton of EAs and average sze of EA by regon and by type of resdence Regon Number of EA Average EA sze Urban Rural Total Urban Rural Total Regon 1 1,541 4,139 5, Regon , Regon 3 3,391 18,016 21, Regon 4 5,030 25,800 30, Regon Regon 6 2,124 14,490 16, Regon Regon Captal Cty 3,865 3, Regon Regon 10* , Country 17,346 65,554 82, Source: XPHC 2006; Regon 10 has only two dstrcts ncluded. Table 5.2 Dstrbuton of households by regon and by type of resdence Regon Number of households % Urban Rural Total Urban % of Country Regon 1 235, , , Regon 2 45, , , Regon 3 619,796 3,284,512 3,904, Regon 4 864,303 4,630,702 5,495, Regon 5 26, , , Regon 6 353,554 2,667,787 3,021, Regon 7 19,275 44,879 64, Regon 8 27,975 17,651 45, Captal Cty 646, , Regon 9 51,991 21,643 73, Regon 10* 49, , , Country 2,940,708 11,959,453 14,900, Source: XPHC 2006; Regon 10 has only two dstrcts ncluded Structure of the sample and the samplng procedure The sample for the XDHS 2012 wll be a stratfed sample selected n two stages from the 2006 census frame. Stratfcaton was acheved by separatng each regon nto urban and rural areas. In total, 19 samplng strata have been created snce the regon of Captal has only urban areas. Samples wll be selected ndependently n each samplng stratum, by two-stage selecton. Implct stratfcaton and proportonal allocaton s acheved at each of the lower admnstratve levels by sortng the 82
93 samplng frame accordng to admnstratve unts n dfferent levels and by usng a probablty proportonal to sze selecton at the frst stage of samplng. In the frst stage, 615 EAs have been selected wth probablty proportonal to EA sze and wth ndependent selecton n each samplng stratum wth the sample allocaton gven n table 5.3 below. Takng nto account the tme passed snce the last populaton census, a household lstng operaton wll be carred out n all of the selected EAs before the man survey. The household lstng operaton conssts of vstng each of the 615 selected EAs; drawng a locaton map and a detaled sketch map; and recordng on the household lstng forms all resdental households found n the EA wth the address and the name of the head of the household. The resultng lst of households wll serve as the samplng frame for the selecton of households n the second stage. Some of the selected EAs may be found to be large n sze n the household lstng operaton. In order to mnmze the task of household lstng, the selected EAs contanng an estmated number of households greater than 300 wll be segmented. Only one segment wll be selected for the survey wth probablty proportonal to the segment sze. The methodology and the detaled household lstng procedure are addressed n the Household Lstng Manual (see Chapter 2). At the second stage, a fxed number of 30 households wll be selected from each EA. Table 5.3 shows the sample dstrbuton of clusters and households by regon and by type of resdence. Among the 615 EAs selected, 185 are n urban areas and 430 are n rural areas. The total number of households to be selected s 18,450; among them, 5,550 wll be n urban areas and 12,900 wll be n rural areas. In the samplng frame, the household dstrbuton by regon vares from 0.3 percent for Regon 8, to 36.9 percent for Regon 4 (see Table 5.2 n Secton 5.2.2). To allocate the approxmately 17,900 women ntervews to dfferent regons, a proportonal allocaton wll provde the best precson for natonal level ndctors, but not for regonal level ndcators. The small regons such as Regon 7, Regon 8 and Regon 9 would receve a sample sze whch s too small to acheve the degree of precson desred for regonal level estmates. In order for the precson of estmates to be acceptable across regons, experence shows that a mnmum of 800 women s ntervews are needed so that relable estmatons for most of the DHS ndcators can be obtaned. The fnal sample allocaton reflects a power allocaton whch s between the proportonal allocaton and the equal sze allocaton. So that the survey precson n the urban areas s comparable wth the rural areas, urban areas are slghtly over-sampled. The allocatons of clusters and households by regon and by type of resdence are functons of the estmated average number of women age per household and the household and ndvdual response rates. Estmates for these parameters are obtaned from the XDHS 2005 survey. Accordng to the results of XDHS 2005, the average number of women age per household s 1.20 n urban areas and 1.00 n rural areas. The number of men age per household s 1.05 n urban areas and 0.95 n rural areas. The household response rates are 92 percent n urban areas and 94 percent n rural areas; the women s response rates are 94 percent and 96 percent n the urban and rural areas, respectvely; the men s response rates are 85 percent and 90 percent n the urban and rural areas, respectvely. 83
94 Table 5.3 Sample allocaton of clusters and households by regon and by type of resdence Allocaton of clusters Allocaton of households Regon Urban Rural Regon Urban Rural Regon Regon ,410 1,800 Regon ,140 1,440 Regon ,860 2,160 Regon ,860 2,250 Regon ,260 1,440 Regon ,950 2,160 Regon ,110 1,380 Regon ,260 Captal Cty 54 na 54 1,620 na 1,620 Regon ,260 Regon ,350 1,680 Country ,550 12,900 18,450 Table 5.4 Expected number of ntervews by regon and by type of resdence Statstcal Regon Women ntervewed Men ntervewed Urban Rural Regon Urban Rural Regon Regon ,280 1, Regon ,035 1, Regon ,689 2, Regon ,689 2, Regon ,144 1, Regon ,771 2, Regon ,008 1, Regon , Captal Cty 1,800 na 1, na 408 Regon , Regon ,226 1, Country 6,168 11,714 17,882 1,400 3,275 4,676 Men s survey wll be carred out n one household n every three selected for women s survey Selecton probablty and samplng weght Due to the non-proportonal allocaton of the sample to the dfferent regons and to ther urban and rural areas, samplng weghts wll be requred for any analyss usng XDHS 2012 data to ensure the survey results are representatve at natonal and regonal levels. Snce the XDHS 2012 sample s a two-stage stratfed cluster sample, samplng weghts wll be calculated based on the separate samplng probabltes for each samplng stage and for each cluster. We use the followng notatons: 84
95 P 1h : P 2h : frst-stage samplng probablty of the th cluster n stratum h second-stage samplng probablty wthn the th cluster (household selecton) Let n h be the number of clusters selected n stratum h, M h the number of households accordng to the samplng frame n the th cluster, and the total number of households n the stratum. The probablty of selectng the th cluster n the XDHS 2012 sample s calculated as follows: P 1h nh M = M A dfferent formula must be used to calculate the probablty of selectng a cluster that has been segmented. Let b be the proporton of households n the selected segment compared to the h total number of households n the EA n stratum h f the EA s segmented, otherwse b = 1. Then the probablty of selectng cluster n the sample s: n M M h h h h P 1h= h Let L h be the number of households lsted n the household lstng operaton n cluster n stratum h, let t h be the number of households selected n the cluster. The second stage selecton probablty for each household n the cluster s calculated as follows: t P 2h = L h h b Mh h h The overall selecton probablty of each household n cluster of stratum h s therefore the producton of the two selecton probabltes: Ph = P1 h P2 h The desgn weght for each household n cluster of stratum h s the nverse of ts overall selecton probablty: W h = 1 / P h A spreadsheet contanng all samplng parameters and selecton probabltes s prepared to facltate the calculaton of samplng weghts. Samplng weghts wll be adjusted for household nonresponse as well as for ndvdual non-response, for the women s and men s surveys respectvely. The dfferences between the household weghts and the ndvdual weghts are ntroduced by ndvdual non-response. The fnal weghts are normalzed so that the total number of unweghted cases wll equal the total number of weghted cases at the natonal level, for both household weghts and ndvdual weghts. 5.3 Sample fle A sample fle ncludng all samplng parameters s very mportant for survey management and for samplng weght calculaton. Once the sample ponts are selected, an Excel fle should be prepared whch should nclude the cluster number and cluster ID nformaton, and all samplng parameters such as the doman, stratum and EA selecton probablty. The cluster number s a unque seral number 85
96 from 1 to the total number of clusters selected. It s mportant for communcaton and for feld work supervson. The cluster number s the offcal cluster ID once assgned. It s also useful to nclude n the sample fle the EA sze, the total sze of the stratum, the number of EAs n the stratum and the number of EAs selected n the stratum. These peces of nformaton allow for reconstructon of the selecton probablty, f needed, for example, for checkng purposes and for replacement clusters. If a selected cluster s not accessble due to securty problems and a replacement cluster s selected, then from the samplng parameters t s easy to calculate the selecton probablty for the replacement cluster. Table 5.5 below shows a part of an example sample fle. The columns wth the lghter colored headngs represent the samplng nformaton provded by the samplng statstcan. The columns wth the darker colored headngs represent the EA dentfcaton nformaton from the samplng frame. Ths fle should be updated after the household lstng operaton by addng the number of households lsted, the segmentaton nformaton, and the number of households selected. These 3 peces of nformaton are necessary for developng the desgn weght for each cluster. 86
97 Table 5.5 An example sample fle 87
98 5.4 Results of Survey mplementaton Once the feld work for the survey has been completed, and the data entry s fnshed, some tables for the results of the survey mplementaton should be produced to evaluate the survey coverage and the departures from the survey desgn. These tables typcally nclude a summary table and ndvdual tables for the household, women s and men s surveys, respectvely. A summary table s usually presented n Chapter 1 of the DHS fnal report, ncludng the number of clusters selected and ntervewed, the number of households selected and ntervewed, the number of women selected and ntervewed, and the number of men selected and ntervewed. The detaled tables for the household, women s and men s surveys are usually present n Appendx A of the DHS fnal report along wth the sample desgn document. These tables both reflect the survey coverage and the data qualty and provde varous response rates and the number of elgble ndvduals per household, whch are useful nformaton for the sample desgn for subsequent surveys. The followng tables are example tables that should be ncluded n the fnal report. Table 5.6 Example table for the results of survey mplementaton 88
99 Table 5.7 Example appendx table for the results of the women s survey mplementaton 89
100 Table 5.8 Example appendx table for the results of the men s survey mplementaton 5.5 Samplng errors Samplng errors are mportant data qualty parameters whch gve a measure of the precson of the survey estmates. The DHS survey fnal reports present samplng errors n Appendx B for selected ndcators. The samplng error tables present the estmated ndcator value, the standard error, the number of unweghted and weghted cases, the desgn effect, the relatve standard error and the confdence lmts. The desgn effect can be used n sample sze calculaton for subsequent survey desgns. Secton 4.2 deals wth the detals of the calculaton of samplng errors; here we gve an example of the natonal level samplng error table. 90
101 Table 5.9 Example table for samplng errors 5.6 Samplng parameters n DHS data fles Some mportant samplng parameters should be ncluded n the DHS fnal data set, such as doman, stratum, EA selecton probablty, and samplng weghts. DHS survey fnal data fles usually present geographc dentfers only down to doman or regon level; dstrct level dentfers are usually not presented due to confdentalty constrants. As for the samplng stratum dentfer, DHS fnal data fles should provde the true samplng stratum, whch s mportant for many statstcal analyses such as the samplng error calculaton. However, n case of small strata havng only a few clusters selected, 91
102 confdentalty constrants do not allow DHS data fles to present the true samplng stratum dentfer. In these cases, a hgher level stratfcaton dentfer s ncluded nstead, whch should be close to the true stratfcaton and wll not ntroduce substantal bas. The standard samplng parameters ncluded n the DHS Recode data fles nclude: 1) Cluster ndcator varable 2) Stratfcaton varable 3) Samplng weght varables 4) Survey doman varables 5) Frst level geographcal/admnstratve unt varable (regon or provnce or department, etc.) 92
103 Glossary of terms Analyss doman Base map Cluster Collectve lvng quarters Confdence nterval Degrees of freedom Desgn doman Desgn Effect (Deft) Desgn weght Desred precson Dwellng unt A sub-populaton whch cannot be dentfed n the samplng frame, such as domans specfed by ndvdual characterstcs. See also Desgn doman. A reference map that descrbes the geographc locaton and boundares of an EA. The smallest geographc survey statstcal unt for DHS surveys. It conssts of a number of adjacent households n a geographc area. For DHS surveys, a cluster corresponds ether to an EA or a segment of a large EA. Lvng quarters such as army camps, boardng schools, or prsons where persons lve ndvdually. Collectve lvng quarters are not consdered as ordnary households and are excluded from DHS samples. A range wthn whch the true value of an estmate lkely les. Usually reported as, wth 95% confdence, the true value of Y wll le wthn the range of y 1.96 * SE( y ) and y * SE( y ). Typcally, DHS reports use y ± 2 * SE( y ) for a conservatve estmate of 95% confdence lmts. The number of ndependent unts of nformaton n a sample relevant to the estmaton of a parameter or calculaton of a statstc. A sub-populaton whch can be dentfed n the samplng frame and therefore can be handled ndependently n the sample sze and samplng procedures, usually consstng of geographc areas or admnstratve unts. See also Analyss doman. A measure of effcency of a complex samplng procedure compared to smple random samplng, defned as the rato between the standard error usng the gven sample desgn and the standard error that would result f a smple random sample had been used. The nverse of the overall probablty wth whch a samplng unt (household or ndvdual) was selected n the sample. See also Samplng weght. The level of accuracy of the results desred, often expressed as Relatve standard error or coeffcent of varaton. A room or a group of rooms normally ntended as a resdence for one household (for example: a sngle house, an apartment, a group of rooms n a house); a dwellng unt can have more than one household. 93
104 Enumeraton Area (EA) Explct stratfcaton Gross response rate Head of household Household Household lstng Household selecton A geographc statstcal unt whch s created as a countng unt for a census and contans a certan number of households. The actual dvson of the samplng unts nto specfed parts known as strata. See also Implct stratfcaton. The number of households or ndvduals ntervewed over the number selected. A person who s acknowledged as such by members of the household and who s usually responsble for the upkeep and mantenance of the household. A person or a group of related or unrelated persons, who lve together n the same dwellng unt, who acknowledge one adult male or female 15 years old or older as the head of the household, who share the same housekeepng arrangements, and are consdered as one unt. A complete lstng of dwellng unts/households n the selected EAs prepared pror to the selecton of households. Random selecton of the households from the household lstng, typcally by systematc selecton. Implct stratfcaton The systematc samplng or probablty proportonal to sze samplng of samplng unts from an ordered lst to acheve the effect of Stratfcaton. See also Explct stratfcaton. Item non-response Locaton map Master sample Measure of sze Non-samplng errors A samplng unt does not provde an answer for a specfc queston. See also Unt non-response. A map produced n the household lstng operaton whch ndcates the man access to a cluster, ncludng man roads and man landmarks n the cluster. A random sample of large sze drawn from the census frame and prepared for use n a number of surveys, from whch sub-samples can be selected for specfc surveys. A measurement reflectng the sze of the samplng unt, typcally the number of households or the total populaton of the samplng unt, avalable for each and every prmary samplng unt n the country. Non-samplng errors result from problems durng data collecton and data processng, such as falure to locate and ntervew the correct household, msunderstandng of the questons on the part of ether the ntervewer or the respondent, and data entry errors. 94
105 Normalzed standard weghts Prmary Samplng Unt (PSU) Probablty sample Relatve standard error (RSE) Sample take Samplng errors Samplng frame Samplng unt Samplng weght Secondary Samplng Unt (SSU) Self-weghtng sample Smple random sample (SRS) Sketch map SRSWOR Samplng weght normalzed by a constant factor such that the unweghted number of cases s the same as the weghted number of cases at the natonal level. Normalzed standard weghts are calculated for total households, total women and total men. The samplng unt for the frst stage of selecton n a mult-stage samplng procedure; n DHS, typcally an EA or a segment of an EA. A sample n whch the unts are selected randomly wth known and nonzero probabltes. The amount of samplng error relatve to the ndcator level, ndependent of the scale of the ndcator, calculated by dvdng the standard error by the estmated value of the ndcator The number of households or ndvduals to be ntervewed per sample cluster. Samplng errors are the representatve errors due to samplng of a small number of elgble unts from the target populaton nstead of ncludng every elgble unt n the survey. A complete lst of all samplng unts that entrely covers the target populaton. The unt of selecton at each stage of the samplng process. In a typcal DHS wth two-stage cluster samplng, the samplng unt at the frst stage (Prmary samplng unt) would be the EA, and the samplng unt at the second stage (Secondary samplng unt) would be the household. The desgn weght corrected for non-response or other calbratons. The samplng unt for the second stage of selecton; n a typcal DHS two-stage sample ths s a household. A sample of ndvduals n whch each ndvdual has the same probablty of beng selected, and therefore a constant samplng weght s used. Also known as an equal probablty sample. A random selecton of ndvduals or households drawn drectly from the target populaton wth each ndvdual or household havng equal probablty of beng selected. A map produced n the household lstng operaton, wth locaton of all structures found n the lstng operaton whch helps the ntervewer locate the selected households. A sketch map also contans the cluster dentfcaton nformaton, locaton nformaton, access nformaton, and prncpal physcal features and landmarks such as mountans, rvers, roads and electrc poles. Smple random sample wthout replacement. 95
106 Standard error (SE) Stratfcaton Structure Student s t dstrbuton Survey doman/study doman Systematc selecton (SYS) Target populaton Two-stage cluster samplng Unformly dstrbuted random number Unt non-response Varance Weght The standard devaton of the samplng dstrbuton of a statstc, or representatve error due to samplng. See also Samplng errors. The process by whch the survey populaton s dvded nto subgroups or strata that are as homogeneous as possble based on certan crtera. The prncpal objectve of stratfcaton s to reduce samplng errors. A free-standng buldng or other constructon that can have one or more unts for resdental or commercal use. Resdental structures can have one or more dwellng unts (for example: sngle house, apartment structure). A famly of contnuous probablty dstrbutons that arses when estmatng the mean of a normally dstrbuted populaton n stuatons where the sample sze s small and populaton standard devaton s unknown. A sub-populaton for whch separate estmaton of the man ndcators s requred. Selecton of unts startng from a random pont and selectng every n th unt. The populaton of nterest n the survey, typcally, n DHS, women age and chldren under fve years of age lvng n resdental households. Most surveys also nclude men age At the frst stage, a stratfed sample of EAs s selected n each stratum, typcally n DHS wth probablty proportonal to sze (PPS). At the second stage, a fxed (or varable) number of households s selected typcally n DHS by equal probablty systematc samplng. A random number whch comes from a unform dstrbuton, that s, all possble values n the nterval wthn whch the random number s selected have equal probablty of selecton. A samplng unt (cluster, household, ndvdual) s not ntervewed at all. See also Item non-response. A measure of how far a set of numbers s spread out around ther mean. An nflaton factor whch extrapolates the sample to the target populaton. See also Desgn weght and Samplng weght. 96
107 References Alaga, A. & Ren, R (2006). Optmal sample szes for two-stage cluster samplng n Demographc and Health Surveys. DHS workng papers No. 30. Banker, M. D. (1998). Power allocatons: determnng sample sze for sub-natonal areas. The Amercan Statstcan, Vol. 42, PP Cochran, W. G. (1977). Samplng technques. John Wley & Sons, New York Devlle, J.-C. & Särndal, C.-E. (1992). Calbraton Estmators n Survey Samplng, JASA, Vol. 87, No. 418, pp Devlle, J.-C., Särndal, C.-E. & Sautory, O. (1993). Generalzed Rakng Procedures n Survey Samplng, JASA, Vol. 88, No. 423, pp Dupont, F. (1994). Calbraton Used as a Nonresponse Adjustment, IN: Dday, E. (ed.) New Approaches n Classfcaton and Data Analyss, Sprnger Verlag, pp Efron, B. & Tbshran, R. J. (1993). An ntroducton to the Bootstrap. Chapman & Hall. Hartley, H. O. & Rao, J. N. K. (1962). Samplng wth unequal probabltes and wthout replacement. Annals of Mathematcal Statstcs, Vol. 33, pp Ksh, L. (1965). Survey Samplng. John Wley & Sons, New York. Lundström, S. & Särndal, C.-E. (1999). Calbraton as a Standard Method for Treatment of nonresponse, JOS, Vol. 15, No. 2, pp Macro Internatonal Inc. (1996). Samplng Manual. DHS-III Basc Documentaton No. 6. Calverton, Maryland. Neuwenbroek, N., Renssen R., Slootbeek, G. & Veugen, T. (1997). A General Weghtng Package Includng Estmates for Populaton Totals and Correspondng Varances: Extended Verson, CBS Research Paper, No Platek, R. & Särndal, C.-E. (2001). Can a Statstcan Delver? JOS, Vol. 17, pp Ren. R (2003). Théores des sondages. Lecture notes, ENSAI, France Sautory, O. (1993). La macro SAS CALMAR: Redressement d un Echantllon par Calage sur Marges, Document de traval de la Drecton des Statstques Démographques et Socales, no. F9310, INSEE. Sknner, C. (1999). Calbraton Weghtng and Non-Samplng Errors, Research n Offcal Statstcs, No. 1, pp Smth, T.M.F. (1990). Comment on Rao and Bellhouse: Foundatons of survey based estmaton and analyss. Survey Methodology, vol. 20, pp
108 Tllé, Y. (2001). Théores des sondages. Dunod, Pars. Wolter, K. M. (1984). An nvestgaton of some estmators of varance for systematc samplng. JASA, Vol. 79, pp Wolter, K. M. (1985). Introducton to varance estmaton. Sprnger-Verlag, New York Woodruff, R. S. (1971). A smple method for approxmatng the varance of a complcated estmate. JASA, Vol. 66, pp Yates, F. & Grundy, P. M. (1953). Selecton Wthout Replacement from Wthn Strata wth Probablty Proportonal to Sze. Journal of the Royal Statstcal Socety. Seres B (Methodologcal) Vol. 15, No. 2 (1953), pp Blackwell Publshng. 98
109
Calculation of Sampling Weights
Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample
Overview of monitoring and evaluation
540 Toolkt to Combat Traffckng n Persons Tool 10.1 Overvew of montorng and evaluaton Overvew Ths tool brefly descrbes both montorng and evaluaton, and the dstncton between the two. What s montorng? Montorng
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
An Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
DEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001
Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 5-9, 2001 LIST-ASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.
Can Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )
February 17, 2011 Andrew J. Hatnay [email protected] Dear Sr/Madam: Re: Re: Hollnger Canadan Publshng Holdngs Co. ( HCPH ) proceedng under the Companes Credtors Arrangement Act ( CCAA ) Update on CCAA Proceedngs
To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.
Corporate Polces & Procedures Human Resources - Document CPP216 Leave Management Frst Produced: Current Verson: Past Revsons: Revew Cycle: Apples From: 09/09/09 26/10/12 09/09/09 3 years Immedately Authorsaton:
Traffic-light a stress test for life insurance provisions
MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL
Multiple-Period Attribution: Residuals and Compounding
Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens
The OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
Instructions for Analyzing Data from CAHPS Surveys:
Instructons for Analyzng Data from CAHPS Surveys: Usng the CAHPS Analyss Program Verson 4.1 Purpose of ths Document...1 The CAHPS Analyss Program...1 Computng Requrements...1 Pre-Analyss Decsons...2 What
IMPACT ANALYSIS OF A CELLULAR PHONE
4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng
Brigid Mullany, Ph.D University of North Carolina, Charlotte
Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte
Project Networks With Mixed-Time Constraints
Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,
A 'Virtual Population' Approach To Small Area Estimation
A 'Vrtual Populaton' Approach To Small Area Estmaton Mchael P. Battagla 1, Martn R. Frankel 2, Machell Town 3 and Lna S. Balluz 3 1 Abt Assocates Inc., Cambrdge MA 02138 2 Baruch College, CUNY, New York
Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno
Data Mnng from the Informaton Systems: Performance Indcators at Masaryk Unversty n Brno Mkuláš Bek EUA Workshop Strasbourg, 1-2 December 2006 1 Locaton of Brno Brno EUA Workshop Strasbourg, 1-2 December
LAW ENFORCEMENT TRAINING TOOLS. Training tools for law enforcement officials and the judiciary
chapter 5 Law enforcement and prosecuton 261 LAW ENFORCEMENT TRAINING TOOLS Tool 5.20 Tranng tools for law enforcement offcals and the judcary Overvew Ths tool recommends resources for tranng law enforcement
How To Calculate The Accountng Perod Of Nequalty
Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.
Protection, assistance and human rights. Recommended Principles and Guidelines on Human Rights and Human Trafficking (E/2002/68/Add.
chapter 8 Vctm assstance 385 Tool 8.3 Protecton, assstance and human rghts Overvew Ths tool dscusses the human rghts consderatons whch must be borne n mnd n protectng and assstng vctms of traffckng. Recommended
CHAPTER 14 MORE ABOUT REGRESSION
CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp
Capacity-building and training
92 Toolkt to Combat Traffckng n Persons Tool 2.14 Capacty-buldng and tranng Overvew Ths tool provdes references to tranng programmes and materals. For more tranng materals, refer also to Tool 9.18. Capacty-buldng
The Current Employment Statistics (CES) survey,
Busness Brths and Deaths Impact of busness brths and deaths n the payroll survey The CES probablty-based sample redesgn accounts for most busness brth employment through the mputaton of busness deaths,
Computer-assisted Auditing for High- Volume Medical Coding
Computer-asssted Audtng for Hgh-Volume Medcal Codng Computer-asssted Audtng for Hgh- Volume Medcal Codng by Danel T. Henze, PhD; Peter Feller, MS; Jerry McCorkle, BA; and Mark Morsch, MS Abstract The volume
Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008
Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn
Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence
1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh
Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University
Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence
Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall
SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent
Assessment of the legal framework
46 Toolkt to Combat Traffckng n Persons Tool 2.4 Assessment of the legal framework Overvew Ths tool offers gudelnes and resources for assessng a natonal legal framework. See also Tool 3.2 on crmnalzaton
How To Understand The Results Of The German Meris Cloud And Water Vapour Product
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
What is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000
Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from
1. Measuring association using correlation and regression
How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a
Load Settlement System. Procedures and Methods
Procedures and Methods Effectve January 01 TABLE OF CONTENTS 1. NTRODUCTON... 1. STE TO SETTLEMENT ZONE MAPPNG.... LOAD PROFLNG... LOAD RESEARCH SAMPLES... SAMPLNG ACCURACY REQUREMENTS... HSTORC CLASS
On the Optimal Control of a Cascade of Hydro-Electric Power Stations
On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE
AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent
Enterprise Master Patient Index
Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an
SIMPLE LINEAR CORRELATION
SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.
Statistical algorithms in Review Manager 5
Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes
Reporting Forms ARF 113.0A, ARF 113.0B, ARF 113.0C and ARF 113.0D FIRB Corporate (including SME Corporate), Sovereign and Bank Instruction Guide
Reportng Forms ARF 113.0A, ARF 113.0B, ARF 113.0C and ARF 113.0D FIRB Corporate (ncludng SME Corporate), Soveregn and Bank Instructon Gude Ths nstructon gude s desgned to assst n the completon of the FIRB
1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.
HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher
Study on Model of Risks Assessment of Standard Operation in Rural Power Network
Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,
Section 5.4 Annuities, Present Value, and Amortization
Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today
A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña
Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION
Software project management with GAs
Informaton Scences 177 (27) 238 241 www.elsever.com/locate/ns Software project management wth GAs Enrque Alba *, J. Francsco Chcano Unversty of Málaga, Grupo GISUM, Departamento de Lenguajes y Cencas de
Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts
Power-of-wo Polces for Sngle- Warehouse Mult-Retaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)
One Click.. Ȯne Location.. Ȯne Portal...
New Addton to your NJ-HITEC Membershp! Member Portal Detals & Features Insde! One Clck.. Ȯne Locaton.. Ȯne Portal... Connect...Share...Smplfy Health IT Member Portal Benefts Trusted Advsor - NJ-HITEC s
Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
Survey Weighting and the Calculation of Sampling Variance
Survey Weghtng and the Calculaton of Samplng Varance Survey weghtng... 132 Calculatng samplng varance... 138 PISA 2012 TECHNICAL REPORT OECD 2014 131 Survey weghts are requred to facltate analyss of PISA
IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS
IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS Chrs Deeley* Last revsed: September 22, 200 * Chrs Deeley s a Senor Lecturer n the School of Accountng, Charles Sturt Unversty,
Time Value of Money Module
Tme Value of Money Module O BJECTIVES After readng ths Module, you wll be able to: Understand smple nterest and compound nterest. 2 Compute and use the future value of a sngle sum. 3 Compute and use the
NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION
NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State
Design and Development of a Security Evaluation Platform Based on International Standards
Internatonal Journal of Informatcs Socety, VOL.5, NO.2 (203) 7-80 7 Desgn and Development of a Securty Evaluaton Platform Based on Internatonal Standards Yuj Takahash and Yoshm Teshgawara Graduate School
Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining
Rsk Model of Long-Term Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,
Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications
Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and
Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy
4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.
Efficient Project Portfolio as a tool for Enterprise Risk Management
Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse
THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek
HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan
METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS
METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng
PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12
14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
Statistical Methods to Develop Rating Models
Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and
Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank.
Margnal Beneft Incdence Analyss Usng a Sngle Cross-secton of Data Mohamed Ihsan Ajwad and uentn Wodon World Bank August 200 Abstract In a recent paper, Lanjouw and Ravallon proposed an attractve and smple
Construction Rules for Morningstar Canada Target Dividend Index SM
Constructon Rules for Mornngstar Canada Target Dvdend Index SM Mornngstar Methodology Paper October 2014 Verson 1.2 2014 Mornngstar, Inc. All rghts reserved. The nformaton n ths document s the property
Research on Evaluation of Customer Experience of B2C Ecommerce Logistics Enterprises
3rd Internatonal Conference on Educaton, Management, Arts, Economcs and Socal Scence (ICEMAESS 2015) Research on Evaluaton of Customer Experence of B2C Ecommerce Logstcs Enterprses Yle Pe1, a, Wanxn Xue1,
Credit Limit Optimization (CLO) for Credit Cards
Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:
SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and
HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*
HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* Luísa Farnha** 1. INTRODUCTION The rapd growth n Portuguese households ndebtedness n the past few years ncreased the concerns that debt
Calculating the high frequency transmission line parameters of power cables
< ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,
LIFETIME INCOME OPTIONS
LIFETIME INCOME OPTIONS May 2011 by: Marca S. Wagner, Esq. The Wagner Law Group A Professonal Corporaton 99 Summer Street, 13 th Floor Boston, MA 02110 Tel: (617) 357-5200 Fax: (617) 357-5250 www.ersa-lawyers.com
Heterogeneous Paths Through College: Detailed Patterns and Relationships with Graduation and Earnings
Heterogeneous Paths Through College: Detaled Patterns and Relatonshps wth Graduaton and Earnngs Rodney J. Andrews The Unversty of Texas at Dallas and the Texas Schools Project Jng L The Unversty of Tulsa
Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable
PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION
PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIIOUS AFFILIATION AND PARTICIPATION Danny Cohen-Zada Department of Economcs, Ben-uron Unversty, Beer-Sheva 84105, Israel Wllam Sander Department of Economcs, DePaul
Searching and Switching: Empirical estimates of consumer behaviour in regulated markets
Searchng and Swtchng: Emprcal estmates of consumer behavour n regulated markets Catherne Waddams Prce Centre for Competton Polcy, Unversty of East Angla Catherne Webster Centre for Competton Polcy, Unversty
The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.
Paper 1837-2014 The Use of Analytcs for Clam Fraud Detecton Roosevelt C. Mosley, Jr., FCAS, MAAA Nck Kucera Pnnacle Actuaral Resources Inc., Bloomngton, IL ABSTRACT As t has been wdely reported n the nsurance
Meta-Analysis of Hazard Ratios
NCSS Statstcal Softare Chapter 458 Meta-Analyss of Hazard Ratos Introducton Ths module performs a meta-analyss on a set of to-group, tme to event (survval), studes n hch some data may be censored. These
Politecnico di Torino. Porto Institutional Repository
Poltecnco d Torno Porto Insttutonal Repostory [Artcle] A cost-effectve cloud computng framework for acceleratng multmeda communcaton smulatons Orgnal Ctaton: D. Angel, E. Masala (2012). A cost-effectve
Sketching Sampled Data Streams
Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA [email protected] [email protected] Abstract Samplng s used as a unversal method to reduce the
M-applications Development using High Performance Project Management Techniques
M-applcatons Development usng Hgh Performance Project Management Technques PAUL POCATILU, MARIUS VETRICI Economc Informatcs Department Academy of Economc Studes 6 Pata Romana, Sector, Bucharest ROMANIA
2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet
2008/8 An ntegrated model for warehouse and nventory plannng Géraldne Strack and Yves Pochet CORE Voe du Roman Pays 34 B-1348 Louvan-la-Neuve, Belgum. Tel (32 10) 47 43 04 Fax (32 10) 47 43 01 E-mal: [email protected]
Traffic-light extended with stress test for insurance and expense risks in life insurance
PROMEMORIA Datum 0 July 007 FI Dnr 07-1171-30 Fnansnspetonen Författare Bengt von Bahr, Göran Ronge Traffc-lght extended wth stress test for nsurance and expense rss n lfe nsurance Summary Ths memorandum
Traffic State Estimation in the Traffic Management Center of Berlin
Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal [email protected] Peter Möhl, PTV AG,
APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT
APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho
Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,
INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS
21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS
