1 Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 59, 2001 LISTASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James Lepkowsk, Unversty of Mchgan Lnda Pekarsk, Survey Samplng, Inc. Abstract Lstasssted RDD desgns became popular n the late 1980s and early 1990s. Work done by the Bureau of Labor Statstcs and the Unversty of Mchgan resulted n the development of the underlyng theory for these desgns as well as the evaluaton of varous alternatve samplng plans to optmze the method. Robert Casady and James Lepkowsk document ths work n an artcle n the June 1993 ssue of Survey Methodology. Recent research to reevaluate these desgns n lght of the sgnfcant changes n the telephone system over the last decade s presented n ths paper. The paper provdes background on the development of lstasssted desgns, and recent changes n the U.S. telephone system are revewed. Usng 1999 data from Survey Samplng, Inc., an analyss of the current state of the telephone system s presented, and a reoptmzaton of the earler desgns s undertaken. Results from the earler work are compared to fndngs from the 1999 data. 1. Introducton The MtofskyWaksberg random dgt dalng (RDD) method (Mtofsky 1970 and Waksberg 1978) was a major nnovaton n the desgn of telephone sample surveys. A two stage samplng procedure, the method was wdely used because of the smplcty of mplementaton and reduced cost through more effcent screenng of telephone numbers. The method selects clusters of numbers (100 defned by area code, prefx, and frst two dgts of the suffx) wth probabltes proportonal to the number of workng resdental numbers n the cluster, despte the fact that ths number s not known at the tme of selecton. Numbers are selected from clusters dentfed n the frst stage as havng at least one workng resdental number. The method s thus a twostage probablty proportonal to sze (PPS) equal probablty selecton of workng resdental numbers. Further, for ongong survey operatons, a 1 Presented at the Annual Conference of the Amercan Assocaton for Publc Opnon, Montreal, Canada, May 18, The fndngs and opnons expressed are those of the authors and do not necessarly reflect those of the Bureau of Labor Statstcs or Survey Samplng, Inc. set of 100 wth at least one workng resdental number could be used to generate subsamples across several successve studes. Although wdely used for a number of years, the MtofskyWaksberg method had several dsadvantages that led to a search for other methods. Whle smple conceptually, there were features of the desgn cumbersome to admnster. Further, because t s a twostage sample desgn, varances were larger than those from a smple random or stratfed random sample of the same sze. To overcome these problems, and retan equal probablty samplng n the desgn, lstasssted methods began to be used n the late 1980s and early 1990s. These methods utlzed a frame of lsted telephone numbers constructed from telephone drectores used by commercal malng frms. The lsted telephone number frame tself was not sutable for drect samplng of telephone numbers because a substantal share of telephone households do not appear n the frame. However, by samplng numbers from 100 that contaned lsted telephone numbers, effcences obtaned n the second stage of the MtofskyWaksberg could nearly be acheved. Sample selecton could be smple random, or stratfed random, selecton of telephone numbers from across 100 contanng one or more lsted telephone numbers. The loss n precson due to cluster samplng was elmnated, and samples generated from lstasssted methods proved less cumbersome to mplement than the twostage cluster method of MtofskyWaksberg. Lstasssted methods were examned by Casady and Lepkowsk (1993), who lad out the statstcal theory underlyng the desgn and presented emprcal analyss on the propertes of stratfed lstasssted desgn optons. Brck and colleagues (1995) showed that the potental bas resultng from the loss of resdental telephones n 100 wthout lsted numbers was small. Government agences, academc survey organzatons, and prvate survey frms subsequently adopted lstasssted desgns. The emprcal examnaton of stratfed desgns n Casady and Lepkowsk s work used estmates of parameters from the underlyng structure of the 1990 telephone system n the US. Potental gans n effcency under lstasssted desgns depended on the dstrbuton of resdental numbers across dfferent types of 100 n the system.
2 The telephone system, however, has changed n dramatc ways snce Casady and Lepkowsk completed ther work almost ten years ago. For example, the number of area codes, and thus the total number of telephone numbers n the system, has almost doubled n the last ten years. There are today 90% more avalable telephone numbers. On the other hand, the number of households has ncreased by only a lttle over 10%. As a result, the proporton of all telephone numbers assgned to a resdental unt has dropped from over 0.20 to no more than 0.15, and perhaps lower. The number of actve prefxes, those wth one or more lsted telephone numbers, has ncreased snce 1990, but these prefxes now are a smaller percentage of all prefxes. The proporton of unlsted numbers s now approachng 30%, and much larger n some urban areas. There s also evdence that telephone companes now appear to be less systematc n the assgnment of resdental numbers across Whle the number of resdences has grown by only 10%, the number of 100 wth resdental numbers has ncreased by over 50%. The ncrease n the unlsted rate and the unusually large number of resdental has resulted n a declne n the proporton of lsted resdences n 100, from 1990 percentages n the low and mddle 50 s to percentages n the upper 30 s today. In addton, there has been substantal growth n the number of households wth multple lnes. Second lnes dedcated to computers, fax machnes, and home busnesses have made t more dffcult to dstngush noncontacts from nonworkng numbers. Fnally, there has been an ncrease n the assgnment of whole prefxes to a sngle frm. The dentfcaton of busness numbers, and the separaton of those numbers from resdental ones, has become more problematc. Gven all of these changes, t s tme to reconsder the Casady and Lepkowsk desgns that were optmzed for a telephone system wth dfferent underlyng parameters than the one we have today. Ths paper frst revews the basc features of the CasadyLepkowsk approach. It then compares features of the current telephone system and the one exstng at the tme Casady and Lepkowsk dd ther orgnal work. The set of desgns Casady and Lepkowsk optmzed usng 1990 data are then optmzed for the current telephone system. Fnally, the current effcences of these desgns wll be contrasted to ther effcences n the past. 2. The LstAsssted Method The lstasssted method assumes that the entre frame of telephone numbers s avalable, and stratfed on the bass of several auxlary varables to mprove the effcency of the samples selected. Chef among these auxlary varables s whether the partcular telephone number s n a 100bank wth at least one lsted resdental number. Two strata are created: one wth telephone numbers n 100 wth one or more lsted numbers and a second or all remanng numbers. Further stratfcaton of the remanng number stratum could be acheved by knowng characterstcs of the prefxes and sets of comprsng a 1000bank (a set of 1000 consecutve telephone numbers wth the same area code, prefx, and frst dgt of the four dgt suffx). Several alternatve stratfed desgns can then be examned, each optmally allocated for effcency wth respect to a gven set of user needs. The optmal allocaton mnmzng the samplng varance of an estmate for a fxed expected cost (C*): zσ 1+ m = h 1+ ( 1 h ) ( γ 1) λ h 1 2 The allocaton depends on the proporton of the populaton n each stratum ( z ), the wthnstratum 2 varances ( σ ), the wthnstratum ht rates ( h ), the proporton of the varance n the estmate of a characterstc accounted for by between stratum dfferences ( λ ), and the rato of the total cost of data collecton to the cost of just dentfyng the resdental numbers (γ ). To assess the relatve effcency of these optmally allocated desgns, the samplng varance of the mean under optmal allocaton was compared to that whch would have been obtaned under a smple random selecton of all telephone numbers. Ths proportonal reducton n the varance relatve to smple random samplng s approxmately ( 1 h) H = 1 z σ h [( 1+ ( 1 h ) λ )( 1+ ( γ 1) h )] σ 2 ( 1+ ( γ 1) h) Casady and Lepkowsk examned several two and threestratum lstasssted desgns. For the twostratum desgns, of numbers were assgned to a hgh or low densty stratum accordng to whether or not the bank contaned at least one lsted resdental number. In threestratum desgns, the low densty stratum was dvded nto those wth moderate to low resdental ht rates (Low densty) and those expected to have very few resdental numbers (Very low densty)
3 3. Study Desgn Casady and Lepkowsk developed ther desgns usng counts n 100 purchased n 1990 from Donnelly Marketng, Inc. These data were merged wth auxlary nformaton from the BellCore Research telephone frame of all telephone numbers. The current research uses data on 100 from 1999 data suppled by Survey Samplng, Inc. contanng all auxlary nformaton. Both data sets were stratfed usng the auxlary varables already dscussed, facltatng the comparson of desgns n the telephone system over the past decade. The relatve effcences of 1990 and 1999 based desgns are then compared. 4. Results 4.1. Changes n the Telephone System Table 1 llustrates recent changes n the telephone system. The percentage of 100 that contaned one or more lsted numbers declned from 38% n 1990 to 30% n There has been a correspondng declne n the densty of lsted numbers wthn these lsted 100 as well. Fgure 1 shows the dstrbuton of the number lsted numbers n lsted 100 over several years snce The dstrbutons for later years through 2000 have shfted to the left, ndcatng a decrease n the proporton of telephone numbers n lsted 100 that are lsted numbers. Table 2 contrasts the dstrbutons of telephone numbers between 1990 and 1999 across types of numbers. As wll be noted subsequently, there s a substantal shft over tme to prefxes wth no lsted numbers. Ths trend reflects changes n the phone system ntroducng more numbers for number portablty and nonresdental purposes. Table 1. Dstrbuton of 100 by number of unque lsted numbers, 1990 and 1999 Number of All Lsted All Lsted Total 4,350,164 1,656,627 7,715,800 2,316, Fgure 1. Incdence of lsted phones n workng blocks Survey Samplng, Inc Number of Blocks 90,000 80,000 70,000 60,000 50,000 40,000 30,000 20,000 10,000 0 Later years Earler years Number of Lsted Phones
4 Table 2. Dstrbuton of 100 by telephone samplng strata, DMIQ, 1990, and Survey Samplng Database, 1999 Descrpton (Stratum) Total 4,350, ,715, wth 1+ lsted numbers (1) a 1,656, ,316, Urban Rural Suburban 468,429 1,090, , wth no lsted numbers 2,693, ,399, Area codeprefx wth no lsted numbers (3) 855, ,949, Exchange class No lstngs Exactly one prefx n the exchange Two or more prefxes n the exchange 1,986,300 42,300 1,944, All other Exchange classes Exactly one prefx n the exchange Two or more prefxes n the exchange , , ,800 26, , Area codeprefx wth 1+ lsted numbers 1,838, ,450, Exactly one prefx n the exchange 1,196, , bank wth no lsted numbers (3) 1000bank wth 1+ lsted numbers (2) 1,048, , , , Two or more prefxes n the exchange 641, ,526, bank wth no lsted numbers (2) 1000bank wth 1+ lsted numbers (2) 429, , , , a Denotes the stratum to whch the 100 are assgned: (1) lsted or hgh densty 100, (2) unlsted and low densty 100, and (3) unlsted and very low densty The Intal Stratfcaton Scheme The stratfcaton scheme used n the current analyss matches that used by Casady and Lepkowsk. Intally, the frame was dvded nto the three strata pctured n Table 3. The Hghdensty stratum, contanng 30% of all the 100, ncludes wth one or more lsted numbers. Ths stratum has almost 97% of the resdental numbers, and a resdental ht rate of approxmately 49%, as estmated from recent screenng results from the Unversty of Mchgan Survey of Consumer Atttudes. The second conssts of unlsted that are n area code and prefx combnatons wth one or more lsted numbers and are n ether wth a lstng or n exchanges wth two or more prefxes (urban areas). Ths stratum makes up about 22% of the frame and contans 2.5% of the resdental numbers. Based on nformaton orgnally presented n Tucker, Casady, and Lepkowsk (1992), the low densty stratum has an estmated ht rate of 1.7%. The thrd stratum contans the remanng 48% of the These are ether n area code and prefx combnatons wth no lsted numbers or n exchanges wth only one prefx (rural area) and a 1000bank wth no lsted numbers. Ths stratum has about 1% of the lsted resdental numbers and a ht rate of 0.3%. Table 4 provdes a comparson of the parameters n the current threestratum desgn to those used by Casady and Lepkowsk. There s a declne n the proporton of n the hghdensty stratum over the decade, but a large gan n the proporton n the very low densty stratum. On the other hand, the proporton of all resdental numbers s a lttle hgher n the hghdensty stratum compared
5 to ten years ago. Gven the declne n the denstes wthn lsted 100, t s not surprsng that the ht rate n the hghdensty stratum s somewhat lower now. In fact, the ht rates have dropped across all three strata. Table 3. Three stratum desgn Hgh Densty Low Densty Very Low Densty Lsted 100 Unlsted 100 Unlsted % tel. nos. 1+ lsted n AC/Prefx 1+ lsted n AC/Prefx 49% ht rate 1 prefx, lsted 1000bank 1 prefx, unlsted 1000bank 96.5% of pop. 2.1% tel. nos. 9.9% tel. nos. 1.3% ht rate 0.4% ht rate 0.2% of pop. 0.2% of pop. 2+ prefx, unlsted 1000bank 0 lsted n AC/Prefx 12.4% tel.nos. 1 prefx n exchange 1.2% ht rate 0.9% tel. nos. 1.0% of pop. 0.2% ht rate 2+ prefx, lsted 1000bank 0.01% of pop. 7.4% tel nos. 2+ prefx n exchange 2.7% ht rate 37.3% tel nos. 1.3% of pop. 0.3% ht rate 0.7% of pop. Table 4. Three stratum desgn: 1999 v Prop. Prop. Ht Frame Popn. rate Stratum/ densty 1: Hgh : Low : Very low Prop. Empty Analyss Fve desgns, four lstasssted and the MtofskyWaksberg sample desgns, are examned (as n the work of Casady and Lepkowsk). There s a desgn based on all three strata. A twostratum desgn uses the hghdensty stratum and all remanng numbers (a collapsng of the low and very lowdensty strata). The twostratum desgn acknowledges the fact that lttle effcency s ganed by separatng the second and thrd strata, and that mplementaton would be smplfed f only two strata were used. Two truncated desgns were formed by elmnatng the very from the threestratum desgn, and by elmnatng the low and very lowdensty collapsed stratum from the twostratum desgn. These truncated desgns do not attempt to cover the small number of households mssed by elmnatng the least productve stratum from the two and three stratum desgns. Table 5 shows the reducton n varance (compared to smple random samplng) for the fve desgns for a fxed total cost across the desgns. The results are presented when the rato between total data collecton costs and the costs of dentfyng or screenng to fnd resdences s two, 10, and 20, respectvely. The proporton of the populaton covered under each desgn also s shown. Table 5. Comparson of fve desgns: 1999 v Desgn Prop. Reducton n Varance Prop. not γ = 2 γ = 10 γ = 20 covered Ht rate n nonempty 2 stratum stratum (truncated) Mtofsky Waksberg stratum stratum (truncated) For 1990 and 1999, the reducton n varance s qute large when the cost rato s two, reflectng the larger relatve mportance of telephone household screenng costs n short ntervew surveys. The reducton n varance s greater for all of the lst
6 asssted desgns compared to the MtofskyWaksberg procedure. Further, n all cases, the reducton n varance s greater now than a decade ago. Ths fndng s due to the declne n the resdental ht rates from over 20% to under 15% for the base smple random samplng over the tme perod. The truncated twostratum desgn s the most effcent, followed closely by the truncated threestratum desgn. In the latter case, only 1% of the populaton s not covered. The cost rato s an mportant factor to consder. The largest gans n precson are, as noted, for the lowest cost rato. By the tme the rato becomes as large as 20 (a much longer ntervew perod), none of the desgns does substantally better than smple random samplng. Table 6 provdes the relatve effcences of each lstasssted desgn compared to Mtofsky Waksberg for 1990 and For both tme perods, all of the lstasssted desgns are more effcent than the MtofskyWaksberg desgn. In some cases the relatve effcences have ncreased over tme, and n other cases they have decreased. Agan, the truncated desgns perform better than those that cover the whole populaton. Thus, for users wllng to dsregard a small loss n coverage, the truncated desgns are qute attractve, especally when the cost rato s low. Table 6. Effcency compared to Mtofsky Waksberg (1999 v. 1990) Desgn Relatve Effcency (%) γ = 2 γ = 10 γ = 20 2 stratum stratum 35.8 (truncated) stratum stratum 27.6 (truncated) Dscusson It s unclear what other changes wll occur n the telephone system n the comng years, and how telephone samplng mght be affected. Whle the total sze of the system should not grow as rapdly n the comng years as t has n the last decade, the allocaton of numbers to dfferent types of 100 shown n Table 2 may contnue to change. For nstance, a small but growng number of resdences have only cellular servce, and these numbers have rarely been ncluded n current telephone survey desgns. The desgns dscussed n ths paper could ncorporate them, but should resdences wth both regular and cellular servce be ncluded and then consdered to have multple lnes? Furthermore, as long as the bllng algorthm remans the same, cellular users wll be reluctant to pay to do survey ntervews. Assumng change n the telephone system slows down, the lstng rates (and, presumably, the unlsted numbers, too) wthn 100 should ncrease. Wth the ncreasng denstes would come an ncrease n effcency for all desgns. Of course, ths assumes that numbers wll be assgned as they have been n the past. Other factors, however, wll contnue to affect the effcences of RDD desgns. The contnued ncrease n computer usage could make the dentfcaton of resdental numbers even more dffcult. Noncontact rates have clmbed for even personal vst surveys, and ths problem wll be more severe for telephone surveys. Ths stuaton s compounded by the rsng use of technologes such as Caller ID to screen calls. The publc's ncreasng reluctance to partcpate n surveys could result n hgher refusal rates, even f they answer ther phones. Thus, the contnued feasblty of conductng telephone surveys may depend less and less on the ease of locatng a resdental number and more and more on the respondent's wllngness to cooperate. References Brck, J.M., Waksberg, J., Kulp, D., and Starer, A. (1995). Bas n lstasssted telephone samples. Publc Opnon Quarterly, 59, Casady, R.J. and Lepkowsk, J.M. (1993). Stratfed telephone survey desgns. Survey Methodology, 19, Mtofsky, W. (1970). Samplng of telephone households. Unpublshed CBS News memorandum. Tucker, C., Casady, R.J., and Lepkowsk, J.M. (1992). Sample allocaton for stratfed telephone sample desgns. Proceedngs of the Secton on Survey Research Methods, Amercan Statstcal Assocaton, Waksberg, J. (1978). Samplng methods for random dgt dalng. Journal of the Amercan Statstcal Assocaton, 73,
More information