Estmaton of Attrton Bases n SIPP Erc V. Slud 1 and Leroy Baley 2 1 Census Bureau, SRD, and Unv. of Maryland College Park 2 Census Bureau, Statstcal Research Dvson Abstract Ths paper studes estmators for the bas n estmated cross-sectonal survey tem totals due to attrton nonresponse weghtng wthn a longtudnal survey. Adjustments for between-sample longtudnal nonresponse are made ether by adjustment cells or by logstc regresson. The bas estmators studed were frst proposed by Baley (2004 n connecton wth the Census Bureau s Survey of Income and Program Partcpaton (SIPP, but are generalzed here to nclude longtudnal survey tems, and formulas and estmators for ther desgn-based varances are gven n terms of jont ncluson probabltes. In practce, varance estmates for the bas estmators are obtaned, n the SIPP settng where PSU samples are drawn as balanced half-sample replcates, usng ether balanced replcaton methods or a formula of Ernst, Huggns and Grll (1986. The methods are llustrated usng crosssectonal tem data from the SIPP 1996 panel. Keywords: Adjustment cells, Balanced replcates, Logstc regresson, Longtudnal survey, Nonresponse, Varance estmaton, VPLX. Ths report s released to nform nterested partes of research and encourage dscusson. The vews expressed on statstcal and methodologcal ssues are those of the authors and not necessarly those of the Census Bureau. 1. Introducton One of the persstent problems arsng n large longtudnal surveys s to compensate through weghtng schemes for nonresponse errors due to attrton. Ths problem s perennal for the Census Bureau s Survey of Income and Program Partcpaton (SIPP, whch has for many years been one of the largest natonal longtudnal surveys, measurng many varables related to famly, employment status, nsurance, amounts and sources of ncome, and ndcators of partcpaton n varous government programs. SIPP s desgn follows panels of the order of 30,000 50,000 sampled households over a successon of staggered waves every four months for total duratons typcally of 3 years (SIPP User s Gude 2001. Attrton over the lfe of a panel s of the order of 30%: of the 94, 444 persons respondng n Wave 1 of SIPP 96, 14% were lost by Wave 4, and 30% by the end of the panel, Wave 12. Whle some aspects of the qualty of reported data from a survey lke SIPP can be judged only by comparng wth related data from other sources, the problem of ascertanng and compensatng for the bases due to attrton s largely a matter to be explored through nternal consderaton of the longtudnal data from the same survey. Ths s because the precse demographc and surveydesgn features and ncentves assocated wth attrton are lkely to be very dfferent for another survey. At frst sght, the possblty of assessng nonresponse bases by a purely nternal statstcal examnaton of survey data seems counterntutve. That s especally so n a survey lke SIPP, where the precse varables beng measured such as marrage and dvorce, poverty or changes n employment status for one or more jobs between successve nstances of followup questonng wll drectly affect the chance of fndng a subject at home and wllng to respond to the survey s followup. However, by examnng adjusted estmates of totals of wave 1 cross-sectonal varables, we separate the modelng of the propensty to respond from later changes n surveyed tems. Baley (2004 studed the bas due to each of a number of attrton adjustment methods n SIPP by calculatng for specfc tems the dfferences between the totals estmated at wave 1 versus the totals of the same wave- 1 tems estmated by nonresponse adjustment of the totals obtaned from the wave-t subjects (who responded n wave 1. The approach of Baley (2004, whch we elaborate formally n Secton 2, was restrcted to sngle-wave (cross-sectonal questonnare tems and was purely descrptve n the sense that standard errors for the dfference between the wave-1 and adjusted wave-t totals were not provded. The man methods of attrton nonresponse adjustment he consdered were Horvtz-Thompson estmators (Särndal et al. 1992, pp. 42ff. wth weghts derved ether from a moderate number (of the order of 100, wthn a survey of more than 30,000 households of ad- 3713
justment cells or from a logstc regresson model, specfed n terms of Wave-1 tems. In ths paper, we broaden Baley s (2004 approach to provde formulas and estmators of the standard errors for the estmators of bas n estmatng wave-1 crosssectonal tem totals usng adjusted wave-4 or wave-12 data. The estmatng formulas gven here for bases and standard errors generalze mmedately to cover the case of bas estmaton for longtudnal tem totals based on early-wave versus adjusted later-wave data. Exact varances of the bas estmators, and unbased estmators for them, requre knowledge of jont ncluson probabltes whch are generally not avalable when, as n SIPP, the wave 1 ncluson weghts take nto account wave 1 nonresponse and are raked and second-stage adjusted. However, the SIPP 1990 redesgn ncorporated a feature of two balanced replcate samples wthn each sampled PSU. Usng a theoretcal formula of Ernst, Huggns and Grll (1986, we provde and mplement closedform and balanced repeated replcate (BRR estmators of varance for the bas estmators we study. The paper s organzed as follows. Secton 2 gves formulas from sample survey theory for estmatng attrton nonresponse bases. Secton 3 gves varance formulas, n terms of jont ncluson probabltes, of desgned replcates, and of balanced repeated replcates as mplemented n the Census Bureau s VPLX software. Secton 4 llustrates and nterprets the bas estmators and ther estmated standard errors for the same eleven SIPP 96 (cross-sectonal tems studed n Baley s (2004 report, and the alternatve estmators for standard errors are compared. Fnally, tentatve conclusons are drawn about the magntudes and sgnfcance of SIPP bases due to attrton, and about mportant drectons for further research. 2. Bas Estmaton Formulas Suppose that a sample S of subjects s drawn from samplng frame U, wth sngle ncluson-probabltes {π } U and jont ncluson probabltes {π j }, j U. Suppose that baselne observatons (x, y B, r are recorded for all S, where y B s an tem of nterest, x s a vector of auxlary data, and r s the ndcator of followup response. Suppose also that y F denotes a followup measurement whch s potentally defned for all populaton members but whch s actually recorded only for S such that r = 1. As s often done n survey theory, we treat estmated totals n a desgn-based framework except that the response ndcators r are treated as random varables condtoned on S,.e., on all frst-stage data. It can be assumed that π, π j are known, but that E(r = P (r = 1 and the correspondng condtonal probabltes p = P (r = 1 S are not known. The target parameter s ϑ = t y F t y B = U (y F y B, where here and n what follows we adopt the standard notatons that for any attrbute z, U, the framepopulaton total s t z = U z, and the correspondng Horvtz-Thompson estmator s ˆt z = S z /π. In ths setup, the recprocal probabltes π 1 are assumed to be weghts whch adjust correctly for frst-stage (Wave-1 nonresponse. The quanttes y B are the frststage measured survey varables of nterest, usually frststage cross-sectonal survey data. In actual practce, y F would be the correspondng varables whch could be observed at a specfed later stage (Wave of the survey. The target parameter would then be the between-wave change n populaton total of the y attrbute. However, to avod confoundng the true between-wave change wth the errors we ntroduce due to samplng varablty and mprecse modellng or estmaton of the condtonal probabltes p, we follow Baley (2004 n the artfcal choce y F = y B for all, makng the target parameter ϑ = 0 known, and allowng us to evaluate the effectveness of adjustments made for later-wave nonresponse. Now f the condtonal probabltes p = E(r S = P (r = 1 S were known, then the general unbased estmator of ϑ = t y F t y B would be S (r y F /p y B/π. The actual estmator ˆϑ that we use depends on an estmator ˆp for p under a parametrc model (two specfc examples of whch wll be dscussed n detal below, wthn whch the random varables r, U, are assumed to be condtonally ndependent gven S, wth p = P (r = 1 S = g(x, β, S (1 where x s the vector of auxlary varables observed for each S, and β s a parameter of fxed dmenson (much smaller than the frame populaton sze U. Under model (1, wth ˆp = g(x, ˆβ generally derved from a weghted estmatng equaton n terms of the data (r, S, we obtan Wave-change Estmator = S 1 ( y F π r y B ˆp Specalzng further to the case of nterest n ths paper, we fx y F y B, n whch case the estmator (2 can be vewed as the dfference between the later-wave nonresponse-adjusted estmator of t y B and (2 3714
the frst-wave Horvtz-Thompson estmator of the same quantty. Therefore we refer to t smply as the Bas Estmator = S y B ( r 1 π ˆp We work wth two dfferent models of the form (1. (3 Adjustment Cell Model. Let the frame U be parttoned nto cells C j, j = 1,..., K, wth K fxed, and for the parameter β (0, 1 K, assume p = Pr(r = 1 S = β j whenever C j, where U, j = 1,..., K. Then the form of estmator for p = β j for C j, whch s unbased for all ncluson-probabltes π, s the Cell Adjustment-Factor gven by S C ˆβ j = j r π 1 S C j π 1 (4 Logstc Regresson Model. Alternatvely, for β R d a coeffcent parameter vector of the same dmenson d as the covarate x, we could assume that p depends on ndvdual covarates through the formula p = (1 + e x β 1 and then estmate ˆp = (1+e ˆβ x 1, where ˆβ s the unque soluton of the weghted score equaton S x 1 π ( r ex ˆβ 1 + e x ˆβ = 0 (5 3. Estmatng the Varance of Bas Estmators Our next objectve s to fnd general large-sample approxmate expressons and unbased estmators for the varance of the Wave-Change Estmator (2, and ts specalzaton to the Bas Estmator (3. Under the parametrc model (1, wth consstent estmators ˆβ for β and estmators ˆp = g(x, ˆβ close to p (unformly over S wth hgh probablty, when the sample sze S s large, a prncpal method for dervng varances s frst to lnearze the estmators of nterest,.e., to approxmate the centered estmators by lnear expressons n centered (Horvtz-Thompson estmators ˆt z t z of survey-tem totals t z. We now proceed to do ths for the estmator (2, denoted ˆϑ, under the two models consdered n Secton 2. Frst, n the adjustment cell settng, defne the cell j attrbute ξ j as the ndcator whch for ndvdual s ξ j, = I [ Cj]. Then the adjustment-cell verson of ˆϑ ϑ, denoted ˆϑ A ϑ and defned from (2 wth (4 substtuted, s easly seen to have the form ˆϑ A ϑ = K ( ˆt ξj j=1 ˆt r ξj K (ˆt r ξj y ˆt F ξj /ˆt r ξj t r ξj y F t ξ j /t r ξj j=1 ˆt ξj y B + t ξ j y B (6 (ˆt r ξj y F t r ξ j y F + ˆt r ξj y F ˆt r ξj (ˆt ξj t ξj ˆt r ξj y F ˆt ξj ˆt 2 r ξ j (ˆt r ξj t r ξj ˆt ξj y B + t ξ j y B where means that the expressons on the left- and rght-hand sdes dffer by an amount whch s neglgble n probablty as S gets large, wth the populatonsze U much larger stll. But ths last expresson s approxmately equal n the same sense to the dfference ˆt z A t z A, where the attrbute z A s defned whenever C j by z A = ˆt ξj ˆt r ξj r y F + ˆt r ξj y F ˆt r ξj ˆt r ξj y F ˆt ξj ˆt 2 r ξ j r y B (7 Next, under the logstc-regresson model for later-wave nonresponse, we fnd by frst-order Taylor expanson of (5 n ˆβ around the value β = β U satsfyng that where and U x ( r ˆβ β U Î = S T = S x 2 π ex β U 1 + e x β U x π (r Î 1 T e x ˆβ (1 + e x ˆβ 2 ex β U = 0 1 + e x β U and for any vector v, we defne the notaton v 2 = vv. Then the centered wave-change estmator ˆϑ L ϑ, wth ˆϑ L gven by (2 after substtutng ˆp = (1 + e x ˆβ 1 for ˆβ satsfyng (5, s ˆϑ L ϑ ˆt r yf (1+e x β U t r yf (1+e x β U ˆt y B + t y B ˆt r x y F e x ˆβ Î 1 T 3715
whch has asymptotcally the same varance as the centered total-estmator ˆt z L t z L defned n terms of the attrbute z L = r y F (1 + e x ˆβ y B ˆt r x y F e x ˆβ Î 1 x (r ˆp (8 Thus the varance V ( ˆϑ of ˆϑ could be found approxmately by the Horvtz-Thompson varance estmator ˆV ( ˆϑ =,k S π k π π k π π k z π z k π k (9 respectvely for z z A gven by (7 under the adjustment-cell model and for z z L gven by (8 under the logstc regresson response model. ncluson probabltes are not avalable at person-level for many large surveys lke SIPP, because even the sngle ncluson probabltes π are constructed by an elaborate rakng and trmmng process. Therefore, the varance formulas (9 are not drectly applcable. However, the lnearzaton dea just descrbed, whch s now a standard method descrbed n textbooks lke that of Särndal et al. (1992, was shown by Woodruff and Causey (1976 to be applcable wthn any replcate-based varance estmaton method. That s, they showed that n large samples, replcate-based varance estmators of totals of the artfcal attrbutes whch make the lnearzed estmators take a Horvtz-Thompson form would provde accurate approxmatons to the varances of the estmators beng lnearzed. We next show how the SIPP desgn allows replcate-based estmaton of V ( ˆϑ n terms of the pseudo-attrbutes z = z A or z L respectvely under the adjustment-cell or logstc regresson response models. We begn by descrbng the 1990 redesgn of SIPP n terms smlar to those of Rottach (2004, buldng on the standard SIPP documentaton (2001. (A more detaled descrpton along the same lnes can also be found n Slud (2006. SIPP s complex multstage samplng desgn ncorporated pared replcate samples wthn Prmary Samplng Unts (PSU s. At the natonal level, the survey s based upon strata geographc unts nested wthn County and MCD consstng n 1996 of 112 self-representng (SR strata, subdvded nto 372 artfcal PSU s all of whch were sampled, together wth 105 nonself-representng (NSR strata n each of whch exactly two were sampled accordng to the Durbn method wth a fxed system of PSU-level probablty weghts. Wthn each PSU, a systematc sample of (a fxed number of persons was drawn, apparently usng a sngle randomzed startng ndex, although the sampled ndvduals are alternately ndexed nto two classes whch Rottach (2004 calls half-samples but whch here wll be referred to as half-psu s. The varance estmaton technques appled to SIPP, begnnng wth Ernst, Huggns and Grll (1986 and Fay (1989, treat these half-psu s as though they were two ndependently drawn samples, e.g., as though two ndependently randomzed startng ndces had been used. h = 1, 2 denote half-psu. Where needed, let S s denote the set of sampled PSU s wthn stratum s. Let j be the ndex for ndvduals wthn a specfed combnaton (s,, h, and S s,,h denote the set of ndvduals sampled wthn the half-psu (s,, h. For NSR stratum s, let π (s 1, π(s 2 respectvely denote the ncluson probabltes for the sampled PSU s n S s respectvely labelled = 1 and 2, and let π (s denote the jont ncluson probablty for both sampled PSU s. (There may be more than two PSU s n SR strata, but all of ther ncluson probabltes are 1. Now denote the ndvdual nverse weght or sngle ncluson probablty by π (s,,h,j to conform wth Secton 2, snce ndvduals now have the quadruple ndces (s,, h, j. Then the wthn-psu weghts for sampled ndvduals n sampled half-psu s (s,, h are π (s / π (s,,h,j. In terms of these notatons, the estmator of Ernst, Huggns and Grll (1986 whch they show to be slghtly upwardly based for the varance of the total-estmator ˆt z for an attrbute z (s,,h,j, s ˆV EHG = s NSR + s S ( Zs,1 b s0 π (s 1 b s,1 S Z 2 s,2 (10 π (s 2 (Z s,,1 Z s,,22 (π (s 2 where weghted half-psu and PSU- aggregated totals are gven by Z s,,h = and j S sh b s,0 = π(s 1 π (s 2 π (s π (s z (s,,h,j π (s,,h,j, Z s, = 2 h=1 Z s,,h 1, b s,1 = max{1 b s,0, 0} Note that b s,0 0 always for the Durbn method, and b s,0 = 0, b s,1 = 1 whenever s SR. In settngs lke SIPP, where replcate samples are avalable wthn PSU s, the Census Bureau often uses a varance estmaton method of Fay (1989 mplemented n Fay s VPLX software. Ths method generates a set of dstnct multplcatve replcate weght-factors f s,,h,r where r = 1,..., R denotes an ndex (R = 160 for SIPP 96. The varance V ( ˆϑ for a possbly nonlnear 3716
estmator ˆτ s then estmated as follows. For each replcate r, the estmator s recalculated wth frst-order ncluson weghts π 1 (s,,h,j replaced by f s,,h,r/π (s,,h,j and the result denoted ˆτ (r. In partcular, f the estmator of nterest was ˆt z, then the r th replcate s ˆτ (r = 2 h=1 f s,,h,r Z s,,h π (s (11 The Fay-method (1989 balanced repeated replcate (BRR estmator for the varance of ˆτ s ˆV F ay = 4 R R r=1 ( ˆτ (r 1 R R m=1 ˆτ (m 2 (12 As shown by Fay (1989 and descrbed n greater detal by Rottach (2004 and Slud (2006, f the estmator of nterest s a Horvtz-Thompson total (ˆτ = ˆt z, then the Fay method varance estmator approxmates ˆV EHG and s algebracally dentcal to t f the number R of replcates s large enough. (That suffcently large number, as justfed n Slud 2006, s 3 tmes the number of NSR strata mnus twce number of NSR strata wth b s,1 = 0, plus the number of SR PSU s, or 667 n SIPP 96. The number R of replcates used n SIPP 96 was 160. In a small smulaton study n Slud (2006, ˆVF ay and ˆV EHG were checked to be unformly close to one another n cases where ˆτ = ˆt z and, n all models except those wth greatest mbalance between half-psu s, also to be close to the emprcal varance. Snce jont person-level ncluson probabltes are not avalable for SIPP, we have two avalable replcate-based methods for estmatng the varances of the nonlnear wave-change or bas estmators ˆϑ A, ˆϑ L : (1 The Fay method varance ˆVF ay for the wavechange estmator ˆτ gven drectly by (2. (2 The Ernst, Huggns and Grll estmator ˆVEHG appled to the lnearzed attrbutes z A or z L gven respectvely by (7 or (8. A thrd method whch ntally seems to be also feasble the Fay method appled to the lnearzed total estmators based on attrbutes z A or z L turns out n both adjustment models to gve algebracally the same value as the ordnary Fay method (1, because of the denttes ˆϑ A = S za and ˆϑ L = S zl for each set of replcate weghts. The two estmators ˆV F ay and ˆV EHG as n (1-(2 above wll be calculated and compared n the next Secton for varous choces of attrbute y B = y F n SIPP 96. For other types of survey estmators, not the bas estmators studed here, lnearzed and BRR varances have prevously been compared by Sae-Ung et al. (2004 wthn the SIPP context. 4. Results for SIPP 96 Data We summarze the SIPP 96 computed results on wave- 1 mnus wave-4 or wave-12 weghted totals, along wth estmated standard errors. We consder totals and dfferences based only on the 94,444 ndvduals n SIPP 96 who had postve wave-1 weghts, whch were the second-stage weghts (bwgt recoded by Jule Tsay and used n the report of Baley (2004. Throughout, we follow Baley (2004 n studyng the followng 11 SIPP cross-sectonal tems: ndcators that the ndvdual lvng n a Household receves ( Food Stamps (Foodst, or ( Ad to Famles wth Dependent Chldren (AFDC; ndcators that the ndvdual receves ( Medcad (Mdcd, or (v Socal Securty (SocSec; and ndcators that the ndvdual (v has health nsurance (Hens, (v s n poverty (Pov, (v s employed (Emp, (v s unemployed (UnEmp, (x s not n the labor force (NILF, (x s marred (MAR, or (x s dvorced (DIV. The defnton of later-wave response used n most longtudnal SIPP studes and also n the present paper s: partcpaton n all waves cumulatvely up to the later wave of nterest. The adjustment-cell nonresponse model consdered here s standard for SIPP (Tupek 2002, consstng of 149 cells defned n terms of regon and Wave-1 measures of educaton, ncome-level, employment status, race, ethncty, asset types, and numbers of mputed tems. All cells contaned suffcently many ndvduals for Waves 4 and 12, and weghts wthn a reasonable range, so that poolng was not necessary. The logstc regresson response model used the Wave-1 predctor ndcator varables for: Poverty, Whte Not Hspanc, Black, Renter, College educated, household reference person, and two parwse nteractons (Renter College, Black College. The model s the same one used n Baley (2004, wth the addton of a Poverty ndcator. Poverty was ncluded n the model because t was hghly sgnfcant n exploratory model fttng, because poverty s used n defnng the standard SIPP set of adjustment cells, and most of all because we want to note the effect on populatonwde bas estmates for a varable whch s also a survey tem whose total s regularly reported. Table 1 presents the estmators for Wave 1 tem totals and also the bases n those totals resultng from adjustment between waves 1 and 4, wth ether the Adjustment-Cell (BasC or Logstc Regresson (BasL method. The Wav1 column shows the frame populaton totals ˆt y for 3717
Table 1: Wave 1 SIPP 96 tem totals and Wave 4 vs. 1 bas estmators, gven n thousands. Standard errors have been used n formng Studentzed bas devates Dev C, Dev L. Here C ndcates Wave 4 adjustment by Cells, L by Logstc. The astersk by Poverty recalls ts use n Logstc regresson model. Item Wav1 BasC BasL Dev C Dev L Foodst 27268-85.8 494.8-0.51 2.77 AFDC 14030-55.3 338.4-0.36 2.10 Mdcd 28173 153.1 778.6 1.13 5.29 SocSec 37087 699.3 413.6 5.62 2.88 Hens 194591 1629.7 1233.5 7.16 5.22 Pov 41796-770.3 29.5-4.29 3.50 Emp 191201 189.4 216.7 1.44 1.32 UnEmp 6406-336.7-375.9-6.02-6.50 NILF 66647 147.3 163.5 1.18 1.01 MAR 114367 1253.2 95.2 6.46 0.52 DIV 18463-206.0-357.4-2.03-3.70 the ndcated cross-sectonal tems measured n Wave 1. (Recall that the Wave 1 sample ncludes only responders, wth Wave 1 nonresponse taken nto account n the ncluson probabltes π. Fay-method BRR varance estmators for these Bas estmators were used to create Standard Errors, and the studentzed estmators (Bas/SE are gven n the fnal two columns. The layout for Table 2 s completely analogous. In Table 1, estmated bas tends to be small, no more than 2% of the Wave 1 total, except that UnEmp bas s 5% to 6% downward both n C and L columns, and the logstc-model bases for AFDC, Mdcd are roughly 3%. The Poverty bas for the logstc model s partcularly small. The bases for Wave 12 versus Wave 1 adjustments, summarzed n Table 2, are generally larger than for Wave 4. Items AFDC, SocSec, UnEmp, NILF all have relatve Wave 12 adjustment bases more than 3% by both methods, wth several greater than 10%. The studentzed bas-estmator ratos Dev C and Dev L, whch should be compared wth standard normal percentage ponts, are sgnfcant n Wave 4 (Table 1 even n some nstances where relatve bases are small, but n Wave 12 (Table 2 vrtually all the studentzed bases are hghly sgnfcant. The patterns n relatve and studentzed bas are smlar for the two Tables, but not qute for the two adjustment methods. At least t s generally true n Table 1 that the bases for both adjustment methods have the same sgn whenever both are sgnfcantly large; but there are several tems for whch one method but not the other shows a sgnfcantly large bas n adjustment. The pat- Table 2: Wave 1 SIPP 96 tem totals and Wave 12 vs. 1 bas estmators, gven n thousands. Standard errors were used to form Studentzed devates Dev C, Dev L. Cellbased adjustment usng Wave 12 responses s ndcated C, Logstc model-based adjustment L. Item Wav1 BasC BasL Dev C Dev L Foodst 27268-1179 261-3.46 0.63 AFDC 14030-1452 -459-4.98-1.38 Mdcd 28173-397 1169-1.34 3.26 SocSec 37087 4142 3844 16.38 13.53 Hens 194591 3792 2527 8.30 5.03 Pov 41796-1528 245-4.06 3.14 Emp 191201-1449 -2242-5.08-6.14 UnEmp 6406-744 -811-5.66-6.28 NILF 66647 2193 3058 7.93 8.35 MAR 114367 5287 2551 13.85 6.61 DIV 18463-381 -689-1.82-3.48 tern s not easy to nterpret, because t evdently depends strongly on the exact choce of the adjustment-cell and logstc regresson models used. We can see ths most clearly n the tny logstc-model bas for Pov n Table 1, snce Poverty was explctly chosen as a predctor varable, and the logstc model wth an ntercept ensures that the regressor-weghted summed devatons between response ndcators and ther model predctors are 0 over the whole populaton. We should menton that the adjusted estmated totals S r y F /(ˆp π of Wave 1 tems for Wave 4 and Wave 12 responders n Tables 1 and 2 dffer slghtly from the totals reported n these Proceedngs by Baley (2006. The dfferences are due to modfcatons of π made n Baley (2006 to reflect second stage adjustments,.e., the rakng of summed weghts over certan subdomans to demographc (updated census totals and trmmng of resultant weghts. Such second stage adjustments are n fact made n SIPP producton estmates of cross-sectonal populaton attrbutes n later waves. 1/2 We calculated the EHG estmators ˆV EHG of standard errors of for Bas estmators (3, correspondng to all of 1/2 the (Fay-method BRR standard errors ˆV F ay used n Tables 1 and 2. The results, whch we dsplay n Table 3 for Wave 4 and n Table 4 for Wave 12, were very nterestng. Wthn Table 3, the Adjustment-Cell varances (EHG.C, Fay.C are generally close, mostly wthn one or two percent, except for a 7% dscrepancy n Pov and 19% n UnEmp. However, the dfferences were remarkably greater between the Fay and EHG standard errors of the bas n the logstc-regresson-based adjustments, never less than 15-20%, and the ratos n the Hens and 3718
Table 3: Standard errors calculated for Wave 4 versus Wave 1 tem adjustment bases, respectvely va the lnearzed EHG versus Fay-BRR methods (EHG vs. Fay prefx, for the Cell-based versus Logstc regresson adjustment methods (C vs. L suffx. All standard errors gven n thousands. Item EHG.C Fay.C EHG.L Fay.L Foodst 160.6 167.6 260.3 203.0 AFDC 151.1 155.3 190.0 165.9 Mdcd 138.8 135.6 210.5 160.4 SocSec 127.0 124.3 162.2 143.5 Hens 227.0 227.5 574.8 264.5 Pov 192.7 179.5 300.7 224.2 Emp 139.7 131.9 567.9 171.4 UnEmp 69.5 56.0 75.2 58.5 NILF 126.3 124.7 230.6 166.9 MAR 191.7 194.1 356.6 188.7 DIV 101.7 101.7 116.0 96.6 Table 4: Standard errors calculated for Wave 12 versus Wave 1 tem adjustment bases, respectvely va the lnearzed EHG versus Fay-BRR methods (EHG vs. Fay prefx, for the Cell-based versus Logstc regresson adjustment methods (C vs. L suffx. All standard errors gven n thousands. Item EHG.C Fay.C EHG.L Fay.L Foodst 324.6 341.2 566.4 459.1 AFDC 303.0 291.5 408.1 351.6 Mdcd 300.6 295.7 517.7 411.8 SocSec 266.2 252.8 370.6 284.3 Hens 469.5 456.3 142.7 584.3 Pov 384.2 376.6 653.4 472.2 Emp 283.2 285.4 1446.5 378.6 UnEmp 138.2 131.4 147.1 130.8 NILF 268.5 276.6 571.0 375.0 MAR 383.8 418.0 901.9 391.0 DIV 200.1 209.3 222.4 197.4 Emp cases were greater than 2 and 3 respectvely. Overall, the Fay-method standard errors (Fay.L are systematcally smaller than those (EHG.L estmated from lnearzed attrbutes z L by the Ernst, Huggns, and Grll (1986 formula (10. The pattern of dscrepancy between the two standard error estmators s very smlar n Table 4, nvolvng Wave 12 adjustment-bases, to that n Table 3. 5. Conclusons and Further Research Drectons Our results suggest that the qualty of longtudnal nonresponse adjustment n SIPP 96 s partcularly problematc for late-wave adjustments, wth adjustments to Hens and UnEmp partcularly based n Wave 4, and Hens, SocSec, UnEmp, NILF, and Mar especally based n Wave 12. However the assessed magntudes of bas are hghly dependent on the specfc adjustment method used. Indvdual tem bases can lkely be made small, lke that for Poverty n the logstc method n Table 1, when consdered only for the whole populaton. Ths artfcal effect of model-choce dsappears when bas summares are made over selected subdomans. For ths reason, we propose n future work to devse combned or composte bas measures over subdomans and survey response varables, and to search for effectve modelbased adjustment methods by studyng the behavor of such bas measures on actual SIPP data. As part of our development of varance estmators for ths bas evaluaton study, we compared varance estmators based on a formula of Ernst, Huggns, and Grll (1986 usng lnearzed estmators, wth the BRR method of Fay (1989 whch can be used drectly on estmators whch are not lnear combnatons of Horvtz-Thompson weghted totals. Snce asymptotc statstcal theory based on lnearzaton s ultmately the mathematcal justfcaton for both methods (Woodruff and Causey 1976, Krewsk and Rao 1981, t s very nterestng and slghtly dsturbng to fnd that the results from the two methods dsagree meanngfully for many of the SIPP 96 survey varables and dsagree drastcally for a few of them. We propose to study further the reasons for these dfferences. Acknowledgments We gratefully acknowledge the help of Mahd Sundukch and John Boes of DSMD and Jule Tsay of SRD n fle acquston. Rck Vallant made some helpful comments about an earler wrteup on SIPP varance estmaton. Most helpful of all, Red Rottach, formerly of DSMD, gave the frst author several extremely clear and patent explanatons of hs 2004 memo and ts relaton to the VPLX data-structure n the SIPP 96 context. References Baley, L. (2004 Weghtng alternatves to compensate for longtudnal nonresponse n the Survey of Income and Program Partcpaton, Census Bureau nternal report, Nov. 16, 2004. Baley, L. (2006, Assessng the effectveness of weghtng cell adjustments for longtudnal nonresponse, ASA Proc. Survey Res. Meth., Seattle, WA. 3719
Ernst, L., Huggns, V., and Grll, D. (1986, Two new varance estmaton technques, ASA Proc. Survey Res. Meth., Washngton DC, pp. 400-405. Fay, R. (1989, Theory and applcaton of replcate weghtng for varance calculatons, ASA Proc. Survey Res. Meth., Washngton DC, pp. 212-217. Krewsk, D. and Rao, J.N.K. (1981, Inference from stratfed samples: propertes of the lnearzaton, jackknfe and balanced repeated replcaton methods, Ann. Statst. 9, 1010-1019. Rottach, R. (2004, Internal-use replcate weghtng system for SIPP (VAR-1, Census Bureau Memorandum, DSMD, March 22, 2004. Sae-Ung, S., Hall, D., and Sssell, C. (2004, Evaluaton of varance estmaton methods based on a SPD longtudnal fle, ASA Proc. Survey Res. Meth., Toronto, Canada, pp. 4296-4303. Särndal, C.-E., Swensson, B. and Wretman, J. (1992 Model Asssted Survey Samplng, Sprnger-Verlag, New York. Slud, E. (2006, Varance estmaton and approxmaton n the settng of SIPP 96, Census Bureau Report, Statst. Res. Dv. Survey of Income and Program Partcpaton User s Gude (Supplement to the Techncal Documentaton (2001, Thrd Edton, Washngton DC. Tupek, A. (2002, SIPP 96: specfcatons for the longtudnal weghtng of sample people, Internal Census Bureau memorandum. Woodruff, R. and Causey, B. (1976, Computerzed method for approxmatng the varance of a complcated estmate, Jour. Amer. Statst. Assoc. 71, 315-321. 3720