Misspecifictio Effects i te Alysis of Logitudil Survey Dt Mrcel de Toledo Vieir Deprtmeto de Esttístic, Uiversidde Federl de Juiz de For, Brsil mrcel.vieir@ufjf.edu.br M. Fátim Slgueiro ISCTE Busiess Scool d UNIDE, Lisbo Uiversity Istitute, Portugl ftim.slgueiro@iscte.pt Peter W. F. Smit S3RI d Uiversity of Soutmpto, Uited Kigdom p.w.smit@soto.c.u Abstrct Misspecifictio effects (s) mesure te ifltio of te smplig vrice of estimtor s result of te use of complex smplig scemes. My logitudil socil survey desigs employ multi-stge smplig, ledig to some clusterig of te smple d to s greter t oe. For model for pel dt we cosider metods for estimtig prmeters wic llow for complex scemes. A empiricl study usig logitudil dt from te Britis Houseold Pel Survey is coducted, d ultio study is performed. Keywords: prmetric models; logitudil dt; smplig impcts. 1
1 Itroductio Stdrd iferetil metods re ofte ot vlid we lysig dt obtied usig complex smplig sceme. Te iterest i fittig models to logitudil complex survey dt s bee growig i te lst decde. Sier d Vieir (007) preseted evidece tt te vrice-ifltig impcts of clusterig my be iger for logitudil lyses t for te correspodig cross-sectiol lyses. We furter ivestigte te impct of weigtig, strtifictio d clusterig i te regressio lysis of logitudil survey dt, comprig it wit te impct o cross-sectiol lyses. I Sectio we itroduce te logitudil survey dt uder lysis. Sectio 3 presets te model, poit d vrice estimtio procedures, d describes mesures of misspecifictio effects (s). Te motivtig pplictio d empiricl results re preseted i Sectio 4 d ultio study is performed i Sectio 5. Sectio 6 cotis discussio. Dt d Smplig Desig Te empiricl evidece preseted i tis pper is bsed o dt from te Britis Houseold Pel Survey (BHPS), ouseold pel survey of idividuls i privte domiciles i Gret Briti. Te BHPS follows logitudilly smple of idividuls selected i 1991 by complex strtified two-stge smplig sceme, wit clusterig by re. Our lyses re bsed o subsmple of 55 me d wome ged 16 or more, wo were origil smple members, wo gve full iterview i wves twelve to fiftee, d wo were employed trougout te period. Te followig vribles re
cosidered: geder; ge ctegory; umber of cildre i te ouseold; qulifictio; socil clss; mritl sttus; elt sttus; ours ormlly wored per wee; d logritm of te ouseold icome. I our smple, te reltive frequecy for bot geder ctegories is pproximtely 50%. Te distributio of te ge ctegory vrible is egtively sewed, s te frequecies for te older ctegories re lrger. Most of te respodets re eiter mrried or livig s couple i 00. Approximtely 80% of te respodets cosidered temselves i eiter good or excellet elt coditio. Furtermore, over 75% of te idividuls wored t lest 30 ours per wee. About 55% of te idividuls d ig level of eductio, d oly 16.3% of tem occupied prtly silled or usilled positio i teir lst job. Almost 6% of te respodets d o cildre i te ouseold were tey live. Moreover, te verge ouseold icome of te smple members ws pproximtely GBP 3365 i te mot before te iterview ws mde. 3 Model, Estimtio Procedures d Meffs Regressio models ve foud wide rge of useful pplictios wit logitudil survey dt (e.g., Diggle et l. 00; Vieir d Sier, 008; Vieir, 009). Let y it deote te respose of iterest for idividul i t time t. Let yi = ( yi 1,..., yit )' be te vector of repeted mesures. We cosider lier models of te followig form to represet te expecttio of y i give te vlues of covrites: E( y ) = x β, (1) i i 3
were xi = ( xi 1 ',..., xit ') ', x it is 1 q vector of specified vlues of covrites for wom i t wve t, β is te q 1 vector of regressio coefficiets, d te expecttio is wit respect to te model. Followig te pseudo-lieliood pproc (Sier, 1989; Sier d Vieir, 007), te most geerl estimtor of β we cosider is ( ) 1 ˆ β = w x V' x w x ' V y, () i s 1 1 i i i i i i i s were w is logitudil survey weigt, V is T T estimted worig vrice i mtrix of y i (Diggle et l., 00), te s te excgeble vrice mtrix wit digol elemets σˆ d off-digol elemets ρˆ σˆ. Furter discussio o te estimtio of β d ρ is preseted i Sier d Vieir (007). Uder (1), ˆβ is pproximtely ubised wit respect to te model d te survey desig d my still be expected to combie bot witi d betwee idividul iformtio i resobly efficiet mer, eve if te worig model for te error structure does ot old exctly (Sier d Vieir, 007). Witout te weigt terms d survey smplig cosidertios, te form of ˆβ, give by (), is motivted by te geerlized estimtig equtios (GEE) pproc of Lig d Zeger (1986), wic we deote by βˆ. 4
Te followig estimtor of te covrice mtrix of ˆβ llows for strtified multistge smplig sceme d it is bsed upo te clssicl metod of lieriztio (Sier, 1989; Sier d Vieir, 007) 1 1 1 1 i i i i i i i s i s ( ˆ v β ) = w x ' V x /( 1) ( z z )( z z )' w x ' V x were deotes strtum, deotes primry smplig uit (PSU), is te umber of 1 PSUs i strtum, z = w x ' V e, z = z / d e = y x ˆ β. If te weigts, i i i i i i i te smplig sceme d te differece betwee /( 1) d 1 re igored, tis estimtor reduces to te robust vrice estimtor preseted by Lig d Zeger (1986). We cosider tree furter ltertives for estimtig te covrice mtrix of ˆβ : (i) v ( βˆ ), wic cosiders =1 d terefore igores strtifictio; (ii) ( βˆ ) v, wic cosiders =1 d terefore igores clusterig; d (iii) v ( βˆ ), wic cosiders =1 d =1 d terefore igores bot strtifictio d clusterig. We lso perform vrice estimtio for βˆ. We re cocered wit te potetil bis of v ( βˆ ), v ( βˆ ), d ( βˆ ) v, we i fct te desig is complex. Sier (1989) s proposed te misspecifictio effect (), wic is desiged to mesure te effects of icorrect specifictio of bot te smplig sceme d te cosidered model. 5
Te effect of te complex smplig sceme o v ( βˆ ) d ( βˆ ) v c be evluted if we exmie te s distributio. We cosider [ βˆ,v ( βˆ )] v( βˆ )/ v ( βˆ ) [ βˆ,v ( βˆ )] = v( βˆ )/ v ( βˆ ); d [ βˆ,v ( βˆ )] v( βˆ )/ v ( βˆ ) = ; =, were ˆ β deote te t elemet of ˆβ. Te,, d mesure te impct of strtifictio, clusterig, d bot strtifictio d clusterig, respectively. We lso clculte ll te cosidered versios of te mesure for g ( βˆ )/ v ( βˆ ) βˆ. Furtermore, = v is clculted i order to ccess te bis cused by igorig ll te smplig sceme fetures. 4 Applictio Te pper is motivted by regressio lysis of four wves of BHPS dt, wic cosiders logritm of te ouseold icome s te depedet vrible. We first estimte s for te lieriztio estimtor, cosiderig ˆβ, s discussed i Sectio 3. Usig dt from just te first wve d settig x i = 1, te estimted for tis cross-sectiol me is give i Tble 1 s bout 1.3. I order to evlute te impct of te logitudil spect of te dt, we estimted series of ec type of te s discussed bove, usig dt for wves 1 to 15. 6
TABLE 1. Meff estimtes for logitudil mes Meff [ βˆ,v ( βˆ )] Wves 1 1 d 13 1 to 14 1 to 15 0.971 0.965 0.965 0.963 [ βˆ,v ( βˆ )] 1.490 1.653 1.699 1.695 [ βˆ,v ( βˆ )] 1.8 1.431 1.474 1.458 [ βˆ,v ( βˆ )] 0.969 0.963 0.961 0.960 [ βˆ,v ( βˆ )] 1.57 1.795 1.830 1.870 [ ( )] βˆ,v βˆ 1.343 1.504 1.575 1.653 g 1.494 1.598 1.778 1.706 Altoug tese estimted s re subject to smplig error, tere is tedecy for,, d to icrese wit te umber of wves. It terefore seems tt it g becomes more importt to llow for clusterig d for te complex smplig desig i geerl we te umber of wves i te lysis icreses. Furtermore, strtifictio effects pper to be costt wit icreses i te umber of wves. We we icluded eductiol level s covrite, we lso oticed some evidece for,, d g to icrese wit te umber of wves. Te model s bee furter elborted by ddig time, geder, ge ctegory, mritl sttus, umber of cildre i te ouseold, socil clss, elt sttus, d umbers of ours ormlly wored s covrites. Oce more, we observed some evidece of 7
tedecy for tose s to diverge from oe s te umber of wves icreses, t lest for te coefficiets of some of te covrites. We lso cofirmed te observtio of Sier d Vieir (007) tt s for regressio coefficiets ted ot to be greter t s for te mes of te depedet vrible. 5 Simultio Study As results reported i Sectio 4 re subject to smplig error we ve coducted ultio study to evlute te beviour of te mesures. Ec of te d =1,, D replicte smples is bsed o te BHPS dt subset described bove wic is cosidered s te trget popultio. We evluted te properties of vrice estimtors for uweigted poit estimtors d ssessed oly differet impcts of clusterig. We studied te we te umber of wves i te lysis is icresed. Note tt we did ot ssess te impct of eiter strtifictio or uequl probbility smplig. Let y it be te vlue for te study vrible for uit i = 1,, K, i PSU, d = 1, K,,m d t wve t of te survey, were d d m d re te smple size d te umber of PSUs for te replicte smple d. For geertig te vlues of y it for te ultio study, we used te followig uiform correltio model, wic llows for te impct of clusterig: y = x β + η + u + v, ( 3 ) it it i it 8
wit η ~ N(, σ ), ~ N (, σ ) 0 η u, d ( ) i 0 u v it ~ N 0, σ v. We cosider te logritm of te ouseold icome s te depedet vrible d te remiig vribles listed i Sectio s covrites. We ve eld te vlues of te covrites s fixed. Te dopted te vlues for β, σ η, σ u, d σ v ve bee obtied by mximum lieliood estimtio cosiderig te trget popultio. I prticulr, we ve cosidered differet relistic coices for σ η, σ η = 0. 06 (ctul vlue estimted from fittig ( 3 )), σ η = 0. 1, d σ η = 0. 18 to eble te evlutio of effects of differet impcts of clusterig o te cosidered vrice estimtio procedures. Let 1 D ( d ) Ê ( mêff ) = mêff, D d =1 be te me of our prmeter of iterest estimted over repeted ultio, 1 vr ( mêff ) =, D -1 D ( d ) [ mêff - Ê( mêff )] d =1 be ultio estimtor of VAR( m êff ), te popultio vrice of te misspecifictio effect mesure, d se [ Ê( mêff )] = vr( mêff )/ D te ultio stdrd error of Ê ( mêff ). 9
For te models tt ve bee fitted to ec geerted replicte smple, we ve set x i = 1 d terefore we ve still studied oly te bevior of te for logitudil mes. Let be te smple size for PSU i te trget popultio d d be te smple size for PSU i te replicte smple d. Tble presets results for tree scerios: (i) ( m = 00, d =, d σ = 0. 35); (ii) ( m = 00, d =, d σ = 0. 70); d (iii) ( m = 00, d =, d d σ =1. 35 ). Note tt m = 34 i te trget popultio. d d TABLE. Ê ( mêff ) d se [ Ê( mêff ) ] (i brcets), for tree scerios. * j σ η Wves 1 1 d 13 1 to 14 1 to 15 0.06 1.1901 (0.0044) 1.077 (0.0046) 1.115 (0.0047) 1.143 (0.0047) j 0.1 1.766 (0.0054) 1.3014 (0.0057) 1.3106 (0.0058) 1.3157 (0.0058) 0.18 1.364 (0.0066) 1.3933 (0.0069) 1.4061 (0.0070) 1.4118 (0.0070) D=1000 Te ultio results lso give evidece tt tere is tedecy for te to icrese s te umber of wves i te lysis icreses, t lest for logitudil mes. Tis tedecy seems to be stroger for lrger clusterig impcts. Meff s icrese we te clusterig impcts re icresed, s expected from te survey smplig literture 10
(Vieir, 009). Simultio stdrd errors of Ê ( mêff ) pper to icrese we umber of wves d clusterig impcts re icresed. 6 Discussio We ve preseted evidece tt clusterig impcts my be stroger for logitudil studies t for cross-sectiol studies, d tt s for te regressio coefficiets my icrese wit te umber of wves cosidered i te lysis. Te mi implictio of tese fidigs is tt stdrd errors i lysis of logitudil survey dt my be misledig if te iitil smple ws clustered d if tis clusterig is igored. We ve lso observed tt s for regressio coefficiets ted ot to be greter t s for te mes of te depedet vrible. Acowledgmets: Te reserc of te first utor ws supported by te Fudção de Ampro à Pesquis do Estdo de Mis Geris (FAPEMIG) grt CEX-APQ-00467-008. Te reserc of te secod utor ws supported by te Fudção pr Ciêci e Tecologi grt PTDC/GES/7784/006. Refereces Diggle, P.J., Hegerty, P., Lig, K. d Zeger, S.L. (00). Alysis of Logitudil Dt. d Ed. Oxford: Oxford Uiversity Press. Lig, K. d Zeger, S. L. (1986) Logitudil Dt Alysis Usig Geerlized Lier Models. Biometri, 73: (1) 13-. 11
Slgueiro, M. F. R. F., Smit, P. W. F. e Vieir, M. D. T. (010) A Multi-Process Secod-Order Ltet Growt Curve Model for Subjective Well-Beig. Submmitted to Multivrite Beviorl Reserc. Sier, C.J. (1989) Domi mes, regressio d multivrite lysis. I Sier, C. J., Holt, D. d Smit, T. M. F. eds. Alysis of Complex Surveys. Cicester: Wiley, pp. 59-87. Sier, C.J. d Holmes, D. (003). Rdom Effects Models for Logitudil Survey Dt. Alysis of Survey Dt, R.L. Cmbers d C.J. Sier (eds). Cicester: Wiley. Sier, C. d Vieir, M. D. T. (007) Vrice estimtio i te lysis of clustered logitudil survey dt. Survey Metodology. 33: (1), 3-1. Vieir, M. D. T. (009). Alysis of Logitudil Survey Dt. 1. ed. Srbrüce: VDM Verlg Dr. Müller. Vieir, M. D. T. d Sier, C. J. (008) Estimtig Models for Pel Survey Dt uder Complex Smplig. Jourl of Officil Sttistics, 4, 343-364. 1