Misspecification Effects in the Analysis of Longitudinal Survey Data



Similar documents
Hypothesis testing using complex survey data

n Using the formula we get a confidence interval of 80±1.64

MATHEMATICS SYLLABUS SECONDARY 7th YEAR

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

THE RISK ANALYSIS FOR INVESTMENTS PROJECTS DECISION

Summation Notation The sum of the first n terms of a sequence is represented by the summation notation i the index of summation

PREMIUMS CALCULATION FOR LIFE INSURANCE

Chapter System of Equations

A. Description: A simple queueing system is shown in Fig Customers arrive randomly at an average rate of

Present and future value formulae for uneven cash flow Based on performance of a Business

1 Correlation and Regression Analysis

Derivatives and Rates of Change

Transformer Maintenance Policies Selection Based on an Improved Fuzzy Analytic Hierarchy Process

MATHEMATICS FOR ENGINEERING BASIC ALGEBRA

Modified Line Search Method for Global Optimization

Application: Volume. 6.1 Overture. Cylinders

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

CHAPTER-10 WAVEFUNCTIONS, OBSERVABLES and OPERATORS

A STRATIFIED SAMPLING PLAN FOR BILLING ACCURACY IN HEALTHCARE SYSTEMS

Applying Fuzzy Analytic Hierarchy Process to Evaluate and Select Product of Notebook Computers

Hypothesis testing. Null and alternative hypotheses


Section 11.3: The Integral Test

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Confidence Intervals for One Mean

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY

Decomposition of Gini and the generalized entropy inequality measures. Abstract

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Measures of Spread and Boxplots Discrete Math, Section 9.4

Authorized licensed use limited to: University of Illinois. Downloaded on July 27,2010 at 06:52:39 UTC from IEEE Xplore. Restrictions apply.

I. Chi-squared Distributions

Fast Circuit Simulation Based on Parallel-Distributed LIM using Cloud Computing System

STUDENTS PARTICIPATION IN ONLINE LEARNING IN BUSINESS COURSES AT UNIVERSITAS TERBUKA, INDONESIA. Maya Maria, Universitas Terbuka, Indonesia

Is there employment discrimination against the disabled? Melanie K Jones i. University of Wales, Swansea

Repeated multiplication is represented using exponential notation, for example:

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Helicopter Theme and Variations

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

The Program and Evaluation of Internet of Things Used in Manufacturing Industry Hongyun Hu, Cong Yang. Intelligent procurement.

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

INVESTIGATION OF PARAMETERS OF ACCUMULATOR TRANSMISSION OF SELF- MOVING MACHINE

Department of Computer Science, University of Otago

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Groundwater Management Tools: Analytical Procedure and Case Studies. MAF Technical Paper No: 2003/06. Prepared for MAF Policy by Vince Bidwell

Economics Letters 65 (1999) macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999

Research of PD on-line Monitoring System for DC Cable

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one.

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

ANALYTICAL REPORT ON THE 2010 URBAN EMPLOYMENT UNEMPLOYMENT SURVEY

CHAPTER 3 THE TIME VALUE OF MONEY

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Properties of MLE: consistency, asymptotic normality. Fisher information.

COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE. Skandza, Stockholm ABSTRACT

CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION

I apply to subscribe for a Stocks & Shares NISA for the tax year 2015/2016 and each subsequent year until further notice.

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Volatility of rates of return on the example of wheat futures. Sławomir Juszczyk. Rafał Balina

Name: Period GL SSS~ Dates, assignments, and quizzes subject to change without advance notice. Monday Tuesday Block Day Friday

THE ROLE OF EXPORTS IN ECONOMIC GROWTH WITH REFERENCE TO ETHIOPIAN COUNTRY

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

On Formula to Compute Primes. and the n th Prime

Designing Incentives for Online Question and Answer Forums

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

MANUFACTURER-RETAILER CONTRACTING UNDER AN UNKNOWN DEMAND DISTRIBUTION

2.23 Gambling Rehabilitation Services. Introduction

THE GEOMETRY OF PYRAMIDS

How To Ensure That An Eac Edge Program Is Successful

5.3. Generalized Permutations and Combinations

Knowledge and Time Management for Manufacturing to Enhance CRM

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Outline. Numerical Analysis Boundary Value Problems & PDE. Exam. Boundary Value Problems. Boundary Value Problems. Solution to BVProblems

MATHEMATICAL INDUCTION

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Geometric Stratification of Accounting Data

Evaluating Model for B2C E- commerce Enterprise Development Based on DEA

Design of Hybrid Neural Network Model for Quality Evaluation of Object Oriented Software Modules

PSYCHOLOGICAL STATISTICS

Graphs on Logarithmic and Semilogarithmic Paper

m n Use technology to discover the rules for forms such as a a, various integer values of m and n and a fixed integer value a.

Pre-Suit Collection Strategies

Treatment Spring Late Summer Fall Mean = 1.33 Mean = 4.88 Mean = 3.

A guide to School Employees' Well-Being

Transcription:

Misspecifictio Effects i te Alysis of Logitudil Survey Dt Mrcel de Toledo Vieir Deprtmeto de Esttístic, Uiversidde Federl de Juiz de For, Brsil mrcel.vieir@ufjf.edu.br M. Fátim Slgueiro ISCTE Busiess Scool d UNIDE, Lisbo Uiversity Istitute, Portugl ftim.slgueiro@iscte.pt Peter W. F. Smit S3RI d Uiversity of Soutmpto, Uited Kigdom p.w.smit@soto.c.u Abstrct Misspecifictio effects (s) mesure te ifltio of te smplig vrice of estimtor s result of te use of complex smplig scemes. My logitudil socil survey desigs employ multi-stge smplig, ledig to some clusterig of te smple d to s greter t oe. For model for pel dt we cosider metods for estimtig prmeters wic llow for complex scemes. A empiricl study usig logitudil dt from te Britis Houseold Pel Survey is coducted, d ultio study is performed. Keywords: prmetric models; logitudil dt; smplig impcts. 1

1 Itroductio Stdrd iferetil metods re ofte ot vlid we lysig dt obtied usig complex smplig sceme. Te iterest i fittig models to logitudil complex survey dt s bee growig i te lst decde. Sier d Vieir (007) preseted evidece tt te vrice-ifltig impcts of clusterig my be iger for logitudil lyses t for te correspodig cross-sectiol lyses. We furter ivestigte te impct of weigtig, strtifictio d clusterig i te regressio lysis of logitudil survey dt, comprig it wit te impct o cross-sectiol lyses. I Sectio we itroduce te logitudil survey dt uder lysis. Sectio 3 presets te model, poit d vrice estimtio procedures, d describes mesures of misspecifictio effects (s). Te motivtig pplictio d empiricl results re preseted i Sectio 4 d ultio study is performed i Sectio 5. Sectio 6 cotis discussio. Dt d Smplig Desig Te empiricl evidece preseted i tis pper is bsed o dt from te Britis Houseold Pel Survey (BHPS), ouseold pel survey of idividuls i privte domiciles i Gret Briti. Te BHPS follows logitudilly smple of idividuls selected i 1991 by complex strtified two-stge smplig sceme, wit clusterig by re. Our lyses re bsed o subsmple of 55 me d wome ged 16 or more, wo were origil smple members, wo gve full iterview i wves twelve to fiftee, d wo were employed trougout te period. Te followig vribles re

cosidered: geder; ge ctegory; umber of cildre i te ouseold; qulifictio; socil clss; mritl sttus; elt sttus; ours ormlly wored per wee; d logritm of te ouseold icome. I our smple, te reltive frequecy for bot geder ctegories is pproximtely 50%. Te distributio of te ge ctegory vrible is egtively sewed, s te frequecies for te older ctegories re lrger. Most of te respodets re eiter mrried or livig s couple i 00. Approximtely 80% of te respodets cosidered temselves i eiter good or excellet elt coditio. Furtermore, over 75% of te idividuls wored t lest 30 ours per wee. About 55% of te idividuls d ig level of eductio, d oly 16.3% of tem occupied prtly silled or usilled positio i teir lst job. Almost 6% of te respodets d o cildre i te ouseold were tey live. Moreover, te verge ouseold icome of te smple members ws pproximtely GBP 3365 i te mot before te iterview ws mde. 3 Model, Estimtio Procedures d Meffs Regressio models ve foud wide rge of useful pplictios wit logitudil survey dt (e.g., Diggle et l. 00; Vieir d Sier, 008; Vieir, 009). Let y it deote te respose of iterest for idividul i t time t. Let yi = ( yi 1,..., yit )' be te vector of repeted mesures. We cosider lier models of te followig form to represet te expecttio of y i give te vlues of covrites: E( y ) = x β, (1) i i 3

were xi = ( xi 1 ',..., xit ') ', x it is 1 q vector of specified vlues of covrites for wom i t wve t, β is te q 1 vector of regressio coefficiets, d te expecttio is wit respect to te model. Followig te pseudo-lieliood pproc (Sier, 1989; Sier d Vieir, 007), te most geerl estimtor of β we cosider is ( ) 1 ˆ β = w x V' x w x ' V y, () i s 1 1 i i i i i i i s were w is logitudil survey weigt, V is T T estimted worig vrice i mtrix of y i (Diggle et l., 00), te s te excgeble vrice mtrix wit digol elemets σˆ d off-digol elemets ρˆ σˆ. Furter discussio o te estimtio of β d ρ is preseted i Sier d Vieir (007). Uder (1), ˆβ is pproximtely ubised wit respect to te model d te survey desig d my still be expected to combie bot witi d betwee idividul iformtio i resobly efficiet mer, eve if te worig model for te error structure does ot old exctly (Sier d Vieir, 007). Witout te weigt terms d survey smplig cosidertios, te form of ˆβ, give by (), is motivted by te geerlized estimtig equtios (GEE) pproc of Lig d Zeger (1986), wic we deote by βˆ. 4

Te followig estimtor of te covrice mtrix of ˆβ llows for strtified multistge smplig sceme d it is bsed upo te clssicl metod of lieriztio (Sier, 1989; Sier d Vieir, 007) 1 1 1 1 i i i i i i i s i s ( ˆ v β ) = w x ' V x /( 1) ( z z )( z z )' w x ' V x were deotes strtum, deotes primry smplig uit (PSU), is te umber of 1 PSUs i strtum, z = w x ' V e, z = z / d e = y x ˆ β. If te weigts, i i i i i i i te smplig sceme d te differece betwee /( 1) d 1 re igored, tis estimtor reduces to te robust vrice estimtor preseted by Lig d Zeger (1986). We cosider tree furter ltertives for estimtig te covrice mtrix of ˆβ : (i) v ( βˆ ), wic cosiders =1 d terefore igores strtifictio; (ii) ( βˆ ) v, wic cosiders =1 d terefore igores clusterig; d (iii) v ( βˆ ), wic cosiders =1 d =1 d terefore igores bot strtifictio d clusterig. We lso perform vrice estimtio for βˆ. We re cocered wit te potetil bis of v ( βˆ ), v ( βˆ ), d ( βˆ ) v, we i fct te desig is complex. Sier (1989) s proposed te misspecifictio effect (), wic is desiged to mesure te effects of icorrect specifictio of bot te smplig sceme d te cosidered model. 5

Te effect of te complex smplig sceme o v ( βˆ ) d ( βˆ ) v c be evluted if we exmie te s distributio. We cosider [ βˆ,v ( βˆ )] v( βˆ )/ v ( βˆ ) [ βˆ,v ( βˆ )] = v( βˆ )/ v ( βˆ ); d [ βˆ,v ( βˆ )] v( βˆ )/ v ( βˆ ) = ; =, were ˆ β deote te t elemet of ˆβ. Te,, d mesure te impct of strtifictio, clusterig, d bot strtifictio d clusterig, respectively. We lso clculte ll te cosidered versios of te mesure for g ( βˆ )/ v ( βˆ ) βˆ. Furtermore, = v is clculted i order to ccess te bis cused by igorig ll te smplig sceme fetures. 4 Applictio Te pper is motivted by regressio lysis of four wves of BHPS dt, wic cosiders logritm of te ouseold icome s te depedet vrible. We first estimte s for te lieriztio estimtor, cosiderig ˆβ, s discussed i Sectio 3. Usig dt from just te first wve d settig x i = 1, te estimted for tis cross-sectiol me is give i Tble 1 s bout 1.3. I order to evlute te impct of te logitudil spect of te dt, we estimted series of ec type of te s discussed bove, usig dt for wves 1 to 15. 6

TABLE 1. Meff estimtes for logitudil mes Meff [ βˆ,v ( βˆ )] Wves 1 1 d 13 1 to 14 1 to 15 0.971 0.965 0.965 0.963 [ βˆ,v ( βˆ )] 1.490 1.653 1.699 1.695 [ βˆ,v ( βˆ )] 1.8 1.431 1.474 1.458 [ βˆ,v ( βˆ )] 0.969 0.963 0.961 0.960 [ βˆ,v ( βˆ )] 1.57 1.795 1.830 1.870 [ ( )] βˆ,v βˆ 1.343 1.504 1.575 1.653 g 1.494 1.598 1.778 1.706 Altoug tese estimted s re subject to smplig error, tere is tedecy for,, d to icrese wit te umber of wves. It terefore seems tt it g becomes more importt to llow for clusterig d for te complex smplig desig i geerl we te umber of wves i te lysis icreses. Furtermore, strtifictio effects pper to be costt wit icreses i te umber of wves. We we icluded eductiol level s covrite, we lso oticed some evidece for,, d g to icrese wit te umber of wves. Te model s bee furter elborted by ddig time, geder, ge ctegory, mritl sttus, umber of cildre i te ouseold, socil clss, elt sttus, d umbers of ours ormlly wored s covrites. Oce more, we observed some evidece of 7

tedecy for tose s to diverge from oe s te umber of wves icreses, t lest for te coefficiets of some of te covrites. We lso cofirmed te observtio of Sier d Vieir (007) tt s for regressio coefficiets ted ot to be greter t s for te mes of te depedet vrible. 5 Simultio Study As results reported i Sectio 4 re subject to smplig error we ve coducted ultio study to evlute te beviour of te mesures. Ec of te d =1,, D replicte smples is bsed o te BHPS dt subset described bove wic is cosidered s te trget popultio. We evluted te properties of vrice estimtors for uweigted poit estimtors d ssessed oly differet impcts of clusterig. We studied te we te umber of wves i te lysis is icresed. Note tt we did ot ssess te impct of eiter strtifictio or uequl probbility smplig. Let y it be te vlue for te study vrible for uit i = 1,, K, i PSU, d = 1, K,,m d t wve t of te survey, were d d m d re te smple size d te umber of PSUs for te replicte smple d. For geertig te vlues of y it for te ultio study, we used te followig uiform correltio model, wic llows for te impct of clusterig: y = x β + η + u + v, ( 3 ) it it i it 8

wit η ~ N(, σ ), ~ N (, σ ) 0 η u, d ( ) i 0 u v it ~ N 0, σ v. We cosider te logritm of te ouseold icome s te depedet vrible d te remiig vribles listed i Sectio s covrites. We ve eld te vlues of te covrites s fixed. Te dopted te vlues for β, σ η, σ u, d σ v ve bee obtied by mximum lieliood estimtio cosiderig te trget popultio. I prticulr, we ve cosidered differet relistic coices for σ η, σ η = 0. 06 (ctul vlue estimted from fittig ( 3 )), σ η = 0. 1, d σ η = 0. 18 to eble te evlutio of effects of differet impcts of clusterig o te cosidered vrice estimtio procedures. Let 1 D ( d ) Ê ( mêff ) = mêff, D d =1 be te me of our prmeter of iterest estimted over repeted ultio, 1 vr ( mêff ) =, D -1 D ( d ) [ mêff - Ê( mêff )] d =1 be ultio estimtor of VAR( m êff ), te popultio vrice of te misspecifictio effect mesure, d se [ Ê( mêff )] = vr( mêff )/ D te ultio stdrd error of Ê ( mêff ). 9

For te models tt ve bee fitted to ec geerted replicte smple, we ve set x i = 1 d terefore we ve still studied oly te bevior of te for logitudil mes. Let be te smple size for PSU i te trget popultio d d be te smple size for PSU i te replicte smple d. Tble presets results for tree scerios: (i) ( m = 00, d =, d σ = 0. 35); (ii) ( m = 00, d =, d σ = 0. 70); d (iii) ( m = 00, d =, d d σ =1. 35 ). Note tt m = 34 i te trget popultio. d d TABLE. Ê ( mêff ) d se [ Ê( mêff ) ] (i brcets), for tree scerios. * j σ η Wves 1 1 d 13 1 to 14 1 to 15 0.06 1.1901 (0.0044) 1.077 (0.0046) 1.115 (0.0047) 1.143 (0.0047) j 0.1 1.766 (0.0054) 1.3014 (0.0057) 1.3106 (0.0058) 1.3157 (0.0058) 0.18 1.364 (0.0066) 1.3933 (0.0069) 1.4061 (0.0070) 1.4118 (0.0070) D=1000 Te ultio results lso give evidece tt tere is tedecy for te to icrese s te umber of wves i te lysis icreses, t lest for logitudil mes. Tis tedecy seems to be stroger for lrger clusterig impcts. Meff s icrese we te clusterig impcts re icresed, s expected from te survey smplig literture 10

(Vieir, 009). Simultio stdrd errors of Ê ( mêff ) pper to icrese we umber of wves d clusterig impcts re icresed. 6 Discussio We ve preseted evidece tt clusterig impcts my be stroger for logitudil studies t for cross-sectiol studies, d tt s for te regressio coefficiets my icrese wit te umber of wves cosidered i te lysis. Te mi implictio of tese fidigs is tt stdrd errors i lysis of logitudil survey dt my be misledig if te iitil smple ws clustered d if tis clusterig is igored. We ve lso observed tt s for regressio coefficiets ted ot to be greter t s for te mes of te depedet vrible. Acowledgmets: Te reserc of te first utor ws supported by te Fudção de Ampro à Pesquis do Estdo de Mis Geris (FAPEMIG) grt CEX-APQ-00467-008. Te reserc of te secod utor ws supported by te Fudção pr Ciêci e Tecologi grt PTDC/GES/7784/006. Refereces Diggle, P.J., Hegerty, P., Lig, K. d Zeger, S.L. (00). Alysis of Logitudil Dt. d Ed. Oxford: Oxford Uiversity Press. Lig, K. d Zeger, S. L. (1986) Logitudil Dt Alysis Usig Geerlized Lier Models. Biometri, 73: (1) 13-. 11

Slgueiro, M. F. R. F., Smit, P. W. F. e Vieir, M. D. T. (010) A Multi-Process Secod-Order Ltet Growt Curve Model for Subjective Well-Beig. Submmitted to Multivrite Beviorl Reserc. Sier, C.J. (1989) Domi mes, regressio d multivrite lysis. I Sier, C. J., Holt, D. d Smit, T. M. F. eds. Alysis of Complex Surveys. Cicester: Wiley, pp. 59-87. Sier, C.J. d Holmes, D. (003). Rdom Effects Models for Logitudil Survey Dt. Alysis of Survey Dt, R.L. Cmbers d C.J. Sier (eds). Cicester: Wiley. Sier, C. d Vieir, M. D. T. (007) Vrice estimtio i te lysis of clustered logitudil survey dt. Survey Metodology. 33: (1), 3-1. Vieir, M. D. T. (009). Alysis of Logitudil Survey Dt. 1. ed. Srbrüce: VDM Verlg Dr. Müller. Vieir, M. D. T. d Sier, C. J. (008) Estimtig Models for Pel Survey Dt uder Complex Smplig. Jourl of Officil Sttistics, 4, 343-364. 1