Multilevel Analysis and Complex Surveys. Alan Hubbard UC Berkeley - Division of Biostatistics

Transcription

1 Multilevel Analysis and Complex Surveys Alan Hubbard UC Berkeley - Division of Biostatistics 1

2 Outline Multilevel data analysis Estimating specific parameters of the datagenerating distribution (GEE) Estimating the whole (latent variable) distribution (Multilevel mixed models and MLE). Complex Survey (Estimation and Inference) Estimating Multilevel mixed models with complex survey data 2

3 Schedule Beginning Time Ending Time Topic 8:00 9:15 Introduction/Overview, GEE 9:15 9:45 GEE Exer 9:45 11:30 Multilievel Models 11:30 MLM Exercise 1:00 2:00 Complex Survey 2:00 2:30 Survey Exer 2:30 3:30 Combined 3:30 4:00 Combined Exer 4:15 5:00 Causality Issues (Michael Oakes, Ecological Effects, etc). 3

4 Multilevel Analysis and Complex Surveys Part 1: Parameters and inference from mixed models (MLE) and estimating equation (GEE) approaches Alan Hubbard UC Berkeley - Division of Biostatistics 4

5 Models For Multilevel Data References Analysis of Longitudinal Data by Diggle, Liang and Zeger. Applied Longitudinal Data, by Fizmaurice, Laird and Ware. Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models by Skrondal, A. and Rabe-Hesketh, S. To gee or not to gee: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health (with commentary and reply). Epidemiology, 21: (2010). 5

6 Generalized Estimation Equation (GEE) Approach to Clustered Data Alan Hubbard UC Berkeley - Division of Biostatistics 6

7 Clustered Data Regressions Ignore Clustering ordinary regressions, assuming that outcomes conditionally independent. Multilevel (Mixed Effects) Models explicit model of sources of random variability at cluster level, E(Y ijk X ijk,α i,α ij ), α i ~N(0,σ 2 α ),. Generalized Estimating Equation (GEE) approach only specify relative simple parameters (e.g., E(Y ijk X ijk )). 7

8 Issues with Clustered Data (Estimation) Covariates of Interest and identifiability: higher level (ecological) vs. individual level covariates. Targeting contributions of both. Defining effect of interest (e.g., direct effect of ecological covariates apart from individual level covariates). Causal inference challenges with clustered data (can one ever measure impact of composite variables vs. contextual variables?). Much work on mechanical implementation, less on what are the appropriate parameters of interest and necessary (but sometimes dubious) identifiability assumptions (Oakes). 8

9 Issues with Clustered Data (Correlation) Dealing with correlated data: general repeated measures issues. Model based inference (inference based on proposed data-generating distribution) Empirical inference use form of estimating equation to get simple robust empirical variance sampling distribution: ˆ θ = θ + 1 n n IC(O i ; θ,γ ) + op 1 n,var( θ ˆ ) var(ic) n i=1 9

10 Example: observations within subjects: The Effect of Drug and Alcohol Use on Teenage Sexual Activity Minnis & Padian (2001) conducted a longitudinal study of teenagers in San Rafael, California to investigate the association between drug and alcohol use and sexual activity on the same day. Participants were asked to keep track of their activities over approximately one month and binary indicator variables were created to show whether drug/alcohol use and/or sexual activity were reported for each 24 hour period. 10

11 Example of Binary Outcome: Sex, Drugs and Teenagers A longitudinal study of the effects of drug-use on sexual activity. Let X ij, the only explanatory variable of interest for now, indicate whether or not subject i reported drug-use (1=yes, 0=no) on day j. Let Y ij denote whether subject had sex (1=yes, 0=no), i.e., Y ij is a binary outcome and thus its expectation can be modeled via the logit transform. 11

12 Data eid today drgalcoh sx24hrs Jun 98 yes no Jun 98 no no Jun 98 no no Jun 98 yes no Jun 98 no no Jun 98 no no Jun 98 no no Jun 98 no no Jun 98 yes no Jun 98 no no Jun 98 no no Jun 98 no yes Jun 98 no no Jun 98 no no Jun 98 no no Jun 98 no no Jun 98 no yes Jun 98 no no Jun 98 no yes Jul 98 no yes Jul 98 no no Jul 98 no no Jul 98 no no Jul 98 no no Jun 98 no no Jun 98 no no Jun 98 no no 12

13 Sexual Activity and drug/alcohol use among teenagers revisted Main Variables sex24hrs - sex in last 24 hrs. (0=no, 1=yes) drgalcoh - drug or alcohol use in last 24 hrs. tues-sun - dummy variables designating day of week 13

14 Random Effects Models Uses a random effect to model the relative similarity of observations made on same statistical unit (e.g., person) Assumes Y ij and Y ik, j k are independent given some realized value of a random effect (β i0 ) and the covariates. Y ij Y ik X ij,β 0i The model assumes these random effects are randomly drawn from a known distribution. 14

15 Random Effects Model for Teenage Sex and Drug-Use logit[p(y ij =1 β 0i,X ij = x ij )] = log P(Y ij =1 β 0i, X ij = x ij ) = β RE 0 + β 0i + β RE 1 x ij P(Y ij = 0 β 0i,X ij = x ij ) Assume that the repeated observations for the ith teenager are independent of one another given β i0 and X ij. Must assume parametric distribution for the β i0, usually β i0 ~N(0,τ 2 ). exp(β 1 RE ) is odds ratio for having sex infection when subject i reports drug-use relative to when same subject does not report drug-use. 15

16 Motivation for This Approach Natural for modeling heterogeneity across individuals in their regression coefficients. This heterogeneity can be represented by a probability distribution Most useful when object is to make inferences about individuals rather than population averages. 16

17 Motivation for This Approach Also useful to estimate the contributions to variability from different sources (e.g., within and among individuals). Can be extended to hierarchy of units (multilevel modeling), such as repeated longitudinal measures of a person, within a household, within a community... 17

18 Some available software for random effects models Linear Models Proc Mixed in SAS xtreg in STATA (only simple random effects models) xtmixed in STATA 10 lme in R Logistic and Poisson Models xtlogit and xtpoisson in STATA for simple random effects, xtmelogit and xtmepoisson for general mixed models in STATA version 10 gllamm for general mixed models is STATA add-on 18

19 Random effects using xtlogit in STATA. xtlogit sx24hrs drgalcoh, or i(eid) re Random-effects logit Number of obs = 1708 Group variable (i) : eid Number of groups = 109 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 15.7 max = 33 Wald chi2(1) = 5.48 Log likelihood = Prob > chi2 = sx24hrs OR Std. Err. z P> z [95% Conf. Interval] exp(β 1 RE ) /lnsig2u τ sigma_u rho Likelihood ratio test of rho=0: chibar2(01) = Prob >= chibar2 =

20 Estimation of Marginal Models (GEE) Estimate marginal mean model. Marginal model is a population, not individual, model. The marginal E[Y ij X ij = x ij ] is defined as the mean value of an observation Y ij in the theoretical experiment where one randomly draws an observation from a population where everyone has X ij = x ij. 20

21 Marginal Models (GEE) For instance, if Y ij is the cholesterol and X ij = yes if one smokes, no otherwise. In a marginal model, E[Y ij X ij = yes] will be the mean of a randomly drawn Y ij from the subpopulation where everyone smokes. 21

22 Parameter Interpretation in a marginal model Parameters in an equivalent random effects and GEE model have subtly different interpretations. Coefficients in a random effects model represent expected differences (odds ratios, relative risks, etc) within an individual, given a change in their X from one value to another Coefficients in a marginal model represent expected differences (odds ratios, relative risks, etc) within an population, given a change in everyone s X from one value to another. 22

23 Parameter Interpretation in a GEE model, cont. In linear, log-linear models, the random effects and marginal regression parameters are the same. In Logistic regression, they are different more later. 23

24 24

25 Marginal Models (GEE) GEE software typically allows several different working correlation models (e.g., exchangeable, auto-regressive, unstructured, etc.). These correlation models are used to build weight matrices, which are used in a weighted regression. When deriving inferences for the coefficients, though, it calculates robust standard errors. 25

26 Examples of Correlation Models R R V = σ R R 0n Each individual is independent of all others Correlation within individuals across longitudinal observations has the same structure 26

27 Structure for R 0 General structure: 1 ρ 12 ρ 13 ρ 1n ρ 12 1 ρ 23 ρ 2n R 0 = ρ 13 ρ 23 1 ρ 3n 1 ρ 1n ρ 2n ρ 3n 1 A lot of unknown parameters 27

28 Correlation Models (contd): Uniform correlation (compound symmetry or exchangeable) 1 ρ ρ ρ ρ 1 ρ ρ R 0 = ρ ρ 1 ρ 1 ρ ρ ρ 1 Arises from random effects model e ij Y ij = α + α i + β x ij + e ij Errors uncorrelated, and independent of and x ij α i Var(α ρ = i ) Var(α i ) + Var(e ij ) 28

29 Correlation Models (contd):time-decaying Correlations (Auto-regressive) 1 ρ ρ 2 ρ n 1 ρ 1 ρ ρ n 2 R 0 = ρ 2 ρ 1 ρ n 3 1 ρ n 1 ρ n 2 ρ n 3 1 Auto-regressive: e ij = ρe ij 1 + η ij Not great for unequally spaced longitudinal data Exponential correlation model generalizes this to rather than corr(y ij, y ik ) = ρ t j t k ρ j k 29

30 Examples of var-cov. models Description Abbrev. Var-Cov. Matrix σ σ 0 σ 0 Compound Symmetry Unstructured Autoregressive Spatial Power CS UN AR(1) Banded Diagnonal UN(1) SP(POW)(c) 2 σ 0 2 σ 0 2 σ 0 2 σ 1 σ 2 +σ σ 0 2 σ 0 2 σ 0 2 σ 0 σ 2 +σ 0 2 σ σ 0 2 σ 0 2 σ 0 σ 2 +σ 0 2 σ 12 σ 13 σ 14 σ 12 2 σ 2 σ 23 σ 24 σ 13 σ 23 2 σ 3 σ 34 σ 14 σ 24 σ 34 2 σ 4 σ 2 ρσ 2 ρ 2 σ 2 ρ 3 σ 2 ρσ 2 σ 2 ρσ 2 ρ 2 σ 2 ρ 2 σ 2 ρσ 2 σ 2 ρσ 2 ρ 3 σ 2 ρ 2 σ 2 ρσ 2 σ 2 2 σ σ σ σ 4 σ 2 ρ d12 σ 2 ρ d13 σ 2 ρ d14 σ 2 ρ d12 σ 2 σ 2 ρ d23 σ 2 ρ d24 σ 2 ρ d13 σ 2 ρ d23 σ 2 σ 2 ρ d34 σ 2 ρ d14 σ 2 ρ d24 σ 2 ρ d34 σ 2 σ 2 30

31 The GEE Algorithm Algorithm is similar to the one used for the non-repeated measures problems (e.g., OLS for continuous data, logistic regression for binary and Poisson regression for counts). Let R(α) be a n i x n i "working" correlation matrix that is fully characterized by a vector of parameters, α. V i is again the variance-covariance of the observations which will be a function of the mean (E(Y i X i )), a scale parameter, φ and R(α). 31

32 Standard Errors of Coefficients GEE will normally return two estimates of the variance of the coefficient estimates, 1) naive and 2) robust. Naive assumes that the chosen model for R(α), such as compound symmetry, is correct. Robust is a more nonparametric estimate that does not assume your guess for R(α) is correct. However, its variance estimates can be more variable. 32

33 log GEE Marginal Model for Teenage Sex and Drug-Use µ P( Y = 1 ij ij ij ij M M it[ P( Yij = 1 Xij = xij)] = log = log = β0 + β1 1 µ ij P( Yij = 0 Xij = xij) var(y ij )= µ ij (1-µ ij )*, corr(y ij, Y ik ) = ρ (i.e., assume compound symmetry). exp(β 1M ) is a ratio of population frequencies, i.e., it is a population averaged parameter. It is the odds ratio of the probabilities (proportions) of teenagers who would engage in sexual activity in populations reporting drug use vs. populations not reporting drug-use. X = x ) x ij * Semi-robust inference can you tell why? 33

34 Sexual Activity and drug/alcohol use among teenagers revisted Main Variables sex24hrs - sex in last 24 hrs. (0=no, 1=yes) drgalcoh - drug or alcohol use in last 24 hrs. tues-sun - dummy variables designating day of week 34

35 Results using xtgee in STATA robust SE. xtgee sx24hrs drgalcoh, eform i(id) family(binomial) cor(ind) robust GEE population-averaged model Number of obs = 1708 Group variable: id Number of groups = 109 Link: logit Obs per group: min = 1 Family: binomial avg = 15.7 Correlation: independent max = 33 (standard errors adjusted for clustering on id) Semi-robust sx24hrs Odds Ratio Std. Err. z P> z [95% Conf. Interval] exp(β 1M )drgalcoh non-robust (naive) SE. xtgee sx24hrs drgalcoh, eform i(eid) family(binomial) cor(ind) sx24hrs Odds Ratio Std. Err. z P> z [95% Conf. Interval] drgalcoh

36 xtgee Options family(?), link(?) -- identify that we wish linear regression with continuous outcome (as compared to, say, binary outcomes more later) corr(ind) -- identify that we will assume independence for our correlation structure (some other possibilities include exchangeability and autoregressive structures) i(?)--identify which variable indentifies the individual (or cluster) ro -- identifies that we wish robust estimates of variability 36

37 Model 2 same marginal model, different working correlation. log µ P( Y = 1 ij ij ij ij M M it[ P( Yij = 1 Xij = xij)] = log = log = β0 + β1 1 µ ij P( Yij = 0 Xij = xij) X = x ) x ij x ij = 0 if drug/alcohol use is no, 1 if yes y ij = 0 if no sex in last 24 hours, 1 if yes cor(yij,yij )=ρ (compound symmetry or exchangeable correlation structure) 37

38 Results of Model 2 using STATA robust SE. xtgee sx24hrs drgalcoh, eform i(id) family(binomial) cor(exc) robust GEE population-averaged model Number of obs = 1708 Group variable: id Number of groups = 109 Link: logit Obs per group: min = 1 Family: binomial avg = 15.7 Correlation: exchangeable max = 33 (standard errors adjusted for clustering on id) Semi-robust sx24hrs Odds Ratio Std. Err. z P> z [95% Conf. Interval] drgalcoh non-robust (naive) SE. xtgee sx24hrs drgalcoh, eform i(eid) family(binomial) cor(exc) sx24hrs Odds Ratio Std. Err. z P> z [95% Conf. Interval] drgalcoh

39 Estimated Working Correlation. xtcorr c1 c2 c3 c4 c5 c6 c7 c8 c9 r r r r r r r r r r r r r r r r r r r

40 Model 3 adjusting for day of week log it[ P( Yij = 1 xij, dayij)] = β + β xij + γ z1ij + γ 2 z ij + + γ z ij... x ij = 1 if drug/alcohol use is yes, 0 if no z 1ij = 1 if interview day is Tuesday, 0 if not z 2ij = 1 if interview day is Wed., 0 if not... z 6ij = 1 if interview day is Sunday, 0 if not y ij = 1 if sex in last 24 hours, 0 if no cor(yij,yij )=ρ (compound symmetry or exchangeable correlation structure) 40

41 Results of Model 3 using STATA. xtgee sx24hrs drgalcoh tues wed thur fri sat sun, eform i(id) family(binomial > ) cor(exc) robust GEE population-averaged model Number of obs = 1708 Group variable: id Number of groups = 109 Link: logit Obs per group: min = 1 Family: binomial avg = 15.7 Correlation: exchangeable max = 33 Wald chi2(7) = Scale parameter: 1 Prob > chi2 = (standard errors adjusted for clustering on id) Semi-robust sx24hrs Odds Ratio Std. Err. z P> z [95% Conf. Interval] drgalcoh tues wed thur fri sat sun

42 Model for drug/alcohol use vs. day of week log it [ P( Xij = 1 dayij)] = γ + γ * z1i j γ * 2 z ij γ * z ij *... X ij = 1 if drug/alcohol use is yes, 0 if no z 1ij = 1 if interview day is Tuesday, 0 if not z 2ij = 1 if interview day is Wed., 0 if not... z 6ij = 1 if interview day is Sunday, 0 if not cor(yij,yij )=ρ (compound symmetry or exchangeable correlation structure) 42

43 Results of drug/alcohol use Model using STATA. xtgee drgalcoh tues wed thur fri sat sun, eform i(id) family(binomial) cor(ex > c) robust GEE population-averaged model Number of obs = 1708 Group variable: id Number of groups = 109 Link: logit Obs per group: min = 1 Family: binomial avg = 15.7 Correlation: exchangeable max = 33 Wald chi2(6) = Scale parameter: 1 Prob > chi2 = (standard errors adjusted for clustering on id) Semi-robust drgalcoh Odds Ratio Std. Err. z P> z [95% Conf. Interval] tues wed thur fri sat sun

44 Covariate and Cluster size issues We examine a simple example to look at how estimation and inference with clustered data are impacted by various changes in the data distribution. Cluster constant (e.g., county level) versus cluster varying (e.g., individual-level) covariates. Balanced versus unbalanced data (number of subunits within clusters). 44

45 Longitudinal Data on HIV+ patients Deeks, et al. (1999) report the results from a longitudinal study of HIV-infected adults undergoing Highly Active Anti-Retroviral Therapy (HAART) at San Francisco General Hospital (SFGH). Patients were included in this analysis if they received at least 16 weeks of continuous therapy with an anti-retroviral regimen The following data was obtained during the initial review: date of birth, sex and length of previous exposure to each individual anti-retroviral agent. 45

46 Once patients were identified, their medical records were reviewed every 3-4 months until November Plasma HIV RNA assays were performed using a branched DNA (bdna) assay. Repeated and irregular measurements of CD4 and viral load (time-structured repeated measures) Data not always matched in time. Goal is to find how CD4 varies with viral load and how this pattern varies in the population 46

47 Sample of HIV+ Data 47

48 CD4 versus Time etime 30 Evenly Spaced Subjects Ranked by Slope (CD4 vs. T) 48

49 HIV+ (CD4 Count) Data some simple analyses using only 2 observations per person Purpose is to illustrate the effects on estimates and inference of both different working correlation matrices and robust vs. naive inference: Consider two scenarios: baseline (time-independent) covariate, time-dependent covariate. 49

50 Association of Baseline Covariate (Age) on CD4 count. Binary age (X ij ) = 0 (<40) or 1 (>40) Fit simple linear model: E[ Y X = x] = β + β i j ij i 0 1 Compare results of Models A-D Naive Robust Unweighted OLS A B Weighted LS C D x i 50

51 Association of Baseline Covariate (Age) on CD4 count Model A. xtgee cd4 binage, i(id) cor(ind) cd4 Coef. Std. Err. z P> z [95% Conf. Interval] binage _cons Model B. xtgee cd4 binage, i(id) cor(ind) robust (standard errors adjusted for clustering on id) Semi-robust cd4 Coef. Std. Err. z P> z [95% Conf. Interval] binage _cons

52 Association of Baseline Covariate (Age) on CD4 count Model C. xtgee cd4 binage, i(id) cor(exc) cd4 Coef. Std. Err. z P> z [95% Conf. Interval] binage _cons Model D. xtgee cd4 binage, i(id) cor(exc) robust (standard errors adjusted for clustering on id) Semi-robust cd4 Coef. Std. Err. z P> z [95% Conf. Interval] binage _cons

53 Summary of Results of Association of Baseline Covariate (Age) on CD4 count β 0 (SE) Naive Robust Unweighted OLS (9.9) (12.6) Weighted LS (13.3) (12.6) β 1 (SE) Naive Robust Unweighted OLS (14.2) (19.3) Weighted LS (19.2) (19.3) 53

54 Association of Time (within cluster) Varying Covariate (Viral Load) on CD4 count. Binary VL: X ij = 0 (<2000) or 1 (>2000) all subjects included have one low and one high VL. Fit simple linear model: E[ Y X = x] = β + β i j ij i 0 1 x ij Compare results of Models A-D Naive Robust Unweighted OLS A B Weighted LS C D 54

55 Association of within-cluster-varying Covariate (VL) on CD4 count Model A. xtgee cd4 medvl, i(id) cor(ind) cd4 Coef. Std. Err. z P> z [95% Conf. Interval] medvl _cons Model B. xtgee cd4 medvl, i(id) cor(ind) robust (standard errors adjusted for clustering on id) Semi-robust cd4 Coef. Std. Err. z P> z [95% Conf. Interval] medvl _cons

56 Association of Within-Cluster-Varying Covariate (VL) on CD4 count Model C. xtgee cd4 medvl, i(id) cor(exc) cd4 Coef. Std. Err. z P> z [95% Conf. Interval] medvl _cons Model D. xtgee cd4 medvl, i(id) cor(exc) robust (standard errors adjusted for clustering on id) Semi-robust cd4 Coef. Std. Err. z P> z [95% Conf. Interval] medvl _cons

57 Robust Equivalent to Paired T-test Paired T-test. keep id cd4 medvl etime. sort cd4 medvl. reshape wide cd4 etime, i(id) j(medvl). ttest cd40= cd41 Paired t test Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] cd cd diff Ho: mean(cd40 - cd41) = mean(diff) = 0 Ha: mean(diff) < 0 Ha: mean(diff)!= 0 Ha: mean(diff) > 0 t = t = t = P < t = P > t = P > t =

58 Summary of Results of Association of Time Varying Covariate (VL) on CD4 count β 0 (SE) Naive Robust Unweighted OLS 377.4(21.4) 377.4(22.9) Weighted LS 377.4(21.4) 377.4(22.9) β 1 (SE) Naive Robust Unweighted OLS -98.3(30.3) -98.3(16.5) Weighted LS -98.3(16.4) -98.3(16.5) t-test (difference) -98.3(16.5) 58

59 Multiple and varying observations per person CD4 (Y) vs. continuous (log) Viral Load (X) E[Y i j X i1 = x i1, X ij = x i j ] = β 0 + β 1 x i1 + β 2 (x ij x i1 ) β 2 represents the expected change in Y given a change in X ij relative to the baseline value (X i1 ) - longitudinal effect. β 1 represents the expected difference in average Y across two sub-populations that differ by their baseline values, X i1 - crosssectional effect. 59

60 Association of Within-Cluster-Varying Covariate (VL) on CD4 count multiple observations per person Model A. xtgee cd4 logvlbase logvlchange, i(id) cor(ind) GEE population-averaged model Number of obs = 7053 Group variable: id Number of groups = 406 Link: identity Obs per group: min = 1 Family: Gaussian avg = 17.4 Correlation: independent max = cd4 Coef. Std. Err. z P> z [95% Conf. Interval] logvlbase logvlchange _cons

61 Association of Time-Varying Covariate (VL) on CD4 count multiple observations per person Model B. xtgee cd4 logvlbase logvlchange, i(id) cor(ind) robust GEE population-averaged model Number of obs = 7053 Group variable: id Number of groups = 406 Link: identity Obs per group: min = 1 Family: Gaussian avg = 17.4 Correlation: independent max = 58 Wald chi2(2) = (standard errors adjusted for clustering on id) Semi-robust cd4 Coef. Std. Err. z P> z [95% Conf. Interval] logvlbase logvlchange _cons

62 Association of Time-Varying Covariate (VL) on CD4 count multiple observations per person Model C. xtgee cd4 logvlbase logvlchange, i(id) cor(exc) cd4 Coef. Std. Err. z P> z [95% Conf. Interval] logvlbase logvlchange _cons Model D (standard errors adjusted for clustering on id) Semi-robust cd4 Coef. Std. Err. z P> z [95% Conf. Interval] logvlbase logvlchange _cons

63 Summary of Results of Association of Time- Varying Covariate (VL) on CD4 count multiple observations per person β 0 (SE) Naive Robust Unweighted OLS 618.9(11.6) 618.9(35.2) Weighted LS 509.1(31.2) 509.1(32.9) β 1 (SE) Naive Robust Unweighted OLS -83.7(3.0) -83.7(8.3) Weighted LS -52.7(7.4) -52.7(7.7) β 2 (SE) Naive Robust Unweighted OLS -99.2(2.4) -99.2(6.8) Weighted LS -54.7(2.2) -54.7(3.2) 63

64 MultiLevel Models Alan Hubbard UC Berkeley - Division of Biostatistics 64

65 Many Names for variants of Same Statistical Model Hierarchical Linear Models (HLM s) Random Coefficient Models Mixed Models (most general) Multilevel models (MLM s) Nested modeling 65

66 Typical Data Structures for MLM s Distinctive feature is the hierarchical nature of statistical units, e.g. Neighborhoods people Measurements made over time on the people Classrooms students Measurements made over time on the students Different sources of variation: Between classrooms Between students within classrooms Within students 66

67 Motivation for using MLM s (mixed models) Procedure estimates the fixed effects of interest Dissects the sources of variation Accounts for residual correlation among statistically dependent units when deriving inference. Permits one to specify a rich set of correlation models and allows for heteroskedascity. 67

68 Motivation for using MLM s (mixed models) It allows different subjects to have different responses to a treatment, risk variable, etc., thus has intuitive appeal. Rarely interesting, but can also provide postestimation estimates of the random effects. You get the entire data-generating distribution. Use the virtues of having a likelihood. Can simulate data from the resulting parameter estimates In contrast with other approaches that only target a specific aspect of the data-generating distribution. 68

69 What s being Mixed? A mixed model has two types of effects, fixed and random. A fixed effect means that all levels of the variable are contained in the data and the effect is universal to all in the target population. A random effect means that the levels (effects) of the variable comprise random samples of the levels (effects) in the target population. Consider a risk factor effect. Fixed, Random, Both? 69

70 The Simplest Example. The Model: Y = µ + α + ij i e ij E(α i )=0, E(e ij )=0, E[α i e ij ]=0. Var(α i )= σ 2 α. Var(e ij )= σ 2 e. More specifically, α i ~N(0, σ 2 α), e ij ~N(0, σ 2 e). 70

71 Likelihood Given α i: f (Y ij α i ) = φ Y ij α i µ 2, f ( n Y i α σ i ) = φ Y ij α i µ i 2 e σ e Likelihood of observed data (for one unit) is: j =1 f ( Y i ) = f ( Y n i α) f (α)dα = φ Y ij α µ 2 σ e φ i α σ 2 α α α j =1 dα 71

72 Estimation of fixed effects using mixed models Random effects models imply certain variancecovariance structures. For instance, a simple random effects model results in equal correlation (exchangeable or compound symmetry) among all observations measured on the same subject. We know that if the variance-covariance matrix (V) is known, then the most efficient estimate of the coefficients is weighted-least squares: ˆ T 1 T β = ( X WX ) X WY 72 where W = V -1.

73 Estimation of coefficients using mixed models, cont. The Mixed Model procedure works by: Converting the random effects model into its implied variance-covariance matrix, V, starting with the independent model (OLS) it gets residuals and then estimates V based on this model, creates weight matrix as W = Vˆ 1, does weighted least squares and gets residuals, repeats until convergence. The SE s the procedure return come from: vâr( ˆ T 1 T 1 1 β ) = ( X WX ) = ( X V X ) 73 Vˆ ˆ

74 Model Based Inference When deriving the inference on coefficients, the estimating procedure assumes that the variance-covariance model of the outcome implied by the model IS CORRECT (i.e., it s SE(βˆ) always naïve, not robust ). 74

75 Virtues of MultiLevel Models Diez-Roux 75

76 Provides Road Map for Accounting for Systematic and random variation at various levels (individuals, counties, states,.) Diez-Roux 76

77 Stage 2 Diez-Roux 77

78 Put it together, just a mixed model Diez-Roux 78

79 Random Intercepts and Random Associations The Model: ij ( β ) 0 + β i + β 1 + β ) x eij Y + = 0 ( 1i ij E(β 0i )=0, E(β 1i )=0, E(e ij )=0. Var(β 0i )= σ 2 0, Var(β1 i )= σ2 1, Var(e ij )= σ2 cov(β 0i, β 1i )= σ 12, cov(β 0i, e ij )=0, cov(β 1i, e ij )=0. What are the fixed and random effects in this model? 79

80 Simple Example (Individual is Cluster) Orthodontic study (Potthoff and Roy; 1964) 16 boys and 11 girls between the ages of 8 and 14 years Response variable is the distance (in millimeters) between the pituitary and the pterygomaxillary fissure. 80

81 Dental Data obsno child age distance gender

82 Dental Data distance age(yrs) 82

83 Mixed Model I for Dental Data Model ij ( β + β ) i + β x eij Y + = where x ij, is the jth age of ith child, Y ij is the distance. ij β 0i i.i.d N(0, σ 2 0 ), e ij i.i.d. N(0, σ2 ) 83

84 Why is this model called MultiLevel? Can write the model in two steps (as two levels): Y x + ij * = β 0 i + β1 ij e ij then model for individual coefficients: ( β ) β + * 0 i = 0 β 0i Implies that each individual has their own random intercept and are drawn from a population with mean intercept, β 0, but all subjects have same slope, β 1. * β 0i Can also have functions of say baseline covariates, z, or β 0i * (z) = β 0 + β 0i + α 1 z 1 + α 2 z

85 Mixed (MultiLevel) Model II for Dental Data The Model (called a random coefficients model) E(β 0i )=0, E(β 1i )=0, E(e ij )=0. ij ( β ) 0 + β i + β 1 + β ) x eij Y + = 0 ( 1i ij Var(β 0i )= σ 2 0, Var(β 1i )= σ2 1, Var(e ij )= σ2 cov(β 0i, β 1i )= σ 12, cov(β 0i, e ij )=cov(β 1i, e ij )=0. 85

86 STATA for Model II xtmixed. xtmixed distance age child: age, cov(uns) Mixed-effects REML regression Number of obs = 108 Group variable: child Number of groups = 27 Obs per group: min = 4 avg = 4.0 max = 4 Wald chi2(1) = Log restricted-likelihood = Prob > chi2 = distance Coef. Std. Err. z P> z [95% Conf. Interval] age _cons

87 STATA for Model II xtmixed Variance Components Estimates Random-effects Parameters Estimate Std. Err. [95% Conf. Interval] child: Unstructured σ 1 sd(age) σ 0 sd(_cons) σ 12 corr(age,_cons) sd(residual) LR test vs. linear regression: chi2(3) = Prob > chi2 = Note: LR test is conservative and provided only for reference 87

88 STATA for Model II xtmixed random coefficient estimates. predict b*, reffects. list b* β 1i, b1 β 0i,b

89 Summary of Results of Association of Baseline Covariate (Age) on CD4 count β 0 (SE) Naive Robust Unweighted OLS (9.9) (12.6) Weighted LS (13.3) (12.6) β 1 (SE) Naive Robust Unweighted OLS (14.2) (19.3) Weighted LS (19.2) (19.3) 89

90 Re-visit Model of CD4 vs. Baseline Age Fit simple random effects model: Y i j = β 0 + β 0i + β 1 X ij + e ij. xtreg cd4 binage, i(id) re Random-effects GLS regression Number of obs = 594 Group variable (i): id Number of groups = 297 R-sq: within =. Obs per group: min = 2 between = avg = 2.0 overall = max = 2 Random effects u_i ~ Gaussian Wald chi2(1) = 1.59 corr(u_i, X) = 0 (assumed) Prob > chi2 = cd4 Coef. Std. Err. z P> z [95% Conf. Interval] est of β 1 binage est of β 0 _cons sigma_u estimate of σ 2 0 sigma_e estimate of σ 2 rho (fraction of variance due to u_i)

91 Association of Time-Varying Covariate (Viral Load) on CD4 count. Binary VL: X ij = 0 (<2000) or 1 (>2000) all subjects included have one low and one high VL. Fit simple linear model: E[ Y X = x] = β + β i j ij i 0 1 Compare results of Models A-D x ij Naive Robust Unweighted OLS A B Weighted LS C D 91

92 Summary of Results of Association of Time Varying Covariate (VL) on CD4 count (note, different data that last lecture) β 0 (SE) Naive Robust Unweighted OLS 355.1(21.7) 355.1(23.6)) Weighted LS 355.1(21.7) 377.4(22.9) β 1 (SE) Naive Robust Unweighted OLS -79.3(30.7) -79.3(17.1) Weighted LS -79.3(17.0) -79.3(17.1) t-test (difference) -79.3(17.1) 92

93 Random Effects Model of CD4 vs. log 10 (viral load) Fit simple random effects model: Y i j = β 0 + β 0i + β 1 X ij + e ij. xtreg cd4 medvl, i(id) re Random-effects GLS regression Number of obs = 174 Group variable (i): id Number of groups = 87 R-sq: within = Obs per group: min = 2 between = avg = 2.0 overall = max = 2 Random effects u_i ~ Gaussian Wald chi2(1) = corr(u_i, X) = 0 (assumed) Prob > chi2 = cd4 Coef. Std. Err. z P> z [95% Conf. Interval] medvl _cons sigma_u estimate of σ 2 0 sigma_e estimate of σ 2 rho (fraction of variance due to u_i)

94 Multiple and varying observations per person CD4 (Y) vs. continuous (log) Viral Load (X) E[Y i j X i1 = x i1,x ij = x i j] = β 0 + β 1 x i1 + β 2 (x ij x i1 ) β 2 represents the expected change in Y given a change in X ij relative to the baseline value (X i1 ) - longitudinal effect. β 1 represents the expected difference in average Y across two sub-populations that differ by their baseline values, X i1 - cross-sectional effect. 94

95 Summary of Results of Association of Time- Varying Covariate (VL) on CD4 count multiple observations per person β 0 (SE) Naive Robust Unweighted OLS 618.9(11.6) 618.9(35.2) Weighted LS 509.1(31.2) 509.1(32.9) β 1 (SE) Naive Robust Unweighted OLS -83.7(3.0) -83.7(8.3) Weighted LS -52.7(7.4) -52.7(7.7) β 2 (SE) Naive Robust Unweighted OLS -99.2(2.4) -99.2(6.8) Weighted LS -54.7(2.2) -54.7(3.2) 95

96 Random Effects Model of CD4 vs. log 10 (viral load) Fit simple random effects model: Y i j = β 0 + β 0i + β 1 X i1 + β 2 (X ij X i1 ) + e ij Random-effects GLS regression Number of obs = 7053 Group variable (i): id Number of groups = 406 R-sq: within = Obs per group: min = 1 between = avg = 17.4 overall = max = 58 Random effects u_i ~ Gaussian Wald chi2(2) = corr(u_i, X) = 0 (assumed) Prob > chi2 = cd4 Coef. Std. Err. z P> z [95% Conf. Interval] logvlbase logvlchange _cons sigma_u sigma_e rho (fraction of variance due to u_i)

97 Random Coefficients Model of CD4 vs. log 10 (viral load) Fit random coef. model: Y i j = (β 0 + β 0i ) + (β 1 + β 1i )X i1 + β 2 (X ij X i1 ) + e ij Fixed Effects (Coefficient) Estimates. xtmixed cd4 logvlbase logvlchange id: logvlbase Mixed-effects REML regression Number of obs = 7053 Group variable: id Number of groups = 406 Obs per group: min = 1 avg = 17.4 max = 58 Wald chi2(2) = Log restricted-likelihood = Prob > chi2 = cd4 Coef. Std. Err. z P> z [95% Conf. Interval] logvlbase logvlchange _cons

98 Random Coefficients Model of CD4 vs. log 10 (viral load) Fit random coef. model: Y i j = (β 0 + β 0i ) + (β 1 + β 1i )X i1 + β 2 (X ij X i1 ) + e ij Variance Components Estimates Random-effects Parameters Estimate Std. Err. [95% Conf. Interval] id: Independent var(β 1i ) sd(logvlb~e) var(β 0i ) sd(_cons) var(e ij ) sd(residual) LR test vs. linear regression: chi2(2) = Prob > chi2 =

99 Random Effects Model for Teenage Sex and Drug-Use logit[p(y ij =1 β 0i, X ij = x ij )] = log P(Y ij =1 β 0i,X ij ) = β * P(Y ij = 0 β 0i,X ij ) 0i + β 1 X ij,β * 0i = β 0+ β 0i,β 0i ~ N(0,τ 2 ) Assume that the repeated observations for the ith teenager are independent of one another given β i0 and X ij. Must assume parametric distribution for the β i0, usually β i0 ~N(0,τ 2 ). exp(β 1 ) is odds ratio for having sex infection when subject i reports drug-use relative to when same subject does not report drug-use. 99

100 Random effects for teenage sex vs drug use. xtlogit sx24hrs drgalcoh, or i(eid) re Random-effects logit Number of obs = 1708 Group variable (i) : eid Number of groups = 109 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 15.7 max = 33 Wald chi2(1) = 5.48 Log likelihood = Prob > chi2 = sx24hrs OR Std. Err. z P> z [95% Conf. Interval] Β 1 drgalcoh /lnsig2u τ sigma_u rho Likelihood ratio test of rho=0: chibar2(01) = Prob >= chibar2 =

101 Random Effects Model for Diarrhea Study in Children P( Yijk log 1 P( Y Measurements made at children (k) within households (j) within villages (k). Want to know the greatest sources of variation: households var(β 0ij ) or villages var(β 0i ) = 1) = 1) ijk = β + β + β 0 0i 0ij Assumes children in same household have same probability of diarrhea. Use gllamm in STATA (also xtmelogit) 101

102 Random Effects Model for Diarrhea Study in Children gllamm diarrhea, i(hhid vilid) nip(5) family(binomial) number of level 1 units = 4736 number of level 2 units = 18 number of level 3 units = 4 Condition Number = gllamm model log likelihood = diarrhea Coef. Std. Err. z P> z [95% Conf. Interval] _cons

103 Random Effects Model for Diarrhea Study in Children Variances and covariances of random effects ***level 2 (hhid) var(β 0ij ) var(1): ( ) ***level 3 (vilid) var(β 0i ) var(1): ( ) Cluster correlation coefficient (based on latent response model): ρ ρ house village = var( β 0ij = var( β 0ij var( β 0ij ) + var( β ) 0i var( β0i ) ) + var( β 0i ) + π 2 ) + π 2 = = = =

104 Bangladesh Fertility Study 104

105 Bangladesh Fertility Study Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston

111 Tower of London Task Study Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston

114 References Slides from: Roberto G. Gutierrez, Director of Statistics, StataCorp LP North American Stata Users Group Meeting, Boston

115 Using conditional logistic regression for estimating within unit OR in logistic regression models 115

116 Treat Individual as a stratification variable for Teen Sex and Drugs For the teen sex and drugs example, we can represent the data on each individual, i, as a simple 2x2 table: Sex yes no D r u g s yes a i b i no c i d i n i Can get the OR for every subject: ˆ O R i = a id i b i c i Because our model it P( Yij = 1 β0i, Xij = xij) log [ P( Yij = 1 β 0 i, Xij = xij)] = log = β * + β0i + β * x 0 ij P( Yij = 0 β i Xij xij 1 0, = ) assumes every person has the same OR, we can average each estimated OR to get the estimate. 116

117 Mantel-Haenszel Average of Stratified OR s Then the MH estimate is: O ˆ R MH = exp( β ˆ 1 * ) = m i=1 m w i ˆ O R i w i i=1 i=1 Note, that for any subject who has identical exposure (drug use) or outcomes (sex) for all observations, the OR is undefined and that person does not contribute to the estimate (their 2x2 table are dropped). = m i=1 m (a i d i ) /n i (b i c i ) /n i 117

118 Conditional Logistic Regression To illustrate, use the teenage sex and drugs example, assume just two observation for a person, and that one had the outcome (Y i1 =1) with drugs (X i1 =1) one observation had neither (Y i2 =0, X i2 =0). Then, the conditional likelihood contribution for this observation is: CondLik i = P(Y i1 =1 X i1 =1)P(Y i2 = 0 X i2 = 0) P(Y i1 =1 X i1 =1)P(Y i2 = 0 X i2 = 0) + P(Y i1 =1 X i1 = 0)P(Y i2 = 0 X i2 =1) After plugging in the model for Y ij: log P( Y = 1 β, X x ) it ij 0i ij = ij [ P( Yij = 1 β 0 i, Xij = xij)] = log = β * + β 0 0i + β * P( Yij = 0 β i Xij xij 1 0, = ) and doing some algebra, one gets: CondLik i = 1 1+ exp(β 1 * (X i2 X i1 )) x ij Notice, the individual level intercept (whether random or not) drops out. 118

119 Conditional Logistic Regression What it means is that the estimate of the within subject OR no longer depends on assumptions on the distribution of the random effect. Can only use this to estimate the association of timevarying covariates. Subjects with identical outcomes will be dropped from analysis. For those covariates that do not change in a subject, they will not contribute to estimation of the OR for that covariate. 119

120 Conditional Logistic Regression More generally, you might want to estimate the within subject OR for several variables simultaneously and/or the OR for a unit change in a continuous variable. Can still do so by using the conditional likelihood - a method used to estimated OR s for matched case-control studies. The conditional likelihood (in example of a cohort) is the probability of observing that the cases have covariates they have and the controls have their observed covariates, given the distribution of covariates observed over all the repeated measurements. To define the likelihood, one normalizes the probability of observing the outcomes conditional on the covariates by the summed probabilities over all possible combinations of covariates and outcomes. 120

121 Teenage Sex and Drug-Use Using M-H summary OR.. cs sx24hrs drgalcoh, by(eid) or eid OR [95% Conf. Interval] M-H Weight (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) (Cornfield) M-H combined

122 Conditional Logistic Estimate. clogit sx24hrs drgalcoh, or group(eid) note: multiple positive outcomes within groups encountered. note: 23 groups (161 obs) dropped due to all positive or all negative outcomes. Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Conditional (fixed-effects) logistic regression Number of obs = 1547 LR chi2(1) = 2.93 Prob > chi2 = Log likelihood = Pseudo R2 = sx24hrs Odds Ratio Std. Err. z P> z [95% Conf. Interval] drgalcoh

123 Pitfalls of Latent Variable Models in General (including MultiLevel Mixed Models) 123

124 General Contrast of Mixed and GEE Models General Mixed Effects Model Specific Mixed Model (logistic) 124

125 Mixed Models General Likelihood of Observed Data based on Mixed Model. Specific from example 125

126 Latent Variable Models Nonparametric Nonidentifiability Point 1 for Mixed Models 126

127 Parameter returned by GEE: Population Average Models Parameter specific estimating function Estimating Function 127

128 Special Case: MultiLevel model coefficients have interpretations in both latent variable and observed data-generating worlds 128

129 Usually, different interpretations Logistic Case 129

130 Often difference in interpretations (Pop. Ave. vs. Mixed) (near) meaningless 130

131 It gets worse: What if Model is mis-specified? Parameter of Interest as Projection 131

132 Interpretation of coefficients in mis-specified multilevel mixed models can get wacky True model 132

133 Summary of MultiLevel Mixed vs. GEE (Population Ave.) Models 133

134 New Golden Rules of Estimation with Latent Variable Models? 134

135 Should avoid relying solely on unverifiable assumptions for inferences 135

136 Multilevel Analysis and Complex Surveys Part 2: Estimation and Inference from Complex Surveys Alan Hubbard UC Berkeley - Division of Biostatistics 136

137 Foundation Finite population U={1,..,N}. Sample s, subset of U. V=(Y,X) observations of outcome, covariates, on each units. Values in finite fixed population are v U =v 1,..,v N and the process by which one draws these will be called the observation process. Parameters can be defined with regard to a finite population, or superpopulation that is, there is some data generating model of interest, P V, and we want to estimate parameters of it, θ(p V ). 137

138 Sampling Mechanism δ=(i t, t=1,..,n) with δ t =1 if t s, 0 otherwise g(δ V=v,Z) sampling mechanism, where Z are so-called design variables, so z U =z 1,..,z N. Thus, the observed data O is generated by a combination of the mechanisms, P V Z and g: O=(δ*v U *z U ) define the joint distribution of observed data, P 0 (O), O=(V, Z δ=1) Special cases: g(δ V=v,Z) = g(δ Z) (noninformative sampling). Parameters of interest from distribution of V Z=z (disaggregated analysis) Parameters of interest from distribution of P V (aggregated) P V (v;θ ) = P(V = v Z = z) p(z = z) 138 z

139 Types of Inference Design-based inference Model-based inference 139

140 Full Likelihood Consider one more source of missingness (say R=1 respondent), e.g., nonrespondent. Full likelihood is then: f (r γ,z,v) f (δ z,v) f (v z) f (z) Missingness caused by mechanisms for both δ and r. Different assumptions imply different conditional independences. 140

141 Pseudo-Likelihood Estimating Equation Approaches Most practical applications of survey data do not contain enough information to define the entire likelihood of the joint missingness/ sampling mechanisms and the distribution of interest. In addition, the parameter of interest can often be identified without having the entire joint likelihood identifiable, but just some of the design elements. Thus, most survey analyses rely on pseudolikelihood estimation. 141

142 Simple Example Assume an exponential family: If the entire population was a simple random draw from this distribution (and you observed everyone s value) then the score equation based on the likelihood would be: with obvious solution f (y;θ) = θe θy, E(Y) =1/θ s(θ;v ) = N t =1 ( ) Y 1 θ 1 ˆ θ = Y ave(y) 142

143 Pseudo-Likelihood, Continued However, let s now assume all we have is the usual subset of the population U defined by s, and the probability that a observation was sampled, given it s observed values (for now, no Z): π t =P(δ t =1 Y t ). An unbiased estimating function for the population average (and thus the parameter of f(y;θ)) is: s(θ;o,π) = 1 ( Y 1 ) θ = t s π t Thus, just treat as a general missingness problem and use inverse weighting. N t =1 δ t π t ( Y 1 ) θ 143

144 It works (in this case)! Consistent estimating equation, which of course results in estimator, when solved: E Y E δ π Y 1 θ ( ) Y Note π = E(δ Y), so get E Y ( Y 1 ) θ = 0 ˆ µ s 1 ˆ θ s = E ( Y Y 1 θ)e δ π Y = t s t s π t 1 Y t π t 1 So, a re-weighted score equation provides consistent, pseudo-likelihood estimate. 144

145 Inference From Estimating Equation Designed based Approach parametric estimation uncertainty from repeated samplings (of the type done) from a fixed target population (Y U fixed). Model based from repeated draws from the underlying data generating distribution (Y U random). Both variance comes both from underlying data-generating mechanism and sampling mechanism if finite population large, model-based portion contributes almost nothing var( θ ˆ s ) = v ar(e( θ ˆ s U)) + E(var( θ ˆ 145 s U)) Model Source Design Source

146 Designed-based Inference, cont. All from sampling mechanism. Need simple empirical estimate derived from estimating equation. Often called sandwich estimator. Can be generally derived as the variancecovariance of the influence curve of the estimator. ˆ θ θ + 1 n t s IC(O t ;γ,θ), so var( θ ˆ ) var(ic(o ;γ,θ)) t n 146

147 Designed-based Inference In this general framework the things to account for in inference: stratified design (not a simple random sample) finite sample population correction (sometime samples not from an infinite population) clustered (correlated) data 147

148 Multilevel Analysis and Complex Surveys Part 3: Putting MultiLevel Models and Complex Survey Data Together Alan Hubbard UC Berkeley - Division of Biostatistics Slides are from Sophia Rabe-Hesketh, UC Berkeley, School of Education and Division of Biostatistics 148