ESALQ Course on Models for Longitudinal and Incomplete Data

Transcription

1 ESALQ Course on Models for Longitudinal and Incomplete Data Geert Verbeke Biostatistical Centre K.U.Leuven, Belgium Geert Molenberghs Center for Statistics Universiteit Hasselt, Belgium ESALQ Course, Diepenbeek, November 2006

2 Contents 0 Related References... 1 I Continuous Longitudinal Data 6 1 Introduction A Model for Longitudinal Data Estimation and Inference in the Marginal Model Inference for the Random Effects ESALQ Course on Models for Longitudinal and Incomplete Data i

3 II Marginal Models for Non-Gaussian Longitudinal Data 64 5 The Toenail Data The Analgesic Trial Generalized Linear Models Parametric Modeling Families Generalized Estimating Equations A Family of GEE Methods III Generalized Linear Mixed Models for Non-Gaussian Longitudinal Data Generalized Linear Mixed Models (GLMM) Fitting GLMM s in SAS IV Marginal Versus Random-effects Models Marginal Versus Random-effects Models ESALQ Course on Models for Longitudinal and Incomplete Data ii

4 V Non-linear Models Non-Linear Mixed Models VI Incomplete Data Setting The Scene Proper Analysis of Incomplete Data Weighted Generalized Estimating Equations Multiple Imputation VII Topics in Methods and Sensitivity Analysis for Incomplete Data An MNAR Selection Model and Local Influence Interval of Ignorance Concluding Remarks ESALQ Course on Models for Longitudinal and Incomplete Data iii

5 VIII Case Study: Age-related Macular Degeneration The Dataset Weighted Generalized Estimating Equations Multiple Imputation ESALQ Course on Models for Longitudinal and Incomplete Data iv

6 Chapter 0 Related References Aerts, M., Geys, H., Molenberghs, G., and Ryan, L.M. (2002). Topics in Modelling of Clustered Data. London: Chapman & Hall. Brown, H. and Prescott, R. (1999). Applied Mixed Models in Medicine. New York: John Wiley & Sons. Crowder, M.J. and Hand, D.J. (1990). Analysis of Repeated Measures. London: Chapman & Hall. Davidian, M. and Giltinan, D.M. (1995). Nonlinear Models For Repeated Measurement Data. London: Chapman & Hall. ESALQ Course on Models for Longitudinal and Incomplete Data 1

7 Davis, C.S. (2002). Statistical Methods for the Analysis of Repeated Measurements. New York: Springer. Demidenko, E. (2004). Mixed Models: Theory and Applications. New York: John Wiley & Sons. Diggle, P.J., Heagerty, P.J., Liang, K.Y. and Zeger, S.L. (2002). Analysis of Longitudinal Data. (2nd edition). Oxford: Oxford University Press. Fahrmeir, L. and Tutz, G. (2002). Multivariate Statistical Modelling Based on Generalized Linear Models (2nd ed). New York: Springer. Fitzmaurice, G.M., Laird, N.M., and Ware, J.H. (2004). Applied Longitudinal Analysis. New York: John Wiley & Sons. Goldstein, H. (1979). The Design and Analysis of Longitudinal Studies. London: Academic Press. ESALQ Course on Models for Longitudinal and Incomplete Data 2

8 Goldstein, H. (1995). Multilevel Statistical Models. London: Edward Arnold. Hand, D.J. and Crowder, M.J. (1995). Practical Longitudinal Data Analysis. London: Chapman & Hall. Hedeker, D. and Gibbons, R.D. (2006). Longitudinal Data Analysis. New York: John Wiley & Sons. Jones, B. and Kenward, M.G. (1989). Design and Analysis of Crossover Trials. London: Chapman & Hall. Kshirsagar, A.M. and Smith, W.B. (1995). Growth Curves. New York: Marcel Dekker. Leyland, A.H. and Goldstein, H. (2001) Multilevel Modelling of Health Statistics. Chichester: John Wiley & Sons. Lindsey, J.K. (1993). Models for Repeated Measurements. Oxford: Oxford ESALQ Course on Models for Longitudinal and Incomplete Data 3

9 University Press. Littell, R.C., Milliken, G.A., Stroup, W.W., Wolfinger, R.D., and Schabenberger, O. (2005). SAS for Mixed Models (2nd ed.). Cary: SAS Press. Longford, N.T. (1993). Random Coefficient Models. Oxford: Oxford University Press. McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models (second edition). London: Chapman & Hall. Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal Data. New York: Springer-Verlag. Pinheiro, J.C. and Bates D.M. (2000). Mixed effects models in S and S-Plus. New York: Springer. Searle, S.R., Casella, G., and McCulloch, C.E. (1992). Variance Components. ESALQ Course on Models for Longitudinal and Incomplete Data 4

10 New-York: Wiley. Senn, S.J. (1993). Cross-over Trials in Clinical Research. Chichester: Wiley. Verbeke, G. and Molenberghs, G. (1997). Linear Mixed Models In Practice: A SAS Oriented Approach, Lecture Notes in Statistics 126. New-York: Springer. Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer Series in Statistics. New-York: Springer. Vonesh, E.F. and Chinchilli, V.M. (1997). Linear and Non-linear Models for the Analysis of Repeated Measurements. Basel: Marcel Dekker. Weiss, R.E. (2005). Modeling Longitudinal Data. New York: Springer. Wu, H. and Zhang, J.-T. (2006). Nonparametric Regression Methods for Longitudinal Data Analysis. New York: John Wiley & Sons. ESALQ Course on Models for Longitudinal and Incomplete Data 5

11 Part I Continuous Longitudinal Data ESALQ Course on Models for Longitudinal and Incomplete Data 6

12 Chapter 1 Introduction Repeated Measures / Longitudinal data Examples ESALQ Course on Models for Longitudinal and Incomplete Data 7

13 1.1 Repeated Measures / Longitudinal Data Repeated measures are obtained when a response is measured repeatedly on a set of units Units: Subjects, patients, participants,... Animals, plants,... Clusters: families, towns, branches of a company, Special case: Longitudinal data ESALQ Course on Models for Longitudinal and Incomplete Data 8

14 1.2 Rat Data Research question (Dentistry, K.U.Leuven): How does craniofacial growth depend on testosteron production? Randomized experiment in which 50 male Wistar rats are randomized to: Control (15 rats) Low dose of Decapeptyl (18 rats) High dose of Decapeptyl (17 rats) ESALQ Course on Models for Longitudinal and Incomplete Data 9

15 Treatment starts at the age of 45 days; measurements taken every 10 days, from day 50 on. The responses are distances (pixels) between well defined points on x-ray pictures of the skull of each rat: ESALQ Course on Models for Longitudinal and Incomplete Data 10

16 Measurements with respect to the roof, base and height of the skull. Here, we consider only one response, reflecting the height of the skull. Individual profiles: ESALQ Course on Models for Longitudinal and Incomplete Data 11

17 Complication: Dropout due to anaesthesia (56%): #Observations Age (days) Control Low High Total Remarks: Much variability between rats, much less variability within rats Fixed number of measurements scheduled per subject, but not all measurements available due to dropout, for known reason. Measurements taken at fixed time points ESALQ Course on Models for Longitudinal and Incomplete Data 12

18 1.3 Prostate Data References: Carter et al (1992, Cancer Research). Carter et al (1992, Journal of the American Medical Association). Morrell et al (1995, Journal of the American Statistical Association). Pearson et al (1994, Statistics in Medicine). Prostate disease is one of the most common and most costly medical problems in the United States Important to look for markers which can detect the disease at an early stage Prostate-Specific Antigen is an enzyme produced by both normal and cancerous prostate cells ESALQ Course on Models for Longitudinal and Incomplete Data 13

19 PSA level is related to the volume of prostate tissue. Problem: Patients with Benign Prostatic Hyperplasia also have an increased PSA level Overlap in PSA distribution for cancer and BPH cases seriously complicates the detection of prostate cancer. Research question (hypothesis based on clinical practice): Can longitudinal PSA profiles be used to detect prostate cancer in an early stage? ESALQ Course on Models for Longitudinal and Incomplete Data 14

20 A retrospective case-control study based on frozen serum samples: 16 control patients 20 BPH cases 14 local cancer cases 4 metastatic cancer cases Complication: No perfect match for age at diagnosis and years of follow-up possible Hence, analyses will have to correct for these age differences between the diagnostic groups. ESALQ Course on Models for Longitudinal and Incomplete Data 15

21 Individual profiles: ESALQ Course on Models for Longitudinal and Incomplete Data 16

22 Remarks: Much variability between subjects Little variability within subjects Highly unbalanced data ESALQ Course on Models for Longitudinal and Incomplete Data 17

23 Chapter 2 A Model for Longitudinal Data Introduction The 2-stage model formulation Example: Rat data The general linear mixed-effects model Hierarchical versus marginal model Example: Rat data Example: Bivariate observations ESALQ Course on Models for Longitudinal and Incomplete Data 18

24 2.1 Introduction In practice: often unbalanced data: unequal number of measurements per subject measurements not taken at fixed time points Therefore, multivariate regression techniques are often not applicable Often, subject-specific longitudinal profiles can be well approximated by linear regression functions This leads to a 2-stage model formulation: Stage 1: Linear regression model for each subject separately Stage 2: Explain variability in the subject-specific regression coefficients using known covariates ESALQ Course on Models for Longitudinal and Incomplete Data 19

25 2.2 A 2-stage Model Formulation Stage 1 Response Y ij for ith subject, measured at time t ij, i =1,...,N, j =1,...,n i Response vector Y i for ith subject: Y i =(Y i1,y i2,...,y ini ) Stage 1 model: Y i = Z i β i + ε i ESALQ Course on Models for Longitudinal and Incomplete Data 20

26 Z i is a (n i q) matrix of known covariates β i is a q-dimensional vector of subject-specific regression coefficients ε i N(0, Σ i ),oftenσ i = σ 2 I ni Note that the above model describes the observed variability within subjects ESALQ Course on Models for Longitudinal and Incomplete Data 21

27 2.2.2 Stage 2 Between-subject variability can now be studied from relating the β i to known covariates Stage 2 model: β i = K i β + b i K i is a (q p) matrix of known covariates β is a p-dimensional vector of unknown regression parameters b i N(0,D) ESALQ Course on Models for Longitudinal and Incomplete Data 22

28 2.3 Example: The Rat Data Individual profiles: ESALQ Course on Models for Longitudinal and Incomplete Data 23

29 Transformation of the time scale to linearize the profiles: Age ij t ij =ln[1+(age ij 45)/10)] Note that t =0corresponds to the start of the treatment (moment of randomization) Stage 1 model: Y ij = β 1i + β 2i t ij + ε ij, j =1,...,n i Matrix notation: Y i = Z i β i + ε i with Z i = 1 t i1 1 t i2.. 1 t ini ESALQ Course on Models for Longitudinal and Incomplete Data 24

30 In the second stage, the subject-specific intercepts and time effects are related to the treatment of the rats Stage 2 model: β 1i = β 0 + b 1i, β 2i = β 1 L i + β 2 H i + β 3 C i + b 2i, L i, H i,andc i are indicator variables: L i = 1 if low dose 0 otherwise H i = 1 if high dose 0 otherwise C i = 1 if control 0 otherwise ESALQ Course on Models for Longitudinal and Incomplete Data 25

31 Parameter interpretation: β 0 : average response at the start of the treatment (independent of treatment) β 1, β 2,andβ 3 : average time effect for each treatment group ESALQ Course on Models for Longitudinal and Incomplete Data 26

32 2.4 The General Linear Mixed-effects Model A 2-stage approach can be performed explicitly in the analysis However, this is just another example of the use of summary statistics: Y i is summarized by β i summary statistics β i analysed in second stage The associated drawbacks can be avoided by combining the two stages into one model: Y i = Z i β i + ε i β i = K i β + b i = Y i = Z i K i } {{ } X i β + Z i b i + ε i = X i β + Z i b i + ε i ESALQ Course on Models for Longitudinal and Incomplete Data 27

33 General linear mixed-effects model: Y i = X i β + Z i b i + ε i b i N(0,D), ε i N(0, Σ i ), b 1,...,b N, ε 1,...,ε N independent Terminology: Fixed effects: β Random effects: b i Variance components: elements in D and Σ i ESALQ Course on Models for Longitudinal and Incomplete Data 28

34 2.5 Hierarchical versus Marginal Model The general linear mixed model is given by: Y i = X i β + Z i b i + ε i b i N(0,D), ε i N(0, Σ i ), b 1,...,b N, ε 1,...,ε N independent It can be rewritten as: Y i b i N(X i β + Z i b i, Σ i ), b i N(0,D) ESALQ Course on Models for Longitudinal and Incomplete Data 29

35 It is therefore also called a hierarchical model: AmodelforY i given b i Amodelforb i Marginally, we have that Y i is distributed as: Y i N(X i β,z i DZ i +Σ i ) Hence, very specific assumptions are made about the dependence of mean and covariance onn the covariates X i and Z i : Implied mean : X i β Implied covariance : V i = Z i DZ i +Σ i Note that the hierarchical model implies the marginal one, NOT vice versa ESALQ Course on Models for Longitudinal and Incomplete Data 30

36 2.6 Example: The Rat Data Stage 1 model: Y ij = β 1i + β 2i t ij + ε ij, j =1,...,n i Stage 2 model: β 1i = β 0 + b 1i, β 2i = β 1 L i + β 2 H i + β 3 C i + b 2i, Combined: Y ij =(β 0 + b 1i )+(β 1 L i + β 2 H i + β 3 C i + b 2i )t ij + ε ij = β 0 + b 1i +(β 1 + b 2i )t ij + ε ij, if low dose β 0 + b 1i +(β 2 + b 2i )t ij + ε ij, if high dose β 0 + b 1i +(β 3 + b 2i )t ij + ε ij, if control. ESALQ Course on Models for Longitudinal and Incomplete Data 31

37 Implied marginal mean structure: Linear average evolution in each group Equal average intercepts Different average slopes Implied marginal covariance structure (Σ i = σ 2 I ni ): Cov(Y i (t 1 ), Y i (t 2 )) = 1 t 1 D 1 t 2 + σ 2 δ {t1,t 2 } = d 22 t 1 t 2 + d 12 (t 1 + t 2 )+d 11 + σ 2 δ {t1,t 2 }. Note that the model implicitly assumes that the variance function is quadratic over time, with positive curvature d 22. ESALQ Course on Models for Longitudinal and Incomplete Data 32

38 A model which assumes that all variability in subject-specific slopes can be ascribed to treatment differences can be obtained by omitting the random slopes b 2i from the above model: Y ij =(β 0 + b 1i )+(β 1 L i + β 2 H i + β 3 C i )t ij + ε ij = β 0 + b 1i + β 1 t ij + ε ij, if low dose β 0 + b 1i + β 2 t ij + ε ij, if high dose β 0 + b 1i + β 3 t ij + ε ij, if control. This is the so-called random-intercepts model The same marginal mean structure is obtained as under the model with random slopes ESALQ Course on Models for Longitudinal and Incomplete Data 33

39 Implied marginal covariance structure (Σ i = σ 2 I ni ): Cov(Y i (t 1 ), Y i (t 2 )) = 1 D 1 + σ 2 δ {t1,t 2 } = d 11 + σ 2 δ {t1,t 2 }. Hence, the implied covariance matrix is compound symmetry: constant variance d 11 + σ 2 constant correlation ρ I = d 11 /(d 11 + σ 2 ) between any two repeated measurements within the same rat ESALQ Course on Models for Longitudinal and Incomplete Data 34

40 2.7 Example: Bivariate Observations Balanced data, two measurements per subject (n i =2), two models: Model 1: Random intercepts + heterogeneous errors V = 1 1 (d) (1 1) + σ σ 2 2 = d + σ 2 1 d d d+ σ 2 2 Model 2: Uncorrelated intercepts and slopes + measurement error V = d d σ σ 2 = d 1 + σ 2 d 1 d 1 d 1 + d 2 + σ 2 ESALQ Course on Models for Longitudinal and Incomplete Data 35

41 Different hierarchical models can produce the same marginal model Hence, a good fit of the marginal model cannot be interpreted as evidence for any of the hierarchical models. A satisfactory treatment of the hierarchical model is only possible within a Bayesian context. ESALQ Course on Models for Longitudinal and Incomplete Data 36

42 Chapter 3 Estimation and Inference in the Marginal Model ML and REML estimation Fitting linear mixed models in SAS Negative variance components Inference ESALQ Course on Models for Longitudinal and Incomplete Data 37

43 3.1 ML and REML Estimation Recall that the general linear mixed model equals Y i = X i β + Z i b i + ε i b i N(0,D) ε i N(0, Σ i ) independent The implied marginal model equals Y i N(X i β,z i DZ i +Σ i ) Note that inferences based on the marginal model do not explicitly assume the presence of random effects representing the natural heterogeneity between subjects ESALQ Course on Models for Longitudinal and Incomplete Data 38

44 Notation: β: vector of fixed effects (as before) α: vector of all variance components in D and Σ i θ =(β, α ) : vector of all parameters in marginal model Marginal likelihood function: L ML (θ) = N i=1 (2π) n i/2 V i (α) 1 2 exp 1 2 (Y i X i β) Vi 1 (α)(y i X i β) If α were known, MLE of β equals where W i equals V 1 i. β(α) = N i=1 X iw i X i 1 N i=1 X iw i y i, ESALQ Course on Models for Longitudinal and Incomplete Data 39

45 In most cases, α is not known, and needs to be replaced by an estimate α Two frequently used estimation methods for α: Maximum likelihood Restricted maximum likelihood ESALQ Course on Models for Longitudinal and Incomplete Data 40

46 3.2 Fitting Linear Mixed Models in SAS A model for the prostate data: ln(psa ij +1) = β 1 Age i + β 2 C i + β 3 B i + β 4 L i + β 5 M i +(β 6 Age i + β 7 C i + β 8 B i + β 9 L i + β 10 M i ) t ij +(β 11 Age i + β 12 C i + β 13 B i + β 14 L i + β 15 M i ) t 2 ij + b 1i + b 2i t ij + b 3i t 2 ij + ε ij. SAS program (Σ i = σ 2 I ni ): proc mixed data=prostate method=reml; class id group timeclss; model lnpsa = group age time group*time age*time time2 group*time2 age*time2 / solution; random intercept time time2 / type=un subject=id; run; ESALQ Course on Models for Longitudinal and Incomplete Data 41

47 ML and REML estimates for fixed effects: Effect Parameter MLE (s.e.) REMLE (s.e.) Age effect β (0.013) (0.014) Intercepts: Control β (0.919) (0.976) BPH β (1.026) (1.090) L/R cancer β (0.997) (1.059) Met. cancer β (1.022) (1.086) Age time effect β (0.020) (0.021) Time effects: Control β (1.359) (1.473) BPH β (1.511) (1.638) L/R cancer β (1.469) (1.593) Met. cancer β (1.499) (1.626) Age time 2 effect β (0.008) (0.009) Time 2 effects: Control β (0.549) (0.610) BPH β (0.604) (0.672) L/R cancer β (0.590) (0.656) Met. cancer β (0.598) (0.666) ESALQ Course on Models for Longitudinal and Incomplete Data 42

48 ML and REML estimates for variance components: Effect Parameter MLE (s.e.) REMLE (s.e.) Covariance of b i : var(b 1i ) d (0.083) (0.098) var(b 2i ) d (0.187) (0.230) var(b 3i ) d (0.032) (0.041) cov(b 1i,b 2i ) d 12 = d (0.113) (0.136) cov(b 2i,b 3i ) d 23 = d (0.076) (0.095) cov(b 3i,b 1i ) d 13 = d (0.043) (0.053) Residual variance: var(ε ij ) σ (0.002) (0.002) Log-likelihood ESALQ Course on Models for Longitudinal and Incomplete Data 43

49 Fitted average profiles at median age at diagnosis: ESALQ Course on Models for Longitudinal and Incomplete Data 44

50 3.3 Negative Variance Components Reconsider the model for the rat data: Y ij =(β 0 + b 1i )+(β 1 L i + β 2 H i + β 3 C i + b 2i )t ij + ε ij REML estimates obtained from SAS procedure MIXED: Effect Parameter REMLE (s.e.) Intercept β (0.325) Time effects: Low dose β (0.228) High dose β (0.231) Control β (0.285) Covariance of b i : var(b 1i ) d (1.123) var(b 2i ) d ( ) cov(b 1i,b 2i ) d 12 = d (0.381) Residual variance: var(ε ij ) σ (0.145) REML log-likelihood ESALQ Course on Models for Longitudinal and Incomplete Data 45

51 This suggests that the REML likelihood could be further increased by allowing negative estimates for d 22 In SAS, this can be done by adding the option nobound to the PROC MIXED statement. Results: Parameter restrictions for α d ii 0,σ 2 0 d ii IR, σ 2 IR Effect Parameter REMLE (s.e.) REMLE (s.e.) Intercept β (0.325) (0.313) Time effects: Low dose β (0.228) (0.198) High dose β (0.231) (0.198) Control β (0.285) (0.254) Covariance of b i : var(b 1i ) d (1.123) (1.019) var(b 2i ) d ( ) (0.169) cov(b 1i,b 2i ) d 12 = d (0.381) (0.357) Residual variance: var(ε ij ) σ (0.145) (0.165) REML log-likelihood ESALQ Course on Models for Longitudinal and Incomplete Data 46

52 Note that the REML log-likelihood value has been further increased and a negative estimate for d 22 is obtained. Brown & Prescott (1999, p. 237) : ESALQ Course on Models for Longitudinal and Incomplete Data 47

53 Meaning of negative variance component? Fitted variance function: Var(Y i (t)) = 1 t D 1 t + σ 2 = d 22 t 2 +2 d 12 t + d 11 + σ 2 = 0.287t t The suggested negative curvature in the variance function is supported by the sample variance function: ESALQ Course on Models for Longitudinal and Incomplete Data 48

54 This again shows that the hierarchical and marginal models are not equivalent: The marginal model allows negative variance components, as long as the marginal covariances V i = Z i DZ i + σ 2 I ni are positive definite The hierarchical interpretation of the model does not allow negative variance components ESALQ Course on Models for Longitudinal and Incomplete Data 49

55 3.4 Inference for the Marginal Model Estimate for β: β(α) = N i=1 X iw i X i with α replacedbyitsmlorremlestimate 1 N i=1 X iw i y i Conditional on α, β(α) is multivariate normal with mean β and covariance Var( β) = = N i=1 X iw i X i N i=1 X iw i X i 1 N 1 i=1 X iw i var(y i )W i X i N i=1 X iw i X i 1 In practice one again replaces α by its ML or REML estimate ESALQ Course on Models for Longitudinal and Incomplete Data 50

56 Inference for β: Wald tests t-andf -tests LR tests (not with REML) Inference for α: Wald tests LR tests (even with REML) Caution 1: Marginal vs. hierarchical model Caution 2: Boundary problems ESALQ Course on Models for Longitudinal and Incomplete Data 51

57 Chapter 4 Inference for the Random Effects Empirical Bayes inference Example: Prostate data Shrinkage Example: Prostate data ESALQ Course on Models for Longitudinal and Incomplete Data 52

58 4.1 Empirical Bayes Inference Random effects b i reflect how the evolution for the ith subject deviates from the expected evolution X i β. Estimation of the b i helpful for detecting outlying profiles This is only meaningful under the hierarchical model interpretation: Y i b i N(X i β + Z i b i, Σ i ) b i N(0,D) Since the b i are random, it is most natural to use Bayesian methods Terminology: prior distribution N(0,D) for b i ESALQ Course on Models for Longitudinal and Incomplete Data 53

59 Posterior density: f(b i y i ) f(b i Y i = y i ) = f(y i b i ) f(b i ) f(y i b i ) f(b i ) f(y i b i ) f(b i ) db i... exp 1 2 (b i DZ iw i (y i X i β)) Λ 1 i (b i DZ iw i (y i X i β)) for some positive definite matrix Λ i. Posterior distribution: b i y i N (DZ iw i (y i X i β), Λ i ) ESALQ Course on Models for Longitudinal and Incomplete Data 54

60 Posterior mean as estimate for b i : b i (θ) =E [b i Y i = y i ] = b i f(b i y i ) db i = DZ iw i (α)(y i X i β) Parameters in θ are replaced by their ML or REML estimates, obtained from fitting the marginal model. b i = b i ( θ) is called the Empirical Bayes estimate of b i. Approximate t- andf -tests to account for the variability introduced by replacing θ by θ, similar to tests for fixed effects. ESALQ Course on Models for Longitudinal and Incomplete Data 55

61 4.2 Example: Prostate Data We reconsider our linear mixed model: ln(psa ij +1) = β 1 Age i + β 2 C i + β 3 B i + β 4 L i + β 5 M i +(β 6 Age i + β 7 C i + β 8 B i + β 9 L i + β 10 M i ) t ij +(β 11 Age i + β 12 C i + β 13 B i + β 14 L i + β 15 M i ) t 2 ij + b 1i + b 2i t ij + b 3i t 2 ij + ε ij. In SAS the estimates can be obtained from adding the option solution to the random statement: random intercept time time2 / type=un subject=id solution; ods listing exclude solutionr; ods output solutionr=out; ESALQ Course on Models for Longitudinal and Incomplete Data 56

62 The ODS statements are used to write the EB estimates into a SAS output data set, and to prevent SAS from printing them in the output window. In practice, histograms and scatterplots of certain components of b i are used to detect model deviations or subjects with exceptional evolutions over time ESALQ Course on Models for Longitudinal and Incomplete Data 57

63 ESALQ Course on Models for Longitudinal and Incomplete Data 58

64 Strong negative correlations in agreement with correlation matrix corresponding to fitted D: Dcorr = Histograms and scatterplots show outliers Subjects #22, #28, #39, and #45, have highest four slopes for time 2 and smallest four slopes for time, i.e., with the strongest (quadratic) growth. Subjects #22, #28 and #39 have been further examined and have been shown to be metastatic cancer cases which were misclassified as local cancer cases. Subject #45 is the metastatic cancer case with the strongest growth ESALQ Course on Models for Longitudinal and Incomplete Data 59

65 4.3 Shrinkage Estimators b i Consider the prediction of the evolution of the ith subject: Y i X i β + Z i b i = X i β + Z i DZ iv i 1 (y i X i β) = ( I ni Z i DZ iv 1 i = Σ i Vi 1 X i β + ( I ni Σ i Vi 1 ) Xi β + Z i DZ iv i 1 y i ) yi, Hence, Ŷi is a weighted mean of the population-averaged profile X i β and the observed data y i, with weights Σ i V i 1 and I ni Σ i V i 1 respectively. ESALQ Course on Models for Longitudinal and Incomplete Data 60

66 Note that X i β gets much weight if the residual variability is large in comparison to the total variability. This phenomenon is usually called shrinkage : The observed data are shrunk towards the prior average profile X i β. This is also reflected in the fact that for any linear combination λ b i of random effects, var(λ b i ) var(λ b i ). ESALQ Course on Models for Longitudinal and Incomplete Data 61

67 4.4 Example: Prostate Data Comparison of predicted, average, and observed profiles for the subjects #15 and #28, obtained under the reduced model: ESALQ Course on Models for Longitudinal and Incomplete Data 62

68 Illustration of the shrinkage effect : Var( b i ) = , D = ESALQ Course on Models for Longitudinal and Incomplete Data 63

69 Part II Marginal Models for Non-Gaussian Longitudinal Data ESALQ Course on Models for Longitudinal and Incomplete Data 64

70 Chapter 5 The Toenail Data Toenail Dermatophyte Onychomycosis: Common toenail infection, difficult to treat, affecting more than 2% of population. Classical treatments with antifungal compounds need to be administered until the whole nail has grown out healthy. New compounds have been developed which reduce treatment to 3 months Randomized, double-blind, parallel group, multicenter study for the comparison of two such new compounds (A and B) for oral treatment. ESALQ Course on Models for Longitudinal and Incomplete Data 65

71 Research question: Severity relative to treatment of TDO? patients randomized, 36 centers 48 weeks of total follow up (12 months) 12 weeks of treatment (3 months) measurements at months 0, 1, 2, 3, 6, 9, 12. ESALQ Course on Models for Longitudinal and Incomplete Data 66

72 Frequencies at each visit (both treatments): ESALQ Course on Models for Longitudinal and Incomplete Data 67

73 Chapter 6 The Analgesic Trial single-arm trial with 530 patients recruited (491 selected for analysis) analgesic treatment for pain caused by chronic nonmalignant disease treatment was to be administered for 12 months we will focus on Global Satisfaction Assessment (GSA) GSA scale goes from 1=very good to 5=very bad GSA was rated by each subject 4 times during the trial, at months 3, 6, 9, and 12. ESALQ Course on Models for Longitudinal and Incomplete Data 68

74 Research questions: Evolution over time Relation with baseline covariates: age, sex, duration of the pain, type of pain, disease progression, Pain Control Assessment (PCA),... Investigation of dropout Frequencies: GSA Month 3 Month 6 Month 9 Month % % % % % % % % % % % % % % % % % % % 3 1.4% Tot ESALQ Course on Models for Longitudinal and Incomplete Data 69

75 Missingness: Measurement occasion Month 3 Month 6 Month 9 Month 12 Number % Completers O O O O Dropouts O O O M O O M M O M M M Non-monotone missingness O O M O O M O O O M O M O M M O M O O O M O O M M O M O M O M M ESALQ Course on Models for Longitudinal and Incomplete Data 70

76 Chapter 7 Generalized Linear Models The model Maximum likelihood estimation Examples McCullagh and Nelder (1989) ESALQ Course on Models for Longitudinal and Incomplete Data 71

77 7.1 The Generalized Linear Model Suppose a sample Y 1,...,Y N of independent observations is available All Y i have densities f(y i θ i,φ) which belong to the exponential family: f(y θ i,φ)=exp { φ 1 [yθ i ψ(θ i )] + c(y, φ) } θ i the natural parameter Linear predictor: θ i = x i β φ is the scale parameter (overdispersion parameter) ESALQ Course on Models for Longitudinal and Incomplete Data 72

78 ψ( ) is a function, generating mean and variance: E(Y )=ψ (θ) Var(Y )=φψ (θ) Note that, in general, the mean μ and the variance are related: Var(Y ) = φψ [ ψ 1 (μ) ] = φv(μ) The function v(μ) is called the variance function. The function ψ 1 which expresses θ as function of μ is called the link function. ψ is the inverse link function ESALQ Course on Models for Longitudinal and Incomplete Data 73

79 7.2 Examples The Normal Model Model: Y N(μ, σ 2 ) Density function: f(y θ, φ) = 1 2πσ 2 exp 1 σ 2(y μ)2 =exp 1 σ 2 yμ μ2 2 + ln(2πσ 2 ) 2 y2 2σ 2 ESALQ Course on Models for Longitudinal and Incomplete Data 74

80 Exponential family: θ= μ φ= σ 2 ψ(θ) =θ 2 /2 c(y, φ) = ln(2πφ) 2 y2 2φ Mean and variance function: μ= θ v(μ) =1 Note that, under this normal model, the mean and variance are not related: φv(μ) = σ 2 The link function is here the identity function: θ = μ ESALQ Course on Models for Longitudinal and Incomplete Data 75

81 7.2.2 The Bernoulli Model Model: Y Bernoulli(π) Density function: f(y θ, φ) =π y (1 π) 1 y =exp{y ln π +(1 y) ln(1 π)} =exp y ln π + ln(1 π) 1 π ESALQ Course on Models for Longitudinal and Incomplete Data 76

82 Exponential family: θ=ln ( ) π φ=1 1 π ψ(θ) = ln(1 π) = ln(1 + exp(θ)) c(y, φ) =0 Mean and variance function: μ= v(μ) = exp θ 1+exp θ = π exp θ (1+exp θ) 2 = π(1 π) Note that, under this model, the mean and variance are related: φv(μ) = μ(1 μ) The link function here is the logit link: θ =ln ( μ 1 μ ) ESALQ Course on Models for Longitudinal and Incomplete Data 77

83 7.3 Generalized Linear Models (GLM) Suppose a sample Y 1,...,Y N of independent observations is available All Y i have densities f(y i θ i,φ) which belong to the exponential family In GLM s, it is believed that the differences between the θ i can be explained through a linear function of known covariates: θ i = x i β x i is a vector of p known covariates β is the corresponding vector of unknown regression parameters, to be estimated from the data. ESALQ Course on Models for Longitudinal and Incomplete Data 78

84 7.4 Maximum Likelihood Estimation Log-likelihood: l(β,φ) = 1 φ i [y iθ i ψ(θ i )] + i c(y i,φ) The score equations: S(β) = i μ i β v 1 i (y i μ i ) = 0 Note that the estimation of β depends on the density only through the means μ i and the variance functions v i = v(μ i ). ESALQ Course on Models for Longitudinal and Incomplete Data 79

85 The score equations need to be solved numerically: iterative (re-)weighted least squares Newton-Raphson Fisher scoring Inference for β is based on classical maximum likelihood theory: asymptotic Wald tests likelihood ratio tests score tests ESALQ Course on Models for Longitudinal and Incomplete Data 80

86 7.5 Illustration: The Analgesic Trial Early dropout (did the subject drop out after the first or the second visit)? Binary response PROC GENMOD can fit GLMs in general PROC LOGISTIC can fit models for binary (and ordered) responses SAS code for logit link: proc genmod data=earlydrp; model earlydrp = pca0 weight psychiat physfct / dist=b; run; proc logistic data=earlydrp descending; model earlydrp = pca0 weight psychiat physfct; run; ESALQ Course on Models for Longitudinal and Incomplete Data 81

87 SAS code for probit link: proc genmod data=earlydrp; model earlydrp = pca0 weight psychiat physfct / dist=b link=probit; run; proc logistic data=earlydrp descending; model earlydrp = pca0 weight psychiat physfct / link=probit; run; Selected output: Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept PCA WEIGHT PSYCHIAT PHYSFCT Scale NOTE: The scale parameter was held fixed. ESALQ Course on Models for Longitudinal and Incomplete Data 82

88 Chapter 8 Parametric Modeling Families Continuous outcomes Longitudinal generalized linear models Notation ESALQ Course on Models for Longitudinal and Incomplete Data 83

89 8.1 Continuous Outcomes Marginal Models: E(Y ij x ij )=x ijβ Random-Effects Models: E(Y ij b i, x ij )=x ijβ + z ijb i Transition Models: E(Y ij Y i,j 1,...,Y i1, x ij )=x ijβ + αy i,j 1 ESALQ Course on Models for Longitudinal and Incomplete Data 84

90 8.2 Longitudinal Generalized Linear Models Normal case: easy transfer between models Also non-normal data can be measured repeatedly (over time) Lack of key distribution such as the normal [= ] A lot of modeling options Introduction of non-linearity No easy transfer between model families crosssectional longitudinal normal outcome linear model LMM non-normal outcome GLM? ESALQ Course on Models for Longitudinal and Incomplete Data 85

91 8.3 Marginal Models Fully specified models & likelihood estimation E.g., multivariate probit model, Bahadur model, odds ratio model Specification and/or fitting can be cumbersome Non-likelihood alternatives Historically: Empirical generalized least squares (EGLS; Koch et al 1977; SAS CATMOD) Generalized estimating equations Pseudo-likelihood ESALQ Course on Models for Longitudinal and Incomplete Data 86

92 Chapter 9 Generalized Estimating Equations General idea Asymptotic properties Working correlation Special case and application SAS code and output ESALQ Course on Models for Longitudinal and Incomplete Data 87

93 9.1 General Idea Univariate GLM, score function of the form (scalar Y i ): S(β) = N i=1 μ i β v 1 i (y i μ i )=0 with v i = Var(Y i ) In longitudinal setting: Y =(Y 1,...,Y N ): where S(β) = i j μ ij β v 1 ij (y ij μ ij ) = N i=1 D i [V i (α)] 1 (y i μ i ) = 0 D i is an n i p matrix with (i, j)th elements μ ij β y i and μ i are n i -vectors with elements y ij and μ ij Is V i n i n i diagonal or more complex? ESALQ Course on Models for Longitudinal and Incomplete Data 88

94 V i = Var(Y i ) is more complex since it involves a set of nuisance parameters α, determining the covariance structure of Y i : in which V i (β, α) =φa 1/2 i (β)r i (α)a 1/2 i (β) A 1/2 i (β) = vi1 (μ i1 (β)) v ini (μ ini (β)) and R i (α) is the correlation matrix of Y i, parameterized by α. Same form as for full likelihood procedure, but we restrict specification to the first moment only Liang and Zeger (1986) ESALQ Course on Models for Longitudinal and Incomplete Data 89

95 9.2 Large Sample Properties As N where N(ˆβ β) N(0,I 1 0 ) I 0 = N i=1 D i[v i (α)] 1 D i (Unrealistic) Conditions: α is known the parametric form for V i (α) is known This is the naive purely model based variance estimator Solution: working correlation matrix ESALQ Course on Models for Longitudinal and Incomplete Data 90

96 9.3 Unknown Covariance Structure Keep the score equations BUT S(β) = N i=1 [D i] [V i (α)] 1 (y i μ i )=0 suppose V i (.) is not the true variance of Y i but only a plausible guess, a so-called working correlation matrix specify correlations and not covariances, because the variances follow from the mean structure the score equations are solved as before ESALQ Course on Models for Longitudinal and Incomplete Data 91

97 The asymptotic normality results change to N(ˆβ β) N(0,I 1 0 I 1 I 1 0 ) I 0 = N i=1 D i[v i (α)] 1 D i I 1 = N i=1 D i[v i (α)] 1 Var(Y i )[V i (α)] 1 D i. This is the robust empirically corrected sandwich variance estimator I 0 is the bread I 1 is the filling (ham or cheese) Correct guess = likelihood variance ESALQ Course on Models for Longitudinal and Incomplete Data 92

98 The estimators ˆβ are consistent even if the working correlation matrix is incorrect An estimate is found by replacing the unknown variance matrix Var(Y i ) by (Y i ˆμ i )(Y i ˆμ i ). Even if this estimator is bad for Var(Y i ) it leads to a good estimate of I 1, provided that: replication in the data is sufficiently large same model for μ i is fitted to groups of subjects observation times do not vary too much between subjects A bad choice of working correlation matrix can affect the efficiency of ˆβ Care needed with incomplete data (see Part VI) ESALQ Course on Models for Longitudinal and Incomplete Data 93

99 9.4 The Working Correlation Matrix V i = V i (β, α,φ)=φa 1/2 i (β)r i (α)a 1/2 i (β) Variance function: A i is (n i n i ) diagonal with elements v(μ ij ),theknownglm variance function. Working correlation: R i (α) possibly depends on a different set of parameters α. Overdispersion parameter: φ, assumed 1 or estimated from the data. The unknown quantities are expressed in terms of the Pearson residuals Note that e ij depends on β. e ij = y ij μ ij v(μij ). ESALQ Course on Models for Longitudinal and Incomplete Data 94

100 9.5 Estimation of Working Correlation Liang and Zeger (1986) proposed moment-based estimates for the working correlation. Corr(Y ij,y ik ) Estimate Independence 0 Exchangeable α ˆα = 1 N AR(1) α ˆα = 1 N N i=1 1 n i (n i 1) j k e ij e ik N i=1 1 n i 1 j ni 1 e ij e i,j+1 Dispersion parameter: ˆφ = 1 N N i=1 1 n i n i j=1 e2 ij. Unstructured α jk ˆα jk = 1 N N i=1 e ij e ik ESALQ Course on Models for Longitudinal and Incomplete Data 95

101 9.6 Fitting GEE The standard procedure, implemented in the SAS procedure GENMOD. 1. Compute initial estimates for β, using a univariate GLM (i.e., assuming independence). 2. Compute Pearson residuals e ij. Compute estimates for α and φ. Compute R i (α) and V i (β, α) =φa 1/2 i (β)r i (α)a 1/2 i (β). 3. Update estimate for β: β (t+1) = β (t) N 4. Iterate until convergence. i=1 D ivi 1 D i 1 N i=1 D iv 1 Estimates of precision by means of I 1 0 and/or I 1 0 I 1 I 1 0. i (y i μ i ). ESALQ Course on Models for Longitudinal and Incomplete Data 96

102 9.7 Special Case: Linear Mixed Models Estimate for β: β(α) = i=1 X iw i X i with α replacedbyitsmlorremlestimate Conditional on α, β has mean N 1 N i=1 X iw i Y i E [ β(α) ] = N i=1 X iw i X i 1 N i=1 X iw i X i β = β provided that E(Y i )=X i β Hence, in order for β to be unbiased, it is sufficient that the mean of the response is correctly specified. ESALQ Course on Models for Longitudinal and Incomplete Data 97

103 Conditional on α, β has covariance Var( β) = N i=1 X iw i X i 1 N i=1 X iw i Var(Y i )W i X i N i=1 X iw 1 i X i = N i=1 X iw 1 i X i Note that this model-based version assumes that the covariance matrix Var(Y i ) is correctly modelled as V i = Z i DZ i +Σ i. An empirically corrected version is: Var( β) = N i=1 X iw i X i 1 } {{ } } {{ } } {{ } BREAD N i=1 X iw i Var(Y i )W i X i MEAT N i=1 X iw i X i BREAD 1 ESALQ Course on Models for Longitudinal and Incomplete Data 98

104 Chapter 10 A Family of GEE Methods Classical approach Prentice s two sets of GEE Linearization-based version GEE2 Alternating logistic regressions ESALQ Course on Models for Longitudinal and Incomplete Data 99

105 10.1 Prentice s GEE where N i=1 D iv 1 i (Y i μ i ) = 0, N i=1 E iw 1 i (Z i δ i ) = 0 Z ijk = (Y ij μ ij )(Y ik μ ik ) μij (1 μ ij )μ ik (1 μ ik ), δ ijk = E(Z ijk ) The joint asymptotic distribution of N(ˆβ β) and N( ˆα α) normal with variance-covariance matrix consistently estimated by N A 0 B C Λ 11 Λ 12 Λ 21 Λ 22 AB 0 C ESALQ Course on Models for Longitudinal and Incomplete Data 100

106 where A = B = C = N i=1 N i=1 N i=1 D iv 1 i E i W 1 i E i W 1 i 1 D i, E 1 N i i=1 1 E i, E i W 1 i Z i β N i=1 D i V 1 i 1 D i, Λ 11 = N i=1 Λ 12 = N i=1 Λ 21 = Λ 12, Λ 22 = N i=1 D iv i 1 Cov(Y i )Vi 1 D i, D iv i 1 Cov(Y i, Z i )Wi 1 E i, E iw i 1 Cov(Z i )Wi 1 E i, and Statistic Estimator Var(Y i ) (Y i μ i )(Y i μ i ) Cov(Y i, Z i ) (Y i μ i )(Z i δ i ) Var(Z i ) (Z i δ i )(Z i δ i ) ESALQ Course on Models for Longitudinal and Incomplete Data 101

107 10.2 GEE Based on Linearization Model formulation Previous version of GEE are formulated directly in terms of binary outcomes This approach is based on a linearization: with y i = μ i + ε i η i = g(μ i ), η i = X i β, Var(y i ) = Var(ε i )=Σ i. η i is a vector of linear predictors, g(.) is the (vector) link function. ESALQ Course on Models for Longitudinal and Incomplete Data 102

108 Estimation (Nelder and Wedderburn 1972) Solve iteratively: where N i=1 X iw i X i β = N i=1 W iy i, W i = F iσ 1 i F i, y i = ˆη i +(y i ˆμ i )F 1 i, F i = μ i η i, Σ i = Var(ε), μ i = E(y i ). Remarks: y i is called working variable or pseudo data. Basis for SAS macro and procedure GLIMMIX For linear models, D i = I ni and standard linear regression follows. ESALQ Course on Models for Longitudinal and Incomplete Data 103

109 The Variance Structure Σ i = φa 1/2 i (β)r i (α)a 1/2 i (β) φ is a scale (overdispersion) parameter, A i = v(μ i ), expressing the mean-variance relation (this is a function of β), R i (α) describes the correlation structure: If independence is assumed then R i (α) =I ni. Other structures, such as compound symmetry, AR(1),...can be assumed as well. ESALQ Course on Models for Longitudinal and Incomplete Data 104

110 10.3 GEE2 Model: Marginal mean structure Pairwise association: Odds ratios Correlations Working assumptions: Third and fourth moments Estimation: Second-order estimating equations Likelihood (assuming 3rd and 4th moments are correctly specified) ESALQ Course on Models for Longitudinal and Incomplete Data 105

111 10.4 Alternating Logistic Regression Diggle, Heagerty, Liang, and Zeger (2002) and Molenberghs and Verbeke (2005) When marginal odds ratios are used to model association, α can be estimated using ALR, which is almost as efficient as GEE2 almost as easy (computationally) than GEE1 μ ijk as before and α ijk =ln(ψ ijk ) the marginal log odds ratio: logit Pr(Y ij =1 x ij )=x ij β logit Pr(Y ij =1 Y ik = y ik )=α ijk y ik +ln μ ij μ ijk 1 μ ij μ ik + μ ijk ESALQ Course on Models for Longitudinal and Incomplete Data 106

112 α ijk can be modelled in terms of predictors the second term is treated as an offset the estimating equations for β and α are solved in turn, and the alternating between both sets is repeated until convergence. this is needed because the offset clearly depends on β. ESALQ Course on Models for Longitudinal and Incomplete Data 107

113 10.5 Application to the Toenail Data The model Consider the model: Y ij Bernoulli(μ ij ), log μ ij 1 μ ij = β 0 + β 1 T i + β 2 t ij + β 3 T i t ij Y ij : severe infection (yes/no) at occasion j for patient i t ij : measurement time for occasion j T i : treatment group ESALQ Course on Models for Longitudinal and Incomplete Data 108

114 Standard GEE SAS Code: proc genmod data=test descending; class idnum timeclss; model onyresp = treatn time treatn*time / dist=binomial; repeated subject=idnum / withinsubject=timeclss type=exch covb corrw modelse; run; SAS statements: The REPEATED statements defines the GEE character of the model. type= : working correlation specification (UN, AR(1), EXCH, IND,...) modelse : model-based s.e. s on top of default empirically corrected s.e. s corrw : printout of working correlation matrix withinsubject= : specification of the ordering within subjects ESALQ Course on Models for Longitudinal and Incomplete Data 109

115 Selected output: Regression parameters: Analysis Of Initial Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Intercept treatn time treatn*time Scale Estimates from fitting the model, ignoring the correlation structure, i.e., from fitting a classical GLM to the data, using proc GENMOD. The reported log-likelihood also corresponds to this model, and therefore should not be interpreted. The reported estimates are used as starting values in the iterative estimation procedure for fitting the GEE s. ESALQ Course on Models for Longitudinal and Incomplete Data 110

116 Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > Z Intercept treatn time <.0001 treatn*time Analysis Of GEE Parameter Estimates Model-Based Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > Z Intercept <.0001 treatn time <.0001 treatn*time The working correlation: Exchangeable Working Correlation Correlation ESALQ Course on Models for Longitudinal and Incomplete Data 111

117 Alternating Logistic Regression type=exch logor=exch Note that α now is a genuine parameter ESALQ Course on Models for Longitudinal and Incomplete Data 112

118 Selected output: Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > Z Intercept treatn time <.0001 treatn*time Alpha <.0001 Analysis Of GEE Parameter Estimates Model-Based Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > Z Intercept treatn time <.0001 treatn*time ESALQ Course on Models for Longitudinal and Incomplete Data 113

119 Linearization Based Method GLIMMIX macro: %glimmix(data=test, procopt=%str(method=ml empirical), stmts=%str( class idnum timeclss; model onyresp = treatn time treatn*time / solution; repeated timeclss / subject=idnum type=cs rcorr; ), error=binomial, link=logit); GLIMMIX procedure: proc glimmix data=test method=rspl empirical; class idnum; model onyresp (event= 1 ) = treatn time treatn*time / dist=binary solution; random _residual_ / subject=idnum type=cs; run; ESALQ Course on Models for Longitudinal and Incomplete Data 114

120 Both produce the same results The GLIMMIX macro is a MIXED core, with GLM-type surrounding statements The GLIMMIX procedure does not call MIXED, it has its own engine PROC GLIMMIX combines elements of MIXED and of GENMOD RANDOM residual is the PROC GLIMMIX way to specify residual correlation ESALQ Course on Models for Longitudinal and Incomplete Data 115

121 Results of Models Fitted to Toenail Data Effect Par. IND EXCH UN GEE1 Int. β (0.109;0.171) (0.134;0.173) (0.166;0.173) T i β (0.157;0.251) 0.012(0.187;0.261) 0.072(0.235;0.246) t ij β (0.025;0.030) (0.021;0.031) (0.028;0.029) T i t ij β (0.039;0.055) (0.036;0.057) (0.047;0.052) ALR Int. β (0.157;0.169) T i β (0.222;0.243) t ij β (0.023;0.030) T i t ij β (0.039;0.052) Ass. α 3.222( ;0.291) Linearization based method Int. β (0.112;0.171) (0.142;0.174) (0.171;0.172) T i β (0.160;0.251) 0.011(0.196;0.262) 0.036(0.242;0.242) t ij β (0.025;0.030) (0.022;0.031) (0.038;0.034) T i t ij β (0.040:0.055) (0.038;0.057) (0.058;0.058) estimate (model-based s.e.; empirical s.e.) ESALQ Course on Models for Longitudinal and Incomplete Data 116

122 Discussion GEE1: All empirical standard errors are correct, but the efficiency is higher for the more complex working correlation structure, as seen in p-values for T i t ij effect: Structure p-value IND EXCH UN Thus, opting for reasonably adequate correlation assumptions still pays off, in spite of the fact that all are consistent and asymptotically normal Similar conclusions for linearization-based method ESALQ Course on Models for Longitudinal and Incomplete Data 117

123 Model-based s.e. and empirically corrected s.e. in reasonable agreement for UN Typically, the model-based standard errors are much too small as they are based on the assumption that all observations in the data set are independent, hereby overestimating the amount of available information, hence also overestimating the precision of the estimates. ALR: similar inferences but now also α part of the inferences ESALQ Course on Models for Longitudinal and Incomplete Data 118

124 Part III Generalized Linear Mixed Models for Non-Gaussian Longitudinal Data ESALQ Course on Models for Longitudinal and Incomplete Data 119

125 Chapter 11 Generalized Linear Mixed Models (GLMM) Introduction: LMM Revisited Generalized Linear Mixed Models (GLMM) Fitting Algorithms Example ESALQ Course on Models for Longitudinal and Incomplete Data 120

126 11.1 Introduction: LMM Revisited We re-consider the linear mixed model: Y i b i N(X i β + Z i b i, Σ i ), b i N(0,D) The implied marginal model equals Y i N(X i β,z i DZ i +Σ i ) Hence, even under conditional independence, i.e., all Σ i equal to σ 2 I ni,amarginal association structure is implied through the random effects. The same ideas can now be applied in the context of GLM s to model association between discrete repeated measures. ESALQ Course on Models for Longitudinal and Incomplete Data 121

127 11.2 Generalized Linear Mixed Models (GLMM) Given a vector b i of random effects for cluster i, it is assumed that all responses Y ij are independent, with density f(y ij θ ij,φ)=exp { φ 1 [y ij θ ij ψ(θ ij )] + c(y ij,φ) } θ ij is now modelled as θ ij = x ij β + z ij b i As before, it is assumed that b i N(0,D) Let f ij (y ij b i, β,φ) denote the conditional density of Y ij given b i, the conditional density of Y i equals f i (y i b i, β,φ)= n i j=1 f ij(y ij b i, β,φ) ESALQ Course on Models for Longitudinal and Incomplete Data 122

128 The marginal distribution of Y i is given by f i (y i β,d,φ)= f i (y i b i, β,φ) f(b i D) db i = n i j=1 f ij(y ij b i, β,φ) f(b i D) db i where f(b i D) is the density of the N(0,D) distribution. The likelihood function for β, D, andφ now equals L(β,D,φ)= N i=1 f i(y i β,d,φ) = N i=1 n i j=1 f ij(y ij b i, β,φ) f(b i D) db i ESALQ Course on Models for Longitudinal and Incomplete Data 123

129 Under the normal linear model, the integral can be worked out analytically. In general, approximations are required: Approximation of integrand Approximation of data Approximation of integral Predictions of random effects can be based on the posterior distribution f(b i Y i = y i ) Empirical Bayes (EB) estimate : Posterior mode, with unknown parameters replaced by their MLE ESALQ Course on Models for Longitudinal and Incomplete Data 124

130 11.3 Laplace Approximation of Integrand Integrals in L(β,D,φ) can be written in the form I = e Q(b) db Second-order Taylor expansion of Q(b) around the mode yields Q(b) Q( b)+ 1 2 (b b) Q ( b)(b b), Quadratic term leads to re-scaled normal density. Hence, I (2π) q/2 Q ( b) 1/2 e Q( b). Exact approximation in case of normal kernels Good approximation in case of many repeated measures per subject ESALQ Course on Models for Longitudinal and Incomplete Data 125

131 11.4 Approximation of Data General Idea Re-write GLMM as: Y ij = μ ij + ε ij = h(x ijβ + z ijb i )+ε ij with variance for errors equal to Var(Y ij b i )=φv(μ ij ) Linear Taylor expansion for μ ij : Penalized quasi-likelihood (PQL): Around current β and b i Marginal quasi-likelihood (MQL): Around current β and b i = 0 ESALQ Course on Models for Longitudinal and Incomplete Data 126

132 Penalized quasi-likelihood (PQL) Linear Taylor expansion around current β and b i : Y ij h(x ij β + z ij b i ) + h (x ij β + z ij b i )x ij(β β) + h (x ij β + z ij b i )z ij(b i b i )+ε ij μ ij + v( μ ij )x ij(β β)+v( μ ij )z ij(b i b i )+ε ij In vector notation: Y i μ i + V i X i (β β)+ V i Z i (b i b i )+ε i Re-ordering terms yields: Y 1 i V i (Y i μ i )+X i β + Z i b i X i β + Z i b i + ε i, Model fitting by iterating between updating the pseudo responses Y i and fitting the above linear mixed model to them. ESALQ Course on Models for Longitudinal and Incomplete Data 127

133 Marginal quasi-likelihood (MQL) Linear Taylor expansion around current β and b i = 0: Y ij h(x ij β) + h (x ij β)x ij(β β) + h (x ij β)z ijb i + ε ij μ ij + v( μ ij )x ij(β β)+v( μ ij )z ijb i + ε ij In vector notation: Y i μ i + V i X i (β β)+ V i Z i b i + ε i Re-ordering terms yields: Y 1 i V i (Y i μ i )+X i β X i β + Z i b i + ε i Model fitting by iterating between updating the pseudo responses Y i and fitting the above linear mixed model to them. ESALQ Course on Models for Longitudinal and Incomplete Data 128

134 PQL versus MQL MQL only performs reasonably well if random-effects variance is (very) small Both perform bad for binary outcomes with few repeated measurements per cluster With increasing number of measurements per subject: MQL remains biased PQL consistent Improvements possible with higher-order Taylor expansions ESALQ Course on Models for Longitudinal and Incomplete Data 129

135 11.5 Approximation of Integral The likelihood contribution of every subject is of the form f(z)φ(z)dz where φ(z) is the density of the (multivariate) normal distribution Gaussian quadrature methods replace the integral by a weighted sum: f(z)φ(z)dz Q q=1 w qf(z q ) Q is the order of the approximation. The higher Q the more accurate the approximation will be ESALQ Course on Models for Longitudinal and Incomplete Data 130

136 The nodes (or quadrature points) z q are solutions to the Qth order Hermite polynomial The w q are well-chosen weights The nodes z q and weights w q are reported in tables. Alternatively, an algorithm is available for calculating all z q and w q for any value Q. With Gaussian quadrature, the nodes and weights are fixed, independent of f(z)φ(z). With adaptive Gaussian quadrature, the nodes and weights are adapted to the support of f(z)φ(z). ESALQ Course on Models for Longitudinal and Incomplete Data 131

137 Graphically (Q =10): ESALQ Course on Models for Longitudinal and Incomplete Data 132

138 Typically, adaptive Gaussian quadrature needs (much) less quadrature points than classical Gaussian quadrature. On the other hand, adaptive Gaussian quadrature is much more time consuming. Adaptive Gaussian quadrature of order one is equivalent to Laplace transformation. ESALQ Course on Models for Longitudinal and Incomplete Data 133

139 11.6 Example: Toenail Data Y ij is binary severity indicator for subject i at visit j. Model: Y ij b i Bernoulli(π ij ), log π ij 1 π ij = β 0 + b i + β 1 T i + β 2 t ij + β 3 T i t ij Notation: T i : treatment indicator for subject i t ij : time point at which jth measurement is taken for ith subject Adaptive as well as non-adaptive Gaussian quadrature, for various Q. ESALQ Course on Models for Longitudinal and Incomplete Data 134

140 Results: Gaussian quadrature Q =3 Q =5 Q =10 Q =20 Q =50 β (0.31) (0.39) (0.32) (0.69) (0.43) β (0.38) 0.19 (0.36) 0.47 (0.36) (0.80) (0.57) β (0.03) (0.04) (0.05) (0.05) (0.05) β (0.05) (0.07) (0.07) (0.07) (0.07) σ 2.26 (0.12) 3.09 (0.21) 4.53 (0.39) 3.86 (0.33) 4.04 (0.39) 2l Adaptive Gaussian quadrature Q =3 Q =5 Q =10 Q =20 Q =50 β (0.59) (0.40) (0.45) (0.43) (0.44) β (0.64) (0.54) (0.59) (0.59) (0.59) β (0.05) (0.04) (0.05) (0.05) (0.05) β (0.07) (0.07) (0.07) (0.07) (0.07) σ 4.51 (0.62) 3.70 (0.34) 4.07 (0.43) 4.01 (0.38) 4.02 (0.38) 2l ESALQ Course on Models for Longitudinal and Incomplete Data 135

141 Conclusions: (Log-)likelihoods are not comparable Different Q can lead to considerable differences in estimates and standard errors For example, using non-adaptive quadrature, with Q =3, we found no difference in time effect between both treatment groups (t = 0.09/0.05,p=0.0833). Using adaptive quadrature, with Q =50, we find a significant interaction between the time effect and the treatment (t = 0.16/0.07,p=0.0255). Assuming that Q =50is sufficient, the final results are well approximated with smaller Q under adaptive quadrature, but not under non-adaptive quadrature. ESALQ Course on Models for Longitudinal and Incomplete Data 136

142 Comparison of fitting algorithms: Adaptive Gaussian Quadrature, Q =50 MQL and PQL Summary of results: Parameter QUAD PQL MQL Intercept group A 1.63 (0.44) 0.72 (0.24) 0.56 (0.17) Intercept group B 1.75 (0.45) 0.72 (0.24) 0.53 (0.17) Slope group A 0.40 (0.05) 0.29 (0.03) 0.17 (0.02) Slope group B 0.57 (0.06) 0.40 (0.04) 0.26 (0.03) Var. random intercepts (τ 2 ) (3.02) 4.71 (0.60) 2.49 (0.29) Severe differences between QUAD (gold standard?) and MQL/PQL. MQL/PQL may yield (very) biased results, especially for binary data. ESALQ Course on Models for Longitudinal and Incomplete Data 137

143 Chapter 12 Fitting GLMM s in SAS Proc GLIMMIX for PQL and MQL Proc NLMIXED for Gaussian quadrature ESALQ Course on Models for Longitudinal and Incomplete Data 138

144 12.1 Procedure GLIMMIX for PQL and MQL Re-consider logistic model with random intercepts for toenail data SAS code (PQL): proc glimmix data=test method=rspl ; class idnum; model onyresp (event= 1 ) = treatn time treatn*time / dist=binary solution; random intercept / subject=idnum; run; MQL obtained with option method=rmpl Inclusion of random slopes: random intercept time / subject=idnum type=un; ESALQ Course on Models for Longitudinal and Incomplete Data 139

145 12.2 Procedure NLMIXED for Gaussian Quadrature Re-consider logistic model with random intercepts for toenail data SAS program (non-adaptive, Q =3): proc nlmixed data=test noad qpoints=3; parms beta0=-1.6 beta1=0 beta2=-0.4 beta3=-0.5 sigma=3.9; teta = beta0 + b + beta1*treatn + beta2*time + beta3*timetr; expteta = exp(teta); p = expteta/(1+expteta); model onyresp ~ binary(p); random b ~ normal(0,sigma**2) subject=idnum; run; Adaptive Gaussian quadrature obtained by omitting option noad ESALQ Course on Models for Longitudinal and Incomplete Data 140

146 Automatic search for optimal value of Q in case of no option qpoints= Good starting values needed! The inclusion of random slopes can be specified as follows: proc nlmixed data=test noad qpoints=3; parms beta0=-1.6 beta1=0 beta2=-0.4 beta3=-0.5 d11=3.9 d12=0 d22=0.1; teta = beta0 + b1 + beta1*treatn + beta2*time + b2*time + beta3*timetr; expteta = exp(teta); p = expteta/(1+expteta); model onyresp ~ binary(p); random b1 b2 ~ normal([0, 0], [d11, d12, d22]) subject=idnum; run; ESALQ Course on Models for Longitudinal and Incomplete Data 141

147 Part IV Marginal Versus Random-effects Models ESALQ Course on Models for Longitudinal and Incomplete Data 142

148 Chapter 13 Marginal Versus Random-effects Models Interpretation of GLMM parameters Marginalization of GLMM Example: Toenail data ESALQ Course on Models for Longitudinal and Incomplete Data 143

149 13.1 Interpretation of GLMM Parameters: Toenail Data We compare our GLMM results for the toenail data with those from fitting GEE s (unstructured working correlation): GLMM GEE Parameter Estimate (s.e.) Estimate (s.e.) Intercept group A (0.4356) (0.1656) Intercept group B (0.4478) (0.1671) Slope group A (0.0460) (0.0277) Slope group B (0.0601) (0.0380) ESALQ Course on Models for Longitudinal and Incomplete Data 144

150 The strong differences can be explained as follows: Consider the following GLMM: Y ij b i Bernoulli(π ij ), log = β 0 + b i + β 1 t ij 1 π ij The conditional means E(Y ij b i ), as functions of t ij,aregivenby π ij E(Y ij b i ) = exp(β 0 + b i + β 1 t ij ) 1+exp(β 0 + b i + β 1 t ij ) ESALQ Course on Models for Longitudinal and Incomplete Data 145

151 The marginal average evolution is now obtained from averaging over the random effects: exp(β 0 + b i + β 1 t ij ) E(Y ij ) = E[E(Y ij b i )] = E 1 + exp(β 0 + b i + β 1 t ij ) exp(β 0 + β 1 t ij ) 1 + exp(β 0 + β 1 t ij ) ESALQ Course on Models for Longitudinal and Incomplete Data 146

152 Hence, the parameter vector β in the GEE model needs to be interpreted completely different from the parameter vector β in the GLMM: GEE: marginal interpretation GLMM: conditional interpretation, conditionally upon level of random effects In general, the model for the marginal average is not of the same parametric form as the conditional average in the GLMM. For logistic mixed models, with normally distributed random random intercepts, it can be shown that the marginal model can be well approximated by again a logistic model, but with parameters approximately satisfying β RE β M = c 2 σ > 1, σ 2 = variance random intercepts c =16 3/(15π) ESALQ Course on Models for Longitudinal and Incomplete Data 147

153 For the toenail application, σ was estimated as , such that the ratio equals c2 σ = The ratio s between the GLMM and GEE estimates are: GLMM GEE Parameter Estimate (s.e.) Estimate (s.e.) Ratio Intercept group A (0.4356) (0.1656) Intercept group B (0.4478) (0.1671) Slope group A (0.0460) (0.0277) Slope group B (0.0601) (0.0380) Note that this problem does not occur in linear mixed models: Conditional mean: E(Y i b i )=X i β + Z i b i Specifically: E(Y i b i = 0) =X i β Marginal mean: E(Y i )=X i β ESALQ Course on Models for Longitudinal and Incomplete Data 148

154 The problem arises from the fact that, in general, E[g(Y )] g[e(y )] So, whenever the random effects enter the conditional mean in a non-linear way, the regression parameters in the marginal model need to be interpreted differently from the regression parameters in the mixed model. In practice, the marginal mean can be derived from the GLMM output by integrating out the random effects. This can be done numerically via Gaussian quadrature, or based on sampling methods. ESALQ Course on Models for Longitudinal and Incomplete Data 149

155 13.2 Marginalization of GLMM: Toenail Data As an example, we plot the average evolutions based on the GLMM output obtained in the toenail example: P (Y ij =1) = E E exp( b i t ij ), 1+exp( b i t ij ) exp( b i t ij ), 1+exp( b i t ij ) ESALQ Course on Models for Longitudinal and Incomplete Data 150

156 Average evolutions obtained from the GEE analyses: P (Y ij =1) = exp( t ij ) 1+exp( t ij ) exp( t ij ) 1+exp( t ij ) ESALQ Course on Models for Longitudinal and Incomplete Data 151

157 In a GLMM context, rather than plotting the marginal averages, one can also plot the profile for an average subject, i.e., a subject with random effect b i = 0: P (Y ij =1 b i =0) = exp( t ij ) 1+exp( t ij ) exp( t ij ) 1+exp( t ij ) ESALQ Course on Models for Longitudinal and Incomplete Data 152

158 13.3 Example: Toenail Data Revisited Overview of all analyses on toenail data: Parameter QUAD PQL MQL GEE Intercept group A 1.63 (0.44) 0.72 (0.24) 0.56 (0.17) 0.72 (0.17) Intercept group B 1.75 (0.45) 0.72 (0.24) 0.53 (0.17) 0.65 (0.17) Slope group A 0.40 (0.05) 0.29 (0.03) 0.17 (0.02) 0.14 (0.03) Slope group B 0.57 (0.06) 0.40 (0.04) 0.26 (0.03) 0.25 (0.04) Var. random intercepts (τ 2 ) (3.02) 4.71 (0.60) 2.49 (0.29) Conclusion: GEE < MQL < PQL < QUAD ESALQ Course on Models for Longitudinal and Incomplete Data 153

159 Part V Non-linear Models ESALQ Course on Models for Longitudinal and Incomplete Data 154

160 Chapter 14 Non-Linear Mixed Models From linear to non-linear models Orange Tree Example Song Bird Example ESALQ Course on Models for Longitudinal and Incomplete Data 155

161 14.1 From Linear to Non-linear Models We have studied: linear models generalized linear models for Gaussian data non-gaussian data for cross-sectional (univariate) data for correlated data (clustered data, multivariate data, longitudinal data) In all cases, a certain amount of linearity is preserved: E.g., in generalized linear models, linearity operates at the level of the linear predictor, describing a transformation of the mean (logit, log, probit,...) This implies that all predictor functions are linear in the parameters: β 0 + β 1 x 1i +...β p x pi ESALQ Course on Models for Longitudinal and Incomplete Data 156

162 This may still be considered a limiting factor, and non-linearity in the parametric functions may be desirable. Examples: Certain growth curves, especially when they include both a growth spurt as well as asymptote behavior towards the end of growth Dose-response modeling Pharmacokinetic and pharmacodynamic models ESALQ Course on Models for Longitudinal and Incomplete Data 157

163 14.2 LMM and GLMM In linear mixed models, the mean is modeled as a linear function of regression parameters and random effects: E(Y ij b i )=x ij β + z ij b i In generalized linear mixed models, apart from a link function, the mean is again modeled as a linear function of regression parameters and random effects: E(Y ij b i )=h(x ij β + z ij b i ) In some applications, models are needed, in which the mean is no longer modeled as a function of a linear predictor x ij β + z ij b i. These are called non-linear mixed models. ESALQ Course on Models for Longitudinal and Incomplete Data 158

164 14.3 Non-linear Mixed Models In non-linear mixed models, it is assumed that the conditional distribution of Y ij, given b i is belongs to the exponential family (Normal, Binomial, Poisson,...). The mean is modeled as: E(Y ij b i )=h(x ij, β, z ij, b i ) As before, the random effects are assumed to be normally distributed, with mean 0 and covariance D. Let f ij (y ij b i, β,φ) be the conditional density of Y ij given b i,andletf(b i D) be the density of the N(0,D) distribution. ESALQ Course on Models for Longitudinal and Incomplete Data 159

165 Under conditional independence, the marginal likelihood equals L(β,D,φ)= N i=1 f i(y i β,d,φ) = N i=1 n i j=1 f ij(y ij b i, β,φ) f(b i D) db i The above likelihood is of the same form as the likelihood obtained earlier for a generalized linear mixed model. As before, numerical quadrature is used to approximate the integral Non-linear mixed models can also be fitted within the SAS procedure NLMIXED. ESALQ Course on Models for Longitudinal and Incomplete Data 160

166 14.4 Example: Orange Trees We consider an experiment in which the trunk circumference (in mm) is measured for 5 orange trees, on 7 different occasions. Data: Response Day Tree 1 Tree 2 Tree 3 Tree 4 Tree ESALQ Course on Models for Longitudinal and Incomplete Data 161

167 The following non-linear mixed model has been proposed: Y ij = β 1 + b i 1 + exp[ (t ij β 2 )/β 3 ] + ε ij b i N(0,σ 2 b) ε ij N(0,σ 2 ) ESALQ Course on Models for Longitudinal and Incomplete Data 162

168 In SAS PROC NLMIXED, the model can be fitted as follows: proc nlmixed data=tree; parms beta1=190 beta2=700 beta3=350 sigmab=10 sigma=10; num = b; ex = exp(-(day-beta2)/beta3); den = 1 + ex; model y ~ normal(num/den,sigma**2); random b ~ normal(beta1,sigmab**2) subject=tree; run; An equivalent program is given by proc nlmixed data=tree; parms beta1=190 beta2=700 beta3=350 sigmab=10 sigma=10; num = b + beta1; ex = exp(-(day-beta2)/beta3); den = 1 + ex; model y ~ normal(num/den,sigma**2); random b ~ normal(0,sigmab**2) subject=tree; run; ESALQ Course on Models for Longitudinal and Incomplete Data 163

169 Selected output: Specifications Data Set Dependent Variable Distribution for Dependent Variable Random Effects Distribution for Random Effects Subject Variable Optimization Technique Integration Method WORK.TREE y Normal b Normal tree Dual Quasi-Newton Adaptive Gaussian Quadrature Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > t beta beta <.0001 beta sigmab sigma ESALQ Course on Models for Longitudinal and Incomplete Data 164

170 Note that the number of quadrature points was selected adaptively to be equal to only one. Refitting the model with 50 quadrature points yielded identical results. Empirical Bayes estimates, and subject-specific predictions can be obtained as follows: proc nlmixed data=tree; parms beta1=190 beta2=700 beta3=350 sigmab=10 sigma=10; num = b + beta1; ex = exp(-(day-beta2)/beta3); den = 1 + ex; ratio=num/den; model y ~ normal(ratio,sigma**2); random b ~ normal(0,sigmab**2) subject=tree out=eb; predict ratio out=ratio; run; ESALQ Course on Models for Longitudinal and Incomplete Data 165

171 We can now compare the observed data to the subject-specific predictions ŷ ij = β 1 + b i 1+exp[ (t ij β 2 )/ β 3 ] ESALQ Course on Models for Longitudinal and Incomplete Data 166

172 14.5 Example: Song Birds At the University of Antwerp, a novel in-vivo MRI approach to discern the functional characteristics of specific neuronal populations in a strongly connected brain circuitry has been established. Of particular interest: the song control system (SCS) in songbird brain. The high vocal center (HVC), one of the major nuclei in this circuit, contains interneurons and two distinct types of neurons projecting respectively to the nucleus robustus archistriatalis, RA or to area X. ESALQ Course on Models for Longitudinal and Incomplete Data 167

173 Schematically, ESALQ Course on Models for Longitudinal and Incomplete Data 168

174 The MRI Data T1-weighted multi slice SE images were acquired (Van Meir et al 2003). After acquisition of a set of control images, MnCl 2 was injected in the cannulated HVC. Two sets of 5 coronal slices (one through HVC and RA, one through area X) were acquired every 15 min for up to 6 7 hours after injection. This results in 30 data sets of 10 slices of each bird (5 controls and 5 testosterone treated). The change of relative SI of each time point is calculated from the mean signal intensity determined within the defined regions of interest (area X and RA) and in an adjacent control area. ESALQ Course on Models for Longitudinal and Incomplete Data 169

175 Initial Study The effect of testosterone (T) on the dynamics of Mn 2+ accumulation in RA and area X of female starling has been established. This has been done with dynamic ME-MRI in individual birds injected with Manganese in their HVC. This was done in a 2-stage approach: The individual SI data, determined as the mean intensity of a region of interest, were submitted to a sigmoid curve fitting providing curve parameters which revealed upon a two-way repeated ANOVA analysis that neurons projecting to RA and those to area X were affected differently by testosterone treatment. However, this approach could be less reliable if the fit-parameters were mutually dependent. ESALQ Course on Models for Longitudinal and Incomplete Data 170

177 Thus: an integrated non-linear modeling approach is necessary: the MRI signal intensities (SI) need to be fitted to a non-linear mixed model assuming that the SI follow a pre-specified distribution depending on a covariate indicating the time after MnCl 2 injection and parameterized through fixed and random (bird-specific) effects. An initial model needs to be proposed and then subsequently simplified. ESALQ Course on Models for Longitudinal and Incomplete Data 172

178 A Model for RA in the Second Period Let RA ij be the measurement at time j for bird i. The following initial non-linear model is assumed: RA ij = (φ 0 + φ 1 G i + f i )T η 0+η 1 G i +n i ij (τ 0 + τ 1 G i + t i ) η 0+η 1 G i +n i + T η 0+η 1 G i +n i ij +γ 0 + γ 1 G i + ε ij The following conventions are used: G i is an indicator for group membership (0 for control, 1 for active) T ij is a covariate indicating the measurement time The fixed effects parameters: ESALQ Course on Models for Longitudinal and Incomplete Data 173

179 the intercept parameters φ 0, η 0, τ 0,andγ 0 the group effect parameters φ 1, η 1, τ 1,andγ 1 If the latter are simultaneously zero, there is no difference between both groups. If at least one of them is (significantly) different from zero, then the model indicates a difference between both groups. There are three bird-specific or random effects, f i, n i,andt i, following a trivariate zero-mean normal and general covariance matrix D The residual error terms ε ij are assumed to be mutually independent and independent from the random effects, and to follow a N(0,σ 2 ) distribution. Thus, the general form of the model has 8 fixed-effects parameters, and 7 variance components (3 variances in D, 3 covariances in D, andσ 2 ). ESALQ Course on Models for Longitudinal and Incomplete Data 174

180 SAS program: data hulp2; set m.vincent03; if time <= 0 then delete; if periode = 1 then delete; run; proc nlmixed data=hulp2 qpoints=3; parms phim=0.64 phimdiff=0 eta=1.88 etadiff=0 tau=3.68 taudiff=0 gamma=-0.01 gdiff=0 d11=0.01 sigma2=0.01 d22=0.01 d33=0.01; teller = (phim + phimdiff * groep + vm ) * (time ** (eta + etadiff * groep + n )); noemer = ((tau + t + taudiff * groep ) ** (eta + etadiff * groep +n) ) + (time ** (eta + etadiff * groep + n)); gemid = teller/noemer + gamma + gdiff * groep; model si_ra ~ normal(gemid,sigma2); random vm t n ~ normal([0, 0, 0],[d11,0,d22, 0, 0, d33]) subject=vogel out=m.eb; run; Gaussian quadrature is used. ESALQ Course on Models for Longitudinal and Incomplete Data 175

181 The covariances in D are set equal to zero, due to computational difficulty. Fitting the model produces the following output. First, the bookkeeping information is provided: Specifications Data Set Dependent Variable Distribution for Dependent Variable Random Effects Distribution for Random Effects Subject Variable Optimization Technique Integration Method WORK.HULP2 SI_RA Normal vm t n Normal vogel Dual Quasi-Newton Adaptive Gaussian Quadrature Dimensions Observations Used 262 Observations Not Used 58 Total Observations 320 Subjects 10 Max Obs Per Subject 28 Parameters 12 Quadrature Points 3 ESALQ Course on Models for Longitudinal and Incomplete Data 176

182 Next, starting values and the corresponding likelihood are provided: Parameters phim phimdiff eta etadiff tau taudiff gamma gdiff d sigma2 d22 d33 NegLogLike The iteration history tells convergence is not straightforward: Iteration History Iter Calls NegLogLike Diff MaxGrad Slope ESALQ Course on Models for Longitudinal and Incomplete Data 177

183 E E-6 NOTE: GCONV convergence criterion satisfied. ESALQ Course on Models for Longitudinal and Incomplete Data 178

184 The fit statistics can be used for model comparison, as always: Fit Statistics -2 Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) Finally, parameter estimates are provided: Parameter Estimates Standard Parameter Estimate Error DF t Value Pr > t Alpha Lower Upper Gradient phim phimdiff eta < E-6 etadiff tau < taudiff gamma gdiff d sigma < d d ESALQ Course on Models for Longitudinal and Incomplete Data 179

185 Next, a backward selection is conducted, using likelihood ratio tests. First, the variance d 33 of n i was removed. The corresponding test statistic has a χ 2 0:1 null distribution (p =0.4387). Next, fixed-effect parameters γ 0, γ 1, τ 1,andφ 1 are removed. The final model is: RA ij = (φ 0 + f i )T η 0+η 1 G i ij (τ 0 + t i ) η 0+η 1 G i + T η 0+η 1 G i ij + ε ij ESALQ Course on Models for Longitudinal and Incomplete Data 180

186 Overview of parameter estimates and standard errors for initial and final model: Estimate (s.e.) Effect Parameter Initial Final φ (0.0655) (0.0478) φ (0.0923) η (0.1394) (0.0802) η (0.1865) (0.1060) τ (0.2346) (0.1761) τ (0.3250) γ (0.0044) γ (0.0059) Var(f i ) d (0.0092) (0.0101) Var(t i ) d (0.1169) (0.1338) Var(n i ) d (0.0349) Var(ε ij ) σ ( ) ( ) ESALQ Course on Models for Longitudinal and Incomplete Data 181

187 AREA.X at the Second Period The same initial model will be fitted to AREA ij. In this case, a fully general model can be fitted, including also the covariances between the random effects. Code to do so: proc nlmixed data=hulp2 qpoints=3; parms phim= phimdiff= eta= etadiff= tau= taudiff= gamma= gdiff= sigma2= d11= d22= d33= d12=0 d13=0 d23=0; teller = (phim + phimdiff * groep + vm) * (time ** (eta + etadiff * groep + n )); noemer = ((tau + taudiff * groep +t ) ** (eta + etadiff * groep + n) ) + (time ** (eta + etadiff * groep +n)); gemid = teller/noemer + gamma + gdiff * groep; model si_area_x ~ normal(gemid,sigma2); random vm t n ~ normal([0, 0, 0],[d11, d12, d22, d13, d23, d33]) subject=vogel out=m.eb2; run; ESALQ Course on Models for Longitudinal and Incomplete Data 182

188 Model simplification: First, the random n i effect is removed (implying the removal of d 11, d 12,and d 13 ), using a likelihood ratio test statistic with value 4.08 and null distribution χ 2 2:3. The corresponding p = Removal of the random t i effect is not possible since the likelihood ratio equals on 1:2 degrees of freedom (p <0.0001). Removal of the covariance between the random t i and f i effects is not possible (G 2 =4.35 on 1 d.f., p =0.0371). Next, the following fixed-effect parameters were removed: γ 0, γ 1, η 1 and τ 1. The fixed-effect φ 1 was found to be highly significant and therefore could not be removed from the model (G 2 = on 1 d.f., p =0.0011). This indicates a significant difference between the two groups. The resulting final model is: AREA ij = (φ 0 + φ 1 G i + f i )T η 0 ij (τ 0 + t i ) η 0 + T η 0 ij + ε ij. ESALQ Course on Models for Longitudinal and Incomplete Data 183

189 Estimate (s.e.) Effect Parameter Initial Final φ (0.0333) (0.0261) φ (0.0458) (0.0312) η (0.5390) (0.1498) η (0.5631) τ (0.5662) (0.3262) τ (0.6226) γ (0.0032) γ (0.0048) Var(f i ) d (0.0020) (0.0022) Var(t i ) d (0.2365) (0.2881) Var(n i ) d (0.1858) Cov(f i,t i ) d (0.0205) (0.0229) Cov(f i,n i ) d (0.0159) Cov(t i,n i ) d (0.1615) Var(ε ij ) σ ( ) ( ) ESALQ Course on Models for Longitudinal and Incomplete Data 184

190 Time Individual curves, showing data points as well as empirical Bayes predictions for the bird-specific curves, show that the sigmoidal curves describe the data quite well. SI.area.X Time SI.area.X Time SI.area.X Time SI.area.X Time SI.area.X Time Time SI.area.X Time Time SI.area.X Time SI.area.X ESALQ Course on Models for Longitudinal and Incomplete Data 185 SI.area.X SI.area.X

191 SI.area.X Group 0 Group 1 Marg. Group 0 Marg. Goup 1 We can also explore all individual as well as marginal average fitted curves per group. The marginal effect was obtained using the sampling-based method. It is clear that Area.X is higher for most treated birds (group 1) compared to the untreated birds (group 0) Time ESALQ Course on Models for Longitudinal and Incomplete Data 186

192 Part VI Incomplete Data ESALQ Course on Models for Longitudinal and Incomplete Data 187

193 Chapter 15 Setting The Scene Orthodontic growth data Notation Taxonomies ESALQ Course on Models for Longitudinal and Incomplete Data 188

194 15.1 Growth Data Taken from Potthoff and Roy, Biometrika (1964) Research question: Is dental growth related to gender? The distance from the center of the pituitary to the maxillary fissure was recorded at ages 8, 10, 12, and 14, for 11 girls and 16 boys ESALQ Course on Models for Longitudinal and Incomplete Data 189

195 Individual profiles: Much variability between girls / boys Considerable variability within girls / boys Fixed number of measurements per subject Measurements taken at fixed time points ESALQ Course on Models for Longitudinal and Incomplete Data 190

196 15.2 Incomplete Longitudinal Data ESALQ Course on Models for Longitudinal and Incomplete Data 191

197 15.3 Scientific Question In terms of entire longitudinal profile In terms of last planned measurement In terms of last observed measurement ESALQ Course on Models for Longitudinal and Incomplete Data 192

198 15.4 Notation Subject i at occasion (time) j =1,...,n i Measurement Y ij Missingness indicator R ij = 1 if Y ij is observed, 0 otherwise. Group Y ij into a vector Y i =(Y i1,...,y ini ) =(Y o i, Y m i ) Y o i contains Y ij for which R ij =1, Y m i contains Y ij for which R ij =0. Group R ij into a vector R i =(R i1,...,r ini ) D i :timeofdropout:d i =1+ n i j=1 R ij ESALQ Course on Models for Longitudinal and Incomplete Data 193

199 15.5 Framework f(y i,d i θ, ψ) Selection Models: f(y i θ) f(d i Y o i, Y m i, ψ) MCAR MAR MNAR f(d i ψ) f(d i Y o i, ψ) f(d i Y o i, Y m i, ψ) Pattern-Mixture Models: f(y i D i, θ)f(d i ψ) Shared-Parameter Models: f(y i b i, θ)f(d i b i, ψ) ESALQ Course on Models for Longitudinal and Incomplete Data 194

200 f(y i,d i θ, ψ) Selection Models: f(y i θ) f(d i Y o i, Y m i, ψ) MCAR MAR MNAR CC? direct likelihood! joint model!? LOCF? expectation-maximization (EM). sensitivity analysis?! imputation? multiple imputation (MI).. (weighted) GEE! ESALQ Course on Models for Longitudinal and Incomplete Data 195

201 Chapter 16 Proper Analysis of Incomplete Data Simple methods Bias for LOCF and CC Direct likelihood inference Weighted generalized estimating equations ESALQ Course on Models for Longitudinal and Incomplete Data 196

202 16.1 Incomplete Longitudinal Data ESALQ Course on Models for Longitudinal and Incomplete Data 197

203 Data and Modeling Strategies ESALQ Course on Models for Longitudinal and Incomplete Data 198

204 Modeling Strategies ESALQ Course on Models for Longitudinal and Incomplete Data 199

205 16.2 Simple Methods MCAR Complete case analysis: delete incomplete subjects Standard statistical software Loss of information Impact on precision and power Missingness MCAR bias Last observation carried forward: impute missing values Standard statistical software Increase of information Constant profile after dropout: unrealistic Usually bias ESALQ Course on Models for Longitudinal and Incomplete Data 200

206 Quantifying the Bias Dropouts t ij =0 Probability p 0 Treatment indicator T i =0, 1 E(Y ij )=β 0 + β 1 T i + β 2 t ij + β 3 T i t ij Completers t ij =0, 1 Probability 1 p 0 = p 1 Treatment indicator T i =0, 1 E(Y ij )=γ 0 +γ 1 T i +γ 2 t ij +γ 3 T i t ij CC LOCF MCAR 0 (p 1 p 0 )β 2 (1 p 1 )β 3 σ[(1 p 1 )(β 0 + β 1 γ 0 γ 1 ) p 1 (γ 0 + γ 1 + γ 2 + γ 3 )+(1 p 1 )(β 0 + β 1 ) MAR (1 p 0 )(β 0 γ 0 )] p 0 (γ 0 + γ 2 ) (1 p 0 )β 0 γ 1 γ 3 σ[(1 p 1 )(β 0 + β 1 γ 0 γ 1 ) (1 p 0 )(β 0 γ 0 )] ESALQ Course on Models for Longitudinal and Incomplete Data 201

207 16.3 Direct Likelihood Maximization MAR : f(y o i θ) f(d i Y o i, ψ) ignorability Mechanism is MAR θ and ψ distinct Interest in θ Use observed information matrix = Likelihood inference is valid Outcome type Modeling strategy Software Gaussian Linear mixed model SAS proc MIXED Non-Gaussian Generalized linear mixed model SAS proc GLIMMIX, NLMIXED ESALQ Course on Models for Longitudinal and Incomplete Data 202

208 Original, Complete Orthodontic Growth Data Mean Covar # par 1 unstructured unstructured 18 2 slopes unstructured 14 3 = slopes unstructured 13 7 slopes CS 6 ESALQ Course on Models for Longitudinal and Incomplete Data 203

209 Trimmed Growth Data: Simple Methods Method Model Mean Covar # par Complete case 7a = slopes CS 5 LOCF 2a quadratic unstructured 16 Unconditional mean 7a = slopes CS 5 Conditional mean 1 unstructured unstructured 18 distorting ESALQ Course on Models for Longitudinal and Incomplete Data 204

210 Trimmed Growth Data: Direct Likelihood Mean Covar # par 7 slopes CS 6 ESALQ Course on Models for Longitudinal and Incomplete Data 205

211 Growth Data: Comparison of Analyses Data Complete cases LOCF imputed data Model Unstructured group by time mean Unstructured covariance matrix All available data Analysis methods Direct likelihood ML REML MANOVA ANOVA per time point ESALQ Course on Models for Longitudinal and Incomplete Data 206

212 Principle Method Boys at Age 8 Boys at Age 10 Original Direct likelihood, ML (0.56) (0.49) Direct likelihood, REML MANOVA (0.58) (0.51) ANOVA per time point (0.61) (0.53) Direct Lik. Direct likelihood, ML (0.56) (0.68) Direct likelihood, REML (0.58) (0.71) MANOVA (0.48) (0.66) ANOVA per time point (0.61) (0.74) CC Direct likelihood, ML (0.45) (0.62) Direct likelihood, REML MANOVA (0.48) (0.66) ANOVA per time point (0.51) (0.74) LOCF Direct likelihood, ML (0.56) (0.65) Direct likelihood, REML MANOVA (0.58) (0.68) ANOVA per time point (0.61) (0.72) ESALQ Course on Models for Longitudinal and Incomplete Data 207

213 Behind the Scenes R completers N R incompleters Y i1 Y i2 N μ 1 μ 2, σ 11 σ 12 σ 22 Conditional density Y i2 y i1 N(β 0 + β 1 y i1,σ 22.1 ) μ 1 freq. & lik. μ1 = 1 N μ 2 frequentist μ2 = 1 R μ 2 likelihood μ2 = 1 N N i=1 y i1 R i=1 y i2 R i=1 y i2 + N i=r+1 [ y2 + β 1 (y i1 y 1 ) ] ESALQ Course on Models for Longitudinal and Incomplete Data 208

214 Growth Data: Further Comparison of Analyses Principle Method Boys at Age 8 Boys at Age 10 Original Direct likelihood, ML (0.56) (0.49) Direct Lik. Direct likelihood, ML (0.56) (0.68) CC Direct likelihood, ML (0.45) (0.62) LOCF Direct likelihood, ML (0.56) (0.65) Data Mean Covariance Boys at Age 8 Boys at Age 10 Complete Unstructured Unstructured Unstructured CS Unstructured Independence Incomplete Unstructured Unstructured Unstructured CS Unstructured Independence ESALQ Course on Models for Longitudinal and Incomplete Data 209

216 Growth Data: SAS Code for Model 1 IDNR AGE SEX MEASURE SAS code: proc mixed data = growth method = ml; class sex idnr age; model measure = sex age*sex / s; repeated age / type = un subject = idnr; run; Subjects in terms of IDNR blocks ageensures proper ordering of observations within subjects! ESALQ Course on Models for Longitudinal and Incomplete Data 211

217 Growth Data: SAS Code for Model 2 IDNR AGE SEX MEASURE SAS code: data help; set growth; agecat = age; run; proc mixed data = growth method = ml; class sex idnr agecat; model measure = sex age*sex / s; repeated agecat / type = un subject = idnr; run; Time ordering variable needs to be categorical ESALQ Course on Models for Longitudinal and Incomplete Data 212

218 Chapter 17 Weighted Generalized Estimating Equations General Principle Analysis of the analgesic trial ESALQ Course on Models for Longitudinal and Incomplete Data 213

219 17.1 General Principle MAR and non-ignorable! Standard GEE inference correct only under MCAR Under MAR: weighted GEE Robins, Rotnitzky & Zhao (JASA, 1995) Fitzmaurice, Molenberghs & Lipsitz (JRSSB, 1995) Decompose dropout time D i =(R i1,...,r in )=(1,...,1, 0,...,0) ESALQ Course on Models for Longitudinal and Incomplete Data 214

220 Weigh a contribution by inverse dropout probability ν idi P [D i = d i ]= d i 1 (1 P [R ik =0 R i2 =...= R i,k 1 =1]) k=2 P [R idi =0 R i2 =...= R i,di 1 =1] I{d i T } Adjust estimating equations N i=1 1 ν idi μ i β V 1 i (y i μ i ) = 0 ESALQ Course on Models for Longitudinal and Incomplete Data 215

221 17.2 Computing the Weights Predicted values from (PROC GENMOD) output The weights are now defined at the individual measurement level: At the first occasion, the weight is w =1 At other than the last ocassion, the weight is the already accumulated weight, multiplied by 1 the predicted probability At the last occasion within a sequence where dropout occurs the weight is multiplied by the predicted probability At the end of the process, the weight is inverted ESALQ Course on Models for Longitudinal and Incomplete Data 216

222 17.3 Analysis of the Analgesic Trial A logistic regression for the dropout indicator: logit[p (D i = j D i j, )] = ψ 0 + ψ 11 I(GSA i,j 1 =1)+ψ 12 I(GSA i,j 1 =2) +ψ 13 I(GSA i,j 1 =3)+ψ 14 I(GSA i,j 1 =4) +ψ 2 PCA0 i + ψ 3 PF i + ψ 4 GD i with GSA i,j 1 the 5-point outcome at the previous time I( ) is an indicator function PCA0 i is pain control assessment at baseline PF i is physical functioning at baseline GD i is genetic disorder at baseline are used) ESALQ Course on Models for Longitudinal and Incomplete Data 217

223 Effect Par. Estimate (s.e.) Intercept ψ (0.49) Previous GSA= 1 ψ (0.41) Previous GSA= 2 ψ (0.38) Previous GSA= 3 ψ (0.37) Previous GSA= 4 ψ (0.38) Basel. PCA ψ (0.10) Phys. func. ψ (0.004) Genetic disfunc. ψ (0.24) There is some evidence for MAR: P (D i = j D i j) depends on previous GSA. Furthermore: baseline PCA, physical functioning and genetic/congenital disorder. ESALQ Course on Models for Longitudinal and Incomplete Data 218

224 GEE and WGEE: logit[p (Y ij =1 t j, PCA0 i )] = β 1 + β 2 t j + β 3 t 2 j + β 4 PCA0) i Effect Par. GEE WGEE Intercept β (0.47) 2.17 (0.69) Time β (0.33) (0.44) Time 2 β (0.07) 0.12 (0.09) Basel. PCA β (0.10) (0.13) A hint of potentially important differences between both ESALQ Course on Models for Longitudinal and Incomplete Data 219

225 Working correlation matrices: R UN, GEE = R UN, WGEE = ESALQ Course on Models for Longitudinal and Incomplete Data 220

226 17.4 Analgesic Trial: Steps for WGEE in SAS 1. Preparatory data manipulation: %dropout(...) 2. Logistic regression for weight model: proc genmod data=gsac; class prevgsa; model dropout = prevgsa pca0 physfct gendis / pred dist=b; ods output obstats=pred; run; 3. Conversion of predicted values to weights:... %dropwgt(...) ESALQ Course on Models for Longitudinal and Incomplete Data 221

227 4. Weighted GEE analysis: proc genmod data=repbin.gsaw; scwgt wi; class patid timecls; model gsabin = time time pca0 / dist=b; repeated subject=patid / type=un corrw within=timecls; run; ESALQ Course on Models for Longitudinal and Incomplete Data 222

228 Chapter 18 Multiple Imputation General idea Use of MI in practice Analysis of the growth data ESALQ Course on Models for Longitudinal and Incomplete Data 223

229 18.1 General Principles Valid under MAR An alternative to direct likelihood and WGEE Three steps: 1. The missing values are filled in M times = M complete data sets 2. The M complete data sets are analyzed by using standard procedures 3. The results from the M analyses are combined into a single inference Rubin (1987), Rubin and Schenker (1986), Little and Rubin (1987) ESALQ Course on Models for Longitudinal and Incomplete Data 224

230 18.2 Use of MI in Practice Many analyses of the same incomplete set of data A combination of missing outcomes and missing covariates As an alternative to WGEE: MI can be combined with classical GEE MI in SAS: Imputation Task: PROC MI Analysis Task: PROC MYFAVORITE Inference Task: PROC MIANALYZE ESALQ Course on Models for Longitudinal and Incomplete Data 225

231 MI Analysis of the Orthodontic Growth Data The same Model 1 as before Focus on boys at ages 8 and 10 Results Boys at Age 8 Boys at Age 10 Original Data (0.58) (0.51) Multiple Imputation (0.66) (0.81) Between-imputation variability for age 10 measurement Confidence interval for Boys at age 10: [21.08,24.29] ESALQ Course on Models for Longitudinal and Incomplete Data 226

232 Part VII Topics in Methods and Sensitivity Analysis for Incomplete Data ESALQ Course on Models for Longitudinal and Incomplete Data 227

233 Chapter 19 An MNAR Selection Model and Local Influence The Diggle and Kenward selection model Mastitis in dairy cattle An informal sensitivity analysis Local influence to conduct sensitivity analysis ESALQ Course on Models for Longitudinal and Incomplete Data 228

234 19.1 A Full Selection Model MNAR : f(y i θ)f(d i Y i, ψ)dy m i f(y i θ) Linear mixed model f(d i Y i, ψ) Logistic regressions for dropout Y i = X i β + Z i b i + ε i logit [P (D i = j D i j, Y i,j 1,Y ij )] = ψ 0 + ψ 1 Y i,j 1 + ψ 2 Y ij Diggle and Kenward (JRSSC 1994) ESALQ Course on Models for Longitudinal and Incomplete Data 229

235 19.2 Mastitis in Dairy Cattle Infectious disease of the udder Leads to a reduction in milk yield High yielding cows more susceptible? But this cannot be measured directly because of the effect of the disease: evidence is missing since infected cause have no reported milk yield ESALQ Course on Models for Longitudinal and Incomplete Data 230

236 Model for milk yield: Y i1 Y i2 N μ μ +Δ, σ 2 1 ρσ 1 σ 2 ρσ 1 σ 2 σ 2 1 Model for mastitis: logit [P (R i =1 Y i1,y i2 )] = ψ 0 + ψ 1 Y i1 + ψ 2 Y i2 = Y i1 2.54Y i2 = Y i1 2.54(Y i2 Y i1 ) LR test for H 0 : ψ 2 =0:G 2 =5.11 ESALQ Course on Models for Longitudinal and Incomplete Data 231

237 19.3 Criticism Sensitivity Analysis...,estimatingthe unestimable canbeaccomplished only by making modelling assumptions,... The consequences of model misspecification will(...) bemoresevereinthenon-randomcase. (Laird 1994) Change distributional assumptions (Kenward 1998) Local and global influence methods Pattern-mixture models Several plausible models or ranges of inferences Semi-parametric framework (Scharfstein et al 1999) ESALQ Course on Models for Longitudinal and Incomplete Data 232

238 19.4 Kenward s Sensitivity Analysis Deletion of #4 and #5 G 2 for ψ 2 : Cows #4 and #5 have unusually large increments Kenward conjectures: #4 and #5 ill during the first year Kenward (SiM 1998) ESALQ Course on Models for Longitudinal and Incomplete Data 233

239 19.5 Local Influence Verbeke, Thijs, Lesaffre, Kenward (Bcs 2001) Perturbed MAR dropout model: logit [P (D i =1 Y i1,y i2 )] = ψ 0 + ψ 1 Y i1 + ω i Y i2 Likelihood displacement: LD(ω) = 2 [ L ω=0 ( θ, ψ ) L ω=0 ( θω, ψ ) ] ω 0 ESALQ Course on Models for Longitudinal and Incomplete Data 234

240 19.5 Local Influence Verbeke, Thijs, Lesaffre, Kenward (Bcs 2001) Perturbed MAR dropout model: logit [P (D i =1 Y i1,y i2 )] = ψ 0 + ψ 1 Y i1 + ω i Y i2 or ψ 0 + ψ 1 Y i1 + ω i (Y i2 Y i1 ) Likelihood displacement: LD(ω) = 2 [ L ω=0 ( θ, ψ ) L ω=0 ( θω, ψ ) ] ω 0 ESALQ Course on Models for Longitudinal and Incomplete Data 235

241 19.6 Application to Mastitis Data Removing #4, #5 and #66 G 2 = h max : different signs for (#4,#5) and #66 ESALQ Course on Models for Longitudinal and Incomplete Data 236

242 Chapter 20 Interval of Ignorance The Slovenian Public Opinion Survey MAR and MNAR analyses Informal sensitivity analysis Interval of ignorance & interval of uncertainty ESALQ Course on Models for Longitudinal and Incomplete Data 237

243 20.1 The Slovenian Plebiscite Rubin, Stern, and Vehovar (1995) Slovenian Public Opinion (SPO) Survey Four weeks prior to decisive plebiscite Three questions: 1. Are you in favor of Slovenian independence? 2. Are you in favor of Slovenia s secession from Yugoslavia? 3. Will you attend the plebiscite? Political decision: ABSENCE NO Primary Estimand: θ: Proportion in favor of independence ESALQ Course on Models for Longitudinal and Incomplete Data 238

244 Slovenian Public Opinion Survey Data: Independence Secession Attendance Yes No Yes Yes No No Yes No Yes No ESALQ Course on Models for Longitudinal and Incomplete Data 239

245 20.2 Slovenian Public Opinion: 1st Analysis Pessimistic: All who can say NO will say NO θ = =0.694 Optimistic: All who can say YES will say YES θ = = =0.904 Resulting Interval: θ [0.694; 0.904] ESALQ Course on Models for Longitudinal and Incomplete Data 240

246 Resulting Interval: θ [0.694; 0.904] Complete cases: All who answered on 3 questions θ = =0.928? Available cases: All who answered on both questions θ = =0.929? ESALQ Course on Models for Longitudinal and Incomplete Data 241

247 20.3 Slovenian Public Opinion: 2nd Analysis Missing at Random: Non-response is allowed to depend on observed, but not on unobserved outcomes: Based on two questions: Based on three questions: θ =0.892 θ =0.883 Missing Not at Random (NI): Non-response is allowed to depend on unobserved measurements: θ =0.782 ESALQ Course on Models for Longitudinal and Incomplete Data 242

248 20.4 Slovenian Public Opinion Survey Estimator θ Pessimistic bound Optimistic bound Complete cases 0.928? Available cases 0.929? MAR (2 questions) MAR (3 questions) MNAR ESALQ Course on Models for Longitudinal and Incomplete Data 243

249 20.5 Slovenian Plebiscite: The Truth? θ =0.885 Estimator θ Pessimistic bound Optimistic bound Complete cases 0.928? Available cases 0.929? MAR (2 questions) MAR (3 questions) MNAR ESALQ Course on Models for Longitudinal and Incomplete Data 244

250 20.6 Did the MNAR model behave badly? Consider a family of MNAR models Baker, Rosenberger, and DerSimonian (1992) Counts Y r1 r 2 jk j, k =1, 2 indicates YES/NO r 1,r 2 =0, 1 indicates MISSING/OBSERVED ESALQ Course on Models for Longitudinal and Incomplete Data 245

251 Model Formulation E(Y 11jk )=m jk, E(Y 10jk )=m jk β jk, E(Y 01jk )=m jk α jk, E(Y 00jk )=m jk α jk β jk γ jk, Interpretation: α jk : models non-response on independence question β jk : models non-response on attendance question γ jk : interaction between both non-response indicators (cannot depend on j or k) ESALQ Course on Models for Longitudinal and Incomplete Data 246

252 Identifiable Models Model Structure d.f. loglik θ C.I. BRD1 (α, β) [0.878;0.906] BRD2 (α, β j ) [0.869;0.900] BRD3 (α k,β) [0.866;0.897] BRD4 (α, β k ) [0.674;0.856] BRD5 (α j,β) [0.806;0.882] BRD6 (α j,β j ) [0.788;0.849] BRD7 (α k,β k ) [0.697;0.832] BRD8 (α j,β k ) [0.657;0.826] BRD9 (α k,β j ) [0.851;0.884] ESALQ Course on Models for Longitudinal and Incomplete Data 247

253 An Interval of MNAR Estimates θ =0.885 Estimator [Pessimistic; optimistic] [0.694;0.904] Complete cases Available cases MAR (2 questions) MAR (3 questions) MNAR MNAR interval [0.741;0.892] θ ESALQ Course on Models for Longitudinal and Incomplete Data 248

254 20.7 A More Formal Look Statistical Imprecision Statistical Uncertainty Statistical Ignorance ESALQ Course on Models for Longitudinal and Incomplete Data 249

255 Statistical Imprecision: Due to finite sampling Fundamental concept of mathematical statistics Consistency, efficiency, precision, testing,... Disappears as sample size increases Statistical Ignorance: Due to incomplete observations Received less attention Can invalidate conclusions Does not disappear with increasing sample size Kenward, Goetghebeur, and Molenberghs (StatMod 2001) ESALQ Course on Models for Longitudinal and Incomplete Data 250

256 20.8 Slovenian Public Opinion: 3rd Analysis Model Structure d.f. loglik θ C.I. BRD1 (α, β) [0.878;0.906] BRD2 (α, β j ) [0.869;0.900] BRD3 (α k,β) [0.866;0.897] BRD4 (α, β k ) [0.674;0.856] BRD5 (α j,β) [0.806;0.882] BRD6 (α j,β j ) [0.788;0.849] BRD7 (α k,β k ) [0.697;0.832] BRD8 (α j,β k ) [0.657;0.826] BRD9 (α k,β j ) [0.851;0.884] Model 10 (α k,β jk ) [0.762;0.893] [0.744;0.907] Model 11 (α jk,β j ) [0.766;0.883] [0.715;0.920] Model 12 (α jk,β jk ) [0.694;0.904] ESALQ Course on Models for Longitudinal and Incomplete Data 251

257 20.9 Every MNAR Model Has Got a MAR Bodyguard Fit an MNAR model to a set of incomplete data Change the conditional distribution of the unobserved outcomes, given the observed ones, to comply with MAR The resulting new model will have exactly the same fit as the original MNAR model The missing data mechanism has changed This implies that definitively testing for MAR versus MNAR is not possible ESALQ Course on Models for Longitudinal and Incomplete Data 252

258 20.10 Slovenian Public Opinion: 4rd Analysis Model Structure d.f. loglik θ C.I. θ MAR BRD1 (α, β) [0.878;0.906] BRD2 (α, β j ) [0.869;0.900] BRD3 (α k,β) [0.866;0.897] BRD4 (α, β k ) [0.674;0.856] BRD5 (α j,β) [0.806;0.882] BRD6 (α j,β j ) [0.788;0.849] BRD7 (α k,β k ) [0.697;0.832] BRD8 (α j,β k ) [0.657;0.826] BRD9 (α k,β j ) [0.851;0.884] Model 10 (α k,β jk ) [0.762;0.893] [0.744;0.907] Model 11 (α jk,β j ) [0.766;0.883] [0.715;0.920] Model 12 (α jk,β jk ) [0.694;0.904] ESALQ Course on Models for Longitudinal and Incomplete Data 253

259 θ =0.885 Estimator θ [Pessimistic; optimistic] [0.694;0.904] MAR (3 questions) MNAR MNAR interval [0.753;0.891] Model 10 [0.762;0.893] Model 11 [0.766;0.883] Model 12 [0.694;0.904] ESALQ Course on Models for Longitudinal and Incomplete Data 254

260 Chapter 21 Concluding Remarks MCAR/simple CC biased LOCF inefficient MAR direct likelihood easy to conduct not simpler than MAR methods weighted GEE Gaussian & non-gaussian MNAR variety of methods strong, untestable assumptions most useful in sensitivity analysis ESALQ Course on Models for Longitudinal and Incomplete Data 255

261 Part VIII Case Study: Age-related Macular Degeneration ESALQ Course on Models for Longitudinal and Incomplete Data 256

262 Chapter 22 The Dataset Visual Acuity Age-related Macular Degeneration Missingness ESALQ Course on Models for Longitudinal and Incomplete Data 257

263 22.1 Visual Acuity I T I S A P L E A S U R E T O B E W I T H Y O U I N P I R A C I C A B A C A P I T A L O F B R A Z IL ESALQ Course on Models for Longitudinal and Incomplete Data 258

264 22.2 Age-related Macular Degeneration Trial Pharmacological Therapy for Macular Degeneration Study Group (1997) An occular pressure disease which makes patients progressively lose vision 240 patients enrolled in a multi-center trial (190 completers) Treatment: Interferon-α (6 million units) versus placebo Visits: baseline and follow-up at 4, 12, 24, and 52 weeks Continuous outcome: visual acuity: # letters correctly read on a vision chart Binary outcome: visual acuity versus baseline 0 or 0 ESALQ Course on Models for Longitudinal and Incomplete Data 259

265 Missingness: Measurement occasion 4 wks 12 wks 24 wks 52 wks Number % Completers O O O O Dropouts O O O M O O M M O M M M M M M M Non-monotone missingness O O M O O M M O M O O O M O M M ESALQ Course on Models for Longitudinal and Incomplete Data 260

266 Chapter 23 Weighted Generalized Estimating Equations Model for the weights Incorporating the weights within GEE ESALQ Course on Models for Longitudinal and Incomplete Data 261

267 23.1 Analysis of the ARMD Trial Model for the weights: logit[p (D i = j D i j)] = ψ 0 + ψ 1 y i,j 1 + ψ 2 T i + ψ 31 L 1i + ψ 32 L 2i + ψ 34 L 3i with +ψ 41 I(t j =2)+ψ 42 I(t j =3) y i,j 1 the binary outcome at the previous time t i,j 1 = t j 1 (since time is common to all subjects) T i =1for interferon-α and T i =0for placebo L ki =1if the patient s eye lesion is of level k =1,...,4 (since one dummy variable is redundant, only three are used) I( ) is an indicator function ESALQ Course on Models for Longitudinal and Incomplete Data 262

268 Results for the weights model: Effect Parameter Estimate (s.e.) Intercept ψ (0.49) Previous outcome ψ (0.38) Treatment ψ (0.37) Lesion level 1 ψ (0.49) Lesion level 2 ψ (0.52) Lesion level 3 ψ (0.72) Time 2 ψ (0.49) Time 3 ψ (0.44) ESALQ Course on Models for Longitudinal and Incomplete Data 263

269 GEE: logit[p (Y ij =1 T i,t j )] = β j1 + β j2 T i with T i =0for placebo and T i =1for interferon-α t j (j =1,...,4) refers to the four follow-up measurements Classical GEE and linearization-based GEE Comparison between CC, LOCF, and GEE analyses SAS code:... proc genmod data=armdhlp descending; class trt prev lesion time; model dropout = prev trt lesion time ods output obstats=pred; ods listing exclude obstats; run; / pred dist=b; ESALQ Course on Models for Longitudinal and Incomplete Data 264

270 ... proc genmod data=armdwgee; title data as is - WGEE ; weight wi; class time treat subject; model bindif = time treat*time / noint dist=binomial; repeated subject=subject / withinsubject=time type=exch modelse; run; proc glimmix data=armdwgee empirical; title data as is - WGEE - linearized version - empirical ; weight wi; nloptions maxiter=50 technique=newrap; class time treat subject; model bindif = time treat*time / noint solution dist=binary ; random _residual_ / subject=subject type=cs; run; ESALQ Course on Models for Longitudinal and Incomplete Data 265

271 Results: Effect Par. CC LOCF Observed data Unweighted WGEE Standard GEE Int.4 β (0.24;0.24) -0.87(0.20;0.21) -0.87(0.21;0.21) -0.98(0.10;0.44) Int.12 β (0.24;0.24) -0.97(0.21;0.21) -1.01(0.21;0.21) -1.78(0.15;0.38) Int.24 β (0.25;0.25) -1.05(0.21;0.21) -1.07(0.22;0.22) -1.11(0.15;0.33) Int.52 β (0.29;0.29) -1.51(0.24;0.24) -1.71(0.29;0.29) -1.72(0.25;0.39) Tr.4 β (0.32;0.32) 0.22(0.28;0.28) 0.22(0.28;0.28) 0.80(0.15;0.67) Tr.12 β (0.31;0.31) 0.55(0.28;0.28) 0.61(0.29;0.29) 1.87(0.19;0.61) Tr.24 β (0.33;0.33) 0.42(0.29;0.29) 0.44(0.30;0.30) 0.73(0.20;0.52) Tr.52 β (0.38;0.38) 0.34(0.32;0.32) 0.44(0.37;0.37) 0.74(0.31;0.52) Corr. ρ ESALQ Course on Models for Longitudinal and Incomplete Data 266

272 Effect Par. CC LOCF Observed data Unweighted WGEE Linearization-based GEE Int.4 β (0.24;0.24) -0.87(0.21;0.21) -0.87(0.21;0.21) -0.98(0.18;0.44) Int.12 β (0.24;0.24) -0.97(0.21;0.21) -1.01(0.22;0.21) -1.78(0.26;0.42) Int.24 β (0.25;0.25) -1.05(0.21;0.21) -1.07(0.23;0.22) -1.19(0.25;0.38) Int.52 β (0.29;0.29) -1.51(0.24;0.24) -1.71(0.29;0.29) -1.81(0.39;0.48) Tr.4 β (0.32;0.32) 0.22(0.28;0.28) 0.22(0.29;0.29) 0.80(0.26;0.67) Tr.12 β (0.31;0.31) 0.55(0.28;0.28) 0.61(0.28;0.29) 1.85(0.32;0.64) Tr.24 β (0.33;0.33) 0.42(0.29;0.29) 0.44(0.30;0.30) 0.98(0.33;0.60) Tr.52 β (0.38;0.38) 0.34(0.32;0.32) 0.44(0.37;0.37) 0.97(0.49;0.65) σ τ Corr. ρ ESALQ Course on Models for Longitudinal and Incomplete Data 267

273 Chapter 24 Multiple Imputation Settings and Models Results for GEE Results for GLMM ESALQ Course on Models for Longitudinal and Incomplete Data 268

274 24.1 MI Analysis of the ARMD Trial M =10imputations GEE: logit[p (Y ij =1 T i,t j )] = β j1 + β j2 T i GLMM: logit[p (Y ij =1 T i,t j,b i )] = β j1 + b i + β j2 T i, b i N(0,τ 2 ) T i =0for placebo and T i =1for interferon-α t j (j =1,...,4) refers to the four follow-up measurements Imputation based on the continuous outcome ESALQ Course on Models for Longitudinal and Incomplete Data 269

275 Results: Effect Par. GEE GLMM Int.4 β (0.20) -1.46(0.36) Int.12 β (0.22) -1.75(0.38) Int.24 β (0.23) -1.83(0.38) Int.52 β (0.27) -2.69(0.45) Trt.4 β (0.28) 0.32(0.48) Trt.12 β (0.29) 0.99(0.49) Trt.24 β (0.30) 0.67(0.51) Trt.52 β (0.35) 0.52(0.56) R.I. s.d. τ 2.20(0.26) R.I. var. τ (1.13) ESALQ Course on Models for Longitudinal and Incomplete Data 270

276 24.2 SAS Code for MI 1. Preparatory data analysis so that there is one line per subject 2. The imputation task: proc mi data=armd13 seed= out=armd13a simple nimpute=10 round=0.1; var lesion diff4 diff12 diff24 diff52; by treat; run; Note that the imputation task is conducted on the continuous outcome diff, indicating the difference in number of letters versus baseline 3. Then, data manipulation takes place to define the binary indicators and to create a longitudinal version of the dataset ESALQ Course on Models for Longitudinal and Incomplete Data 271

277 4. The analysis task (GEE): proc genmod data=armd13c; class time subject; by _imputation_; model bindif = time1 time2 time3 time4 trttime1 trttime2 trttime3 trttime4 / noint dist=binomial covb; repeated subject=subject / withinsubject=time type=exch modelse; ods output ParameterEstimates=gmparms parminfo=gmpinfo CovB=gmcovb; run; ESALQ Course on Models for Longitudinal and Incomplete Data 272

278 5. The analysis task (GLMM): proc nlmixed data=armd13c qpoints=20 maxiter=100 technique=newrap cov ecov; by _imputation_; eta = beta11*time1+beta12*time2+beta13*time3+beta14*time4+b +beta21*trttime1+beta22*trttime2+beta23*trttime3+beta24*trttime4; p = exp(eta)/(1+exp(eta)); model bindif ~ binary(p); random b ~ normal(0,tau*tau) subject=subject; estimate tau2 tau*tau; ods output ParameterEstimates=nlparms CovMatParmEst=nlcovb AdditionalEstimates=nlparmsa CovMatAddEst=nlcovba; run; ESALQ Course on Models for Longitudinal and Incomplete Data 273

279 6. The inference task (GEE): proc mianalyze parms=gmparms covb=gmcovb parminfo=gmpinfo wcov bcov tcov; modeleffects time1 time2 time3 time4 trttime1 trttime2 trttime3 trttime4; run; 7. The inference task (GLMM): proc mianalyze parms=nlparms covb=nlcovb wcov bcov tcov; modeleffects beta11 beta12 beta13 beta14 beta21 beta22 beta23 beta24; run; ESALQ Course on Models for Longitudinal and Incomplete Data 274