Longitudinal Data Analysis

Similar documents

Chapter 1. Longitudinal Data Analysis. 1.1 Introduction

Analysis of Correlated Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Overview of Methods for Analyzing Cluster-Correlated Data. Garrett M. Fitzmaurice

Statistical Models in R

Introduction to Longitudinal Data Analysis

Introduction to Regression and Data Analysis

Data Mining: Algorithms and Applications Matrix Math Review

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA

11. Analysis of Case-control Studies Logistic Regression

UNDERSTANDING THE TWO-WAY ANOVA

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Least Squares Estimation

An Introduction to Modeling Longitudinal Data

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

HLM software has been one of the leading statistical packages for hierarchical

SYSTEMS OF REGRESSION EQUATIONS

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts, Procedures and Illustrations

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

Elements of statistics (MATH0487-1)

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Regression III: Advanced Methods

Poisson Models for Count Data

An analysis method for a quantitative outcome and two categorical explanatory variables.

Algebra 1 Course Information

Simple Predictive Analytics Curtis Seare

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Multivariate Normal Distribution

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Introduction to General and Generalized Linear Models

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

SAS Software to Fit the Generalized Linear Model

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

Marketing Mix Modelling and Big Data P. M Cain

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Multivariate Analysis of Variance (MANOVA): I. Theory

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

DATA INTERPRETATION AND STATISTICS

Multivariate Logistic Regression

Simple Linear Regression Inference

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Additional sources Compilation of sources:

Linear Models for Continuous Data

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

A Basic Introduction to Missing Data

10. Analysis of Longitudinal Studies Repeat-measures analysis

Simple Regression Theory II 2010 Samuel L. Baker

Assignments Analysis of Longitudinal data: a multilevel approach

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Normality Testing in Excel

Statistical Models in R

Penalized regression: Introduction

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

3. Data Analysis, Statistics, and Probability

2. Simple Linear Regression

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Basic Concepts in Research and Data Analysis

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

2013 MBA Jump Start Program. Statistics Module Part 3

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Interaction between quantitative predictors

COMMON CORE STATE STANDARDS FOR

Fairfield Public Schools

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Biostatistics: Types of Data Analysis

What is the purpose of this document? What is in the document? How do I send Feedback?

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics Graduate Courses

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

the points are called control points approximating curve

Multivariate Analysis of Ecological Data

Biostatistics Short Course Introduction to Longitudinal Studies

The Basic Two-Level Regression Model

Clustering in the Linear Model

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Analysis of Variance. MINITAB User s Guide 2 3-1

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE

Recall this chart that showed how most of our course would be organized:

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Parametric and Nonparametric: Demystifying the Terms

Fitting Subject-specific Curves to Grouped Longitudinal Data

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Use of deviance statistics for comparing models

Introduction to Structural Equation Modeling (SEM) Day 4: November 29, 2012

II. DISTRIBUTIONS distribution normal distribution. standard scores

Systematic Reviews and Meta-analyses

GLM I An Introduction to Generalized Linear Models

Transcription:

Longitudinal Data Analysis Acknowledge: Professor Garrett Fitzmaurice INSTRUCTOR: Rino Bellocco Department of Statistics & Quantitative Methods University of Milano-Bicocca Department of Medical Epidemiology and Biostatistics Karolinska Institutet 1

ORGANIZATION OF LECTURES Introduction and Overview Longitudinal Data: Basic Concepts Linear Models for Longitudinal Dara Modelling the Mean: Analysis of Response Profiles Modelling the Mean: Parametric Curves Modelling the Covariance Linear Mixed Effects Models Extensions to discrete data: GEE 2

LONGITUDINAL DATA ANALYSIS LECTURE 1 3

INTRODUCTION Longitudinal Studies: Studies in which individuals are measured repeatedly through time. This course will cover the analysis and interpretation of results from longitudinal studies. Emphasis will be on model development, use of statistical software, and interpretation of results. Theoretical basis for results mentioned but not developed. No calculus or matrix algebra is assumed. 4

Features of Longitudinal Data Defining feature: repeated observations on individuals, allowing the direct study of change over time. Primary goal of a longitudinal study is to characterize the change in response and factors that influence change. With repeated measures on individuals, one can capture within-individual change. Note that measurements in a longitudinal study are commensurate, i.e., the same variable is measured repeatedly. 5

Longitudinal data require somewhat more sophisticated statistical techniques because the repeated observations are usually (positively) correlated. Sequential nature of the measures implies that certain types of correlation structures are likely to arise. Correlation must be accounted for in order to obtain valid inferences. 6

Example 1 Treatment of Lead-Exposed Children (TLC) Trial Exposure to lead during infancy is associated with substantial deficits in tests of cognitive ability Chelation treatment of children with high lead levels usually requires injections and hospitalization A new agent, Succimer, can be given orally Randomized trial examining changes in blood lead level during course of treatment 100 children randomized to placebo or Succimer Measures of blood lead level at baseline, 1, 4 and 6 weeks 7

Table 1: Blood lead levels (µg/dl) at baseline, week 1, week 4, and week 6 for 8 randomly selected children. ID Group a Baseline Week 1 Week 4 Week 6 046 P 30.8 26.9 25.8 23.8 149 A 26.5 14.8 19.5 21.0 096 A 25.8 23.0 19.1 23.2 064 P 24.7 24.5 22.0 22.5 050 A 20.4 2.8 3.2 9.4 210 A 20.4 5.4 4.5 11.9 082 P 28.6 20.8 19.2 18.4 121 P 33.7 31.6 28.5 25.1 a P = Placebo; A = Succimer. 8

Table 2: Mean blood lead levels (and standard deviation) at baseline, week 1, week 4, and week 6. Group Baseline Week 1 Week 4 Week 6 Succimer 26.5 13.5 15.5 20.8 (5.0) (7.7) (7.8) (9.2) Placebo 26.3 24.7 24.1 23.2 (5.0) (5.5) (5.7) (6.2) 9

Mean blood lead level (mcg/dl) 10 15 20 25 30 Placebo Succimer 0 2 4 6 Time (weeks) Figure 1: Plot of mean blood lead levels at baseline, week 1, week 4, and week 6 in the succimer and placebo groups. 10

Example 2 Six Cities Study of Air Pollution and Health Longitudinal study designed to characterize lung function growth in children and adolescents. Most children were enrolled between the ages of six and seven and measurements were obtained annually until graduation from high school. Focus on a randomly selected subset of the 300 female participants living in Topeka, Kansas. Response variable: Volume of air exhaled in the first second of spiromtry manoeuvre, FEV 1. 11

Table 3: Data on age, height, and FEV 1 for a randomly selected girl from the Topeka data set. Subject ID Age Height Time FEV 1 159 6.58 1.13 0.00 1.36 159 7.65 1.19 1.06 1.42 159 12.74 1.49 6.15 2.13 159 13.77 1.53 7.19 2.38 159 14.69 1.55 8.11 2.85 159 15.82 1.56 9.23 3.17 159 16.67 1.57 10.08 2.52 159 17.63 1.57 11.04 3.11 Note: Time represents time since entry to study. 12

1.00 0.75 Log(FEV1/Height) 0.50 0.25 0.00-0.25 6 10 14 18 Age (years) Figure 2: Timeplot of log(fev 1 /height) versus age for 50 randomly selected girls from the Topeka data set. 13

Example 3 Influence of Menarche on Changes in Body Fat Prospective study on body fat accretion in a cohort of 162 girls from the MIT Growth and Development Study. At start of study, all the girls were pre-menarcheal and non-obese All girls were followed over time according to a schedule of annual measurements until four years after menarche. The final measurement was scheduled on the fourth anniversary of their reported date of menarche. At each examination, a measure of body fatness was obtained based on bioelectric impedance analysis. 14

15

Figure 3: Timeplot of percent body fat against age (in years). 16

Consider an analysis of the changes in percent body fat before and after menarche. For the purposes of these analyses time is coded as time since menarche and can be positive or negative. Note: measurement protocol is the same for all girls. Study design is almost balanced if timing of measurement is defined as time since baseline measurement. It is inherently unbalanced when timing of measurements is defined as time since a girl experienced menarche. 17

40 Percent body fat 30 20 10 0-6 -4-2 0 2 4 Time relative to menarche (years) Figure 4: Timeplot of percent body fat against time, relative to age of menarche (in years). 18

LONGITUDINAL DATA: BASIC CONCEPTS Defining feature of longitudinal studies is that measurements of the same individuals are taken repeatedly through time. Longitudinal studies allow direct study of change over time. The primary goal is to characterize the change in response over time and the factors that influence change. With repeated measures on individuals, we can capture within-individual change. 19

A longitudinal study can estimate change with great precision because each individual acts as his/her own control. By comparing each individual s responses at two or more occasions, a longitudinal analysis can remove extraneous, but unavoidable, sources of variability among individuals. This eliminates major sources of variability or noise from the estimation of within-individual change. 20

The assessment of within-subject change can only be achieved within a longitudinal study design In a cross-sectional study, we can only obtain estimates of between-individual differences in the response A cross-sectional study may allow comparison among sub-populations that happen to differ in age, but it does not provide any information about how individuals change In a cross-sectional study the effect of ageing is potentially confounded or mixed-up with possible cohort effects 21

Terminology Individuals/Subjects: Participants in a longitudinal study are referred to as individuals or subjects. Occasions: In a longitudinal study individuals are measured repeatedly at different occasions or times. The number of repeated observations, and their timing, can vary widely from one longitudinal study to another. When the number and the timing of the repeated measurements are the same for all individuals, the study design is said to be balanced over time. 22

Notation Let Y ij denote the response variable for the i th individual (i = 1,..., N) at the j th occasion (j = 1,..., n). If the repeated measures are assumed to be equally-separated in time, this notation will be sufficient. Later, we refine notation to handle the case where repeated measures are unequally-separated and unbalanced over time. We can represent the n observations on the N individuals in a two-dimensional array, with rows corresponding to individuals and columns corresponding to the responses at each occasion. 23

Table 4: Tabular representation of longitudinal data, with n repeated observations on N individuals. Occasion Individual 1 2 3... n 1 y 11 y 12 y 13... y 1n 2 y 21 y 22 y 23... y 2n........................ N y N1 y N2 y N3... y Nn 24

Vector Notation: We can group the n repeated measures on the same individual into a n 1 response vector: Y i = Y i1 Y i2.. Y in Alternatively, we can denote the response vectors Y i as Y i = (Y i1, Y i2,..., Y in ). 25

Covariance and Correlation An aspect of longitudinal data that complicates their statistical analysis is that repeated measures on the same individual are usually positively correlated. This violates the fundamental assumption of independence that is the cornerstone of many statistical techniques. What are the potential consequences of not accounting for correlation among longitudinal data in the analysis? An additional, although often overlooked, aspect of longitudinal data that complicates their statistical analysis is heterogeneous variability. 26

That is, the variability of the outcome at the end of the study is often discernibly different than the variability at the start of the study. This violates the assumption of homoscedasticity that is the basis for standard linear regression techniques. Thus, there are two aspects of longitudinal data that complicate their statistical analysis: (i) repeated measures on the same individual are usually positively correlated, and (ii) variability is often heterogeneous across measurement occasions. 27

Before we can give a formal definition of correlation we need to introduce the notion of expectation. We denote the expectation or mean of Y ij by µ j = E(Y ij ), where E( ) can be thought of as a long-run average. The mean, µ j, provides a measure of the location of the center of the distribution of Y ij. 28

The variance provides a measure of the spread or dispersion of the values of Y ij around its respective mean: σj 2 = E[Y ij E(Y ij )] 2 = E(Y ij µ j ) 2. The positive square-root of the variance, σ j, is known as the standard deviation. The covariance between two variables, say Y ij and Y ik, σ jk = E [(Y ij µ j )(Y ik µ k )], is a measure of the linear dependence between Y ij and Y ik. When the covariance is zero, there is no linear dependence between the responses at the two occasions. 29

The correlation between Y ij and Y ik is denoted by ρ jk = E [(Y ij µ j )(Y ik µ k )] σ j σ k, where σ j and σ k are the standard deviations of Y ij and Y ik. The correlation, unlike covariance, is a measure of dependence free of scales of measurement of Y ij and Y ik. By definition, correlation must take values between 1 and 1. A correlation of 1 or 1 is obtained when there is a perfect linear relationship between the two variables. 30

For the vector of repeated measures, Y i = (Y i1, Y i2,..., Y in ), we define the variance-covariance matrix, Cov(Y i ), Cov Y i1 Y i2. Y in = Var(Y i1 ) Cov(Y i1, Y i2 ) Cov(Y i1, Y in ) Cov(Y i2, Y i1 ) Var(Y i2 ) Cov(Y i2, Y in )........ Cov(Y in, Y i1 ) Cov(Y in, Y i2 ) Var(Y in ) = σ 2 1 σ 12 σ 1n σ 21 σ2 2 σ 2n........, σ n1 σ n2 σ 2 n where Cov (Y ij, Y ik ) = σ jk = σ kj = Cov(Y ik, Y ij ). 31

We can also define the correlation matrix, Corr(Y i ), Corr(Y i ) = 1 ρ 12 ρ 1n ρ 21 1 ρ 2n........ ρ n1 ρ n2 1. This matrix is also symmetric in the sense that Corr (Y ij, Y ik ) = ρ jk = ρ kj = Corr(Y ik, Y ij ). 32

Example: Treatment of Lead-Exposed Children Trial We restrict attention to the data from placebo group Data consist of 4 repeated measurements of blood lead levels obtained at baseline (or week 0), weeks 1, 4, and 6. The inter-dependence (or time-dependence) among the four repeated measures of blood lead level can be examined by constructing a scatter-plot of each pair of repeated measures. Examination of the correlations confirms that they are all positive and tend to decrease with increasing time separation. 33

Week 1 10 20 30 40 10 20 30 40 Baseline Week 4 10 20 30 40 10 20 30 40 Baseline Week 4 10 20 30 40 10 20 30 40 Week 1 Week 6 10 20 30 40 10 20 30 40 Baseline Week 6 10 20 30 40 10 20 30 40 Week 1 Week 6 10 20 30 40 10 20 30 40 Week 4 Figure 5: Pairwise scatter-plots of blood lead levels at baseline, week 1, week 4, and week 6 for children in the placebo group. 34

Table 5: Estimated covariance matrix for the blood lead levels at baseline, week 1, week 4, and week 6 for children in the placebo group of the TLC trial. Covariance Matrix 25.2 22.8 24.2 18.4 22.8 29.8 27.0 20.5 24.2 27.0 33.0 26.6 18.4 20.5 26.6 38.7 35

Table 6: Estimated correlation matrix for the blood lead levels at baseline, week 1, week 4, and week 6 for children in the placebo group of the TLC trial. Correlation Matrix 1.00 0.83 0.84 0.59 0.83 1.00 0.86 0.60 0.84 0.86 1.00 0.74 0.59 0.60 0.74 1.00 36

Some Observations about Correlation in Longitudinal Data Empirical observations about the nature of the correlation among repeated measures in longitudinal studies: (i) correlations are positive, (ii) correlations decrease with increasing time separation, (iii) correlations between repeated measures rarely ever approach zero, and (iv) correlation between a pair of repeated measures taken very closely together in time rarely approaches one. 37

Consequences of Ignoring Correlation Potential implications of ignoring the correlation among the repeated measures. Consider only the first two repeated measures from the TLC trial, taken at baseline (or week 0) and week 1. Suppose it is of interest to determine whether there has been a change in the mean response over time. A natural estimate of change in the mean response is: ˆδ = ˆµ 2 ˆµ 1 where N ˆµ j = 1 N i=1 Y ij 38

For data from TLC trial, estimate of change in the mean response over time in succimer groups is -13.0 (or 13.5-26.5). An expression for the variance of ˆδ is given by [ ] 1 N Var(ˆδ) = Var (Y i2 Y i1) = 1 N N (σ2 1 + σ2 2 2σ 12 ). i=1 We can substitute estimates of variance and covariances into this expression to obtain estimates of variance of ˆδ. Var(ˆδ) = 1 50 [25.2 + 58.9 2(15.5)] = 1.06 39

Consequences of Ignoring Correlation If we ignored the correlation and proceeded with analysis assuming observations are independent, we would have obtained the following (incorrect) estimate of variance of ˆδ 1 50 [25.2 + 58.9] = 1.168 which is approximately 1.6 times larger. In general, ignoring, the correlation leads to incorrect inferences 40

Further Reading: Fitzmaurice GM, Laird NM and Ware JH. (2004) Applied Longitudinal Analysis. Wiley. Chapter 1, Chapter 2 (Sections 2.1, 2.2, 2.3, 2.4) 41

APPLIED LONGITUDINAL ANALYSIS LECTURE 2 42

LINEAR MODELS FOR LONGITUDINAL DATA Notation: Previously, we assumed a sample of N subjects are measured repeatedly at n occasions. Either by design or happenstance, subjects may not have same number of repeated measures or be measured at same set of occasions. We assume there are n i repeated measurements on the i th subject and each Y ij is observed at time t ij. 43

We can group the response variables for the i th subject into a n i 1 vector: Y i = Y i1. Y ini, i = 1,..., N. Associated with Y ij there is a p 1 vector of covariates X ij = X ij1. X ijp, i = 1,..., N; j = 1,..., n i. 44

We can group vectors of covariates into a n i p matrix: X i = X i11 X i12 X i1p X i21 X i22 X i2p......, i = 1,..., N. X ini 1 X ini 2 X ini p X i is simply an ordered collection of the values of the p covariates for the i th subject at the n i occasions. 45

Throughout this course we consider linear regression models for changes in the mean response over time: Y ij = β 1 X ij1 + β 2 X ij2 + + β p X ijp + e ij, j = 1,..., n i ; where β 1,..., β p are unknown regression coefficients. The e ij are random errors, with mean zero, and represent deviations of the Y ij s from their means, E(Y ij X ij ) = β 1 X ij1 + β 2 X ij2 + + β p X ijp. Typically, although not always, X ij1 = 1 for all i and j, E(Y ij X ij ) = β 1 + β 2 X ij2 + + β p X ijp, and then β 1 is the intercept term in the model. 46

Treatment of Lead-Exposed Children Trial For illustrative purposes, consider model that assumes mean blood lead level changes linearly over time, but at a rate that differs by group. Assume two treatment groups have different intercepts and slopes: Y ij = β 1 X ij1 + β 2 X ij2 + β 3 X ij3 + β 4 X ij4 + e ij, where X ij1 = 1 for all i and all j; X ij2 = t j, the week in which the blood lead level was obtained; X ij3 = 1 if the i th subject is assigned to the succimer group and X ij3 = 0 otherwise. X ij4 = t j if the i th subject is assigned to the succimer group and X ij4 = 0 otherwise. Alternatively, X ij4 = X ij2 X ij3. 47

Thus, for children in the placebo group E(Y ij X ij ) = β 1 + β 2 t j, where β 1 represents the mean blood lead level at baseline (week = 0) and β 2 is the constant rate of change in mean blood level. Similarly, for children in the succimer group E(Y ij X ij ) = (β 1 + β 3 ) + (β 2 + β 4 )t j, where β 2 + β 4 is the constant rate of change in mean blood level per week. Hypothesis that treatments are equally effective in reducing blood lead levels translated into hypothesis that β 4 = 0. 48

To reinforce notation, consider the responses and covariates at the 4 occasions for any individual. For example, the responses at the 4 occasions for ID = 046: 30.8 26.9. 25.8 23.8 The values of the covariates at the 4 occasions for ID = 046: 1 0 0 1 1 0. 1 4 0 1 6 0 This individual was assigned to treatment with placebo. 49

On the other hand, the responses at the 4 occasions for ID = 149: 26.5 14.8. 19.5 21.0 The values of the covariates at the 4 occasions for ID = 149: 1 0 1 0 1 1 1 1. 1 4 1 4 1 6 1 6 This individual was assigned to treatment with succimer. 50

Distributional Assumptions So far, only assumptions made concern patterns of change in the mean response over time and their relation to covariates. E(Y ij X ij ) = β 1 X ij1 + β 2 X ij2 + + β p X ijp. Next we consider distributional assumptions concerning the random errors, e ij. Note: Y ij is assumed to be comprised of two components, a systematic component, β 1 X ij1 + β 2 X ij2 + + β p X ijp, and a random component, e ij. Assumptions about shape of distribution of e ij translate into assumptions about distribution of Y ij given X ij. 51

We assume Y i1,..., Y ini have a multivariate normal distribution, with mean response E(Y ij X ij ) = µ ij = β 1 X ij1 + β 2 X ij2 + + β p X ijp, and covariance matrix, Σ i = Cov(Y i ). Multivariate normal distribution can be considered multivariate analogue of the univariate normal distribution. The multivariate normal distribution is completely specified by the means, µ i1,..., µ ini, and the covariance matrix, Σ i. 52

Estimation: Maximum Likelihood A very general approach to estimation is the method of maximum likelihood (ML). Basic idea: use as estimates of β 1,..., β p and Σ i the values that are most probable (or most likely ) for the data that have actually been observed. ML estimates of β 1,..., β p and Σ i are those values that maximize the joint probability of the response variables evaluated at their observed values (the likelihood function). 53

The ML estimator of β 1,..., β p is the so-called generalized least squares (GLS) estimator, β 1 (Σ i ),..., β p (Σ i ), β = [ N ( X i Σ 1 ) ] 1 N X i i=1 i=1 ( X i Σ 1 i y i ), and depends on covariance among the repeated measures, Σ i. This is a generalization of the ordinary least squares (OLS) estimator used in standard linear regression. In general, there is no simple expression for ML estimator of Σ i ; instead, ML estimate of Σ i requires iterative techniques. Once ML estimate of Σ i has been obtained, we simply substitute the estimate of Σ i, say Σ i, into the GLS estimator of β 1,..., β p to obtain ML estimates, β 1 ( Σ i ),..., β p ( Σ i ). 54

Restricted Maximum Likelihood (REML) Estimation ML estimation of Σ i is known to be biased in small samples. Bias arises because ML estimate has not taken into account the fact that β 1,..., β p is also estimated from the data. Instead we can use a variant on ML estimation, known as restricted maximum likelihood (REML) estimation. When REML estimation is used to estimate Σ i, β 1,..., β p are estimated by the usual GLS estimator, β 1 ( Σ i ),..., β p ( Σ i ), except Σ i is now the REML estimate of Σ i. 55

Caution: While REML log-likelihood can be used to compare nested models for the covariance, it should not be used to compare nested regression models for the mean. Instead, nested models for the mean should be compared using likelihood ratio tests based on the ML log-likelihood. 56

Modelling Longitudinal Data Longitudinal data present two aspects of the data that require modelling: (i) mean response over time (ii) covariance Models for longitudinal data must jointly specify models for the mean and covariance. Modelling the Mean Two main approaches can be distinguished: (1) analysis of response profiles (2) parametric or semi-parametric curves. 57

Modelling the Covariance Three broad approaches can be distinguished: (1) unstructured or arbitrary pattern of covariance (2) covariance pattern models (3) random effects covariance strucutre 58

MODELLING THE MEAN: ANALYSIS OF RESPONSE PROFILES Basic idea: Compare groups in terms of mean response profiles over time. Useful for balanced longitudinal designs and when there is a single categorical covariate (perhaps denoting different treatment or exposure groups). Analysis of response profiles can be extended to handle more than a single group factor. Analysis of response profiles can also handle missing data. 59

Mean blood lead level (mcg/dl) 10 15 20 25 30 Placebo Succimer 0 2 4 6 Time (weeks) Figure 6: Mean blood lead levels at baseline, week 1, week 4, and week 6 in the succimer and placebo groups. 60

Hypotheses concerning response profiles Given a sequence of n repeated measures on a number of distinct groups of individuals, three main questions: 1. Are the mean response profiles similar in the groups, in the sense that the mean response profiles are parallel? This is a question that concerns the group time interaction effect; see Figure 7(a). 2. Assuming mean response profiles are parallel, are the means constant over time, in the sense that the mean response profiles are flat? This is a question that concerns the time effect; see Figure 7(b). 61

3. Assuming that the population mean response profiles are parallel, are they also at the same level in the sense that the mean response profiles for the groups coincide? This is a questions that concerns the group effect; see Figure 7(c). For many longitudinal studies, main interest is in question 1. 62

(a) Expected Response Group 1 Group 2 1 2 3 4 Time (b) Expected Response Group 1 Group 2 1 2 3 4 Time (c) Expected Response Group 1 Group 2 1 2 3 4 Time Figure 7: Graphical representation of the null hypotheses of (a) no group time interaction effect, (b) no time effect, and (c) no group effect. 63

Table 7: Mean response profile over time in G groups. Measurement Occasion Group 1 2... n 1 µ 1 (1) µ 2 (1)... µ n (1) 2 µ 1 (2) µ 2 (2)... µ n (2).... g µ 1 (g) µ 2 (g)... µ n (g).... G µ 1 (G) µ 2 (G)... µ n (G) 64

Consider two-group case: G = 2 Define j = µ j (1) µ j (2), j = 1,..., n. With G = 2, the first hypothesis in an analysis of response profiles can be expressed as: No group time interaction effect: H 0 : 1 = 2 = = n, (n-1 degrees of freedom) With G 2, the test of the null hypothesis of no group time interaction effect has (G 1) (n 1) degrees of freedom. 65

Focus of analysis is on a global test of the null hypothesis that mean response profiles are parallel. This question concerns the group time interaction effect. In testing this hypothesis, both group and time are regarded as categorical covariates (analogous to two-way ANOVA). Analysis of response profiles can be specified as a regression model with indicator variables for group and time. However, correlation and variability among repeated measures on the same individuals must be properly accounted for. 66

Dummy or Indicator Variable Coding: Consider a factor with k levels: Define X 1 = 1 if measurement or subject belongs to level 1, and 0 otherwise. Define X 2 = 1 if measurement or subject belongs to level 2, and 0 otherwise. Define X 3,..., X k similarly. Note: In model with intercept term, we only require k 1 indicator variables. The choice of omitted indicator variable (e.g., X 1 ) determines what level of the factor is the reference level (e.g., first level). 67

Level X 2 X 3 X 4... X k 1 0 0 0... 0 2 1 0 0... 0 3 0 1 0... 0 4 0 0 1... 0........ k 0 0 0... 1 68

For example, this leads to a simple way of expressing the mean response in a regression model (with intercept β 1 ): or (equivalently) Y ij = β 1 + β 2 X 2ij + + β k X kij + ɛ ij µ j = E(Y ij ) = β 1 + β 2 X 2ij + + β k X kij 69

Note: Mean response for level 1: µ 1 = β 1 Mean response for level 2: µ 2 = β 1 + β 2. Mean response for level k: µ k = β 1 + β k Equivalently: Level 2 versus Level 1 = β 2 Level 3 versus Level 1 = β 3.. Level k versus Level 1 = β k 70

Choice of Reference Level The usual choice of reference group: (i) A natural baseline or comparison group, and/or (ii) group with largest sample size In longitudinal data setting, the baseline or first measurement occasion is a natural reference group for time. 71

In summary, analysis of response profiles can be specified as a regression model with indicator variables for group and time. The global test of the null hypothesis of parallel profiles translates into a hypothesis concerning regression coefficients for the group time interaction being equal to zero. Beyond testing the null hypothesis of parallel profiles, the estimated regression coefficients have meaningful interpretations. 72

Treatment of Lead-Exposed Children Trial In the TLC Trial there are two groups (placebo and succimer) and four measurement occasions (week 0, 1, 4, 6). Let X 1 = 1 for all children at all occasions. Creating indicator variables for group and time: Group: Let X 2 = 1 if child randomized to succimer, X 2 = 0 otherwise. Time: Let X 3 = 1 if measurement at week 1, X 3 = 0 otherwise Let X 4 = 1 if measurement at week 4, X 4 = 0 otherwise Let X 5 = 1 if measurement at week 6, X 5 = 0 otherwise 73

Analysis of response profiles model can be expressed as: Y = β 1 +β 2 X 2 +β 3 X 3 +β 4 X 4 +β 5 X 5 +β 6 X 2 X 3 +β 7 X 2 X 4 +β 8 X 2 X 5 +e Test of group time interaction: H 0 : β 6 = β 7 = β 8 = 0. The analysis must also account for the correlation among repeated measures on the same child. The analysis of response profiles estimates separate variances for each occasion (4 variances) and six pairwise correlations. 74

Table 8: Estimated covariance matrix for the blood lead levels at baseline, week 1, week 4, and week 6 for the children from the TLC trial. Covariance Matrix 25.2 19.1 19.4 17.2 19.1 44.3 35.3 27.5 19.4 35.3 48.9 31.4 17.2 27.5 31.4 43.6 75

Note the discernible increase in the variability in blood lead levels from pre- to post-randomization. This increase in variability from baseline is probably due to: (1) given the treatment group assignment, there may be natural heterogeneity in the individual response trajectories over time, (2) the trial had an inclusion criterion that blood lead levels at baseline were in the range of 20-44 micrograms/dl. 76

Table 9: Tests of fixed effects based on a profile analysis of the blood lead level data at baseline, weeks 1, 4, and 6. EFFECT DF CHI-SQUARE P-VALUE GROUP 1 29.32 <0.0001 WEEK 3 184.68 <0.0001 GROUP*WEEK 3 113.04 <0.0001 77

Test of the group time interaction is based on (multivariate) Wald test (comparison of estimates to SEs). In TLC trial, question of main interest concerns comparison of groups in terms of patterns of change from baseline. This question translates into test of group time interaction. The test of the group time interaction yields a Wald statistic of 113 with 3 degrees of freedom (p < 0.001). Because this is a global test, it indicates that groups differ but does not tell us how they differ. 78

Recall, analysis of response profiles model can be expressed as: Y = β 1 +β 2 X 2 +β 3 X 3 +β 4 X 4 +β 5 X 5 +β 6 X 2 X 3 +β 7 X 2 X 4 +β 8 X 2 X 5 +e Test of group time interaction: H 0 : β 6 = β 7 = β 8 = 0. The 3 single df contrasts for group time interaction have direct interpretations in terms of group comparisons of changes from baseline. They indicate that children treated with succimer have greater decrease in mean blood lead levels from baseline at all occasions when compared to children treated with placebo (see Table 10). 79

Table 10: Estimated regression coefficients and standard errors based on a profile analysis of the blood lead level data. PARAMETER GROUP WEEK ESTIMATE SE Z INTERCEPT 26.27 0.71 36.99 GROUP A 0.27 1.00 0.27 WEEK 1-1.61 0.79-2.03 WEEK 4-2.21 0.84-2.63 WEEK 6-3.03 0.83-3.65 GROUP*WEEK A 1-11.43 1.12-16.20 GROUP*WEEK A 4-9.08 1.19-7.64 GROUP*WEEK A 6-3.84 1.17-3.27 80

Strengths and Weaknesses of Analysis of Response Profiles Strengths: Allows arbitrary patterns in the mean response over time and arbitrary patterns in the covariance. Analysis has a certain robustness since potential risks of bias due to misspecification of models for mean and covariance are minimal. Drawbacks: Requirement that the longitudinal design be balanced. Analysis cannot incorporate mistimed measurements. 81

Analysis ignores the time-ordering (time trends) of the repeated measures in a longitudinal study occasion is regarded as a qualitative factor. Produces omnibus tests of effects that may have low power to detect group differences in specific trends in the mean response over time (e.g., linear trends in the mean response). The number of estimated parameters, G n mean parameters and n(n+1) 2 covariance parameters, grows rapidly with the number of measurement occasions. 82

Further Reading: Fitzmaurice GM, Laird NM and Ware JH. (2004) Applied Longitudinal Analysis. Wiley. Chapter 3 (Sections 3.1, 3.2, 3.4, 3.5) Chapter 4 (Sections 4.1, 4.2, 4.4, 4.5) Chapter 5 (Sections 5.1-5.4, 5.8, 5.9) 83

APPLIED LONGITUDINAL ANALYSIS LECTURE 3 84

MODELLING THE MEAN: PARAMETRIC CURVES Fitting parametric or semi-parametric curves to longitudinal data can be justified on substantive and statistical grounds. Substantively, in many studies true underlying mean response process changes over time in a relatively smooth, monotonically increasing/decreasing pattern. Fitting parsimonious models for mean response results in statistical tests of covariate effects (e.g., treatment time interactions) with greater power than in profile analysis. 85

Polynomial Trends in Time Describe the patterns of change in the mean response over time in terms of simple polynomial trends. The means are modelled as an explicit function of time. This approach can handle highly unbalanced designs in a relatively seamless way. For example, mistimed measurements are easily incorporated in the model for the mean response. 86

Linear Trends over Time Simplest possible curve for describing changes in the mean response over time is a straight line. Slope has direct interpretation in terms of a constant change in mean response for a single unit change in time. Consider two-group study comparing treatment and control, where changes in mean response are approximately linear: E (Y ij ) = β 1 + β 2 Time ij + β 3 Group i + β 4 Time ij Group i, where Group i = 1 if i th individual assigned to treatment, and Group i = 0 otherwise; and Time ij denotes measurement time for the j th measurement on i th individual. 87

Model for the mean for subjects in control group: E (Y ij ) = β 1 + β 2 Time ij, while for subjects in treatment group, E (Y ij ) = (β 1 + β 3 ) + (β 2 + β 4 ) Time ij. Thus, each group s mean response is assumed to change linearly over time (see Figure 8). 88

Group 1 Expected Response Group 2 0 1 2 3 4 5 Time Figure 8: Graphical representation of model with linear trends for two groups. 89

Quadratic Trends over Time When changes in the mean response over time are not linear, higher-order polynomial trends can be considered. For example, if the means are monotonically increasing or decreasing over the course of the study, but in a curvilinear way, a model with quadratic trends can be considered. In a quadratic trend model the rate of change in the mean response is not constant but depends on time. Rate of change must be expressed in terms of two parameters. 90

Consider two-group study example: E (Y ij ) = β 1 + β 2 Time ij + β 3 Time 2 ij + β 4 Group i + β 5 Time ij Group i + β 6 Time 2 ij Group i. Model for subjects in control group: E (Y ij ) = β 1 + β 2 Time ij + β 3 Time 2 ij; while model for subjects in treatment group: E (Y ij ) = (β 1 + β 4 ) + (β 2 + β 5 ) Time ij + (β 3 + β 6 ) Time 2 ij. 91

Group 1 Expected Response Group 2 0 1 2 3 4 5 Time Figure 9: Graphical representation of model with quadratic trends for two groups. 92

Note: mean response changes at different rate, depending upon Time ij. Rate of change in control group is β 2 + 2β 3 Time ij (derivation of this instantaneous rate of change requires familiarity with calculus). Thus, early in the study when Time ij = 1, rate of change is β 2 + 2β 3 ; while later in the study, say Time ij = 4, rate of change is β 2 + 8β 3. Regression coefficients, (β 2 + β 5 ) and (β 3 + β 6 ), have similar interpretations for treatment group. 93

Centering To avoid problems of collinearity it is advisable to center Time j on its mean value prior to the analysis. Replace Time j by its deviation from the mean of (Time 1,Time 2,...,Time n ). Note: centering of Time ij at individual-specific values (e.g., the mean of the n i measurement times for i th individual) should be avoided, as the interpretation of the intercept becomes meaningless. 94

Linear Splines If simplest possible curve is a straight line, then one way to extend the curve is to have sequence of joined line segments that produces a piecewise linear pattern. Linear spline models provide flexible way to accommodate many non-linear trends that cannot be approximated by simple polynomials in time. Basic idea: Divide time axis into series of segments and consider piecewise-linear trends, having different slopes but joined at fixed times. Locations where lines are tied together are known as knots. Resulting piecewise-linear curve is called a spline. Piecewise-linear model often called broken-stick model. 95

Group 1 Expected Response Group 2 0 1 2 3 4 5 Time Figure 10: Graphical representation of model with linear splines for two groups, with common knot. 96

The simplest possible spline model has only one knot. For two-group example, linear spline model with knot at t : E (Y ij ) = β 1 + β 2 Time ij + β 3 (Time ij t ) + + β 4 Group i + β 5 Time ij Group i + β 6 (Time ij t ) + Group i, where (x) + is defined as a function that equals x when x is positive and is equal to zero otherwise. Thus, (Time ij t ) + is equal to (Time ij t ) when Time ij > t and is equal to zero when Time ij t. Note: (Time t ) + can be created as max(time t, 0). 97

Model for subjects in control group: E (Y ij ) = β 1 + β 2 Time ij + β 3 (Time ij t ) +. When expressed in terms of mean response prior/after t : E (Y ij ) = β 1 + β 2 Time ij, Time ij t ; E (Y ij ) = (β 1 β 3 t ) + (β 2 + β 3 )Time ij, Time ij > t. Slope prior to t is β 2 and following t is (β 2 + β 3 ). 98

Model for subjects in treatment group: E (Y ij ) = (β 1 + β 4 ) + (β 2 + β 5 )Time ij + (β 3 + β 6 )(Time ij t ) +. When expressed in terms of mean response prior/after t : E (Y ij ) = (β 1 + β 4 ) + (β 2 + β 5 )Time ij, Time ij t ; E (Y ij ) = [(β 1 + β 4 ) (β 3 + β 6 )t )] + (β 2 + β 3 + β 5 + β 6 )Time ij, Time ij > t. 99

Case Study 1: Vlagtwedde-Vlaardingen Study Epidemiologic study on risk factors for chronic obstructive lung disease. Sample participated in follow-up surveys approximately every 3 years for up to 19 years. Pulmonary function was determined by spirometry: FEV 1. We focus on subset of 133 residents aged 36 or older at entry into study and whose smoking status did not change during follow-up. Each participant was either a current or former smoker. 100

3.6 Former Smokers Current Smokers 3.4 Mean FEV1 (liters) 3.2 3 2.8 2.6 2.4 0 5 10 15 20 Time (years) Figure 11: Mean FEV 1 at baseline (year 0), year 3, year 6, year 9, year 12, year 15, and year 19 in the current and former smoking exposure groups. 101

First we consider a linear trend in the mean response over time, with intercepts and slopes that differ for the two smoking exposure groups. We assume an unstructured covariance matrix. Based on the REML estimates of the regression coefficients in Table 11, the mean response for former smokers is E (Y ij ) = 3.507 0.033 Time ij, while for current smokers, E (Y ij ) = (3.507 0.262) (0.033 + 0.005) Time ij = 3.245 0.038 Time ij. 102

Table 11: Estimated regression coefficients for linear trend model for FEV 1 data from the Vlagtwedde-Vlaardingen study. Variable Smoking Group Estimate SE Z Intercept 3.5073 0.1004 34.94 Smoke i Current 0.2617 0.1151 2.27 Time ij 0.0332 0.0031 10.84 Smoke i Time ij Current 0.0050 0.0035 1.42 103

Thus, both groups have a significant decline in mean FEV 1 over time. But there is no discernible difference between the two smoking exposure groups in the constant rate of change. That is, the Smoke i Time ij interaction (i.e., the comparison of the two slopes) is not significant, with Z = 1.42, p > 0.15. But is the rate of change constant over time? Adequacy of linear trend model can be assessed by including higher-order polynomial trends. 104

For example, we can consider a model that allows quadratic trends for changes in FEV 1 over time. Recall that linear trend model is nested within quadratic trend model. The maximized log-likelihoods for models with linear and quadratic trends are presented in Table 12. LRT test statistic, based on ML not REML, can be compared to a chi-squared distribution with 2 degrees of freedom (or 6, the number of parameters in the quadratic trend model, minus 4, the number of parameters in the linear trend model). 105

Table 12: Maximized (ML) log-likelihoods for models with linear and quadratic trends for FEV 1 data from the Vlagtwedde-Vlaardingen study. Model 2 (ML) Log-Likelihood Quadratic Trend Model 237.2 Linear Trend Model 238.5 2 Log-Likelihood Ratio: G 2 = 1.3, 2 df (p > 0.50) 106

LRT comparing quadratic and linear trend models, produces G 2 = 1.3, with 2 degrees of freedom (p > 0.50). Thus, when compared to quadratic trend model, linear trend model appears to be adequate. Finally, for illustrative purposes, we can make a comparison with a cubic trend model. This produces LRT statistic, G 2 = 4.4, with 4 degrees of freedom (p > 0.35), indicating again that the linear trend model is adequate. 107

Case Study 2: Treatment of Lead-Exposed Children Recall data from TLC trial: Trial Children randomized to placebo or Succimer. Measures of blood lead level at baseline, 1, 4 and 6 weeks. The sequence of means over time in each group is displayed in Figure 12. 108

Mean blood lead level (mcg/dl) 10 15 20 25 30 Placebo Succimer 0 2 4 6 Time (weeks) Figure 12: Mean blood lead levels at baseline, week 1, week 4, and week 6 in the succimer and placebo groups. 109

Given that there are non-linearities in the trends over time, higher-order polynomial models (e.g., a quadratic trend model) could be fit to the data. Alternatively, we can accommodate the non-linearity with a piecewise linear model with common knot at week 1, E (Y ij ) = β 1 + β 2 Week ij + β 3 (Week ij 1) + + β 4 Group i Week ij + β 5 Group i (Week ij 1) +, where Group i = 1 if assigned to succimer, and Group i = 0 otherwise. 110

In this piecewise linear model, means for subjects in placebo group are E (Y ij ) = β 1 + β 2 Week ij + β 3 (Week ij 1) +, while in the succimer group E (Y ij ) = β 1 + (β 2 + β 4 ) Week ij + (β 3 + β 5 ) (Week ij 1) +. Because of randomization, assume a common mean at baseline (no Group main effect). 111

Table 13: Estimated regression coefficients and standard errors based on a piecewise linear model, with knot at week 1. Variable Group Estimate SE Z Intercept 26.3422 0.4991 52.78 Week ij 1.6296 0.7818 2.08 (Week ij 1) + 1.4305 0.8777 1.63 Group Week ij A 11.2500 1.0924 10.30 Group (Week ij 1) + A 12.5822 1.2278 10.25 112

When expressed in terms of mean response prior to/after week 1, estimated means in the placebo group are µ ij = β 1 + β 2 Week ij, Week ij 1; µ ij = ( β 1 β 3 ) + ( β 2 + β 3 ) Week ij, Week ij > 1. Thus, in the placebo group, slope prior to week 1 is β 2 = 1.63 and following week 1 is ( β 2 + β 3 ) = 1.63 + 1.43 = 0.20. 113

Similarly, when expressed in terms of the mean response prior to and after week 1, the estimated means for subjects in the succimer group are given by µ ij = β 1 + ( β 2 + β 4 ) Week ij, Week ij 1; µ ij = β 1 ( β 3 + β 5 ) + ( β 2 + β 3 + β 4 + β 5 ) Week ij, Week ij > 1. 114

Estimates of mean blood lead levels for placebo and succimer groups are presented in Table 14. The estimated means appear to adequately fit observed mean response profiles for two treatment groups. Note piecewise linear and quadratic trend models (with common intercept for two groups) are not nested. Because they have same number of parameters, their log-likelihoods (LL) can be directly compared: piecewise linear model fits better than quadratic trend model ( 2 ML LL = 2436.2 for piecewise linear model versus 2 ML LL = 2551.7 for quadratic trend model). 115

Table 14: Estimated mean blood lead levels for placebo and succimer groups from linear spline model (knot at week 1). Observed means in parentheses. Group Week 0 Week 1 Week 4 Week 6 Succimer 26.3 13.5 16.7 19.1 (26.5) (13.5) (15.5) (20.8) Placebo 26.3 24.7 24.1 23.7 (26.3) (24.7) (24.1) (23.2) 116

Further Reading: Fitzmaurice GM, Laird NM and Ware JH. (2004) Applied Longitudinal Analysis. Wiley. Chapter 6 117

APPLIED LONGITUDINAL ANALYSIS LECTURE 4 118

MODELLING THE COVARIANCE Longitudinal data present two aspects of the data that require modelling: mean response over time and covariance. Although these two aspects of the data can be modelled separately, they are interrelated. Choice of models for mean response and covariance are interdependent. A model for the covariance must be chosen on the basis of some assumed model for the mean response. Covariance between any pair of residuals, say [Y ij µ ij (β)] and [Y ik µ ik (β)], depends on the model for the mean, i.e., depends on β. 119

Modelling the Covariance Three broad approaches can be distinguished: (1) unstructured or arbitrary pattern of covariance (2) covariance pattern models (3) random effects covariance structure 120

Unstructured Covariance Appropriate when design is balanced and number of measurement occasions is relatively small. No explicit structure is assumed other than homogeneity of covariance across different individuals, Cov(Y i ) = Σ i = Σ. Chief advantage: no assumptions made about the patterns of variances and covariances. 121

With n measurement occasions, unstructured covariance matrix has n (n+1) 2 parameters: the n variances and n (n 1)/2 pairwise covariances (or correlations), Cov(Y i ) = σ 2 1 σ 12... σ 1n σ 21 σ2 2... σ 2n...... σ n1 σ n2... σ 2 n. 122

Potential drawbacks: Number of covariance parameters grows rapidly with the number of measurement occasions: For n = 3 number of covariance parameters is 6 For n = 5 number of covariance parameters is 15 For n = 10 number of covariance parameters is 55 When number of covariance parameters is large, relative to sample size, estimation is likely to be very unstable. Use of an unstructured covariance is appealing only when N is large relative to n (n+1) 2. Unstructured covariance is problematic when there are mistimed measurements. 123

Covariance Pattern Models When attempting to impose some structure on the covariance, a subtle balance needs to be struck. With too little structure there may be too many parameters to be estimated with limited amount of data. With too much structure, potential risk of model misspecification and misleading inferences concerning β. Classic tradeoff between bias and variance. Covariance pattern models have their basis in models for serial correlation originally developed for time series data. 124

Compound Symmetry Assumes variance is constant across occasions, say σ 2, and Corr(Y ij, Y i,k ) = ρ for all j and k. 1 ρ ρ... ρ ρ 1 ρ... ρ Cov(Y i ) = σ 2 ρ ρ 1... ρ........ ρ ρ ρ... 1 Parsimonious: two parameters regardless of number of measurement occasions. Strong assumptions about variance and correlation are usually not valid with longitudinal data. 125

Toeplitz Assumes variance is constant across occasions, say σ 2, and Corr(Y ij, Y i,j+k ) = ρ k for all j and k. 1 ρ 1 ρ 2... ρ n 1 ρ 1 1 ρ 1... ρ n 2 Cov(Y i ) = σ 2 ρ 2 ρ 1 1... ρ n 3........ ρ n 1 ρ n 2 ρ n 3... 1 Assumes correlation among responses at adjacent measurement occasions is constant, ρ 1. 126

Toeplitz only appropriate when measurements are made at equal (or approximately equal) intervals of time. Toeplitz covariance has n parameters (1 variance parameter, and n 1 correlation parameters). A special case of the Toeplitz covariance is the (first-order) autoregressive covariance. 127

Autoregressive Assumes variance is constant across occasions, say σ 2, and Corr(Y ij, Y i,j+k ) = ρ k for all j and k, and ρ 0. 1 ρ ρ 2... ρ n 1 ρ 1 ρ... ρ n 2 Cov(Y i ) = σ 2 ρ 2 ρ 1... ρ n 3........ ρ n 1 ρ n 2 ρ n 3... 1 Parsimonious: only 2 parameters, regardless of number of measurement occasions. Only appropriate when the measurements are made at equal (or approximately equal) intervals of time. 128

Compound symmetry, Toeplitz and autoregressive covariances assume variances are constant across time. This assumption can be relaxed by considering versions of these models with heterogeneous variances, Var(Y ij ) = σj 2. A heterogeneous autoregressive covariance pattern: σ1 2 ρσ 1 σ 2 ρ 2 σ 1 σ 3... ρ n 1 σ 1 σ n ρσ 1 σ 2 σ2 2 ρσ 2 σ 3... ρ n 2 σ 2 σ n Cov(Y i ) = ρ 2 σ 1 σ 3 ρσ 2 σ 3 σ3 2... ρ n 3 σ 3 σ n......., ρ n 1 σ 1 σ n ρ n 2 σ 2 σ n ρ n 3 σ 3 σ n... σ 2 n and has n + 1 parameters (n variance parameters and 1 correlation parameter). 129

Banded Assumes correlation is zero beyond some specified interval. For example, a banded covariance pattern with a band size of 3 assumes that Corr(Y ij, Y i,j+k ) = 0 for k 3. It is possible to apply a banded pattern to any of the covariance pattern models considered so far. 130

A banded Toeplitz covariance pattern with a band size of 2 is given by, 1 ρ 1 0... 0 ρ 1 1 ρ 1... 0 Cov(Y i ) = σ 2 0 ρ 1 1... 0,....... 0 0 0... 1 where ρ 2 = ρ 3 = = ρ n 1 = 0. Banding makes very strong assumption about how quickly the correlation decays to zero with increasing time separation. 131

Exponential When measurement occasions are not equally-spaced over time, autoregressive model can be generalized as follows. Let {t i1,..., t in } denote the observation times for the i th individual and assume that the variance is constant across all measurement occasions, say σ 2, and Corr(Y ij, Y ik ) = ρ t ij t ik, for ρ 0. Correlation between any pair of repeated measures decreases exponentially with the time separations between them. 132

Referred to as exponential covariance because it can be re-expressed as Cov(Y ij, Y ik ) = σ 2 ρ t ij t ik = σ 2 exp ( θ t ij t ik ), where θ = log(ρ) or ρ = exp ( θ) for θ 0. Exponential covariance model is invariant under linear transformation of the time scale. If we replace t ij by (a + bt ij ) (e.g., if we replace time measured in weeks by time measured in days ), the same form for the covariance matrix holds. 133

Choice among Covariance Pattern Models Choice of models for covariance and mean are interdependent. Choice of model for covariance should be based on a maximal model for the mean that minimizes any potential misspecification. With balanced designs and a very small number of discrete covariates, choose saturated model for the mean response. Saturated model: includes main effects of time (regarded as a within-subject factor) and all other main effects, in addition to their two- and higher-way interactions. 134

Maximal model should be in a certain sense the most elaborate model for the mean response that we would consider from a subject-matter point of view. Once maximal model has been chosen, residual variation and covariation can be used to select appropriate model for covariance. For nested covariance pattern models, a likelihood ratio test statistic can be constructed that compares full and reduced models. 135

Recall: two models are said to be nested when the reduced model is a special case of the full model. For example, compound symmetry model is nested within the Toeplitz model, since if the former holds the latter must necessarily hold, with ρ 1 = ρ 2 = = ρ n 1. Likelihood ratio test is obtained by taking twice the difference in the respective maximized REML log-likelihoods, G 2 = 2( l full l red ), and comparing statistic to a chi-squared distribution with df equal to difference between the number of covariance parameters in full and reduced models. 136

To compare non-nested model, an alternative approach is the Akaike Information Criterion (AIC). According to the AIC, given a set of competing models for the covariance, one should select the model that minimizes AIC = 2(maximized log-likelihood) + 2(number of parameters) = 2( l c), where l is the maximized REML log-likelihood and c is the number of covariance parameters. 137

Exercise Therapy Trial subjects were assigned to one of two weightlifting programs to increase muscle strength. treatment 1: number of repetitions of the exercises was increased as subjects became stronger. treatment 2, number of repetitions was held constant but amount of weight was increased as subjects became stronger. Measurements of body strength were taken at baseline and on days 2, 4, 6, 8, 10, and 12. We focus only on measures of strength obtained at baseline (or day 0) and on days 4, 6, 8, and 12. 138

Before considering models for the covariance, it is necessary to choose a maximal model for the mean response. We chose maximal model to be the saturated model for the mean. First, we consider an unstructured covariance matrix. Note that the variance is larger by the end of the study when compared to the variance at baseline. Correlations decrease as the time separation between the repeated measures increases. 139

Table 15: Estimated unstructured covariance matrix for the strength data at baseline, day 4, day 6, day 8, and day 12. Day 0 4 6 8 12 0 9.668 10.175 8.974 9.812 9.407 4 10.175 12.550 11.091 12.580 11.928 6 8.974 11.091 10.642 11.686 11.101 8 9.812 12.580 11.686 13.990 13.121 12 9.407 11.928 11.101 13.121 13.944 140

Table 16: Estimated unstructured correlation matrix for the strength data at baseline, day 4, day 6, day 8, and day 12. Day 0 4 6 8 12 0 1.0000 0.9237 0.8847 0.8437 0.8102 4 0.9237 1.0000 0.9597 0.9494 0.9017 6 0.8847 0.9597 1.0000 0.9577 0.9113 8 0.8437 0.9494 0.9577 1.0000 0.9394 12 0.8102 0.9017 0.9113 0.9394 1.0000 141

Despite apparent increase in variance over time, we consider an autoregressive model for the correlation. It results in the following estimates of the variance and correlation parameters, σ 2 = 11.87 and ρ = 0.94. This model is not very appropriate as data are unequally spaced over time. Consider exponential model for the covariance, where Corr(Y ij, Y ik ) = ρ t ij t ik, for t i = (0, 4, 6, 8, 12) for all subjects. Results: σ 2 = 11.87 and ρ = 0.98. 142

Table 17: Estimated autoregressive correlation matrix for the strength data at baseline, day 4, day 6, day 8, and day 12. Day 0 4 6 8 12 0 1.0000 0.9402 0.8839 0.8311 0.7813 4 0.9402 1.0000 0.9402 0.8839 0.8311 6 0.8839 0.9402 1.0000 0.9402 0.8839 8 0.8311 0.8839 0.9402 1.0000 0.9402 12 0.7813 0.8311 0.8839 0.9402 1.0000 143

Table 18: Estimated exponential correlation matrix for the strength data at baseline, day 4, day 6, day 8, and day 12. Day 0 4 6 8 12 0 1.0000 0.9169 0.8780 0.8408 0.7709 4 0.9169 1.0000 0.9576 0.9169 0.8408 6 0.8780 0.9576 1.0000 0.9576 0.8780 8 0.8408 0.9169 0.9576 1.0000 0.9169 12 0.7709 0.8408 0.8780 0.9169 1.0000 144

There is a hierarchy among the models: autoregressive and exponential are both nested within unstructured. The autoregressive and exponential models are not nested but have the same number of parameters. Any comparison between these two models can be made directly in terms of their maximized log-likelihoods. LRT comparing autoregressive and unstructured covariance, G 2 = 621.1 597.3 = 23.8, with 13 (or 15-2) degrees of freedom (p < 0.05). There is evidence that the autoregressive model does not provide an adequate fit to the covariance. 145

LRT comparing exponential and unstructured covariance, yields G 2 = 618.5 597.3 = 21.2, and when compared to a chi-squared distribution with 13 degrees of freedom, p > 0.05. Exponential covariance provides an adequate fit to the data. Also, in terms of AIC, the exponential model minimizes this criterion. 146

Table 19: Comparison of the maximized (REML) loglikelihoods and AIC for the covariance pattern models for the strength data from the exercise therapy trial. Covariance Pattern Model -2 (REML) Log-Likelihood AIC Unstructured 597.3 627.3 Autoregressive 621.1 625.1 Exponential 618.5 622.5 147

Strengths/Weaknesses of Covariance Pattern Models Covariance pattern models attempt to characterize the covariance with a relatively small number of parameters. However, many models (e.g., autoregressive, Toeplitz, and banded) appropriate only when repeated measurements are obtained at equal intervals and cannot handle irregularly timed measurements. While there is a large selection of models for correlations, choice of models for variances is limited. They are not well-suited for modelling data from inherently unbalanced longitudinal designs. 148

Further Reading: Fitzmaurice GM, Laird NM and Ware JH. (2004) Applied Longitudinal Analysis. Wiley. Chapter 7 (Sections 7.1-7.4, 7.7) 149

APPLIED LONGITUDINAL ANALYSIS LECTURE 5 150

LINEAR MIXED EFFECTS MODEL So far, we have discussed models for longitudinal data where changes in the mean response can be expressed as, E(Y ij X ij ) = β 1 X ij1 + β 2 X ij2 + + β p X ijp, and where the primary goal is to make inferences about the population regression parameters, β 1,..., β p. Specification of this regression model is completed by making additional assumptions about structure of Cov(Y i ) = Σ i. Next, we consider an alternative, but closely related, approach for analyzing longitudinal data using linear mixed effects models. 151

Basic idea: some subset of the regression parameters vary randomly from one individual to another, thereby accounting for sources of natural heterogeneity in the population. Individuals in population are assumed to have their own subject-specific mean response trajectories over time. Distinctive feature: mean response modelled as a combination of population characteristics, β 1,..., β p, assumed to be shared by all individuals, and subject-specific effects that are unique to a particular individual. 152

The former are referred to as fixed effects, while the latter are referred to as random effects. The term mixed is used in this context to denote that the model contains both fixed and random effects. Note: It is not only possible to estimate parameters that describe how the mean response changes in the population of interest, but it is also possible to predict how individual-specific response trajectories change over time. 153

Example: Random Intercept Model Simplest possible case is the linear model with a randomly varying subject effect (or random intercept). Each subject assumed to have underlying level of response which persists over time, incorporated in linear mixed effects model by regarding this subject effect as random, Y ij = β 1 X ij1 + β 2 X ij2 + + β p X ijp + b i + e ij, where b i is the random subject effect and the e ij are regarded as measurement or sampling errors. Response for i th subject at j th occasion is assumed to differ from population mean by a subject effect, b i, and a within-subject measurement error, e ij. 154

Subject-effect and measurement error are assumed to be random, with mean zero, Var (b i ) = σ 2 b and Var (e ij) = σ 2. b i and e ij are assumed to be independent of one another. Model describes conditional mean response trajectory over time for a specific individual, E(Y ij b i ) = β 1 X ij1 + β 2 X ij2 + + β p X ijp + b i, in addition to marginal mean response profile in the population (i.e., averaged over all individuals in population), E(Y ij ) = β 1 X ij1 + β 2 X ij2 + + β p X ijp. 155

Y ij = β 1 X ij1 + β 2 X ij2 + + β p X ijp + b i + e ij = (β 1 + b i ) + β 2 X ij2 + + β p X ijp + e ij, where X ij1 = 1 for all i and j, and β 1 is then the fixed effect intercept term in the model. Intercept for the i th individual is β 1 + b i and varies randomly from one individual to another. Because mean of the random effect b i is assumed to be zero, b i represents the deviation of the i th individual s intercept (β 1 + b i ) from the population intercept, β 1. 156

A Response B 0 1 2 3 4 5 Time Figure 13: Graphical representation of the marginal and conditional mean responses over time. 157

Figure 13 provides graphical representation of model. Marginal mean response over time in the population changes linearly with time (denoted by the solid line). Conditional mean responses for two specific individuals, subjects A and B, deviate from the population trend (denoted by the broken lines). Individual A responds higher than the population average and thus has a positive b i. Individual B responds lower than the population average and has a negative b i. Inclusion of measurement errors, e ij, allows response at any occasion to vary randomly above/below subject-specific trajectories (see Figure 14). 158

A Response B 0 1 2 3 4 5 Time Figure 14: Graphical representation of the marginal and conditional mean responses over time, plus measurement errors. 159

Consider the marginal covariance among the repeated measurements on the same individual. When averaged over the individual-specific effects, the marginal mean of Y ij is given by E(Y ij ) = µ ij = β 1 X ij1 + β 2 X ij2 + + β p X ijp. The marginal covariance among the Y ij is defined in terms of deviations of Y ij from the marginal mean, µ ij. 160

For the model with randomly varying intercepts, the marginal covariance has the following compound symmetry pattern σ b 2 + σ2 σb 2 σb 2... σb 2 σ b 2 σb 2 + σ2 σb 2... σ 2 b Cov(Y i ) = σ b 2 σb 2 σb 2 + σ2... σb 2,....... σb 2 σb 2 σb 2... σb 2 + σ2 and Corr(Y ij, Y ik ) = σ2 b σ 2 b +σ2. This emphasizes an important aspect of mixed effects models: the introduction of a random subject effect, b i, can be seen to induce correlation among the repeated measurements. 161

Linear Mixed Effects Model Next, we consider generalizations by allowing additional regression coefficients to vary randomly. By allowing a subset of the regression coefficients to vary randomly, a very flexible, and yet quite parsimonious, class of random effects covariance structures becomes available. In the following we assume there are N individuals on whom we have collected n i repeated observations, with the response variable Y ij measured at time t ij. 162

Any of the components of β 1,..., β p can be allowed to vary randomly. The collection of random effects, denoted b i, are assumed to have a multivariate normal distribution with mean zero and covariance matrix G. That is, E(b i ) = 0 and Cov(b i ) = G. Ordinarily, the errors e ij and e ik are assumed to be uncorrelated, with equal variance, σ 2 ; the e ij s can be thought of as sampling or measurement errors. 163

Example: Random Intercept and Slope Model Consider model with intercepts and slopes that vary randomly among individuals, Y ij = β 1 + β 2 t ij + b i1 + b i2 t ij + e ij, j = 1,..., n i. This model posits that individuals vary not only in their baseline level of response (when t i1 = 0), but also in terms of their changes in the mean response over time. 164

The effects of covariates (e.g. due to treatments, exposures) can be incorporated by allowing mean of intercepts and slopes to depend on covariates. For example, consider two-group study comparing a treatment and a control group: Y ij = β 1 + β 2 t ij + β 3 Group i + β 4 t ij Group i + b i1 + b i2 t ij + e ij, where Group i = 1 if the i th individual assigned to treatment, and Group i = 0 otherwise. 165

A Response B 0 1 2 3 4 5 Time Figure 15: Graphical representation of the marginal and conditional mean responses over time, plus measurement errors. 166

Next, consider the covariance among the responses. Let Var (b i1 ) = g 11, Var (b i2 ) = g 22, and Cov (b i1, b i2 ) = g 12. If we also assume that Cov(e ij, e ik ) = 0 and Var(e ij ) = σ 2, then Var (Y ij ) = Var (b i1 + b i2 t ij + e i ) = Var(b i1 ) + 2t ij Cov(b i1, b i2 ) + t 2 ij Var(b i2) + Var(e i ) = g 11 + 2t ij g 12 + t 2 ij g 22 + σ 2. Similar;y, it can be shown that Cov (Y ij, Y ik ) = g 11 + (t ij + t ik ) g 12 + t ij t ik g 22. Thus, in this model for longitudinal data the covariance matrix, Cov(Y i ), can be expressed as a function of time, t ij. 167

Random Effects Covariance Structure Next, we consider form of induced random effects covariance structure in the more general case when Y i = X i β + Z i b i + e i. Recall that we can distinguish the conditional mean of Y i, given b i, E (Y i b i ) = X i β + Z i b i, from the marginal or population-averaged mean of Y i, E (Y i ) = X i β. Similarly, we can distinguish conditional and marginal covariances. 168

The conditional covariance of Y i, given b i, is Cov (Y i b i ) = Cov (e i ) = R i, while the marginal covariance of Y i, averaged over the distribution of b i, is given by, Cov (Y i ) = Cov (Z i b i ) + Cov (e i ) = Z i Cov(b i )Z i + Cov(e i ) = Z i GZ i + R i. Even when R i = Cov(e i ) = σ 2 I ni, a diagonal matrix (with all pairwise correlations equal to zero), Cov (Y i ) = Z i GZ i + σ 2 I ni is emphatically not a diagonal matrix. 169

Cov (Y i ) will, in general, have non-zero off-diagonal elements, thereby accounting for the correlation among repeated observations on the same individuals. Introduction of random effects, b i, induces correlation among the components of Y i. Cov (Y i ) has been described in terms of a set of covariance parameters, some defining G and some defining R i. Linear mixed effects model allows for the explicit analysis of between-subject (G) and within-subject (R i ) sources of variation in the responses. Finally, note the marginal covariance of Y i is a function of the times of measurement. 170

Prediction of Random Effects In many applications, inference is focused on fixed effects, β 1,..., β p. However, we can also estimate (or predict) individual-specific response trajectories over time. We can obtain predictions of the subject-specific effects, b i (or of the subject-specific response trajectories). This is known as best linear unbiased predictor (or BLUP). 171

Interestingly, the predicted response trajectory for the i th subject is a weighted combination of estimated population-averaged mean response profile, µ i1,..., µ ini (where µ ij = β 1 X ij1 + β 2 X ij2 + + β p X ijp ), and i th subject s observed response profile Y i1,..., Y ini. This helps to explains why it is often said that empirical BLUP shrinks i th subject s predicted response profile towards population-averaged mean. 172

For example, consider the random intercept model Y ij = β 1 X ij1 + β 2 X ij2 + + β p X ijp + b i + e ij, where Var (b i ) = σ 2 b and Var (e ij) = σ 2. It can be shown that the BLUP for b i is: bi = w 1 n i (Y ij µ ij ) +(1 w) 0, where w = n i j=1 n iσ 2 b n i σ 2 b + σ2. That is, a weighted-average of zero (mean of b i ) and the mean residual for the i th subject. Note: Less shrinkage (toward zero) when n i is large and when σ 2 b is large relative to σ2. 173

Example: Influence of Menarche on Changes in Body Fat Prospective study on body fat accretion in a cohort of 162 girls from the MIT Growth and Development Study. At start of study, all the girls were pre-menarcheal and non-obese All girls were followed over time according to a schedule of annual measurements until four years after menarche. The final measurement was scheduled on the fourth anniversary of their reported date of menarche. At each examination, a measure of body fatness was obtained based on bioelectric impedance analysis. 174

175

176

Figure 16: Timeplot of percent body fat against age (in years). 177

Consider an analysis of the changes in percent body fat before and after menarche. For the purposes of these analyses time is coded as time since menarche and can be positive or negative. Note: measurement protocol is the same for all girls. Study design is balanced if timing of measurement is defined as time since baseline measurement. It is inherently unbalanced when timing of measurements is defined as time since a girl experienced menarche. 178

40 Percent body fat 30 20 10 0-6 -4-2 0 2 4 Time relative to menarche (years) Figure 17: Timeplot of percent body fat against time, relative to age of menarche (in years). 179

Consider hypothesis that %body fat accretion increases linearly with age, but with different slopes before/after menarche. We assume that each girl has a piecewise linear spline growth curve with a knot at the time of menarche. Each girl s growth curve can be described with an intercept and two slopes, one slope for changes in response before menarche, another slope for changes in response after menarche. Note the knot is not a fixed age for all subjects. Let t ij denote time of the j th measurement on i th subject before or after menarche (i.e., t ij = 0 at menarche). 180

Figure 18: Graphical representation of piecewise linear trajectory. 181

We consider the following linear mixed effects model E(Y ij b i ) = β 1 + β 2 t ij + β 3 (t ij ) + + b i1 + b i2 t ij + b i3 (t ij ) +, where (t ij ) + = t ij if t ij > 0 and (t ij ) + = 0 if t ij 0. (β 1 + b i1 ) is intercept for i th subject and is the true %body fat at menarche (when t ij = 0). Similarly, (β 2 + b i2 ) is i th subject s slope, or rate of change in %body fat during the pre-menarcheal period. Finally, the i th subject s slope during the post-menarcheal period is given by [(β 2 + β 3 ) + (b i2 + b i3 )]. Note: goal of analysis is to assess whether population slopes differ before and after menarche, i.e., H 0 : β 3 = 0. 182

Table 20: Estimated regression coefficients (fixed effects) and standard errors for the percent body fat data. PARAMETER ESTIMATE SE Z INTERCEPT 21.3614 0.5646 37.84 time 0.4171 0.1572 2.65 (time) + 2.0471 0.2280 8.98 183

Table 21: Estimated covariance of the random effects and standard errors for the percent body fat data. PARAMETER ESTIMATE SE Z Var(b i1 ) = g 11 45.9413 5.7393 8.00 Var(b i2 ) = g 22 1.6311 0.4331 3.77 Var(b i3 ) = g 33 2.7497 0.9635 2.85 Cov(b i1, b i2 ) = g 12 2.5263 1.2185 2.07 Cov(b i1, b i3 ) = g 13-6.1096 1.8730-3.26 Cov(b i2, b i3 ) = g 23-1.7505 0.5980-2.93 Var(e i ) = σ 2 9.4732 0.5443 17.40 184

Results: Based on magnitude of β 3, relative to its standard error, slopes before and after menarche differ. Estimate of the population mean pre-menarcheal slope is 0.42, which is statistically significant at the 0.05 level. Estimated slope is rather shallow and indicates that the annual rate of body fat accretion is less that 0.5%. Estimate of population mean post-menarcheal slope is 2.46 (with SE = 0.12), which is significant at 0.05 level. Indicates that annual rate of growth in body fat is approximately 2.5%, almost 6 times higher than pre-menarche. 185

Estimated variance of b i2 is 1.6, indicating substantial variability from girl to girl in rates of fat accretion during the pre-menarcheal period. Approx. 95% of girls have changes in percent body fat between -2.09% and 2.92% ( i.e., 0.42 ± 1.96 1.63 ). Estimated variance of slopes during post-menarcheal period, Var(b i2 + b i3 ), is 0.88 (or [1.63 + 2.75 2 1.75]), indicating less variability in the slopes after menarche. Approx. 95% of girls have changes in percent body fat between 0.62% and 4.30% ( i.e., 2.46 ± 1.96 0.88 ). 186

Results indicate that more than 95% of girls are expected to have increases in body fat during the post-menarcheal period. Substantially fewer (approximately 63%) are expected to have increases in body fat during the pre-menarcheal period. There is strong positive correlation (approximately 0.8) between annual measurements of percent body fat. Strength of correlation declines over time, but does not decay to zero even when measurements are taken 8 years apart. 187

Table 22: Marginal correlations (off-diagonals) among repeated measures of percent body fat between 4 years pre- and postmenarche, with estimated variances along main diagonal. -4-3 -2-1 0 1 2 3 4 61.3 0.82 0.78 0.71 0.61 0.60 0.57 0.52 0.47 0.82 54.9 0.81 0.76 0.70 0.68 0.64 0.60 0.54 0.78 0.81 51.8 0.80 0.76 0.74 0.71 0.66 0.60 0.71 0.76 0.80 52.0 0.81 0.79 0.76 0.71 0.64 0.61 0.70 0.76 0.81 55.4 0.81 0.78 0.73 0.66 0.60 0.68 0.74 0.79 0.81 49.1 0.79 0.76 0.70 0.57 0.64 0.71 0.76 0.78 0.79 44.6 0.77 0.74 0.52 0.60 0.66 0.71 0.73 0.76 0.77 41.8 0.76 0.47 0.54 0.60 0.64 0.66 0.70 0.74 0.76 40.8 188

The mixed effects model can be used to obtain estimates of each girl s growth trajectory over time. Figure 19 displays estimated population mean growth curve and predicted (empirical BLUP) growth curves for two girls. Note: two girls differ in the number of measurements obtained (6 and 10 respectively). A noticeable feature of the predicted growth curves is that there is more shrinkage towards the population mean curve when fewer data points are available. 189

30 Percent body fat 20 10-6 -4-2 0 2 4 Time relative to menarche (years) Figure 19: Population average curve and empirical BLUPs for two randomly selected girls. 190

Predicted growth curve for girl with only 6 data points is pulled closer to the population mean curve. Predicted growth curve for girl with 10 observation follows her data more closely. This becomes more apparent when empirical BLUPs are compared to ordinary least squares (OLS) estimates based only on observations from each girl (see Figure 20). 191

30 Percent body fat 20 10-6 -4-2 0 2 4 Time relative to menarche (years) Figure 20: Population average curve, empirical BLUPs, and OLS predictions for two randomly selected girls. 192

Additional Questions: 1. Is there non-linearity of curve after menarche? Can add (t 2 ij ) + term to the model as a fixed effect (β) and/or as a random effect (b i ). 2. Does parental obesity affect percent body fat? Does parental obesity affect slope post-menarche? Can allow mean % body fat at menarche and slopes to depend on parental obesity, add more β parameters (fixed effects). 193

Further Reading: Fitzmaurice GM, Laird NM and Ware JH. (2004) Applied Longitudinal Analysis. Wiley. Chapter 8 Cnaan A, Slasor P and Laird NM. (1997) Tutorial in Biostatistics: Using the general linear mixed model to analyze unbalanced repeated measures and longitudinal data. Statistics in Medicine, 16:2349 2380. Naumova EN, Must A and Laird NM. (2001) Evaluating the impact of critical periods in longitudinal studies of growth using piecewise mixed effects models. International Journal of Epidemiology, 30:1332 1341. 194

APPLIED LONGITUDINAL ANALYSIS LECTURE 6 195

INTRODUCTION TO GENERALIZED LINEAR MODELS FOR LONGITUDINAL DATA So far, focus has been on methods for longitudinal data where response variable is continuous. The main emphasis was on linear models for a vector of responses assumed to have a multivariate normal distribution. Today, we consider non-gaussian responses and generalized linear models. Since binary data are the most common case, we focus heavily on methods for repeated binary data. 196

Generalized linear models are a class of regression models; they incude the standard linear regression model but also many other important regression models used in biomedical research: - Linear regression for continuous data - Logistic regression for binary data - Loglinear models for count data In today s lecture, we will review the generalized linear model and show how we can apply generalized linear models in the longitudinal data setting. 197

Motivating Example Six Cities Study of Respiratory Illness in Children (Laird, Beck and Ware, 1984) A non-randomized longitudinal study of the health effects of air pollution. Subset of data from Steubenville, Ohio Outcome: Binary indicator of respiratory illness in child Measurement schedule: Four annual measurements at ages 7, 8, 9, and 10. Interested in the influence of maternal smoking on children s respiratory illness. Sample size: 537 children 198

199

MOTIVATING EXAMPLE Clinical trial of anti-epileptic drug progabide (Thall and Vail, Biometrics, 1990) Randomized, placebo-controlled study of treatment of epileptic seizures with progabide. Patients were randomized to treatment with progabide, or to placebo in addition to standard chemotherapy. Outcome variable: Count of number of seizures Measurement schedule: Baseline measurement during 8 weeks prior to randomization. Four measurements during consecutive two-week intervals. Sample size: 28 epileptics on placebo; 31 epileptics on progabide 200

201

Summary In longitudinal studies the outcome variable can be: - continuous - binary - count The data set can be incomplete. Subjects may be measured at different occasions. Over the next three days of lectures we will develop statistical methods that can handle all of these cases. 202

Generalized Linear Model In generalized linear models: (i) the distribution of the response is assumed to belong to a single family of distributions known as the exponential family, e.g., normal, Bernoulli, binomial, and Poisson distributions. (ii) A transformation of the mean response is then linearly related to the covariates, via an appropriate link function. 203

Mean and Variance of Exponential Family Distribution Exponential family distributions share some common statistical properties. The variance of Y i can be expressed in terms of Var (Y i ) = φv(µ i ), where the scale parameter φ > 0. The variance function, v(µ i ), describes how the variance of the response is functionally related to µ i, the mean of Y i. 204

Link Function The link function applies a transformation to the mean and then links the covariates to the transformed mean, g(µ i ) = β 1 X i1 + β 2 X i2 +... + β p X ip = p β j X ij, j=1 where link function g( ) is known function, e.g., log(µ i ). This implies that it is the transformed mean response that changes linearly with changes in the values of the covariates. 205

Canonical link and variance functions for the normal, Bernoulli, and Poisson distributions. Distribution Var. Function, v(µ) Canonical Link Normal v(µ) = 1 Identity: µ = η Bernoulli v(µ) = µ(1 µ) Logit: log [ ] µ (1 µ) = η Poisson v(µ) = µ Log: log(µ) = η where η = β 1 X 1 + β 2 X 2 +... + β p X p. 206

Common Examples Normal distribution: If we assume that g( ) is the identity function, then g (µ) = µ µ i = β 1 X i1 + β 2 X i2 +... + β p X ip = p β j X ij j=1 gives the standard linear regression model. We can, however, choose other link functions when they seem appropriate to the application. 207

Bernoulli distribution: For the Bernoulli distribution, 0 < µ i < 1, so we would prefer a link function that transforms the interval [0, 1] on to the entire real line [, ]. There are several possibilities: logit : η i = ln [(µ i / (1 µ i ))] probit : η i = Φ 1 (µ i ) where Φ( ) is the standard normal cumulative distribution function. Another possibility is the complementary log-log link function: η i = ln [ ln (1 µ i )] 208

GENERALIZED LINEAR MODELS FOR LONGITUDINAL DATA Next, we consider a general approach for analyzing longitudinal responses. Marginal models can be considered an extensions of generalized linear models to correlated data. The main focus will be on discrete response data, e.g. count data or binary responses. 209

MARGINAL MODELS The basic premise of marginal models is to make inferences about population averages. The term marginal is used here to emphasize that the mean response modelled is conditional only on covariates and not on other responses or random effects. A feature of marginal models is that the models for the mean and the covariance are specified separately. 210

Notation Assume sample of N subjects are measured repeatedly over time. We let Y ij denote the response variable for the i th subject on the j th measurement occasion. Y ij can be continuous, binary, or a count. Associated with each response, Y ij, there is a p 1 vector of covariates, X ij. 211

The focus of marginal models is on inferences about population averages. The marginal expectation, µ ij = E (Y ij ), of each response is modelled as a function of covariates. Specifically, with marginal models we make the following three assumptions: 212

1. the marginal expectation of the response, µ ij, depends on covariates, X ij, through a known link function g (µ ij ) = η ij = β 1 X ij1 + β 2 X ij2 +... + β p X ijp 2. the marginal variance of Y ij depends on the marginal mean according to V ar (Y ij ) = υ (µ ij ) φ where υ (µ ij ) is a known variance function and φ is a scale parameter that may need to be estimated. 3. the within-subject association among the responses is a function of the means and perhaps of additional parameters, say α, that may also need to be estimated. 213

Illustrative Examples of Marginal Models: Example 1. Continuous responses: 1. µ ij = η ij = β 1 X ij1 + β 2 X ij2 +... + β p X ijp (i.e. linear regression) 2. V ar (Y ij ) = φ j (i.e. heterogeneous variance) 3. Corr (Y ij, Y ik ) = α k j (0 α 1) (i.e. autoregressive correlation) 214

Example 2. Binary responses: 1. Logit (µ ij ) = η ij = β 1 X ij1 + β 2 X ij2 +... + β p X ijp (i.e. logistic regression) 2. V ar (Y ij ) = µ ij (1 µ ij ) (i.e. Bernoulli variance) 3. Corr (Y ij, Y ik ) = α jk (i.e. unstructured correlation) 215

Example 3. Count data: 1. Log (µ ij ) = η ij = β 1 X ij1 + β 2 X ij2 +... + β p X ijp (i.e. Poisson regression) 2. V ar (Y ij ) = µ ij φ (i.e. extra-poisson variance) 3. Corr (Y ij, Y ik ) = α (i.e. compound symmetry correlation) 216

Interpretation of Marginal Model Parameters: The regression parameters, β, have population-averaged interpretations: - describe the effect of covariates on the marginal expectations or average responses - contrast the means in sub-populations that share common covariate values Of note, nature or magnitude of the correlation does not alter the interpretation of β. 217

Summary Focus of marginal models is inferences about population averages. Regression parameters, β, have population-averaged interpretations (where averaging is over all individuals within subgroups of the population): - describe effect of covariates on the average responses - contrast means in sub-populations that share common covariate values = Marginal models are most useful for population-level inferences. Marginal models should not be used to make inferences about individuals ( ecological fallacy ). 218

Statistical Inference for Marginal Models Maximum Likelihood (ML): Unfortunately, with discrete response data there is no analogue of the multivariate normal distribution. In the absence of a convenient likelihood function for discrete data, there is no unified likelihood-based approach for marginal models. Recall: In linear models for normal responses, specifying the means and the covariance matrix fully determines the likelihood. 219

This is not the case with discrete response data. To specify the likelihood for multivariate discrete data, additional assumptions about higher-order moments are required. Even when additional assumptions are made, the likelihood is often intractable. 220

To illustrate some of the problems, consider a discrete response, Y ij, having C categories. The joint distribution of Y i = (Y i1, Y i2,..., Y in ) is multinomial, with a C n joint probability vector. For example, when C = 5 and n = 10, the joint probability vector is of length 10,000,000. Problem: Computations grow exponentially with n and ML quickly becomes impractical. Solution: We will consider an alternative approach to estimation - Generalized Estimating Equations (GEE). 221

Generalized Estimating Equations Avoid making distributional assumptions about Y i altogether. Potential advantages: Empirical researcher does not have to be concerned that the distribution of Y i closely approximates some multivariate distribution. It circumvents the need to specify models for the three-way, four-way and higher-way associations among the responses. It leads to a method of estimation, known as generalized estimating equations (GEE). 222

The GEE approach has become an extremely popular method for analyzing longitudinal data. It provides a flexible approach for modelling the mean and the pairwise within-subject association structure. It can handle inherently unbalanced designs and missing data with ease. GEE approach is computationally straightforward and has been implemented in existing, widely-available statistical software. 223

Recall, marginal models make following three assumptions: 1. the marginal expectation of the response, µ ij, depends on covariates, X ij, through a known link function g (µ ij ) = η ij = X ijβ = β 1 X ij1 + β 2 X ij2 +... + β p X ijp 2. the marginal variance of Y ij depends on the marginal mean according to V ar (Y ij ) = υ (µ ij ) φ where υ (µ ij ) is a known variance function and φ is a scale parameter that may need to be estimated. 3. the within-subject association among the responses is a function of the means and perhaps of additional parameters, say α, that may also need to be estimated. 224

For example, when the vector of parameters α represents the pairwise correlations among the responses, the covariances among the responses depend on µ ij (β), φ, and α: V i = A 1 2 i Corr(Y i )A 1 2 i, where A i is a diagonal matrix with Var (Y ij ) = φ v (µ ij ) along the diagonal (and A 1 2 i is a diagonal matrix with the standard deviations, φ v (µ ij ), along the diagonal), and Corr(Y i ) is the correlation matrix. 225

The GEE estimator of β solves the following generalized estimating equations N i=1 D V 1 i (y i µ i ) = 0, where V i is the so-called working covariance matrix. By working covariance matrix we mean that V i approximates the true underlying covariance matrix for Y i. That is, V i Cov (Y i ), recognizing that V i Cov (Y i ) unless the models for the variances and the within-subject associations are correct. D i = µ i / β is the so-called derivative matrix (of µ i with respect to the components of β). 226

D i is a matrix that transforms from the original units of Y i (and µ i ) to the units of g(µ ij ). The n i p derivative matrix D i is given by, µ i1 / β 1 µ i1 / β 2... µ i1 / β p...... D i =............, µ ini / β 1 µ ini / β 2... µ ini / β p and is only a function of β (since the µ ij only depend on β). 227

Therefore the generalized estimating equations depend on both β and α. Because the generalized estimating equations depend on both β and α, an iterative two-stage estimation procedure is required: 1. Given current estimates of α and φ, an estimate of β is obtained as the solution to the generalized estimating equations 2. Given current estimate of β, estimates of α and φ are obtained based on the standardized residuals, r ij = (Y ij µ ij ) /υ ( µ ij ) 1/2 228

For example, φ can be estimated by 1/ (Nn) N i=1 n j=1 r 2 ij Alternatively, to take account of the p regression parameters estimated, φ can be estimated by 1/ (Nn p) N i=1 n j=1 r 2 ij 229

The correlation parameters, α, can be estimated in a similar way, depending on the working assumption about the correlation. For example, unstructured correlations, α jk = Corr (Y ij, Y ik ), can be estimated by α jk = (1/N) φ 1 N r ij r ik i=1 Finally, in the two-stage estimation procedure we iterate between steps 1) and 2) until convergence has been achieved. 230

Properties of GEE estimators: β, the solution to the generalized estimating equations, has the following properties: 1. β is consistent estimate of β (with high probability, it is close to β) 2. In large samples, β has a multivariate normal distribution 3. Cov( β) = B 1 MB 1 231

where B = M = N i=1 N i=1 D iv i 1 D i D iv 1 i Cov (Y i ) Vi 1 D i Note that B and M can be estimated by replacing α, φ, and β by their estimates, and replacing Cov (Y i ) by (Y i µ i ) (Y i µ i ). That is, we can use the empirical or so-called sandwich variance estimator. 232

In summary, the GEE estimators have the following attractive properties: 1. In many cases β is almost efficient when compared to MLE. For example, GEE has same form as likelihood equations for multivariate normal models and also certain models for discrete data 2. β is consistent even if the covariance of Yi has been misspecified 3. Standard errors for β can be obtained using the empirical or so-called sandwich estimator 233

Example: Six Cities Study of Respiratory Illness in Children. A non-randomized longitudinal study of the health effects of air pollution. Subset of data from one of the participating cities: Steubenville, Ohio Outcome variable: Binary indicator of respiratory infections in child. Measurements on the children were taken annually at ages 7, 8, 9, and 10. Interested in changes in the rates of respiratory illness and the influence of maternal smoking. 234

Assume marginal probability of infection follows a logistic model, logit(µ ij ) = β 1 + β 2 age ij + β 3 smoke i + β 4 age smoke i where age ij = child s age - 9; and smoke i = 1 if child s mother smokes, 0 otherwise. Also, we assume that var(y ij ) = µ ij (1 µ ij ) Need to make assumptions about within-subject association, e.g., Corr(Y ij, Y ik ) = α jk ( unstructured ). 235

Stata Commands for GEE. infile id smoke age y using wheeze.dat. replace age=age-9. generate sm_age=smoke*age. tsset id age. xtgee y smoke age sm_age,family(binomial) link(logit) corr(unstr) robust Iteration 1: tolerance =.00925475 Iteration 2: tolerance =.00012794 Iteration 3: tolerance = 9.376e-07 GEE population-averaged model Number of obs = 2148 Group and time vars: id age Number of groups = 537 Link: logit Obs per group: min = 4 Family: binomial avg = 4.0 Correlation: unstructured max = 4 Wald chi2(3) = 8.80 Scale parameter: 1 Prob > chi2 = 0.0321 (Std. Err. adjusted for clustering on id) 236

-------------------------------------------------------------------- Semi-robust y Coef. Std. Err. z P> z [95% Conf. Interval] --------+----------------------------------------------------------- smoke.301627.1886556 1.60 0.110 -.0681311.6713851 age -.1418336.0585654-2.42 0.015 -.2566197 -.0270475 sm_age.068452.0892638 0.77 0.443 -.1065019.2434058 _cons -1.908367.1192416-16.00 0.000-2.142076-1.674658 --------------------------------------------------------------------. xtcorr Estimated within-id correlation matrix R: c1 c2 c3 c4 r1 1.0000 r2 0.3501 1.0000 r3 0.3084 0.4694 1.0000 r4 0.3036 0.3185 0.3780 1.0000. xtgee y smoke age,family(binomial) link(logit) corr(unstr) robust 237

GEE population-averaged model Number of obs = 2148 Group and time vars: id age Number of groups = 537 Link: logit Obs per group: min = 4 Family: binomial avg = 4.0 Correlation: unstructured max = 4 Wald chi2(2) = 8.90 Scale parameter: 1 Prob > chi2 = 0.0117 (Std. Err. adjusted for clustering on id) -------------------------------------------------------------------- Semi-robust y Coef. Std. Err. z P> z [95% Conf. Interval] --------+----------------------------------------------------------- smoke.2534879.1783504 1.42 0.155 -.0960725.6030483 age -.1148972.0442797-2.59 0.009 -.2016838 -.0281106 _cons -1.888564.1140663-16.56 0.000-2.11213-1.664998 --------------------------------------------------------------------. predict yhat 238

(option mu assumed; exp(xb)/(1+exp(xb))). list smoke age yhat if (smoke==0) +------------------------+ smoke age yhat ------------------------ 1. 0-2.1599273 2. 0-1.1450869 3. 0 0.1314083 4. 0 1.11884. list smoke age yhat if (smoke==1) +------------------------+ smoke age yhat ------------------------ 1401. 1-2.1969794 1402. 1-1.1794352 1403. 1 0.1631362 1404. 1 1.1480506 239

Thus, there is evidence that the rates of respiratory infection decline with age, but the rates do not appear to depend on whether a child s mother smokes. Can fit model exluding age smoke interaction. For a child whose mother does not smoke, the predicted probability of infection at ages 7, 8, 9, and 10 is 0.160, 0.145, 0.131, and 0.119 respectively. While for a child whose mother does smoke, the predicted probability of infection at ages 7, 8, 9, and 10 is 0.197, 0.179, 0.163, and 0.148 respectively. 240

Summary The GEE approach has become an extremely popular method for analyzing longitudinal data. It provides a flexible approach for modelling the mean and the pairwise within-subject association structure. It can handle inherently unbalanced designs and missing data with ease. GEE approach is computationally straightforward and has been implemented in existing, widely-available statistical software. 241

Further Reading: Fitzmaurice GM, Laird NM and Ware JH. (2004) Applied Longitudinal Analysis. Wiley. Chapter 10 (Sections 10.1, 10.2, 10.3) Chapter 11 (Sections 11.1, 11.2, 11.3, 11.4) 242