Analysis of Correlated Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Heagerty, 6
Course Outline Examples of longitudinal data Correlation and weighting Exploratory data analysis between- and within-person variation correlation / covariance Regression analysis Models for the mean Models for the correlation Linear mixed model GEE Heagerty, 6
Course Outline Missing data Binary and Count Data Models for the mean Models for the correlation Generalized linear mixed model GEE Transition models Time-dependent covariates Multilevel models Design considerations 3 Heagerty, 6
Bio & Notes Patrick J Heagerty Professor, University of Washington Collaborative roles = RWJ, VA ERIC, NIAMS MCRC, K Books: Diggle, Heagerty, Liang & Zeger Analysis of Longitudinal Data Oxford, van Belle, Fisher, Heagerty & Lumley Biostatistics Wiley, 4 (introductory chapter on LDA) 4 Heagerty, 6
Course Notes & Slides UW Biostat 57 = PhD applied core sequence Winter 999,,, UM Epi 766 = Longitudinal Data Analysis / Epi Summer (Summer 4 with VA/UW Biostat/Epi) Second Seattle Symposium (with S Zeger) Fall RAND short course Fall NICHD short course Fall 3 5 Heagerty, 6
Longitudinal Data Analysis INTRODUCTION to EXAMPLES AND ISSUES 6 Heagerty, 6
Outline Examples of longitudinal data between- and within-person variation correlation / covariance Scientific motivation Opportunities Issues Time scales Cross-sectional contrasts Longitudinal contrasts Correlation and weighting Impact of correlation Weighted estimation 7 Heagerty, 6
Continuous Longitudinal Data Example : Cystic Fibrosis and Lung Function There is a large registry of cystic fibrosis patient data Annual measurements include standard pulmonary function measures: FVC, FEV primary outcome: FEV percent predicted covariates: age, gender, genotype Q: Does change in lung function differ by gender and/or genotype? 8 Heagerty, 6
8- Heagerty, 6 5 5 5 3 5 5 fev 5 5 5 5 5 3 5 5 5 3 age
Continuous Longitudinal Data Example : Beta-carotene and vitamin E a phase II study to ascertain the pharmacokinetics of beta-carotene supplementation and the subsequent impact on vitamin E levels primary outcome: plasma measures taken monthly for 3 months prior to, 9 months during, and 3 months after supplementation covariates: dose (, 5, 3, 45, 6 mg/day) and time Q: What is the time course? Dose-response? Relationship between beta-carotene and vitamin E? 9 Heagerty, 6
9- Heagerty, 6 Dose = 45 Subject = 9 Subject = Subject = Subject = 3 beta-carotene (solid) 3 4 6 8 beta-carotene (solid) 3 4 6 8 beta-carotene (solid) 3 4 6 8 beta-carotene (solid) 3 4 6 8 5 5 5 5 5 5 5 5 months since randomization months since randomization months since randomization months since randomization Subject = 3 Subject = 3 Subject = 46 Subject = 47 beta-carotene (solid) 3 4 6 8 beta-carotene (solid) 3 4 6 8 beta-carotene (solid) 3 4 6 8 beta-carotene (solid) 3 4 6 8 Vitamin E (dashed) 5 5 5 5 5 5 5 5 months since randomization months since randomization months since randomization months since randomization
9- Heagerty, 6 Dose = 45 Subject = 9 Subject = Subject = Subject = 3 log(beta-carotene) (solid) 4 5 6 7 8 4 6 8 log(beta-carotene) (solid) 4 5 6 7 8 4 6 8 log(beta-carotene) (solid) 4 5 6 7 8 4 6 8 log(beta-carotene) (solid) 4 5 6 7 8 4 6 8 5 5 5 5 5 5 5 5 months since randomization months since randomization months since randomization months since randomization Subject = 3 Subject = 3 Subject = 46 Subject = 47 log(beta-carotene) (solid) 4 5 6 7 8 4 6 8 log(beta-carotene) (solid) 4 5 6 7 8 4 6 8 log(beta-carotene) (solid) 4 5 6 7 8 4 6 8 log(beta-carotene) (solid) 4 5 6 7 8 4 6 8 Vitamin E (dashed) 5 5 5 5 5 5 5 5 months since randomization months since randomization months since randomization months since randomization
Categorical Longitudinal Data Example 3: Maternal Stress and Child Morbidity daily indicators of stress (maternal), and illness (child) primary outcome: illness, utilization covariates: employment, stress Q: association between employment, stress and morbidity? Heagerty, 6
- Heagerty, 6
Illness Percent 5 5 5 3 Employed= Employed= 5 5 5 Day Stress Percent 5 5 5 3 Employed= Employed= 5 5 5 Day - Heagerty, 6
-3 Heagerty, 6 subject = 4 subject = subject = 56 Y illness N Y stress N 7 4 8 Day Y illness N Y stress N 7 4 8 Day Y illness N Y stress N 7 4 8 Day Y illness N Y stress N subject = 7 4 8 Day Y illness N Y stress N subject = 7 7 4 8 Day Y illness N Y stress N subject = 9 7 4 8 Day Y illness N Y stress N subject = 7 4 8 Day Y illness N Y stress N subject = 4 7 4 8 Day Y illness N Y stress N subject = 94 7 4 8 Day Y illness N Y stress N subject = 7 7 4 8 Day Y illness N Y stress N subject = 96 7 4 8 Day Y illness N Y stress N subject = 34 7 4 8 Day
Continuous Longitudinal Data Example 4: PANSS Data PANSS is a standard symptom assessment for schizophrenic patients This study compares different doses of a new agent to a standard agent and to placebo primary outcome: PANSS covariates: treatment, time Q: What s the best treatment? Heagerty, 6
- Heagerty, 6 Last Visit = 8 Last Visit = 6 Last Visit = 4 PANSS Score 4 6 8 4 6 PANSS Score 4 6 8 4 6 PANSS Score 4 6 8 4 6 4 6 8 Time 4 6 8 Time 4 6 8 Time Last Visit = Last Visit = Last Visit = PANSS Score 4 6 8 4 6 PANSS Score 4 6 8 4 6 PANSS Score 4 6 8 4 6 4 6 8 Time 4 6 8 Time 4 6 8 Time
Longitudinal Data In longitudinal studies of health, we typically observe two distinct kinds of outcomes Times of clinical or other key events Repeated values of markers of the health status of participants In general terms, the scientific question is how explanatory variables affect times to clinical events and markers of the level or change in health status over time The relationship between the event times and markers can also be of interest Use markers as predictors of, or surrogates for, the clinical event time Heagerty, 6
Longitudinal Studies Benefits of longitudinal studies: Incident events are recorded Measure the new occurrence of disease Timing of disease onset can be correlated with recent changes in patient exposure and/or with chronic exposure Prospective ascertainment of exposure Participants can have their exposure status recorded at multiple follow-up visits This can alleviate recall bias Temporal order of exposures and outcomes is observed 3 Heagerty, 6
3 Measurement of individual change in outcomes A key strength of a longitudinal study is the ability to measure change in outcomes and/or exposure at the individual level Longitudinal studies provide the opportunity to observe individual patterns of change 4 Separation of time effects: Cohort, Period, Age When studying change over time there are many time scales to consider cohort scale is the time of birth such as 945 or 963 period is the current time such as 4 age is (period - cohort) A longitudinal study with times t, t, t n can characterize multiple time scales such as age and cohort effects using covariates derived from the calendar time and birth year: age of subject i at time t j is age ij = (t j birth i ); and cohort is cohort ij = birth i 4 Heagerty, 6
5 Control for cohort effects In a cross-sectional study the comparison of subgroups of different ages combines the effects of aging and the effects of different cohorts That is, comparison of outcomes measured in 3 among 58 year old subjects and among 4 year old subjects reflects both the fact that the groups differ by 8 years (aging) and the fact that the subjects were born in different eras In a longitudinal study the cohort under study is fixed and thus changes in time are not confounded by cohort differences An nice overview of LDA opportunities in respiratory epidemiology is presented in Weiss and Ware (996) Lebowitz (996) discusses age, period, and cohort effects 5 Heagerty, 6
Longitudinal Studies The benefits of a longitudinal design are not without cost There are several challenges posed: Challenges of longitudinal studies: Participant follow-up Risk of bias due to incomplete follow-up, or drop-out of study participants If subjects that are followed to the planned end of study differ from subjects who discontinue follow-up then a naive analysis may provide summaries that are not representative of the original target population 6 Heagerty, 6
Analysis of correlated data Statistical analysis of longitudinal data requires methods that can properly account for the intra-subject correlation of response measurements If such correlation is ignored then inferences such as statistical tests or confidence intervals can be grossly invalid 7 Heagerty, 6
3 Time-varying covariates Although longitudinal designs offer the opportunity to associate changes in exposure with changes in the outcome of interest, the direction of causality can be complicated by feedback between the outcome and the exposure Example = MSCM with stresss and illness Although scientific interest generally lies in the effect of exposure on health, reciprocal influence between exposure and outcome poses analytical difficulty when trying to separate the effect of exposure on health from the effect of health on exposure How to choose exposure lag? eg Is it the air pollution today, yesterday, or last week that is the important predictor of morbidity today? 8 Heagerty, 6
Longitudinal Studies The Scientific Opportunity Observe individual changes over time Characterize the time-course of disease Outcome Measures A single outcome at a fixed follow-up time The time until an event occurs Repeated measures taken over time 9 Heagerty, 6
Motivation Cystic Fibrosis and Pulmonary Function Several specific aspects are of interest: What is the rate of decline in FEV? Is the time course different for males and females? 3 Is the time course different for F58 homozygous subjects? Reference: Davis PB (997) Journal of Pediatrics Heagerty, 6
Data ID = patient id FEV = percent-predicted forced expiratory volume in second AGE = age (years) GENDER = sex (=male, =female) PSEUDOA = infection with Pseudomonas Aeruginosa (=no, 3=yes) F58 = genotype (=homozygous, =heterozygous, 3=none) PANCREAT = pancreatic enzyme supplmentation (,=no, =yes) 73 38 845 3 73 988 8783 3 73 9873 9785 3 73 79 538 3 73 984 39 3 73 943 336 3 73 9548 448 3 9685 55 3 5 33 3 33 55 3 99 6838 3 978 758 3 776 8847 3 Heagerty, 6
- Heagerty, 6 EDA: Numerical Summaries Total number of subjects = Number of observations (number of subjects with ni): 6 7 8 9 49 5 36 63 Distribution of males / females male female 98 Number of mutations of f58 3 87 9
- Heagerty, 6 EDA: Numerical Summaries Age at entry N = Median = 9655 Quartiles = 7758, 5335 Decimal point is at the colon 5 : 355666788889 6 : 334555567789999 7 : 34446778889 8 : 3345566899 9 : 44788 : 3349 : 3446678 : 33445557788888999 3 : 34455 4 : 45555779 5 : 3357 6 : 347899 7 : 3567779 8 : 4899 9 : 4 : 3778 : 5577 : 459 3 : 8
ID = 57 ID = 5796 ID = 577 ID = 774 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC 5 5 5 3 5 5 5 3 5 5 5 3 5 5 5 3 Age Age Age Age ID = 7 ID = 6345 ID = 884 ID = 749 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC 5 5 5 3 5 5 5 3 5 5 5 3 5 5 5 3 Age Age Age Age ID = 3564 ID = 587 ID = 439 ID = 7755 FEV 5 5 FEV 5 5 FEV PA PA PA PA PC PC PC PC 5-3 5 5 3 5 5 5 3 5 5 5 3 5 5 Heagerty, 6 5 3 5 5 FEV 5 5
ID = 743 ID = 977 ID = 876 ID = 83 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC 5 5 5 3 5 5 5 3 5 5 5 3 5 5 5 3 Age Age Age Age ID = 3399 ID = 979 ID = 8645 ID = 7 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC 5 5 5 3 5 5 5 3 5 5 5 3 5 5 5 3 Age Age Age Age ID = 679 ID = 5 ID = 35 ID = 8 FEV 5 5 FEV 5 5 FEV PA PA PA PA PC PC PC PC 5-4 5 5 3 5 5 5 3 5 5 5 3 5 5 Heagerty, 6 5 3 5 5 FEV 5 5
ID = 9847 ID = 67 ID = 97 ID = 597 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC 5 5 5 3 5 5 5 3 5 5 5 3 5 5 5 3 Age Age Age Age ID = 736 ID = 4367 ID = 63 ID = 945 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC 5 5 5 3 5 5 5 3 5 5 5 3 5 5 5 3 Age Age Age Age ID = 6699 ID = 394 ID = 54 ID = 7 FEV 5 5 FEV 5 5 FEV PA PA PA PA PC PC PC PC 5-5 5 5 3 5 5 5 3 5 5 5 3 5 5 Heagerty, 6 5 3 5 5 FEV 5 5
ID = 837 ID = 74 ID = 76 ID = 74 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC 5 5 5 3 5 5 5 3 5 5 5 3 5 5 5 3 Age Age Age Age ID = 7483 ID = 4864 ID = 786 ID = 754 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC 5 5 5 3 5 5 5 3 5 5 5 3 5 5 5 3 Age Age Age Age ID = 66 ID = 8579 ID = 383 ID = 9469 FEV 5 5 FEV 5 5 FEV PA PA PA PA PC PC PC PC 5-6 5 5 3 5 5 5 3 5 5 5 3 5 5 Heagerty, 6 5 3 5 5 FEV 5 5
-7 Heagerty, 6 FEV versus Age fev 5 5 slope = -73 5 5 5 3 age
FEV versus Age-at-Entry FEV 5 5 slope = -5 5 5 Age at Entry FEV change - -5 5 FEV Change versus Age-since-Entry slope = -39 4 6 8 Age since Entry -8 Heagerty, 6
Distinguishing Cross-sectional and Longitudinal Associations Cross-sectional data Y i = β C x i + ɛ i, i =,, m () β C represents the difference in average Y across two sub-populations which differ by one unit in x Longitudinal data Y ij = β C x i + β L (x ij x i ) + ɛ ij, j =,, n i i =,, m () Heagerty, 6
When j =, the two equations are the same; β C has the same cross-sectional interpretation Subtract equations above to obtain (Y ij Y i ) = β L (x ij x i ) + (ɛ ij ɛ i ) β L represents the expected change in Y per unit change in x 3 Heagerty, 6
3- Heagerty, 6 A - Cross-Sectional Data Reading Score 4 6 8 o o o o o o o o o o o o 4 6 8 4 6 Age
Cross-sectional Longitudinal Response 5 5 Response 5 5 4 6 8 Months 4 6 8 Months Both Response 5 5 4 6 8 Months 3- Heagerty, 6
Age change in FEV 4 6 8-5 5 males females FEV by Male/Female Age change in FEV 4 6 8-5 5 f58 f58 f58 FEV by f58 3-3 Heagerty, 6
EDA Summary Observations Systematic trends: time, gender, F58 Random variation: individual, observation Questions Two time scales? Estimation / testing for rates of decline? Other? 4 Heagerty, 6
Longitudinal Data Analysis INTRODUCTION to CORRELATION and WEIGHTING 5 Heagerty, 6
Longitudinal Data The basic statistical problem is that variables from a given individual are correlated over time (generic) Q: So what? (-) ignoring dependence can lead to invalid inference (-) often limited information regarding dependence (+) can observe change for individuals over time (+) variety of statistical approaches that are available 6 Heagerty, 6
Longitudinal Data need to account for the dependence (generic) Q: How? Choice of Model Choice of Estimator 3 Choice of Summaries 7 Heagerty, 6
Dependent Data and Proper Variance Estimates Let X ij = denote placebo assignment and X ij = denote active treatment () Consider (Y i, Y i ) with (X i, X i ) = (, ) for i = : n and (X i, X i ) = (, ) for i = (n + ) : n ˆµ = n ˆµ = n n i= j= n Y ij i=n+ j= var(ˆµ ˆµ ) = n {σ ( + ρ)} 8 Heagerty, 6 Y ij
8- Heagerty, 6 Scenario subject control treatment time time time time ID = Y, Y, ID = Y, Y, ID = 3 Y 3, Y 3, ID = 4 Y 4, Y 4, ID = 5 Y 5, Y 5, ID = 6 Y 6, Y 6,
Dependent Data and Proper Variance Estimates () Consider (Y i, Y i ) with (X i, X i ) = (, ) for i = : n and (X i, X i ) = (, ) for i = (n + ) : n ˆµ = n ˆµ = n { n Y i + i= { n Y i + i= n i=n+ n i=n+ Y i } Y i } var(ˆµ ˆµ ) = n {σ ( ρ)} 9 Heagerty, 6
9- Heagerty, 6 Scenario subject control treatment time time time time ID = Y, Y, ID = Y, Y, ID = 3 Y 3, Y 3, ID = 4 Y 4, Y 4, ID = 5 Y 5, Y 5, ID = 6 Y 6, Y 6,
Dependent Data and Proper Variance Estimates If we simply had n independent observations on treatment (X = ) and n independent observations on control then we d obtain var(ˆµ ˆµ ) = σ n + σ n = n σ Q: What is the impact of dependence relative to the situation where all (n + n) observations are independent? () positive dependence, ρ >, results in a loss of precision () positive dependence, ρ >, results in an improvement in precision! 3 Heagerty, 6
Therefore: Dependent data impacts proper statements of precision Dependent data may increase or decrease standard errors depending on the design 3 Heagerty, 6
Weighted Estimation Consider the situation where subjects report both the number of attempts and the number of successes: (Y i, N i ) Examples: live born (Y i ) in a litter (N i ) condoms used (Y i ) in sexual encounters (N i ) SAEs (Y i ) among total surgeries (N i ) Q: How to combine these data from i = : m subjects to estimate a common rate (proportion) of successes? 3 Heagerty, 6
Weighted Estimation Proposal : ˆp = i Y i / i N i Proposal : Simple Example: ˆp = m Y i /N i i Data : (, ) (, ) ˆp = ( + )/() = 3 ˆp = {/ + /} = 5 33 Heagerty, 6
Weighted Estimation Note: Each of these estimators, ˆp, and ˆp, can be viewed as weighted estimators of the form: ˆp w = { i w i Y i N i } / i We obtain ˆp by letting w i = N i, corresponding to equal weight given each to binary outcome, Y ij, Y i = N i j= Y ij We obtain ˆp by letting w i =, corresponding to equal weight given to each subject w i Q: What s optimal? 34 Heagerty, 6
Weighted Estimation A: Whatever weights are closest to /variance of Y i /N i (stat theory called Gauss-Markov ) If subjects are perfectly homogeneous then and ˆp is best V (Y i ) = N i p( p) If subjects are heterogeneous then, for example V (Y i ) = N i p( p){ + (N i )ρ} and an estimator closer to ˆp is best 35 Heagerty, 6
Summary Longitudinal (dependent) data are common (and interesting!) Inference must account for the dependence Consideration as to the choice of weighting will depend on the variance/covariance of the response variables 36 Heagerty, 6
Summary Methodological Issues in the Study of Cognitive Decline Morris et al AJE (999) In modeling associations between risk factors and change in cognitive function, the usual analytic issues are complicated by the need to consider time in the analysis The potential for both fruitful investigation and misinterpretation of the data is increased Analyses that make the most effective use of the data usually require the most advanced methods of longitudinal analysis 37 Heagerty, 6
Summary Methodological Issues in the Study of Cognitive Decline Morris et al AJE (999) Use of the longitudinal designs and advanced statistical models described here is well worth the extra effort required Meeting the special challenges of measuring change in cognitive function should improve our ability to identify risk factors for cognitive decline and other diseases related to aging Many of these issues apply to any study of disease processes entailing change with age 38 Heagerty, 6
Some References: Books Diggle PJ, Heagerty PJ, Liang K-Y, Zeger SL () Analysis of Longitudinal Data, Second Edition, Oxford University Press Fitzmaurice GM, Laird NM, Ware JM (4) Applied Longitudinal Analysis, Wiley Verbeke G, Molenberghs G () Linear Mixed Models for Longitudinal Data, Springer Singer JD, Willett JB (3) Applied Longitudinal Data Analysis, Oxford University Press 39 Heagerty, 6
Longitudinal Data Analysis INTRODUCTION to REGRESSION APPROACHES 4 Heagerty, 6
Linear Mixed Model Regression model: mean response as a function of covariates systematic variation Random effects: variation from subject-to-subject in trajectory random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 4 Heagerty, 6
Scientific Questions as Regression Questions concerning the rate of decline refer to the time slope for FEV: E[ FEV X = age, gender, f58] = β (X) + β (X) time Time Scales Let age = age-at-entry, age i Let agel = time-since-entry, age ij age i 4 Heagerty, 6
CF Regression Model Model: E[FEV X i ] = β +β age + β agel +β 3 female +β 4 f58 = + β 5 f58 = +β 6 female agel +β 7 f58 = agel + β 8 f58 = agel = β (X i ) + β (X i ) agel 43 Heagerty, 6
43- Heagerty, 6 Intercept f58= f58= f58= male β + β age β + β age β + β age +β 4 +β 5 female β + β age β + β age β + β age +β 3 +β 3 + β 4 +β 3 + β 5
43- Heagerty, 6 Slope f58= f58= f58= male β β + β 7 β + β 8 female β β + β 7 β + β 8 +β 6 +β 6 +β 6
43-3 Heagerty, 6 Gender Groups (f58==) FEV 5 6 7 8 9 Male Female 4 6 8 Age (years)
43-4 Heagerty, 6 Genotype Groups (male) FEV 5 6 7 8 9 f58= f58= f58= 4 6 8 Age (years)
Define Y ij = FEV for subject i at time t ij X i = (X ij,, X ini ) X ij = (X ij,, X ij,,, X ij,p ) age, agel, gender, genotype Issue: Response variables measured on the same subject are correlated cov(y ij, Y ik ) 44 Heagerty, 6
Some Notation It is useful to have some notation that can be used to discuss the stack of data that correspond to each subject Let n i denote the number of observations for subject i Define: Y i = Y i Y i Y ini If the subjects are observed at a common set of times t, t,, t m then E(Y ij ) = µ j denotes the mean of the population at time t j 45 Heagerty, 6
Dependence and Correlation Recall that observations are termed independent when deviation in one variable does not predict deviation in the other variable Given two subjects with the same age and gender, then the blood pressure for patient ID= is not predictive of the blood pressure for patient ID=334 Observations are called dependent or correlated when one variable does predict the value of another variable The LDL cholesterol of patient ID= at age 57 is predictive of the LDL cholesterol of patient ID= at age 6 46 Heagerty, 6
Dependence and Correlation Recall: The variance of a variable, Y ij (fix time t j for now) is defined as: σ j = E [ (Y ij µ j ) ] = E [(Y ij µ j )(Y ij µ j )] The variance measures the average distance that an observation falls away from the mean 47 Heagerty, 6
Dependence and Correlation Define: The covariance of two variables, Y ij, and Y ik (fix t j and t k ) is defined as: σ jk = E [(Y ij µ j )(Y ik µ k )] The covariance measures whether, on average, departures in one variable, Y ij µ j, go together with departures in a second variable, Y ik µ k In simple linear regression of Y ij on Y ik the regression coefficient β in E(Y ij Y ik ) = β + β Y ik is the covariance divided by the variance of Y ik : β = σ jk σ k 48 Heagerty, 6
Dependence and Correlation Define: The correlation of two variables, Y ij, and Y ik (fix t j and t k ) is defined as: ρ jk = E [(Y ij µ j )(Y ik µ k )] σ j σ k The correlation is a measure of dependence that takes values between - and + Recall that a correlation of implies that the two measures are unrelated (linearly) Recall that a correlation of implies that the two measures fall perfectly on a line one exactly predicts the other! 49 Heagerty, 6
Why interest in covariance and/or correlation? Recall that on pages 8 and 9 our standard error for the sample mean difference µ µ depends on ρ In general a statistical model for the outcomes Y i = (Y i, Y i,, Y ini ) requires the following: Means: µ j Variances: σ j Covariances: σ jk, or correlations ρ jk Therefore, one approach to making inferences based on longitudinal data is to construct a model for each of these three components 5 Heagerty, 6
Something new to model cov(y i ) = var(y i ) cov(y i, Y i ) cov(y i, Y ini ) cov(y i, Y i ) var(y i ) cov(y i, Y ini ) cov(y ini, Y i ) cov(y ini, Y i ) var(y ini ) = σ σ σ ρ σ σ ni ρ ni σ σ ρ σ σ σ ni ρ ni σ ni σ ρ ni σ ni σ ρ ni σ n i 5 Heagerty, 6
Mean and Covariance Models for FEV Models: E(Y ij X i ) = µ ij (regression) cov(y i X i ) = Σ i = Z i DZ T i } {{ } + R }{{} i between subjects within subjects Q: What are appropriate covariance models for the FEV data? 5 Heagerty, 6
5- Heagerty, 6 Wide Data Data array for the residuals (first rows of rmat ) age8 age age age4 age6 age8 age age age4 [,] 3464 9 566 675 NA NA NA NA NA [,] NA NA 648 68 794 387 NA NA NA [3,] NA NA NA -349-346 -35-497 NA NA [4,] NA NA NA NA NA NA NA -4-6 [5,] NA NA NA 459-8 36 868 NA NA [6,] 55 7 78 55 46 NA NA NA NA [7,] NA NA NA NA 38-839 338 94 NA [8,] NA NA NA NA -856-9 -937-46 NA [9,] 398 396 3679 NA NA NA NA NA NA [,] NA NA 976 459 69 9 78 NA NA
age 8-6 4-4 4-4 -4 4-6 4-6 - age age -6 4-4 4 age 4 age 6-4 4 8-4 4 age 8 age -4 4-4 age -6 4-6 4-4 8-4 4-399975*^ 99975*^ 6*^ age 4 5- Heagerty, 6
5-3 Heagerty, 6 Empirical Covariance Matrix: [,] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,] 746 57586 5558 4758 64856 346 NA NA NA [,] 6336 534 44444 4679 3836 NA NA NA [3,] 68 548 535 547 49 97333 NA [4,] 5796 5386 4935 447 39584 358 [5,] 6638 5798 443 37498 4885 [6,] 59754 57 4639 4675 [7,] 659 4875 59 [8,] 59 4377 [9,] 687 Empirical Correlation Matrix: [,] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,] 86 74 73 94 6 NA NA NA [,] 8 73 7 6 NA NA NA [3,] 8 77 8 77 65 NA [4,] 84 84 68 73 53 [5,] 84 69 64 76 [6,] 83 73 76 [7,] 87 84 [8,] 78 [9,]
5-4 Heagerty, 6 Number of observations (pairs): [,] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,] 68 66 64 5 [,] 87 83 69 39 6 [3,] 5 73 5 7 3 [4,] 7 93 7 45 9 4 [5,] 85 6 33 3 [6,] 77 47 5 [7,] 85 54 33 [8,] 66 4 [9,] 48
How to build models for correlation? Mixed models random effects within-subject similarity due to sharing trajectory Serial correlation close in time implies strong similarity correlation decreases as time separation increases 53 Heagerty, 6
Linear Mixed Model Regression model: mean response as a function of covariates systematic variation Random effects: variation from subject-to-subject in trajectory random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 54 Heagerty, 6
54- Heagerty, 6 Two Subjects Response 5 5 5 4 6 8 Months
Levels of Analysis We first consider the distribution of measurements within subjects: Y ij = β,i + β,i t ij + e ij e ij N (, σ ) E[Y i X i, β i ] = β,i + β,i t ij = [, time ij ] β,i β,i = X i β i 55 Heagerty, 6
Levels of Analysis We can equivalently separate the subject-specific regression coefficients into the average coefficient and the specific departure for subject i: β,i = β + b,i β,i = β + b,i This allows another perspective: Y ij = β,i + β,i t ij + e ij = (β + β t ij ) + (b,i + b,i t ij ) + e ij E[Y i X i, β i ] = X i β } {{ } + X i b } {{ } i mean model between-subject 56 Heagerty, 6
56- Heagerty, 6 Sample of Lines FEV 5 6 7 8 9 4 6 8 Age (years)
56- Heagerty, 6 Intercepts and Slopes Slope Deviation (b) Intercept Deviation (b)
Levels of Analysis Next we consider the distribution of patterns (parameters) among subjects: equivalently β i N (β, D) b i N (, D) Y i = X i β } {{ } + X i b } {{ } i + e }{{} i mean model between-subject within-subject 57 Heagerty, 6
57- Heagerty, 6 Fixed intercept, Fixed slope Random intercept, Fixed slope beta + beta * time - 4 6 8 beta + beta * time - 4 6 8 4 6 8 time Fixed intercept, Random slope 4 6 8 time Random intercept, Random slope beta + beta * time - 4 6 8 beta + beta * time - 4 6 8 4 6 8 time 4 6 8 time
Between-subject Variation We can use the idea of random effects to allow different types of between-subject heterogeneity: The magnitude of heterogeneity is characterized by D: b i = var(b i ) = b,i b,i D D D D 58 Heagerty, 6
Between-subject Variation The components of D can be interpreted as: D the typical subject-to-subject deviation in the overall level of the response D the typical subject-to-subject deviation in the change (time slope) of the response D the covariance between individual intercepts and slopes If positive then subjects with high levels also have high rates of change If negative then subjects with high levels have low rates of change 59 Heagerty, 6
59- Heagerty, 6 Fixed intercept, Fixed slope Random intercept, Fixed slope beta + beta * time - 4 6 8 beta + beta * time - 4 6 8 4 6 8 time Fixed intercept, Random slope 4 6 8 time Random intercept, Random slope beta + beta * time - 4 6 8 beta + beta * time - 4 6 8 4 6 8 time 4 6 8 time
Between-subject Variation: Examples No random effects: Y ij = β + β t ij + e ij = [, time ij ]β + e ij Random intercepts: Y ij = (β + β t ij ) + b,i + e ij = [, time ij ]β + [ ]b,i + e ij Random intercepts and slopes: Y ij = (β + β t ij ) + b,i + b,i t ij + e ij = [, time ij ]β + [, time ij ]b i + e ij 6 Heagerty, 6
6- Heagerty, 6 Mixed Models and Covariances/Correlation Q: What is the correlation between outcomes Y ij and Y ik under these random effects models? Random Intercept Model Y ij = β + β t ij + b,i + e ij Y ik = β + β t ik + b,i + e ik var(y ij ) = var(b,i ) + var(e ij ) = D + σ cov(y ij, Y ik ) = cov(b,i + e ij, b,i + e ik ) = D
6- Heagerty, 6 Mixed Models and Covariances/Correlation Random Intercept Model corr(y ij, Y ik ) = D D + σ D + σ = D D + σ = between var between var + within var Therefore, any two outcomes have the same correlation Doesn t depend on the specific times, nor on the distance between the measurements Exchangeable correlation model Assuming: var(e ij ) = σ, and cov(e ij, e ik ) =
6-3 Heagerty, 6 Mixed Models and Covariances/Correlation Random Intercept and Slope Model Y ij = (β + β t ij ) + (b,i + b,i t ij ) + e ij Y ik = (β + β t ik ) + (b,i + b,i t ik ) + e ik var(y ij ) = var(b,i + b,i t ij ) + var(e ij ) = D + D t ij + D t ij + σ cov(y ij, Y ik ) = cov[(b,i + b,i t ij + e ij ), (b,i + b,i t ik + e ik )] = D + D (t ij + t ik ) + D t ij t ik
6-4 Heagerty, 6 Mixed Models and Covariances/Correlation Random Intercept and Slope Model ρ ijk = corr(y ij, Y ik ) = D + D (t ij + t ik ) + D t ij t ik D + D t ij + D t ij + σ D + D t ik + D t ik + σ Therefore, two outcomes may not have the same correlation Correlation depends on the specific times for the observations, and does not have a simple form Assuming: var(e ij ) = σ, and cov(e ij, e ik ) =
Linear Mixed Model Regression model: mean response as a function of covariates systematic variation Random effects: variation from subject-to-subject in trajectory random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 6 Heagerty, 6
Covariance Models Serial Models Linear mixed models assume that each subject follows his/her own line In some situations the dependence is more local meaning that observations close in time are more similar than those far apart in time 6 Heagerty, 6
Covariance Models Define e ij = ρ e ij + ɛ ij ɛ ij N (, σ ( ρ ) ) ɛ i N (, σ ) This leads to autocorrelated errors: cov(e ij, e ik ) = σ ρ j k 63 Heagerty, 6
ID=, rho=5 ID=, rho=5 ID=3, rho=5 ID=4, rho=5 y 5 5 5 y 5 5 5 y 5 5 5 y 5 5 5 4 6 8 4 6 8 4 6 8 4 6 8 months months months months ID=5, rho=5 ID=6, rho=5 ID=7, rho=5 ID=8, rho=5 y 5 5 5 y 5 5 5 y 5 5 5 y 5 5 5 4 6 8 4 6 8 4 6 8 4 6 8 months months months months ID=9, rho=5 ID=, rho=5 ID=, rho=5 ID=, rho=5 y 5 5 5 y 5 5 5 y 5 5 5 y 5 5 5 4 6 8 4 6 8 4 6 8 4 6 8 months months months months ID=3, rho=5 ID=4, rho=5 ID=5, rho=5 ID=6, rho=5 y 5 5 5 y 5 5 5 y 5 5 5 y 5 5 5 4 6 8 4 6 8 4 6 8 4 6 8 months months months months ID=7, rho=5 ID=8, rho=5 ID=9, rho=5 ID=, rho=5 y 5 5 5 y 5 5 5 y 5 5 5 y 5 5 5 4 6 8 4 6 8 4 6 8 4 6 8 months months months months 63- Heagerty, 6
ID=, rho=9 ID=, rho=9 ID=3, rho=9 ID=4, rho=9 y 5 5 5 y 5 5 5 y 5 5 5 y 5 5 5 4 6 8 4 6 8 4 6 8 4 6 8 months months months months ID=5, rho=9 ID=6, rho=9 ID=7, rho=9 ID=8, rho=9 y 5 5 5 y 5 5 5 y 5 5 5 y 5 5 5 4 6 8 4 6 8 4 6 8 4 6 8 months months months months ID=9, rho=9 ID=, rho=9 ID=, rho=9 ID=, rho=9 y 5 5 5 y 5 5 5 y 5 5 5 y 5 5 5 4 6 8 4 6 8 4 6 8 4 6 8 months months months months ID=3, rho=9 ID=4, rho=9 ID=5, rho=9 ID=6, rho=9 y 5 5 5 y 5 5 5 y 5 5 5 y 5 5 5 4 6 8 4 6 8 4 6 8 4 6 8 months months months months ID=7, rho=9 ID=8, rho=9 ID=9, rho=9 ID=, rho=9 y 5 5 5 y 5 5 5 y 5 5 5 y 5 5 5 4 6 8 4 6 8 4 6 8 4 6 8 months months months months 63- Heagerty, 6
63-3 Heagerty, 6 Two Subjects Response 5 5 5 line for subject average line line for subject subject dropout 4 6 8 Months
Linear Mixed Model Regression model: mean response as a function of covariates systematic variation Random effects: variation from subject-to-subject in trajectory random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 64 Heagerty, 6