A short course in Longitudinal Data Analysis ESRC Research Methods and Short Course Material for Practicals with the joiner package. Lab 2 - June, 2008 1 jointdata objects To analyse longitudinal data using the joiner package, data sets have to be an object of type jointdata. The data set formatted as one row per observation can be transformed to an object of this type, > schiz <- as.jointdata(schiz, indv.col = 1, time.col = 2, Y.col = 3, + covariates.col = c(4, 5), survival.col = c(6, 7)) > names(schiz) [1] "longitudinal" "covariates" "survival" > schiz$longitudinal[1:6, ] indv Y time treat n.obs 1 1 91 0 1 1 2 2 72 0 2 1 3 3 108 0 1 2 4 3 110 1 1 2 5 4 106 0 1 2 6 4 93 1 1 2 > schiz$covariates[1:6, ] indv treat n.obs 1 1 1 1 2 2 2 1 3 3 1 2 4 4 1 2 5 5 1 2 6 6 1 2 > schiz$survival[1:6, ] 1
indv surv.time cens.ind 1 1 0.704 0 2 2 0.740 0 3 3 1.121 1 4 4 1.224 1 5 5 1.303 0 6 6 1.541 1 An object of type jointdata is a list, where the first element is a data frame containing longitudinal data, and if they exist, the second and third elements are data frames containing covariates and survival data, respectively. To test if the data set is now an object of type jointdata, use the function is.jointdata. > is.jointdata(schiz) [1] TRUE A summary of the data set is obtained with the command, > summary(schiz) n.indv: 150 times: 0 1 2 4 6 8 responses: Y ; covariates: treat n.obs ; failures: 63 2 Exploring Longitudinal Data The exploration of longitudinal data can be separated into the exploration of the mean and correlation structure, respectively. We will cover tabular and graphical summaries of the data. 2.1 Mean response profiles We start by examining the mean response profiles within each treatment group. Subsets of a jointdata object can be extracted using the function subset(object,subset). For example, to extract the subset of schizophrenia data for individuals under treatment 1, > schiz.subset1 <- subset(schiz, schiz$covariates$treat == 1) > names(schiz.subset1) [1] "longitudinal" "covariates" "survival" > schiz.subset1$longitudinal[1:6, ] indv Y time treat n.obs 1 1 91 0 1 1 2 3 108 0 1 2 3 3 110 1 1 2 4 4 106 0 1 2 5 4 93 1 1 2 6 5 77 0 1 2 2
> schiz.subset1$covariates[1:6, ] indv treat n.obs 1 1 1 1 2 3 1 2 3 4 1 2 4 5 1 2 5 6 1 2 6 7 1 2 Exercise : Create subsets of data for patients under treatments 2 and 3, respectively. The function mean.longitudinal calculates the mean repsonse at each time point. The input must be the longitudinal element of a jointdata object. For unbalanced designs this function has a lowess smoothing option, whereas for balanced designs it calculates the mean response at each time point. > mean.treat1 <- mean.longitudinal(schiz.subset1$longitudinal, + time.col = 3, Y.col = 2) > mean.treat1 $x [1] 0 1 2 4 6 8 $y [1] 93.40000 87.87755 84.47727 83.92500 75.67857 73.72000 Exercise : Calculate the longitudinal means for the subsets under treatments 2 and 3, respectively. The number of drop-outs at the first time point in treatment group 1 can be obtained by counting the number of individuals with only a single observation. To find the number of drop-outs at the second time point in treatment group 1, we count the number of individuals with two observations, etc, etc... + 1]) [1] 1 + 2]) [1] 5 + 3]) [1] 4 + 4]) 3
[1] 12 + 5]) [1] 3 Graphical representations of the longitudinal response profile of a jointdata object are obtained with the function plot. The commands lines and points can be used to superimpose curves onto the original plot. To plot the longitudinal profiles of all individuals in the data set, > plot(schiz, col = "grey", type = "l") longitudinal profile Response 40 60 80 100 120 140 160 0 2 4 6 8 Time Figure 1: Longitudinal profiles for all subjects across all treatment groups with superimposed mean response profiles for each treatment group The commands > lines(mean.treat1, col = "green") > lines(mean.treat2, col = "red") > lines(mean.treat3, col = "black",lty=2) superimpose the mean response profiles for each treatment group. In the plot displaying all longitudinal profiles it is difficult to discern between individuals and between treatment groups. Clarity is gained by constructing a dot plot of all responses from a particular treatment group, superimposing a random selection of individual longitudinal profiles, or a mean longitudinal 4
profile for the specific treatment. For treatment group 1 consider the profiles of a random sample of size 8 > plot(schiz.subset1, type = "p", col = "grey", pch = 20, main = "treatment 1", + ylim = c(30, 180)) > lines(sample(schiz.subset1, size = 8)) treatment 1 Response 50 100 150 0 2 4 6 8 Time Figure 2: Dot plot for all subjects in treatment group 1 with a random sample of 8 individual longitudinal profiles superimposed Exercise : Construct the same plot for treatment groups 2 and 3. 2.2 Exploring correlation structure For balanced designs the correlation structure can be explored by constructing a matrix of correlations for all pairs of time points. The functions cov.longitudinal and cor.longitudinal, for covariance and correlation, respectively, can be used to achieve this end. The input for these functions must be a longitudinal object with one row per observation. > table.cor.treat1 <- cor.longitudinal(schiz.subset1$longitudinal, + indv.col = 1, time.col = 3, Y.col = 2, na.rm=true) > names(table.cor.treat1) [1] "variance" "correlation" > round(table.cor.treat1$variance, 1) [1] 403.4 406.8 476.0 488.2 471.9 427.7 5
> round(table.cor.treat1$correlation, 3) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1.000 0.645 0.546 0.559 0.453 0.400 [2,] 0.645 1.000 0.747 0.705 0.744 0.661 [3,] 0.546 0.747 1.000 0.834 0.859 0.753 [4,] 0.559 0.705 0.834 1.000 0.911 0.838 [5,] 0.453 0.744 0.859 0.911 1.000 0.924 [6,] 0.400 0.661 0.753 0.838 0.924 1.000 > pairs(schiz, pch = 19, cex = 0.5) A matrix of scatterplots, that is scatterplots of responses across all pairs of time points, displays how the association between longitudinal responses changes with time. Notice that the number of points in the scatterplot decreases with time, owing to dropout during the study. 60 100 140 40 80 120 40 80 120 60 100 140 Y.t0 Y.t1 60 100 140 Y.t2 40 80 120 40 80 120 Y.t4 Y.t6 40 80 120 40 80 120 Y.t8 60 100 140 40 80 120 40 80 120 Figure 3: matrix of scatterplots Exercise : Construct correlation matrices under treatments 2 and 3. For unbalanced studies, the variogram is used to explore correlation structure. Clearly, for balanced designs, the variogram is only defined for a small number of time lags. > vargm <- variogram(indv = schiz$longitudinal$indv, time = schiz$longitudinal$time, + Y = schiz$longitudinal$y) > names(vargm) 6
[1] "svar" "sigma2" > plot(vargm, ylim = c(0, 700)) Variogram 0 100 200 300 400 500 600 700 1 2 3 4 5 6 7 8 u Figure 4: variogram 3 The nlme library in R Load the package nlme. > library(nlme) We can fit a general linear mixed effects model using the lme function. With reference to the individual response profiles, we wish to fit a random intercept model to the data. > model1 <- lme(y ~ 1 + time, data = schiz$longitudinal, random = ~1 + indv, method = "ML") > model1 Linear mixed-effects model fit by maximum likelihood Data: schiz$longitudinal Log-likelihood: -2899.911 Fixed: Y ~ 1 + time (Intercept) time 89.341043-1.147556 7
Random effects: Formula: ~1 indv (Intercept) Residual StdDev: 16.93857 13.29633 Number of Observations: 685 Number of Groups: 150 The output parameters provided by nlme are not equivalent to those under the standard general linear model. To obtain the standard parameterisation the transformations are ν = intercept σ 2 = (residual) 2 (1 nugget) φ = range τ 2 = (residual) 2 nugget To fit a random slope model, that is a model specifying both a random intercept and random slope, > model2 <- lme(y ~ 1 + time, data = schiz$longitudinal, random = ~time + indv, method = "ML") Notice we set the random argument within lme() as time. 4 Heart data The Heart data set contains longitudinal responses with accompanying covariates and survival data for 256 patients. The data set includes three different longitudinal response variables for each patient - the logarithm of Gradient (pressure difference across valve), the logarithm of LVMI (left ventricular muscle mass) and EF (amount of blood expressed per beat). The data is unbalanced both with respect to the times at which the responses as recorded, and with respect to the number of observations per patient. There are between 1 and 10 longitudinal measurements per patient under each of the three response variables. Load the Heart data into your R workspace > data(heart) Owing to the data being unbalanced, it is already formatted as one-row-perobservation so there is no need to apply the command format.per.observation. > names(heart) 8
Entering?heart in the R console provides a description of the information stored in each column of the data frame. The longitudinal responses are stored in columns 4, 6 and 7, survival data ( fuyrs and status ) in columns 25 and 26, with covariates stored in the remaining columns. Convert the Heart data to an object of type jointdata. > heart <- as.jointdata(heart, indv.col=1, time.col=2,y.col=c(4,6,7), covariates.col=c(8,9,10,11,12,13,14,15,16,17,18,19,20,21,22), survival.col=c(25,26)) Fit a random intercept model to the data when the response variable is the logarithm of Gradient. > model.grad <- lme(y.log.grad ~ 1 + time + sex + age + lvh + prenyha + redo + size + con.cabg + creat + dm + acei + lv + emergenc + hc + sten.reg.mix, data=heart$longitudinal, random = ~1 indv, method = REML,na.action=na.omit) Exercise : Fit a random intercept model when the response variable is (a) Y.log.lvmi, and (b) Y.ef. Assign the names model.lvmi and model.ef, respectively, to these fits. REMEMBER TO SAVE YOUR WORKSPACE!! 9