A short course in Longitudinal Data Analysis ESRC Research Methods and Short Course Material for Practicals with the joiner package.



Similar documents
E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Introducing the Multilevel Model for Change

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Computational Assignment 4: Discriminant Analysis

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Fitting Subject-specific Curves to Grouped Longitudinal Data

Assignments Analysis of Longitudinal data: a multilevel approach

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

MIXED MODEL ANALYSIS USING R

ANOVA. February 12, 2015

Getting started with qplot

5 Correlation and Data Exploration

Statistical Models in R

Exploratory Data Analysis

Applications of R Software in Bayesian Data Analysis

SAS Syntax and Output for Data Manipulation:

Statistical Models in R

Graphics in R. Biostatistics 615/815

Simulating Investment Portfolios

PLOTTING DATA AND INTERPRETING GRAPHS

Package MDM. February 19, 2015

Time Series Analysis. 1) smoothing/trend assessment

Package survpresmooth

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

STA 4273H: Statistical Machine Learning

An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R)

Exploratory Data Analysis and Plotting

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Chapter 2 Exploratory Data Analysis

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

An Introduction to Modeling Longitudinal Data

Directions for using SPSS

Tutorial on Using Excel Solver to Analyze Spin-Lattice Relaxation Time Data

Introduction to Multivariate Analysis

Time-Series Regression and Generalized Least Squares in R

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Psychology 205: Research Methods in Psychology

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Multivariate Normal Distribution

Multivariate Logistic Regression

Longitudinal Data Analysis

DATA INTERPRETATION AND STATISTICS

11. Analysis of Case-control Studies Logistic Regression

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Diagrams and Graphs of Statistical Data

R with Rcmdr: BASIC INSTRUCTIONS

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Viewing Ecological data using R graphics

A Short Guide to R with RStudio

Package neuralnet. February 20, 2015

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Regression III: Advanced Methods

Simple Predictive Analytics Curtis Seare

Experiment #1, Analyze Data using Excel, Calculator and Graphs.

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Scatter Plot, Correlation, and Regression on the TI-83/84

Introduction to Longitudinal Data Analysis

Multiple Choice: 2 points each

Homework 11. Part 1. Name: Score: / null

2. Simple Linear Regression

Data Mining Lab 5: Introduction to Neural Networks

Data exploration with Microsoft Excel: analysing more than one variable

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

NATIONAL GENETICS REFERENCE LABORATORY (Manchester)

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Package dsmodellingclient

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Analysis of Bayesian Dynamic Linear Models

Highlights the connections between different class of widely used models in psychological and biomedical studies. Multiple Regression

The KaleidaGraph Guide to Curve Fitting

Analysis of System Performance IN2072 Chapter M Matlab Tutorial

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

SUMAN DUVVURU STAT 567 PROJECT REPORT

CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS

Chapter 4 Models for Longitudinal Data

Algebra 1 Course Information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Chapter 23. Inferences for Regression

Getting Correct Results from PROC REG

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

7 Time series analysis

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Assignment 2: Option Pricing and the Black-Scholes formula The University of British Columbia Science One CS Instructor: Michael Gelbart

RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY

Multilevel Modeling in R, Using the nlme Package

Alternative Graphics System Lattice

Paper No 19. FINALTERM EXAMINATION Fall 2009 MTH302- Business Mathematics & Statistics (Session - 2) Ref No: Time: 120 min Marks: 80

Figure 1. An embedded chart on a worksheet.

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Mathematics Online Instructional Materials Correlation to the 2009 Algebra I Standards of Learning and Curriculum Framework

SAS Software to Fit the Generalized Linear Model

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Exploratory Data Analysis with MATLAB

MULTIPLE REGRESSION EXAMPLE

CHAPTER 7: OPTIMAL RISKY PORTFOLIOS

Visualization of missing values using the R-package VIM

Transcription:

A short course in Longitudinal Data Analysis ESRC Research Methods and Short Course Material for Practicals with the joiner package. Lab 2 - June, 2008 1 jointdata objects To analyse longitudinal data using the joiner package, data sets have to be an object of type jointdata. The data set formatted as one row per observation can be transformed to an object of this type, > schiz <- as.jointdata(schiz, indv.col = 1, time.col = 2, Y.col = 3, + covariates.col = c(4, 5), survival.col = c(6, 7)) > names(schiz) [1] "longitudinal" "covariates" "survival" > schiz$longitudinal[1:6, ] indv Y time treat n.obs 1 1 91 0 1 1 2 2 72 0 2 1 3 3 108 0 1 2 4 3 110 1 1 2 5 4 106 0 1 2 6 4 93 1 1 2 > schiz$covariates[1:6, ] indv treat n.obs 1 1 1 1 2 2 2 1 3 3 1 2 4 4 1 2 5 5 1 2 6 6 1 2 > schiz$survival[1:6, ] 1

indv surv.time cens.ind 1 1 0.704 0 2 2 0.740 0 3 3 1.121 1 4 4 1.224 1 5 5 1.303 0 6 6 1.541 1 An object of type jointdata is a list, where the first element is a data frame containing longitudinal data, and if they exist, the second and third elements are data frames containing covariates and survival data, respectively. To test if the data set is now an object of type jointdata, use the function is.jointdata. > is.jointdata(schiz) [1] TRUE A summary of the data set is obtained with the command, > summary(schiz) n.indv: 150 times: 0 1 2 4 6 8 responses: Y ; covariates: treat n.obs ; failures: 63 2 Exploring Longitudinal Data The exploration of longitudinal data can be separated into the exploration of the mean and correlation structure, respectively. We will cover tabular and graphical summaries of the data. 2.1 Mean response profiles We start by examining the mean response profiles within each treatment group. Subsets of a jointdata object can be extracted using the function subset(object,subset). For example, to extract the subset of schizophrenia data for individuals under treatment 1, > schiz.subset1 <- subset(schiz, schiz$covariates$treat == 1) > names(schiz.subset1) [1] "longitudinal" "covariates" "survival" > schiz.subset1$longitudinal[1:6, ] indv Y time treat n.obs 1 1 91 0 1 1 2 3 108 0 1 2 3 3 110 1 1 2 4 4 106 0 1 2 5 4 93 1 1 2 6 5 77 0 1 2 2

> schiz.subset1$covariates[1:6, ] indv treat n.obs 1 1 1 1 2 3 1 2 3 4 1 2 4 5 1 2 5 6 1 2 6 7 1 2 Exercise : Create subsets of data for patients under treatments 2 and 3, respectively. The function mean.longitudinal calculates the mean repsonse at each time point. The input must be the longitudinal element of a jointdata object. For unbalanced designs this function has a lowess smoothing option, whereas for balanced designs it calculates the mean response at each time point. > mean.treat1 <- mean.longitudinal(schiz.subset1$longitudinal, + time.col = 3, Y.col = 2) > mean.treat1 $x [1] 0 1 2 4 6 8 $y [1] 93.40000 87.87755 84.47727 83.92500 75.67857 73.72000 Exercise : Calculate the longitudinal means for the subsets under treatments 2 and 3, respectively. The number of drop-outs at the first time point in treatment group 1 can be obtained by counting the number of individuals with only a single observation. To find the number of drop-outs at the second time point in treatment group 1, we count the number of individuals with two observations, etc, etc... + 1]) [1] 1 + 2]) [1] 5 + 3]) [1] 4 + 4]) 3

[1] 12 + 5]) [1] 3 Graphical representations of the longitudinal response profile of a jointdata object are obtained with the function plot. The commands lines and points can be used to superimpose curves onto the original plot. To plot the longitudinal profiles of all individuals in the data set, > plot(schiz, col = "grey", type = "l") longitudinal profile Response 40 60 80 100 120 140 160 0 2 4 6 8 Time Figure 1: Longitudinal profiles for all subjects across all treatment groups with superimposed mean response profiles for each treatment group The commands > lines(mean.treat1, col = "green") > lines(mean.treat2, col = "red") > lines(mean.treat3, col = "black",lty=2) superimpose the mean response profiles for each treatment group. In the plot displaying all longitudinal profiles it is difficult to discern between individuals and between treatment groups. Clarity is gained by constructing a dot plot of all responses from a particular treatment group, superimposing a random selection of individual longitudinal profiles, or a mean longitudinal 4

profile for the specific treatment. For treatment group 1 consider the profiles of a random sample of size 8 > plot(schiz.subset1, type = "p", col = "grey", pch = 20, main = "treatment 1", + ylim = c(30, 180)) > lines(sample(schiz.subset1, size = 8)) treatment 1 Response 50 100 150 0 2 4 6 8 Time Figure 2: Dot plot for all subjects in treatment group 1 with a random sample of 8 individual longitudinal profiles superimposed Exercise : Construct the same plot for treatment groups 2 and 3. 2.2 Exploring correlation structure For balanced designs the correlation structure can be explored by constructing a matrix of correlations for all pairs of time points. The functions cov.longitudinal and cor.longitudinal, for covariance and correlation, respectively, can be used to achieve this end. The input for these functions must be a longitudinal object with one row per observation. > table.cor.treat1 <- cor.longitudinal(schiz.subset1$longitudinal, + indv.col = 1, time.col = 3, Y.col = 2, na.rm=true) > names(table.cor.treat1) [1] "variance" "correlation" > round(table.cor.treat1$variance, 1) [1] 403.4 406.8 476.0 488.2 471.9 427.7 5

> round(table.cor.treat1$correlation, 3) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1.000 0.645 0.546 0.559 0.453 0.400 [2,] 0.645 1.000 0.747 0.705 0.744 0.661 [3,] 0.546 0.747 1.000 0.834 0.859 0.753 [4,] 0.559 0.705 0.834 1.000 0.911 0.838 [5,] 0.453 0.744 0.859 0.911 1.000 0.924 [6,] 0.400 0.661 0.753 0.838 0.924 1.000 > pairs(schiz, pch = 19, cex = 0.5) A matrix of scatterplots, that is scatterplots of responses across all pairs of time points, displays how the association between longitudinal responses changes with time. Notice that the number of points in the scatterplot decreases with time, owing to dropout during the study. 60 100 140 40 80 120 40 80 120 60 100 140 Y.t0 Y.t1 60 100 140 Y.t2 40 80 120 40 80 120 Y.t4 Y.t6 40 80 120 40 80 120 Y.t8 60 100 140 40 80 120 40 80 120 Figure 3: matrix of scatterplots Exercise : Construct correlation matrices under treatments 2 and 3. For unbalanced studies, the variogram is used to explore correlation structure. Clearly, for balanced designs, the variogram is only defined for a small number of time lags. > vargm <- variogram(indv = schiz$longitudinal$indv, time = schiz$longitudinal$time, + Y = schiz$longitudinal$y) > names(vargm) 6

[1] "svar" "sigma2" > plot(vargm, ylim = c(0, 700)) Variogram 0 100 200 300 400 500 600 700 1 2 3 4 5 6 7 8 u Figure 4: variogram 3 The nlme library in R Load the package nlme. > library(nlme) We can fit a general linear mixed effects model using the lme function. With reference to the individual response profiles, we wish to fit a random intercept model to the data. > model1 <- lme(y ~ 1 + time, data = schiz$longitudinal, random = ~1 + indv, method = "ML") > model1 Linear mixed-effects model fit by maximum likelihood Data: schiz$longitudinal Log-likelihood: -2899.911 Fixed: Y ~ 1 + time (Intercept) time 89.341043-1.147556 7

Random effects: Formula: ~1 indv (Intercept) Residual StdDev: 16.93857 13.29633 Number of Observations: 685 Number of Groups: 150 The output parameters provided by nlme are not equivalent to those under the standard general linear model. To obtain the standard parameterisation the transformations are ν = intercept σ 2 = (residual) 2 (1 nugget) φ = range τ 2 = (residual) 2 nugget To fit a random slope model, that is a model specifying both a random intercept and random slope, > model2 <- lme(y ~ 1 + time, data = schiz$longitudinal, random = ~time + indv, method = "ML") Notice we set the random argument within lme() as time. 4 Heart data The Heart data set contains longitudinal responses with accompanying covariates and survival data for 256 patients. The data set includes three different longitudinal response variables for each patient - the logarithm of Gradient (pressure difference across valve), the logarithm of LVMI (left ventricular muscle mass) and EF (amount of blood expressed per beat). The data is unbalanced both with respect to the times at which the responses as recorded, and with respect to the number of observations per patient. There are between 1 and 10 longitudinal measurements per patient under each of the three response variables. Load the Heart data into your R workspace > data(heart) Owing to the data being unbalanced, it is already formatted as one-row-perobservation so there is no need to apply the command format.per.observation. > names(heart) 8

Entering?heart in the R console provides a description of the information stored in each column of the data frame. The longitudinal responses are stored in columns 4, 6 and 7, survival data ( fuyrs and status ) in columns 25 and 26, with covariates stored in the remaining columns. Convert the Heart data to an object of type jointdata. > heart <- as.jointdata(heart, indv.col=1, time.col=2,y.col=c(4,6,7), covariates.col=c(8,9,10,11,12,13,14,15,16,17,18,19,20,21,22), survival.col=c(25,26)) Fit a random intercept model to the data when the response variable is the logarithm of Gradient. > model.grad <- lme(y.log.grad ~ 1 + time + sex + age + lvh + prenyha + redo + size + con.cabg + creat + dm + acei + lv + emergenc + hc + sten.reg.mix, data=heart$longitudinal, random = ~1 indv, method = REML,na.action=na.omit) Exercise : Fit a random intercept model when the response variable is (a) Y.log.lvmi, and (b) Y.ef. Assign the names model.lvmi and model.ef, respectively, to these fits. REMEMBER TO SAVE YOUR WORKSPACE!! 9