Introduction to Longitudinal Data Analysis



Similar documents
Introduction to Data Analysis in Hierarchical Linear Models

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Qualitative vs Quantitative research & Multilevel methods

Εισαγωγή στην πολυεπίπεδη μοντελοποίηση δεδομένων με το HLM. Βασίλης Παυλόπουλος Τμήμα Ψυχολογίας, Πανεπιστήμιο Αθηνών

Assignments Analysis of Longitudinal data: a multilevel approach

HLM software has been one of the leading statistical packages for hierarchical

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts, Procedures and Illustrations

Introducing the Multilevel Model for Change

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

10. Analysis of Longitudinal Studies Repeat-measures analysis

SAS Syntax and Output for Data Manipulation:

Overview of Methods for Analyzing Cluster-Correlated Data. Garrett M. Fitzmaurice

Illustration (and the use of HLM)

Moderation. Moderation

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

4. Simple regression. QBUS6840 Predictive Analytics.

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Simple Predictive Analytics Curtis Seare

A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011

How to Get More Value from Your Survey Data

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

How To Understand Multivariate Models

The 3-Level HLM Model

Lecture 5 Three level variance component models

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

January 26, 2009 The Faculty Center for Teaching and Learning

The Latent Variable Growth Model In Practice. Individual Development Over Time

Department of Epidemiology and Public Health Miller School of Medicine University of Miami

Joseph Twagilimana, University of Louisville, Louisville, KY

Homework 8 Solutions

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

An introduction to hierarchical linear modeling

Chapter 5 Analysis of variance SPSS Analysis of variance

Multilevel Modeling Tutorial. Using SAS, Stata, HLM, R, SPSS, and Mplus

10. Comparing Means Using Repeated Measures ANOVA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Introduction to Regression and Data Analysis

Introduction to Hierarchical Linear Modeling with R

Additional sources Compilation of sources:

Specifications for this HLM2 run

Chapter 7: Simple linear regression Learning Objectives

Multilevel Models for Longitudinal Data. Fiona Steele

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Better decision making under uncertain conditions using Monte Carlo Simulation

Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data

Binary Logistic Regression

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

College Readiness LINKING STUDY

Multivariate Normal Distribution

Statistical Rules of Thumb

16 : Demand Forecasting

(and sex and drugs and rock 'n' roll) ANDY FIELD

International Statistical Institute, 56th Session, 2007: Phil Everson

Handling attrition and non-response in longitudinal data

data visualization and regression

Regression Analysis: A Complete Example

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 22

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

When to Use a Particular Statistical Test

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

861 Example SPLH. 5 page 1. prefer to have. New data in. SPSS Syntax FILE HANDLE. VARSTOCASESS /MAKE rt. COMPUTE mean=2. COMPUTE sal=2. END IF.

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Statistical Models in R

GLM I An Introduction to Generalized Linear Models

Use of deviance statistics for comparing models

Problem of Missing Data

Simple linear regression

17. SIMPLE LINEAR REGRESSION II

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

BIO 226: APPLIED LONGITUDINAL ANALYSIS COURSE SYLLABUS. Spring 2015

1 Theory: The General Linear Model

IBM SPSS Direct Marketing 19

RMTD 404 Introduction to Linear Models

Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from

MRes Psychological Research Methods

Using Mixture Latent Markov Models for Analyzing Change with Longitudinal Data

Including the Salesperson Effect in Purchasing Behavior Models Using PROC GLIMMIX

Using Excel for Statistics Tips and Warnings

Module 4 - Multiple Logistic Regression

The Basic Two-Level Regression Model

DISCRIMINANT FUNCTION ANALYSIS (DA)

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

Electronic Thesis and Dissertations UCLA

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Latent Growth Curve Analysis: A Gentle Introduction. Alan C. Acock* Department of Human Development and Family Sciences. Oregon State University

[This document contains corrections to a few typos that were found on the version available through the journal s web page]

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Transcription:

Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction to Longitudinal Data Analysis

Covered This Section Features of multilevel models The general class of models we will be using to investigate longitudinal data Features of longitudinal data Features of longitudinal models What can multilevel models do for you? Workshop overview Section 1: Introduction to Longitudinal Data Analysis 2

HIERARCHICAL/MULTILEVEL DATA STRUCTURES Section 1: Introduction to Longitudinal Data Analysis 3

Multilevel or Hierarchical Data Structures In the social and educational sciences there are many examples of Hierarchical Data I will use the terms Hierarchical and Multilevel synonymously Hierarchical is often used in educational sciences Multilevel is often used in the social sciences Natural question: What are Hierarchical Data? Two or more different dimensions of sampling units Unit are nested within the other units Later we will call units levels Section 1: Introduction to Longitudinal Data Analysis 4

Multilevel Data An educational example is when the two different units are classes and students For any given classroom We will have students Here we would say that students are nested within classroom. We could have variables at the classroom level and at the student level Section 1: Introduction to Longitudinal Data Analysis 5

Multilevel Data In this particular example we may be interested in Performance of each student as predicted by a set of variables such as gender, socio economic status (SES), But it also makes sense that different classroom characteristics can have an impact It could also be the case that students from the same classroom have observations that are more highly related to each other than students from different classrooms Literally: a non zero intraclass correlation In practice: these data have additional dependencies Section 1: Introduction to Longitudinal Data Analysis 6

Multilevel Data So we could have several different classes with different characteristics that impact the students differently Section 1: Introduction to Longitudinal Data Analysis 7

LONGITUDINAL DATA FROM A MULTILEVEL PERSPECTIVE Section 1: Introduction to Longitudinal Data Analysis 8

Longitudinal Data: A Different Type of Nesting We could have subject variables e.g., gender, job, And we could have variables about each time point e.g., energy level, stress level T1 T1 T3 T5 T2 T3 T4 T5 T1 T2 T4 T5 Section 1: Introduction to Longitudinal Data Analysis 9

Multilevel Data and Models Repeated measures models and longitudinal models are different types of multilevel models Trials/observations are nested with in subjects In this case the subjects may also be nested within classes We would have three levels of data Time (1), Subject (2), Class (3) Section 1: Introduction to Longitudinal Data Analysis 10

Data Requirements for Models Featured in Workshop We need multiple OUTCOMES from the same sampling unit 2 is the minimum (think pre/post tests), but just 2 can lead to problems: Only 1 kind of change is observable (1 difference) Can t distinguish real individual differences in change from error Repeated measures ANOVA (our baseline) is just fine for 2 observations Necessary assumption of sphericity is satisfied with only 2 observations even if compound symmetry doesn t hold More data is better (with diminishing returns) More occasions means we can get a better description of the form of change More persons means we can get better estimates of amount of individual differences in change; better prediction of those individual differences More items/stimuli means we have more power to show effects of differences between items/stimuli/conditions Section 1: Introduction to Longitudinal Data Analysis 11

What this Workshop is About Longitudinal data Same individual units of analysis measured at different occasions (which can range from milliseconds to decades) Repeated measures data Same individual units of analysis measured via different items, using different stimuli, or under different conditions Both of these fall under a more general category of multivariate data of varying complexity The link between them is the use of random effects to describe covariance of outcomes from the same unit Section 1: Introduction to Longitudinal Data Analysis 12

Types of Outcome Data Covered in this Workshop For our workshop, an outcome (dependent) variable: Has an interval scale A one unit difference means the same thing across all scale points Has scores with the same meaning over observations Includes meaning of construct Includes how items relate to the scale Implies measurement invariance We will note the following up front: FANCY STATISTICAL MODELS CANNOT SAVE BADLY MEASURED VARIABLES or BADLY DESIGNED LONGITUDINAL STUDIES Section 1: Introduction to Longitudinal Data Analysis 13

Levels of Analysis in Longitudinal Data Between Person (BP) Variation: Level 2 INTER individual Differences Time Invariant All longitudinal studies begin as cross sectional studies Within Person (WP) Variation: Level 1 INTRA individual Differences Time Varying Only longitudinal studies can provide this extra information Longitudinal studies allow examination of both types of relationships simultaneously (and their interactions) Any variable measured over time usually has both BP and WP variation BP = more/less than other people; WP = more/less than one s average Section 1: Introduction to Longitudinal Data Analysis 14

A Longitudinal Data Continuum Within Person Change: Systematic change Magnitude or direction of change can be different across individuals Growth curve models where time is meaningfully sampled Within Person Fluctuation: No systematic change Outcome just varies/fluctuates over time (e.g., emotion, stress) Time is just a way to get lots of data per individual Pure WP Change Pure WP Fluctuation Time Time Section 1: Introduction to Longitudinal Data Analysis 15

Options for Longitudinal Models Although models and software are logically separate, longitudinal data can be analyzed via multiple analytic frameworks: Multilevel/Mixed Models Dependency over time, persons, groups, etc. is modeled via random effects (multivariate through levels using stacked/long data) Builds on GLM, generalizes easier to additional levels of analysis Structural Equation Models Dependency over time only is modeled via latent variables (single level analysis using multivariate/wide data) Generalizes easier to broader analysis of latent constructs, mediation Because random effects and latent variables are the same thing, many longitudinal models can be specified/estimated either way And now Multilevel Structural Equation Models can do it all Section 1: Introduction to Longitudinal Data Analysis 16

WHAT CAN MLM DO FOR YOU? Section 1: Introduction to Longitudinal Data Analysis 17

What can MLM do for you? Model dependency across observations Longitudinal, clustered, and/or cross classified data? No problem! Tailor your model of sources of correlation to your data Include categorical or continuous predictors at any level Time varying, person level, group level predictors for each variance Explore reasons for dependency, don t just control for dependency Does not require same data structure for each person Unbalanced or missing data? No problem! You already know how (or you will soon)! Use SPSS Mixed, SAS Mixed, Stata, Mplus, R, HLM, MlwiN What s an intercept? What s a slope? What s a variance component? Section 1: Introduction to Longitudinal Data Analysis 18

1. Model Dependency Sources of dependency depend on the sources of variation created by your sampling design: residuals for outcomes from the same unit are likely to be related, which violates the general linear model independence assumption Levels for dependency = levels of random effects Sampling dimensions can be nested e.g., time within person, person within group, school within district If you can t figure out the direction of your nesting structure, odds are good you have a crossed sampling design instead e.g., persons crossed with items, raters crossed with targets To have a level, there must be random outcome variation due to sampling that remains after including the model s fixed effects e.g., treatment vs. control does not create another level of group Section 1: Introduction to Longitudinal Data Analysis 19

Longitudinal dependency comes from Mean differences across sampling units (e.g., persons) Creates constant dependency over time Will be represented by a random intercept in our models Individual differences in effects of predictors e.g., individual differences in growth Creates non constant dependency, the size of which depends on the value of the predictor at each occasion Will be represented by random slopes in our models Non constant within person correlation for unknown reasons (timedependent autocorrelation) Can add other patterns of correlation as needed for this Section 1: Introduction to Longitudinal Data Analysis 20

Why care about dependency? In other words, what happens if we have how we model dependencies wrong? Validity of the tests of the predictors depends on having the most right dependency model (right random effects in the model) Estimates will usually be ok as the come from fixed effects Standard errors (and thus p values) can be compromised The sources of variation that exist in your outcome will dictate what kinds of predictors will be useful Between Person variation needs Between Person predictors Within Person variation needs Within Person predictors Section 1: Introduction to Longitudinal Data Analysis 21

2. Include categorical or continuous predictors at any level of analysis ANOVA: test differences among discrete independent variable factor levels Between Groups: Gender, Intervention, Age Groups Within Subjects (Repeated Measures): Condition, Time Test main effects of continuous covariates (ANCOVA) Regression: test whether slopes relating predictors to outcomes are different from zero Persons measured once, differ categorically or continuously on a set of time invariant (person level) covariates What if a predictor is assessed repeatedly (time varying predictors) but can t be characterized by conditions? ANOVA or Regression won t work per se, so there is a need for MLM Section 1: Introduction to Longitudinal Data Analysis 22

2. Include categorical or continuous predictors at any level of analysis Some things don t change over measurements Sex, Ethnicity Time Invariant Predictor = Person Level Some things do change over measurements Health Status, Stress Levels, Living Arrangements Time Varying Predictor = Time Level Some predictors might be measured at higher levels Family SES, length of marriage, school size Interactions between levels may be included, too Does the effect of health status differ by gender and SES? Level: Time Person Family Section 1: Introduction to Longitudinal Data Analysis 23

3. Does not require same data structure per person (by accident or by design) RM ANOVA: uses multivariate (wide) data structure: ID Sex T1 T2 T3 T4 100 0 5 6 8 12 101 1 4 7. 11 People missing any data are excluded (data from ID 101 are not included at all) MLM: uses stacked (long) data structure: Only rows missing data are excluded 100 uses 4 cases 101 uses 3 cases ID Sex Time Y 100 0 1 5 100 0 2 6 100 0 3 8 100 0 4 12 101 1 1 4 101 1 2 7 101 1 3. 101 1 4 11 Time can also be unbalanced across people such that each person can have his or her own measurement schedule: Time 0.9 1.4 3.5 4.2 Section 1: Introduction to Longitudinal Data Analysis 24

4. You most likely already know how to use MLMs If you can do general linear models (ANOVA and Regression) then you can do MLM. How do you interpret an estimate for the intercept? the effect of a continuous variable? the effect of a categorical variable? a variance component (think residual variance)? Section 1: Introduction to Longitudinal Data Analysis 25

WORKSHOP OVERVIEW Section 1: Introduction to Longitudinal Data Analysis 26

Longitudinal Data Analysis: A 7 Step Program 1. Calculate nature of dependency (intraclass correlation or ICC) from an empty model 2. Decide on a metric of time to use 3. Decide on a centering point 4. Estimate a saturated means model and plot individual trajectories 5. If systematic change is present: evaluate fixed and random effects of time; otherwise evaluate alternative models for the variances 6. Consider possible alternative models for the residuals 7. (very seldom done) Evaluate remaining heterogeneity (not in workshop) Section 1: Introduction to Longitudinal Data Analysis 27

Computing Information We will use SAS PROC MIXED for all analyses in this workshop All examples have detailed syntax available online and in your course packet Section 1: Introduction to Longitudinal Data Analysis 28