Longitudinal Data Analysis
|
|
- Eleanore Hoover
- 8 years ago
- Views:
Transcription
1 Longitudinal Data Analysis Acknowledge: Professor Garrett Fitzmaurice INSTRUCTOR: Rino Bellocco Department of Statistics & Quantitative Methods University of Milano-Bicocca Department of Medical Epidemiology and Biostatistics Karolinska Institutet 1
2 ORGANIZATION OF LECTURES Introduction and Overview Longitudinal Data: Basic Concepts Linear Models for Longitudinal Dara Modelling the Mean: Analysis of Response Profiles Modelling the Mean: Parametric Curves Modelling the Covariance Linear Mixed Effects Models Extensions to discrete data: GEE 2
3 LONGITUDINAL DATA ANALYSIS LECTURE 1 3
4 INTRODUCTION Longitudinal Studies: Studies in which individuals are measured repeatedly through time. This course will cover the analysis and interpretation of results from longitudinal studies. Emphasis will be on model development, use of statistical software, and interpretation of results. Theoretical basis for results mentioned but not developed. No calculus or matrix algebra is assumed. 4
5 Features of Longitudinal Data Defining feature: repeated observations on individuals, allowing the direct study of change over time. Primary goal of a longitudinal study is to characterize the change in response and factors that influence change. With repeated measures on individuals, one can capture within-individual change. Note that measurements in a longitudinal study are commensurate, i.e., the same variable is measured repeatedly. 5
6 Longitudinal data require somewhat more sophisticated statistical techniques because the repeated observations are usually (positively) correlated. Sequential nature of the measures implies that certain types of correlation structures are likely to arise. Correlation must be accounted for in order to obtain valid inferences. 6
7 Example 1 Treatment of Lead-Exposed Children (TLC) Trial Exposure to lead during infancy is associated with substantial deficits in tests of cognitive ability Chelation treatment of children with high lead levels usually requires injections and hospitalization A new agent, Succimer, can be given orally Randomized trial examining changes in blood lead level during course of treatment 100 children randomized to placebo or Succimer Measures of blood lead level at baseline, 1, 4 and 6 weeks 7
8 Table 1: Blood lead levels (µg/dl) at baseline, week 1, week 4, and week 6 for 8 randomly selected children. ID Group a Baseline Week 1 Week 4 Week P A A P A A P P a P = Placebo; A = Succimer. 8
9 Table 2: Mean blood lead levels (and standard deviation) at baseline, week 1, week 4, and week 6. Group Baseline Week 1 Week 4 Week 6 Succimer (5.0) (7.7) (7.8) (9.2) Placebo (5.0) (5.5) (5.7) (6.2) 9
10 Mean blood lead level (mcg/dl) Placebo Succimer Time (weeks) Figure 1: Plot of mean blood lead levels at baseline, week 1, week 4, and week 6 in the succimer and placebo groups. 10
11 Example 2 Six Cities Study of Air Pollution and Health Longitudinal study designed to characterize lung function growth in children and adolescents. Most children were enrolled between the ages of six and seven and measurements were obtained annually until graduation from high school. Focus on a randomly selected subset of the 300 female participants living in Topeka, Kansas. Response variable: Volume of air exhaled in the first second of spiromtry manoeuvre, FEV 1. 11
12 Table 3: Data on age, height, and FEV 1 for a randomly selected girl from the Topeka data set. Subject ID Age Height Time FEV Note: Time represents time since entry to study. 12
13 Log(FEV1/Height) Age (years) Figure 2: Timeplot of log(fev 1 /height) versus age for 50 randomly selected girls from the Topeka data set. 13
14 Example 3 Influence of Menarche on Changes in Body Fat Prospective study on body fat accretion in a cohort of 162 girls from the MIT Growth and Development Study. At start of study, all the girls were pre-menarcheal and non-obese All girls were followed over time according to a schedule of annual measurements until four years after menarche. The final measurement was scheduled on the fourth anniversary of their reported date of menarche. At each examination, a measure of body fatness was obtained based on bioelectric impedance analysis. 14
15 15
16 Figure 3: Timeplot of percent body fat against age (in years). 16
17 Consider an analysis of the changes in percent body fat before and after menarche. For the purposes of these analyses time is coded as time since menarche and can be positive or negative. Note: measurement protocol is the same for all girls. Study design is almost balanced if timing of measurement is defined as time since baseline measurement. It is inherently unbalanced when timing of measurements is defined as time since a girl experienced menarche. 17
18 40 Percent body fat Time relative to menarche (years) Figure 4: Timeplot of percent body fat against time, relative to age of menarche (in years). 18
19 LONGITUDINAL DATA: BASIC CONCEPTS Defining feature of longitudinal studies is that measurements of the same individuals are taken repeatedly through time. Longitudinal studies allow direct study of change over time. The primary goal is to characterize the change in response over time and the factors that influence change. With repeated measures on individuals, we can capture within-individual change. 19
20 A longitudinal study can estimate change with great precision because each individual acts as his/her own control. By comparing each individual s responses at two or more occasions, a longitudinal analysis can remove extraneous, but unavoidable, sources of variability among individuals. This eliminates major sources of variability or noise from the estimation of within-individual change. 20
21 The assessment of within-subject change can only be achieved within a longitudinal study design In a cross-sectional study, we can only obtain estimates of between-individual differences in the response A cross-sectional study may allow comparison among sub-populations that happen to differ in age, but it does not provide any information about how individuals change In a cross-sectional study the effect of ageing is potentially confounded or mixed-up with possible cohort effects 21
22 Terminology Individuals/Subjects: Participants in a longitudinal study are referred to as individuals or subjects. Occasions: In a longitudinal study individuals are measured repeatedly at different occasions or times. The number of repeated observations, and their timing, can vary widely from one longitudinal study to another. When the number and the timing of the repeated measurements are the same for all individuals, the study design is said to be balanced over time. 22
23 Notation Let Y ij denote the response variable for the i th individual (i = 1,..., N) at the j th occasion (j = 1,..., n). If the repeated measures are assumed to be equally-separated in time, this notation will be sufficient. Later, we refine notation to handle the case where repeated measures are unequally-separated and unbalanced over time. We can represent the n observations on the N individuals in a two-dimensional array, with rows corresponding to individuals and columns corresponding to the responses at each occasion. 23
24 Table 4: Tabular representation of longitudinal data, with n repeated observations on N individuals. Occasion Individual n 1 y 11 y 12 y y 1n 2 y 21 y 22 y y 2n N y N1 y N2 y N3... y Nn 24
25 Vector Notation: We can group the n repeated measures on the same individual into a n 1 response vector: Y i = Y i1 Y i2.. Y in Alternatively, we can denote the response vectors Y i as Y i = (Y i1, Y i2,..., Y in ). 25
26 Covariance and Correlation An aspect of longitudinal data that complicates their statistical analysis is that repeated measures on the same individual are usually positively correlated. This violates the fundamental assumption of independence that is the cornerstone of many statistical techniques. What are the potential consequences of not accounting for correlation among longitudinal data in the analysis? An additional, although often overlooked, aspect of longitudinal data that complicates their statistical analysis is heterogeneous variability. 26
27 That is, the variability of the outcome at the end of the study is often discernibly different than the variability at the start of the study. This violates the assumption of homoscedasticity that is the basis for standard linear regression techniques. Thus, there are two aspects of longitudinal data that complicate their statistical analysis: (i) repeated measures on the same individual are usually positively correlated, and (ii) variability is often heterogeneous across measurement occasions. 27
28 Before we can give a formal definition of correlation we need to introduce the notion of expectation. We denote the expectation or mean of Y ij by µ j = E(Y ij ), where E( ) can be thought of as a long-run average. The mean, µ j, provides a measure of the location of the center of the distribution of Y ij. 28
29 The variance provides a measure of the spread or dispersion of the values of Y ij around its respective mean: σj 2 = E[Y ij E(Y ij )] 2 = E(Y ij µ j ) 2. The positive square-root of the variance, σ j, is known as the standard deviation. The covariance between two variables, say Y ij and Y ik, σ jk = E [(Y ij µ j )(Y ik µ k )], is a measure of the linear dependence between Y ij and Y ik. When the covariance is zero, there is no linear dependence between the responses at the two occasions. 29
30 The correlation between Y ij and Y ik is denoted by ρ jk = E [(Y ij µ j )(Y ik µ k )] σ j σ k, where σ j and σ k are the standard deviations of Y ij and Y ik. The correlation, unlike covariance, is a measure of dependence free of scales of measurement of Y ij and Y ik. By definition, correlation must take values between 1 and 1. A correlation of 1 or 1 is obtained when there is a perfect linear relationship between the two variables. 30
31 For the vector of repeated measures, Y i = (Y i1, Y i2,..., Y in ), we define the variance-covariance matrix, Cov(Y i ), Cov Y i1 Y i2. Y in = Var(Y i1 ) Cov(Y i1, Y i2 ) Cov(Y i1, Y in ) Cov(Y i2, Y i1 ) Var(Y i2 ) Cov(Y i2, Y in ) Cov(Y in, Y i1 ) Cov(Y in, Y i2 ) Var(Y in ) = σ 2 1 σ 12 σ 1n σ 21 σ2 2 σ 2n , σ n1 σ n2 σ 2 n where Cov (Y ij, Y ik ) = σ jk = σ kj = Cov(Y ik, Y ij ). 31
32 We can also define the correlation matrix, Corr(Y i ), Corr(Y i ) = 1 ρ 12 ρ 1n ρ 21 1 ρ 2n ρ n1 ρ n2 1. This matrix is also symmetric in the sense that Corr (Y ij, Y ik ) = ρ jk = ρ kj = Corr(Y ik, Y ij ). 32
33 Example: Treatment of Lead-Exposed Children Trial We restrict attention to the data from placebo group Data consist of 4 repeated measurements of blood lead levels obtained at baseline (or week 0), weeks 1, 4, and 6. The inter-dependence (or time-dependence) among the four repeated measures of blood lead level can be examined by constructing a scatter-plot of each pair of repeated measures. Examination of the correlations confirms that they are all positive and tend to decrease with increasing time separation. 33
34 Week Baseline Week Baseline Week Week 1 Week Baseline Week Week 1 Week Week 4 Figure 5: Pairwise scatter-plots of blood lead levels at baseline, week 1, week 4, and week 6 for children in the placebo group. 34
35 Table 5: Estimated covariance matrix for the blood lead levels at baseline, week 1, week 4, and week 6 for children in the placebo group of the TLC trial. Covariance Matrix
36 Table 6: Estimated correlation matrix for the blood lead levels at baseline, week 1, week 4, and week 6 for children in the placebo group of the TLC trial. Correlation Matrix
37 Some Observations about Correlation in Longitudinal Data Empirical observations about the nature of the correlation among repeated measures in longitudinal studies: (i) correlations are positive, (ii) correlations decrease with increasing time separation, (iii) correlations between repeated measures rarely ever approach zero, and (iv) correlation between a pair of repeated measures taken very closely together in time rarely approaches one. 37
38 Consequences of Ignoring Correlation Potential implications of ignoring the correlation among the repeated measures. Consider only the first two repeated measures from the TLC trial, taken at baseline (or week 0) and week 1. Suppose it is of interest to determine whether there has been a change in the mean response over time. A natural estimate of change in the mean response is: ˆδ = ˆµ 2 ˆµ 1 where N ˆµ j = 1 N i=1 Y ij 38
39 For data from TLC trial, estimate of change in the mean response over time in succimer groups is (or ). An expression for the variance of ˆδ is given by [ ] 1 N Var(ˆδ) = Var (Y i2 Y i1) = 1 N N (σ2 1 + σ2 2 2σ 12 ). i=1 We can substitute estimates of variance and covariances into this expression to obtain estimates of variance of ˆδ. Var(ˆδ) = 1 50 [ (15.5)] =
40 Consequences of Ignoring Correlation If we ignored the correlation and proceeded with analysis assuming observations are independent, we would have obtained the following (incorrect) estimate of variance of ˆδ 1 50 [ ] = which is approximately 1.6 times larger. In general, ignoring, the correlation leads to incorrect inferences 40
41 Further Reading: Fitzmaurice GM, Laird NM and Ware JH. (2004) Applied Longitudinal Analysis. Wiley. Chapter 1, Chapter 2 (Sections 2.1, 2.2, 2.3, 2.4) 41
42 APPLIED LONGITUDINAL ANALYSIS LECTURE 2 42
43 LINEAR MODELS FOR LONGITUDINAL DATA Notation: Previously, we assumed a sample of N subjects are measured repeatedly at n occasions. Either by design or happenstance, subjects may not have same number of repeated measures or be measured at same set of occasions. We assume there are n i repeated measurements on the i th subject and each Y ij is observed at time t ij. 43
44 We can group the response variables for the i th subject into a n i 1 vector: Y i = Y i1. Y ini, i = 1,..., N. Associated with Y ij there is a p 1 vector of covariates X ij = X ij1. X ijp, i = 1,..., N; j = 1,..., n i. 44
45 We can group vectors of covariates into a n i p matrix: X i = X i11 X i12 X i1p X i21 X i22 X i2p......, i = 1,..., N. X ini 1 X ini 2 X ini p X i is simply an ordered collection of the values of the p covariates for the i th subject at the n i occasions. 45
46 Throughout this course we consider linear regression models for changes in the mean response over time: Y ij = β 1 X ij1 + β 2 X ij2 + + β p X ijp + e ij, j = 1,..., n i ; where β 1,..., β p are unknown regression coefficients. The e ij are random errors, with mean zero, and represent deviations of the Y ij s from their means, E(Y ij X ij ) = β 1 X ij1 + β 2 X ij2 + + β p X ijp. Typically, although not always, X ij1 = 1 for all i and j, E(Y ij X ij ) = β 1 + β 2 X ij2 + + β p X ijp, and then β 1 is the intercept term in the model. 46
47 Treatment of Lead-Exposed Children Trial For illustrative purposes, consider model that assumes mean blood lead level changes linearly over time, but at a rate that differs by group. Assume two treatment groups have different intercepts and slopes: Y ij = β 1 X ij1 + β 2 X ij2 + β 3 X ij3 + β 4 X ij4 + e ij, where X ij1 = 1 for all i and all j; X ij2 = t j, the week in which the blood lead level was obtained; X ij3 = 1 if the i th subject is assigned to the succimer group and X ij3 = 0 otherwise. X ij4 = t j if the i th subject is assigned to the succimer group and X ij4 = 0 otherwise. Alternatively, X ij4 = X ij2 X ij3. 47
48 Thus, for children in the placebo group E(Y ij X ij ) = β 1 + β 2 t j, where β 1 represents the mean blood lead level at baseline (week = 0) and β 2 is the constant rate of change in mean blood level. Similarly, for children in the succimer group E(Y ij X ij ) = (β 1 + β 3 ) + (β 2 + β 4 )t j, where β 2 + β 4 is the constant rate of change in mean blood level per week. Hypothesis that treatments are equally effective in reducing blood lead levels translated into hypothesis that β 4 = 0. 48
49 To reinforce notation, consider the responses and covariates at the 4 occasions for any individual. For example, the responses at the 4 occasions for ID = 046: The values of the covariates at the 4 occasions for ID = 046: This individual was assigned to treatment with placebo. 49
50 On the other hand, the responses at the 4 occasions for ID = 149: The values of the covariates at the 4 occasions for ID = 149: This individual was assigned to treatment with succimer. 50
51 Distributional Assumptions So far, only assumptions made concern patterns of change in the mean response over time and their relation to covariates. E(Y ij X ij ) = β 1 X ij1 + β 2 X ij2 + + β p X ijp. Next we consider distributional assumptions concerning the random errors, e ij. Note: Y ij is assumed to be comprised of two components, a systematic component, β 1 X ij1 + β 2 X ij2 + + β p X ijp, and a random component, e ij. Assumptions about shape of distribution of e ij translate into assumptions about distribution of Y ij given X ij. 51
52 We assume Y i1,..., Y ini have a multivariate normal distribution, with mean response E(Y ij X ij ) = µ ij = β 1 X ij1 + β 2 X ij2 + + β p X ijp, and covariance matrix, Σ i = Cov(Y i ). Multivariate normal distribution can be considered multivariate analogue of the univariate normal distribution. The multivariate normal distribution is completely specified by the means, µ i1,..., µ ini, and the covariance matrix, Σ i. 52
53 Estimation: Maximum Likelihood A very general approach to estimation is the method of maximum likelihood (ML). Basic idea: use as estimates of β 1,..., β p and Σ i the values that are most probable (or most likely ) for the data that have actually been observed. ML estimates of β 1,..., β p and Σ i are those values that maximize the joint probability of the response variables evaluated at their observed values (the likelihood function). 53
54 The ML estimator of β 1,..., β p is the so-called generalized least squares (GLS) estimator, β 1 (Σ i ),..., β p (Σ i ), β = [ N ( X i Σ 1 ) ] 1 N X i i=1 i=1 ( X i Σ 1 i y i ), and depends on covariance among the repeated measures, Σ i. This is a generalization of the ordinary least squares (OLS) estimator used in standard linear regression. In general, there is no simple expression for ML estimator of Σ i ; instead, ML estimate of Σ i requires iterative techniques. Once ML estimate of Σ i has been obtained, we simply substitute the estimate of Σ i, say Σ i, into the GLS estimator of β 1,..., β p to obtain ML estimates, β 1 ( Σ i ),..., β p ( Σ i ). 54
55 Restricted Maximum Likelihood (REML) Estimation ML estimation of Σ i is known to be biased in small samples. Bias arises because ML estimate has not taken into account the fact that β 1,..., β p is also estimated from the data. Instead we can use a variant on ML estimation, known as restricted maximum likelihood (REML) estimation. When REML estimation is used to estimate Σ i, β 1,..., β p are estimated by the usual GLS estimator, β 1 ( Σ i ),..., β p ( Σ i ), except Σ i is now the REML estimate of Σ i. 55
56 Caution: While REML log-likelihood can be used to compare nested models for the covariance, it should not be used to compare nested regression models for the mean. Instead, nested models for the mean should be compared using likelihood ratio tests based on the ML log-likelihood. 56
57 Modelling Longitudinal Data Longitudinal data present two aspects of the data that require modelling: (i) mean response over time (ii) covariance Models for longitudinal data must jointly specify models for the mean and covariance. Modelling the Mean Two main approaches can be distinguished: (1) analysis of response profiles (2) parametric or semi-parametric curves. 57
58 Modelling the Covariance Three broad approaches can be distinguished: (1) unstructured or arbitrary pattern of covariance (2) covariance pattern models (3) random effects covariance strucutre 58
59 MODELLING THE MEAN: ANALYSIS OF RESPONSE PROFILES Basic idea: Compare groups in terms of mean response profiles over time. Useful for balanced longitudinal designs and when there is a single categorical covariate (perhaps denoting different treatment or exposure groups). Analysis of response profiles can be extended to handle more than a single group factor. Analysis of response profiles can also handle missing data. 59
60 Mean blood lead level (mcg/dl) Placebo Succimer Time (weeks) Figure 6: Mean blood lead levels at baseline, week 1, week 4, and week 6 in the succimer and placebo groups. 60
61 Hypotheses concerning response profiles Given a sequence of n repeated measures on a number of distinct groups of individuals, three main questions: 1. Are the mean response profiles similar in the groups, in the sense that the mean response profiles are parallel? This is a question that concerns the group time interaction effect; see Figure 7(a). 2. Assuming mean response profiles are parallel, are the means constant over time, in the sense that the mean response profiles are flat? This is a question that concerns the time effect; see Figure 7(b). 61
62 3. Assuming that the population mean response profiles are parallel, are they also at the same level in the sense that the mean response profiles for the groups coincide? This is a questions that concerns the group effect; see Figure 7(c). For many longitudinal studies, main interest is in question 1. 62
63 (a) Expected Response Group 1 Group Time (b) Expected Response Group 1 Group Time (c) Expected Response Group 1 Group Time Figure 7: Graphical representation of the null hypotheses of (a) no group time interaction effect, (b) no time effect, and (c) no group effect. 63
64 Table 7: Mean response profile over time in G groups. Measurement Occasion Group n 1 µ 1 (1) µ 2 (1)... µ n (1) 2 µ 1 (2) µ 2 (2)... µ n (2).... g µ 1 (g) µ 2 (g)... µ n (g).... G µ 1 (G) µ 2 (G)... µ n (G) 64
65 Consider two-group case: G = 2 Define j = µ j (1) µ j (2), j = 1,..., n. With G = 2, the first hypothesis in an analysis of response profiles can be expressed as: No group time interaction effect: H 0 : 1 = 2 = = n, (n-1 degrees of freedom) With G 2, the test of the null hypothesis of no group time interaction effect has (G 1) (n 1) degrees of freedom. 65
66 Focus of analysis is on a global test of the null hypothesis that mean response profiles are parallel. This question concerns the group time interaction effect. In testing this hypothesis, both group and time are regarded as categorical covariates (analogous to two-way ANOVA). Analysis of response profiles can be specified as a regression model with indicator variables for group and time. However, correlation and variability among repeated measures on the same individuals must be properly accounted for. 66
67 Dummy or Indicator Variable Coding: Consider a factor with k levels: Define X 1 = 1 if measurement or subject belongs to level 1, and 0 otherwise. Define X 2 = 1 if measurement or subject belongs to level 2, and 0 otherwise. Define X 3,..., X k similarly. Note: In model with intercept term, we only require k 1 indicator variables. The choice of omitted indicator variable (e.g., X 1 ) determines what level of the factor is the reference level (e.g., first level). 67
68 Level X 2 X 3 X 4... X k k
69 For example, this leads to a simple way of expressing the mean response in a regression model (with intercept β 1 ): or (equivalently) Y ij = β 1 + β 2 X 2ij + + β k X kij + ɛ ij µ j = E(Y ij ) = β 1 + β 2 X 2ij + + β k X kij 69
70 Note: Mean response for level 1: µ 1 = β 1 Mean response for level 2: µ 2 = β 1 + β 2. Mean response for level k: µ k = β 1 + β k Equivalently: Level 2 versus Level 1 = β 2 Level 3 versus Level 1 = β 3.. Level k versus Level 1 = β k 70
71 Choice of Reference Level The usual choice of reference group: (i) A natural baseline or comparison group, and/or (ii) group with largest sample size In longitudinal data setting, the baseline or first measurement occasion is a natural reference group for time. 71
72 In summary, analysis of response profiles can be specified as a regression model with indicator variables for group and time. The global test of the null hypothesis of parallel profiles translates into a hypothesis concerning regression coefficients for the group time interaction being equal to zero. Beyond testing the null hypothesis of parallel profiles, the estimated regression coefficients have meaningful interpretations. 72
73 Treatment of Lead-Exposed Children Trial In the TLC Trial there are two groups (placebo and succimer) and four measurement occasions (week 0, 1, 4, 6). Let X 1 = 1 for all children at all occasions. Creating indicator variables for group and time: Group: Let X 2 = 1 if child randomized to succimer, X 2 = 0 otherwise. Time: Let X 3 = 1 if measurement at week 1, X 3 = 0 otherwise Let X 4 = 1 if measurement at week 4, X 4 = 0 otherwise Let X 5 = 1 if measurement at week 6, X 5 = 0 otherwise 73
74 Analysis of response profiles model can be expressed as: Y = β 1 +β 2 X 2 +β 3 X 3 +β 4 X 4 +β 5 X 5 +β 6 X 2 X 3 +β 7 X 2 X 4 +β 8 X 2 X 5 +e Test of group time interaction: H 0 : β 6 = β 7 = β 8 = 0. The analysis must also account for the correlation among repeated measures on the same child. The analysis of response profiles estimates separate variances for each occasion (4 variances) and six pairwise correlations. 74
75 Table 8: Estimated covariance matrix for the blood lead levels at baseline, week 1, week 4, and week 6 for the children from the TLC trial. Covariance Matrix
76 Note the discernible increase in the variability in blood lead levels from pre- to post-randomization. This increase in variability from baseline is probably due to: (1) given the treatment group assignment, there may be natural heterogeneity in the individual response trajectories over time, (2) the trial had an inclusion criterion that blood lead levels at baseline were in the range of micrograms/dl. 76
77 Table 9: Tests of fixed effects based on a profile analysis of the blood lead level data at baseline, weeks 1, 4, and 6. EFFECT DF CHI-SQUARE P-VALUE GROUP < WEEK < GROUP*WEEK <
78 Test of the group time interaction is based on (multivariate) Wald test (comparison of estimates to SEs). In TLC trial, question of main interest concerns comparison of groups in terms of patterns of change from baseline. This question translates into test of group time interaction. The test of the group time interaction yields a Wald statistic of 113 with 3 degrees of freedom (p < 0.001). Because this is a global test, it indicates that groups differ but does not tell us how they differ. 78
79 Recall, analysis of response profiles model can be expressed as: Y = β 1 +β 2 X 2 +β 3 X 3 +β 4 X 4 +β 5 X 5 +β 6 X 2 X 3 +β 7 X 2 X 4 +β 8 X 2 X 5 +e Test of group time interaction: H 0 : β 6 = β 7 = β 8 = 0. The 3 single df contrasts for group time interaction have direct interpretations in terms of group comparisons of changes from baseline. They indicate that children treated with succimer have greater decrease in mean blood lead levels from baseline at all occasions when compared to children treated with placebo (see Table 10). 79
80 Table 10: Estimated regression coefficients and standard errors based on a profile analysis of the blood lead level data. PARAMETER GROUP WEEK ESTIMATE SE Z INTERCEPT GROUP A WEEK WEEK WEEK GROUP*WEEK A GROUP*WEEK A GROUP*WEEK A
81 Strengths and Weaknesses of Analysis of Response Profiles Strengths: Allows arbitrary patterns in the mean response over time and arbitrary patterns in the covariance. Analysis has a certain robustness since potential risks of bias due to misspecification of models for mean and covariance are minimal. Drawbacks: Requirement that the longitudinal design be balanced. Analysis cannot incorporate mistimed measurements. 81
82 Analysis ignores the time-ordering (time trends) of the repeated measures in a longitudinal study occasion is regarded as a qualitative factor. Produces omnibus tests of effects that may have low power to detect group differences in specific trends in the mean response over time (e.g., linear trends in the mean response). The number of estimated parameters, G n mean parameters and n(n+1) 2 covariance parameters, grows rapidly with the number of measurement occasions. 82
83 Further Reading: Fitzmaurice GM, Laird NM and Ware JH. (2004) Applied Longitudinal Analysis. Wiley. Chapter 3 (Sections 3.1, 3.2, 3.4, 3.5) Chapter 4 (Sections 4.1, 4.2, 4.4, 4.5) Chapter 5 (Sections , 5.8, 5.9) 83
84 APPLIED LONGITUDINAL ANALYSIS LECTURE 3 84
85 MODELLING THE MEAN: PARAMETRIC CURVES Fitting parametric or semi-parametric curves to longitudinal data can be justified on substantive and statistical grounds. Substantively, in many studies true underlying mean response process changes over time in a relatively smooth, monotonically increasing/decreasing pattern. Fitting parsimonious models for mean response results in statistical tests of covariate effects (e.g., treatment time interactions) with greater power than in profile analysis. 85
86 Polynomial Trends in Time Describe the patterns of change in the mean response over time in terms of simple polynomial trends. The means are modelled as an explicit function of time. This approach can handle highly unbalanced designs in a relatively seamless way. For example, mistimed measurements are easily incorporated in the model for the mean response. 86
87 Linear Trends over Time Simplest possible curve for describing changes in the mean response over time is a straight line. Slope has direct interpretation in terms of a constant change in mean response for a single unit change in time. Consider two-group study comparing treatment and control, where changes in mean response are approximately linear: E (Y ij ) = β 1 + β 2 Time ij + β 3 Group i + β 4 Time ij Group i, where Group i = 1 if i th individual assigned to treatment, and Group i = 0 otherwise; and Time ij denotes measurement time for the j th measurement on i th individual. 87
88 Model for the mean for subjects in control group: E (Y ij ) = β 1 + β 2 Time ij, while for subjects in treatment group, E (Y ij ) = (β 1 + β 3 ) + (β 2 + β 4 ) Time ij. Thus, each group s mean response is assumed to change linearly over time (see Figure 8). 88
89 Group 1 Expected Response Group Time Figure 8: Graphical representation of model with linear trends for two groups. 89
90 Quadratic Trends over Time When changes in the mean response over time are not linear, higher-order polynomial trends can be considered. For example, if the means are monotonically increasing or decreasing over the course of the study, but in a curvilinear way, a model with quadratic trends can be considered. In a quadratic trend model the rate of change in the mean response is not constant but depends on time. Rate of change must be expressed in terms of two parameters. 90
91 Consider two-group study example: E (Y ij ) = β 1 + β 2 Time ij + β 3 Time 2 ij + β 4 Group i + β 5 Time ij Group i + β 6 Time 2 ij Group i. Model for subjects in control group: E (Y ij ) = β 1 + β 2 Time ij + β 3 Time 2 ij; while model for subjects in treatment group: E (Y ij ) = (β 1 + β 4 ) + (β 2 + β 5 ) Time ij + (β 3 + β 6 ) Time 2 ij. 91
92 Group 1 Expected Response Group Time Figure 9: Graphical representation of model with quadratic trends for two groups. 92
93 Note: mean response changes at different rate, depending upon Time ij. Rate of change in control group is β 2 + 2β 3 Time ij (derivation of this instantaneous rate of change requires familiarity with calculus). Thus, early in the study when Time ij = 1, rate of change is β 2 + 2β 3 ; while later in the study, say Time ij = 4, rate of change is β 2 + 8β 3. Regression coefficients, (β 2 + β 5 ) and (β 3 + β 6 ), have similar interpretations for treatment group. 93
94 Centering To avoid problems of collinearity it is advisable to center Time j on its mean value prior to the analysis. Replace Time j by its deviation from the mean of (Time 1,Time 2,...,Time n ). Note: centering of Time ij at individual-specific values (e.g., the mean of the n i measurement times for i th individual) should be avoided, as the interpretation of the intercept becomes meaningless. 94
95 Linear Splines If simplest possible curve is a straight line, then one way to extend the curve is to have sequence of joined line segments that produces a piecewise linear pattern. Linear spline models provide flexible way to accommodate many non-linear trends that cannot be approximated by simple polynomials in time. Basic idea: Divide time axis into series of segments and consider piecewise-linear trends, having different slopes but joined at fixed times. Locations where lines are tied together are known as knots. Resulting piecewise-linear curve is called a spline. Piecewise-linear model often called broken-stick model. 95
96 Group 1 Expected Response Group Time Figure 10: Graphical representation of model with linear splines for two groups, with common knot. 96
97 The simplest possible spline model has only one knot. For two-group example, linear spline model with knot at t : E (Y ij ) = β 1 + β 2 Time ij + β 3 (Time ij t ) + + β 4 Group i + β 5 Time ij Group i + β 6 (Time ij t ) + Group i, where (x) + is defined as a function that equals x when x is positive and is equal to zero otherwise. Thus, (Time ij t ) + is equal to (Time ij t ) when Time ij > t and is equal to zero when Time ij t. Note: (Time t ) + can be created as max(time t, 0). 97
98 Model for subjects in control group: E (Y ij ) = β 1 + β 2 Time ij + β 3 (Time ij t ) +. When expressed in terms of mean response prior/after t : E (Y ij ) = β 1 + β 2 Time ij, Time ij t ; E (Y ij ) = (β 1 β 3 t ) + (β 2 + β 3 )Time ij, Time ij > t. Slope prior to t is β 2 and following t is (β 2 + β 3 ). 98
99 Model for subjects in treatment group: E (Y ij ) = (β 1 + β 4 ) + (β 2 + β 5 )Time ij + (β 3 + β 6 )(Time ij t ) +. When expressed in terms of mean response prior/after t : E (Y ij ) = (β 1 + β 4 ) + (β 2 + β 5 )Time ij, Time ij t ; E (Y ij ) = [(β 1 + β 4 ) (β 3 + β 6 )t )] + (β 2 + β 3 + β 5 + β 6 )Time ij, Time ij > t. 99
100 Case Study 1: Vlagtwedde-Vlaardingen Study Epidemiologic study on risk factors for chronic obstructive lung disease. Sample participated in follow-up surveys approximately every 3 years for up to 19 years. Pulmonary function was determined by spirometry: FEV 1. We focus on subset of 133 residents aged 36 or older at entry into study and whose smoking status did not change during follow-up. Each participant was either a current or former smoker. 100
101 3.6 Former Smokers Current Smokers 3.4 Mean FEV1 (liters) Time (years) Figure 11: Mean FEV 1 at baseline (year 0), year 3, year 6, year 9, year 12, year 15, and year 19 in the current and former smoking exposure groups. 101
102 First we consider a linear trend in the mean response over time, with intercepts and slopes that differ for the two smoking exposure groups. We assume an unstructured covariance matrix. Based on the REML estimates of the regression coefficients in Table 11, the mean response for former smokers is E (Y ij ) = Time ij, while for current smokers, E (Y ij ) = ( ) ( ) Time ij = Time ij. 102
103 Table 11: Estimated regression coefficients for linear trend model for FEV 1 data from the Vlagtwedde-Vlaardingen study. Variable Smoking Group Estimate SE Z Intercept Smoke i Current Time ij Smoke i Time ij Current
104 Thus, both groups have a significant decline in mean FEV 1 over time. But there is no discernible difference between the two smoking exposure groups in the constant rate of change. That is, the Smoke i Time ij interaction (i.e., the comparison of the two slopes) is not significant, with Z = 1.42, p > But is the rate of change constant over time? Adequacy of linear trend model can be assessed by including higher-order polynomial trends. 104
105 For example, we can consider a model that allows quadratic trends for changes in FEV 1 over time. Recall that linear trend model is nested within quadratic trend model. The maximized log-likelihoods for models with linear and quadratic trends are presented in Table 12. LRT test statistic, based on ML not REML, can be compared to a chi-squared distribution with 2 degrees of freedom (or 6, the number of parameters in the quadratic trend model, minus 4, the number of parameters in the linear trend model). 105
106 Table 12: Maximized (ML) log-likelihoods for models with linear and quadratic trends for FEV 1 data from the Vlagtwedde-Vlaardingen study. Model 2 (ML) Log-Likelihood Quadratic Trend Model Linear Trend Model Log-Likelihood Ratio: G 2 = 1.3, 2 df (p > 0.50) 106
107 LRT comparing quadratic and linear trend models, produces G 2 = 1.3, with 2 degrees of freedom (p > 0.50). Thus, when compared to quadratic trend model, linear trend model appears to be adequate. Finally, for illustrative purposes, we can make a comparison with a cubic trend model. This produces LRT statistic, G 2 = 4.4, with 4 degrees of freedom (p > 0.35), indicating again that the linear trend model is adequate. 107
108 Case Study 2: Treatment of Lead-Exposed Children Recall data from TLC trial: Trial Children randomized to placebo or Succimer. Measures of blood lead level at baseline, 1, 4 and 6 weeks. The sequence of means over time in each group is displayed in Figure
109 Mean blood lead level (mcg/dl) Placebo Succimer Time (weeks) Figure 12: Mean blood lead levels at baseline, week 1, week 4, and week 6 in the succimer and placebo groups. 109
110 Given that there are non-linearities in the trends over time, higher-order polynomial models (e.g., a quadratic trend model) could be fit to the data. Alternatively, we can accommodate the non-linearity with a piecewise linear model with common knot at week 1, E (Y ij ) = β 1 + β 2 Week ij + β 3 (Week ij 1) + + β 4 Group i Week ij + β 5 Group i (Week ij 1) +, where Group i = 1 if assigned to succimer, and Group i = 0 otherwise. 110
111 In this piecewise linear model, means for subjects in placebo group are E (Y ij ) = β 1 + β 2 Week ij + β 3 (Week ij 1) +, while in the succimer group E (Y ij ) = β 1 + (β 2 + β 4 ) Week ij + (β 3 + β 5 ) (Week ij 1) +. Because of randomization, assume a common mean at baseline (no Group main effect). 111
112 Table 13: Estimated regression coefficients and standard errors based on a piecewise linear model, with knot at week 1. Variable Group Estimate SE Z Intercept Week ij (Week ij 1) Group Week ij A Group (Week ij 1) + A
113 When expressed in terms of mean response prior to/after week 1, estimated means in the placebo group are µ ij = β 1 + β 2 Week ij, Week ij 1; µ ij = ( β 1 β 3 ) + ( β 2 + β 3 ) Week ij, Week ij > 1. Thus, in the placebo group, slope prior to week 1 is β 2 = 1.63 and following week 1 is ( β 2 + β 3 ) = =
114 Similarly, when expressed in terms of the mean response prior to and after week 1, the estimated means for subjects in the succimer group are given by µ ij = β 1 + ( β 2 + β 4 ) Week ij, Week ij 1; µ ij = β 1 ( β 3 + β 5 ) + ( β 2 + β 3 + β 4 + β 5 ) Week ij, Week ij >
115 Estimates of mean blood lead levels for placebo and succimer groups are presented in Table 14. The estimated means appear to adequately fit observed mean response profiles for two treatment groups. Note piecewise linear and quadratic trend models (with common intercept for two groups) are not nested. Because they have same number of parameters, their log-likelihoods (LL) can be directly compared: piecewise linear model fits better than quadratic trend model ( 2 ML LL = for piecewise linear model versus 2 ML LL = for quadratic trend model). 115
116 Table 14: Estimated mean blood lead levels for placebo and succimer groups from linear spline model (knot at week 1). Observed means in parentheses. Group Week 0 Week 1 Week 4 Week 6 Succimer (26.5) (13.5) (15.5) (20.8) Placebo (26.3) (24.7) (24.1) (23.2) 116
117 Further Reading: Fitzmaurice GM, Laird NM and Ware JH. (2004) Applied Longitudinal Analysis. Wiley. Chapter 6 117
118 APPLIED LONGITUDINAL ANALYSIS LECTURE 4 118
119 MODELLING THE COVARIANCE Longitudinal data present two aspects of the data that require modelling: mean response over time and covariance. Although these two aspects of the data can be modelled separately, they are interrelated. Choice of models for mean response and covariance are interdependent. A model for the covariance must be chosen on the basis of some assumed model for the mean response. Covariance between any pair of residuals, say [Y ij µ ij (β)] and [Y ik µ ik (β)], depends on the model for the mean, i.e., depends on β. 119
120 Modelling the Covariance Three broad approaches can be distinguished: (1) unstructured or arbitrary pattern of covariance (2) covariance pattern models (3) random effects covariance structure 120
121 Unstructured Covariance Appropriate when design is balanced and number of measurement occasions is relatively small. No explicit structure is assumed other than homogeneity of covariance across different individuals, Cov(Y i ) = Σ i = Σ. Chief advantage: no assumptions made about the patterns of variances and covariances. 121
122 With n measurement occasions, unstructured covariance matrix has n (n+1) 2 parameters: the n variances and n (n 1)/2 pairwise covariances (or correlations), Cov(Y i ) = σ 2 1 σ σ 1n σ 21 σ σ 2n σ n1 σ n2... σ 2 n. 122
123 Potential drawbacks: Number of covariance parameters grows rapidly with the number of measurement occasions: For n = 3 number of covariance parameters is 6 For n = 5 number of covariance parameters is 15 For n = 10 number of covariance parameters is 55 When number of covariance parameters is large, relative to sample size, estimation is likely to be very unstable. Use of an unstructured covariance is appealing only when N is large relative to n (n+1) 2. Unstructured covariance is problematic when there are mistimed measurements. 123
124 Covariance Pattern Models When attempting to impose some structure on the covariance, a subtle balance needs to be struck. With too little structure there may be too many parameters to be estimated with limited amount of data. With too much structure, potential risk of model misspecification and misleading inferences concerning β. Classic tradeoff between bias and variance. Covariance pattern models have their basis in models for serial correlation originally developed for time series data. 124
125 Compound Symmetry Assumes variance is constant across occasions, say σ 2, and Corr(Y ij, Y i,k ) = ρ for all j and k. 1 ρ ρ... ρ ρ 1 ρ... ρ Cov(Y i ) = σ 2 ρ ρ 1... ρ ρ ρ ρ... 1 Parsimonious: two parameters regardless of number of measurement occasions. Strong assumptions about variance and correlation are usually not valid with longitudinal data. 125
126 Toeplitz Assumes variance is constant across occasions, say σ 2, and Corr(Y ij, Y i,j+k ) = ρ k for all j and k. 1 ρ 1 ρ 2... ρ n 1 ρ 1 1 ρ 1... ρ n 2 Cov(Y i ) = σ 2 ρ 2 ρ ρ n ρ n 1 ρ n 2 ρ n Assumes correlation among responses at adjacent measurement occasions is constant, ρ
127 Toeplitz only appropriate when measurements are made at equal (or approximately equal) intervals of time. Toeplitz covariance has n parameters (1 variance parameter, and n 1 correlation parameters). A special case of the Toeplitz covariance is the (first-order) autoregressive covariance. 127
128 Autoregressive Assumes variance is constant across occasions, say σ 2, and Corr(Y ij, Y i,j+k ) = ρ k for all j and k, and ρ 0. 1 ρ ρ 2... ρ n 1 ρ 1 ρ... ρ n 2 Cov(Y i ) = σ 2 ρ 2 ρ 1... ρ n ρ n 1 ρ n 2 ρ n Parsimonious: only 2 parameters, regardless of number of measurement occasions. Only appropriate when the measurements are made at equal (or approximately equal) intervals of time. 128
129 Compound symmetry, Toeplitz and autoregressive covariances assume variances are constant across time. This assumption can be relaxed by considering versions of these models with heterogeneous variances, Var(Y ij ) = σj 2. A heterogeneous autoregressive covariance pattern: σ1 2 ρσ 1 σ 2 ρ 2 σ 1 σ 3... ρ n 1 σ 1 σ n ρσ 1 σ 2 σ2 2 ρσ 2 σ 3... ρ n 2 σ 2 σ n Cov(Y i ) = ρ 2 σ 1 σ 3 ρσ 2 σ 3 σ ρ n 3 σ 3 σ n , ρ n 1 σ 1 σ n ρ n 2 σ 2 σ n ρ n 3 σ 3 σ n... σ 2 n and has n + 1 parameters (n variance parameters and 1 correlation parameter). 129
130 Banded Assumes correlation is zero beyond some specified interval. For example, a banded covariance pattern with a band size of 3 assumes that Corr(Y ij, Y i,j+k ) = 0 for k 3. It is possible to apply a banded pattern to any of the covariance pattern models considered so far. 130
131 A banded Toeplitz covariance pattern with a band size of 2 is given by, 1 ρ ρ 1 1 ρ Cov(Y i ) = σ 2 0 ρ , where ρ 2 = ρ 3 = = ρ n 1 = 0. Banding makes very strong assumption about how quickly the correlation decays to zero with increasing time separation. 131
132 Exponential When measurement occasions are not equally-spaced over time, autoregressive model can be generalized as follows. Let {t i1,..., t in } denote the observation times for the i th individual and assume that the variance is constant across all measurement occasions, say σ 2, and Corr(Y ij, Y ik ) = ρ t ij t ik, for ρ 0. Correlation between any pair of repeated measures decreases exponentially with the time separations between them. 132
133 Referred to as exponential covariance because it can be re-expressed as Cov(Y ij, Y ik ) = σ 2 ρ t ij t ik = σ 2 exp ( θ t ij t ik ), where θ = log(ρ) or ρ = exp ( θ) for θ 0. Exponential covariance model is invariant under linear transformation of the time scale. If we replace t ij by (a + bt ij ) (e.g., if we replace time measured in weeks by time measured in days ), the same form for the covariance matrix holds. 133
134 Choice among Covariance Pattern Models Choice of models for covariance and mean are interdependent. Choice of model for covariance should be based on a maximal model for the mean that minimizes any potential misspecification. With balanced designs and a very small number of discrete covariates, choose saturated model for the mean response. Saturated model: includes main effects of time (regarded as a within-subject factor) and all other main effects, in addition to their two- and higher-way interactions. 134
135 Maximal model should be in a certain sense the most elaborate model for the mean response that we would consider from a subject-matter point of view. Once maximal model has been chosen, residual variation and covariation can be used to select appropriate model for covariance. For nested covariance pattern models, a likelihood ratio test statistic can be constructed that compares full and reduced models. 135
136 Recall: two models are said to be nested when the reduced model is a special case of the full model. For example, compound symmetry model is nested within the Toeplitz model, since if the former holds the latter must necessarily hold, with ρ 1 = ρ 2 = = ρ n 1. Likelihood ratio test is obtained by taking twice the difference in the respective maximized REML log-likelihoods, G 2 = 2( l full l red ), and comparing statistic to a chi-squared distribution with df equal to difference between the number of covariance parameters in full and reduced models. 136
137 To compare non-nested model, an alternative approach is the Akaike Information Criterion (AIC). According to the AIC, given a set of competing models for the covariance, one should select the model that minimizes AIC = 2(maximized log-likelihood) + 2(number of parameters) = 2( l c), where l is the maximized REML log-likelihood and c is the number of covariance parameters. 137
138 Exercise Therapy Trial subjects were assigned to one of two weightlifting programs to increase muscle strength. treatment 1: number of repetitions of the exercises was increased as subjects became stronger. treatment 2, number of repetitions was held constant but amount of weight was increased as subjects became stronger. Measurements of body strength were taken at baseline and on days 2, 4, 6, 8, 10, and 12. We focus only on measures of strength obtained at baseline (or day 0) and on days 4, 6, 8, and
139 Before considering models for the covariance, it is necessary to choose a maximal model for the mean response. We chose maximal model to be the saturated model for the mean. First, we consider an unstructured covariance matrix. Note that the variance is larger by the end of the study when compared to the variance at baseline. Correlations decrease as the time separation between the repeated measures increases. 139
Chapter 1. Longitudinal Data Analysis. 1.1 Introduction
Chapter 1 Longitudinal Data Analysis 1.1 Introduction One of the most common medical research designs is a pre-post study in which a single baseline health status measurement is obtained, an intervention
More informationAnalysis of Correlated Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington
Analysis of Correlated Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Heagerty, 6 Course Outline Examples of longitudinal data Correlation and weighting Exploratory data
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationOverview of Methods for Analyzing Cluster-Correlated Data. Garrett M. Fitzmaurice
Overview of Methods for Analyzing Cluster-Correlated Data Garrett M. Fitzmaurice Laboratory for Psychiatric Biostatistics, McLean Hospital Department of Biostatistics, Harvard School of Public Health Outline
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationIntroduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationIndividual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA
Paper P-702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationUNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationAn Introduction to Modeling Longitudinal Data
An Introduction to Modeling Longitudinal Data Session I: Basic Concepts and Looking at Data Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu August 2010 Robert Weiss
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationHLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
More informationSYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
More informationHow To Model The Fate Of An Animal
Models Where the Fate of Every Individual is Known This class of models is important because they provide a theory for estimation of survival probability and other parameters from radio-tagged animals.
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationLongitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts, Procedures and Illustrations
Research Article TheScientificWorldJOURNAL (2011) 11, 42 76 TSW Child Health & Human Development ISSN 1537-744X; DOI 10.1100/tsw.2011.2 Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts,
More informationChapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.
Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,
More informationElements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
More informationOverview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models
Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationAn analysis method for a quantitative outcome and two categorical explanatory variables.
Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that
More informationAlgebra 1 Course Information
Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationLinear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure
Technical report Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure Table of contents Introduction................................................................ 1 Data preparation
More informationTips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD
Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationSouth Carolina College- and Career-Ready (SCCCR) Probability and Statistics
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationLinear Models and Conjoint Analysis with Nonlinear Spline Transformations
Linear Models and Conjoint Analysis with Nonlinear Spline Transformations Warren F. Kuhfeld Mark Garratt Abstract Many common data analysis models are based on the general linear univariate model, including
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationCORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA
We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More informationAlgebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
More informationDESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS
DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics
More informationMultivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationLinear Models for Continuous Data
Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More information10. Analysis of Longitudinal Studies Repeat-measures analysis
Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationAssignments Analysis of Longitudinal data: a multilevel approach
Assignments Analysis of Longitudinal data: a multilevel approach Frans E.S. Tan Department of Methodology and Statistics University of Maastricht The Netherlands Maastricht, Jan 2007 Correspondence: Frans
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
More informationNormality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationPOLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.
Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression
More information3. Data Analysis, Statistics, and Probability
3. Data Analysis, Statistics, and Probability Data and probability sense provides students with tools to understand information and uncertainty. Students ask questions and gather and use data to answer
More information2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationCase Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?
Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationCurriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
More informationThis can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.
One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.
More informationBasic Concepts in Research and Data Analysis
Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationInteraction between quantitative predictors
Interaction between quantitative predictors In a first-order model like the ones we have discussed, the association between E(y) and a predictor x j does not depend on the value of the other predictors
More informationCOMMON CORE STATE STANDARDS FOR
COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationBiostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
More informationWhat is the purpose of this document? What is in the document? How do I send Feedback?
This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Statistics
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationIndices of Model Fit STRUCTURAL EQUATION MODELING 2013
Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:
More informationthe points are called control points approximating curve
Chapter 4 Spline Curves A spline curve is a mathematical representation for which it is easy to build an interface that will allow a user to design and control the shape of complex curves and surfaces.
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationIs the Basis of the Stock Index Futures Markets Nonlinear?
University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Engineering and Information Sciences 2011 Is the Basis of the Stock
More informationBiostatistics Short Course Introduction to Longitudinal Studies
Biostatistics Short Course Introduction to Longitudinal Studies Zhangsheng Yu Division of Biostatistics Department of Medicine Indiana University School of Medicine Zhangsheng Yu (Indiana University) Longitudinal
More informationThe Basic Two-Level Regression Model
2 The Basic Two-Level Regression Model The multilevel regression model has become known in the research literature under a variety of names, such as random coefficient model (de Leeuw & Kreft, 1986; Longford,
More informationClustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
More informationReview of the Methods for Handling Missing Data in. Longitudinal Data Analysis
Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics
More informationAnalysis of Variance. MINITAB User s Guide 2 3-1
3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationTechnical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE
Linear mixedeffects modeling in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Table of contents Introduction................................................................3 Data preparation for MIXED...................................................3
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
More informationParametric and Nonparametric: Demystifying the Terms
Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD
More informationFitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,
More informationLAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE
LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-
More informationUse of deviance statistics for comparing models
A likelihood-ratio test can be used under full ML. The use of such a test is a quite general principle for statistical testing. In hierarchical linear models, the deviance test is mostly used for multiparameter
More informationIntroduction to Structural Equation Modeling (SEM) Day 4: November 29, 2012
Introduction to Structural Equation Modeling (SEM) Day 4: November 29, 202 ROB CRIBBIE QUANTITATIVE METHODS PROGRAM DEPARTMENT OF PSYCHOLOGY COORDINATOR - STATISTICAL CONSULTING SERVICE COURSE MATERIALS
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationSystematic Reviews and Meta-analyses
Systematic Reviews and Meta-analyses Introduction A systematic review (also called an overview) attempts to summarize the scientific evidence related to treatment, causation, diagnosis, or prognosis of
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More information