Analysis of Correlated Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington
|
|
- Junior Bridges
- 8 years ago
- Views:
Transcription
1 Analysis of Correlated Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Heagerty, 6
2 Course Outline Examples of longitudinal data Correlation and weighting Exploratory data analysis between- and within-person variation correlation / covariance Regression analysis Models for the mean Models for the correlation Linear mixed model GEE Heagerty, 6
3 Course Outline Missing data Binary and Count Data Models for the mean Models for the correlation Generalized linear mixed model GEE Transition models Time-dependent covariates Multilevel models Design considerations 3 Heagerty, 6
4 Bio & Notes Patrick J Heagerty Professor, University of Washington Collaborative roles = RWJ, VA ERIC, NIAMS MCRC, K Books: Diggle, Heagerty, Liang & Zeger Analysis of Longitudinal Data Oxford, van Belle, Fisher, Heagerty & Lumley Biostatistics Wiley, 4 (introductory chapter on LDA) 4 Heagerty, 6
5 Course Notes & Slides UW Biostat 57 = PhD applied core sequence Winter 999,,, UM Epi 766 = Longitudinal Data Analysis / Epi Summer (Summer 4 with VA/UW Biostat/Epi) Second Seattle Symposium (with S Zeger) Fall RAND short course Fall NICHD short course Fall 3 5 Heagerty, 6
6 Longitudinal Data Analysis INTRODUCTION to EXAMPLES AND ISSUES 6 Heagerty, 6
7 Outline Examples of longitudinal data between- and within-person variation correlation / covariance Scientific motivation Opportunities Issues Time scales Cross-sectional contrasts Longitudinal contrasts Correlation and weighting Impact of correlation Weighted estimation 7 Heagerty, 6
8 Continuous Longitudinal Data Example : Cystic Fibrosis and Lung Function There is a large registry of cystic fibrosis patient data Annual measurements include standard pulmonary function measures: FVC, FEV primary outcome: FEV percent predicted covariates: age, gender, genotype Q: Does change in lung function differ by gender and/or genotype? 8 Heagerty, 6
9 8- Heagerty, fev age
10 Continuous Longitudinal Data Example : Beta-carotene and vitamin E a phase II study to ascertain the pharmacokinetics of beta-carotene supplementation and the subsequent impact on vitamin E levels primary outcome: plasma measures taken monthly for 3 months prior to, 9 months during, and 3 months after supplementation covariates: dose (, 5, 3, 45, 6 mg/day) and time Q: What is the time course? Dose-response? Relationship between beta-carotene and vitamin E? 9 Heagerty, 6
11 9- Heagerty, 6 Dose = 45 Subject = 9 Subject = Subject = Subject = 3 beta-carotene (solid) beta-carotene (solid) beta-carotene (solid) beta-carotene (solid) months since randomization months since randomization months since randomization months since randomization Subject = 3 Subject = 3 Subject = 46 Subject = 47 beta-carotene (solid) beta-carotene (solid) beta-carotene (solid) beta-carotene (solid) Vitamin E (dashed) months since randomization months since randomization months since randomization months since randomization
12 9- Heagerty, 6 Dose = 45 Subject = 9 Subject = Subject = Subject = 3 log(beta-carotene) (solid) log(beta-carotene) (solid) log(beta-carotene) (solid) log(beta-carotene) (solid) months since randomization months since randomization months since randomization months since randomization Subject = 3 Subject = 3 Subject = 46 Subject = 47 log(beta-carotene) (solid) log(beta-carotene) (solid) log(beta-carotene) (solid) log(beta-carotene) (solid) Vitamin E (dashed) months since randomization months since randomization months since randomization months since randomization
13 Categorical Longitudinal Data Example 3: Maternal Stress and Child Morbidity daily indicators of stress (maternal), and illness (child) primary outcome: illness, utilization covariates: employment, stress Q: association between employment, stress and morbidity? Heagerty, 6
14 - Heagerty, 6
15 Illness Percent Employed= Employed= Day Stress Percent Employed= Employed= Day - Heagerty, 6
16 -3 Heagerty, 6 subject = 4 subject = subject = 56 Y illness N Y stress N Day Y illness N Y stress N Day Y illness N Y stress N Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day
17 Continuous Longitudinal Data Example 4: PANSS Data PANSS is a standard symptom assessment for schizophrenic patients This study compares different doses of a new agent to a standard agent and to placebo primary outcome: PANSS covariates: treatment, time Q: What s the best treatment? Heagerty, 6
18 - Heagerty, 6 Last Visit = 8 Last Visit = 6 Last Visit = 4 PANSS Score PANSS Score PANSS Score Time Time Time Last Visit = Last Visit = Last Visit = PANSS Score PANSS Score PANSS Score Time Time Time
19 Longitudinal Data In longitudinal studies of health, we typically observe two distinct kinds of outcomes Times of clinical or other key events Repeated values of markers of the health status of participants In general terms, the scientific question is how explanatory variables affect times to clinical events and markers of the level or change in health status over time The relationship between the event times and markers can also be of interest Use markers as predictors of, or surrogates for, the clinical event time Heagerty, 6
20 Longitudinal Studies Benefits of longitudinal studies: Incident events are recorded Measure the new occurrence of disease Timing of disease onset can be correlated with recent changes in patient exposure and/or with chronic exposure Prospective ascertainment of exposure Participants can have their exposure status recorded at multiple follow-up visits This can alleviate recall bias Temporal order of exposures and outcomes is observed 3 Heagerty, 6
21 3 Measurement of individual change in outcomes A key strength of a longitudinal study is the ability to measure change in outcomes and/or exposure at the individual level Longitudinal studies provide the opportunity to observe individual patterns of change 4 Separation of time effects: Cohort, Period, Age When studying change over time there are many time scales to consider cohort scale is the time of birth such as 945 or 963 period is the current time such as 4 age is (period - cohort) A longitudinal study with times t, t, t n can characterize multiple time scales such as age and cohort effects using covariates derived from the calendar time and birth year: age of subject i at time t j is age ij = (t j birth i ); and cohort is cohort ij = birth i 4 Heagerty, 6
22 5 Control for cohort effects In a cross-sectional study the comparison of subgroups of different ages combines the effects of aging and the effects of different cohorts That is, comparison of outcomes measured in 3 among 58 year old subjects and among 4 year old subjects reflects both the fact that the groups differ by 8 years (aging) and the fact that the subjects were born in different eras In a longitudinal study the cohort under study is fixed and thus changes in time are not confounded by cohort differences An nice overview of LDA opportunities in respiratory epidemiology is presented in Weiss and Ware (996) Lebowitz (996) discusses age, period, and cohort effects 5 Heagerty, 6
23 Longitudinal Studies The benefits of a longitudinal design are not without cost There are several challenges posed: Challenges of longitudinal studies: Participant follow-up Risk of bias due to incomplete follow-up, or drop-out of study participants If subjects that are followed to the planned end of study differ from subjects who discontinue follow-up then a naive analysis may provide summaries that are not representative of the original target population 6 Heagerty, 6
24 Analysis of correlated data Statistical analysis of longitudinal data requires methods that can properly account for the intra-subject correlation of response measurements If such correlation is ignored then inferences such as statistical tests or confidence intervals can be grossly invalid 7 Heagerty, 6
25 3 Time-varying covariates Although longitudinal designs offer the opportunity to associate changes in exposure with changes in the outcome of interest, the direction of causality can be complicated by feedback between the outcome and the exposure Example = MSCM with stresss and illness Although scientific interest generally lies in the effect of exposure on health, reciprocal influence between exposure and outcome poses analytical difficulty when trying to separate the effect of exposure on health from the effect of health on exposure How to choose exposure lag? eg Is it the air pollution today, yesterday, or last week that is the important predictor of morbidity today? 8 Heagerty, 6
26 Longitudinal Studies The Scientific Opportunity Observe individual changes over time Characterize the time-course of disease Outcome Measures A single outcome at a fixed follow-up time The time until an event occurs Repeated measures taken over time 9 Heagerty, 6
27 Motivation Cystic Fibrosis and Pulmonary Function Several specific aspects are of interest: What is the rate of decline in FEV? Is the time course different for males and females? 3 Is the time course different for F58 homozygous subjects? Reference: Davis PB (997) Journal of Pediatrics Heagerty, 6
28 Data ID = patient id FEV = percent-predicted forced expiratory volume in second AGE = age (years) GENDER = sex (=male, =female) PSEUDOA = infection with Pseudomonas Aeruginosa (=no, 3=yes) F58 = genotype (=homozygous, =heterozygous, 3=none) PANCREAT = pancreatic enzyme supplmentation (,=no, =yes) Heagerty, 6
29 - Heagerty, 6 EDA: Numerical Summaries Total number of subjects = Number of observations (number of subjects with ni): Distribution of males / females male female 98 Number of mutations of f
30 - Heagerty, 6 EDA: Numerical Summaries Age at entry N = Median = 9655 Quartiles = 7758, 5335 Decimal point is at the colon 5 : : : : : : 3349 : : : : : : : : : 4 : 3778 : 5577 : : 8
31 ID = 57 ID = 5796 ID = 577 ID = 774 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 7 ID = 6345 ID = 884 ID = 749 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 3564 ID = 587 ID = 439 ID = 7755 FEV 5 5 FEV 5 5 FEV PA PA PA PA PC PC PC PC Heagerty, FEV 5 5
32 ID = 743 ID = 977 ID = 876 ID = 83 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 3399 ID = 979 ID = 8645 ID = 7 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 679 ID = 5 ID = 35 ID = 8 FEV 5 5 FEV 5 5 FEV PA PA PA PA PC PC PC PC Heagerty, FEV 5 5
33 ID = 9847 ID = 67 ID = 97 ID = 597 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 736 ID = 4367 ID = 63 ID = 945 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 6699 ID = 394 ID = 54 ID = 7 FEV 5 5 FEV 5 5 FEV PA PA PA PA PC PC PC PC Heagerty, FEV 5 5
34 ID = 837 ID = 74 ID = 76 ID = 74 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 7483 ID = 4864 ID = 786 ID = 754 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 66 ID = 8579 ID = 383 ID = 9469 FEV 5 5 FEV 5 5 FEV PA PA PA PA PC PC PC PC Heagerty, FEV 5 5
35 -7 Heagerty, 6 FEV versus Age fev 5 5 slope = age
36 FEV versus Age-at-Entry FEV 5 5 slope = Age at Entry FEV change FEV Change versus Age-since-Entry slope = Age since Entry -8 Heagerty, 6
37 Distinguishing Cross-sectional and Longitudinal Associations Cross-sectional data Y i = β C x i + ɛ i, i =,, m () β C represents the difference in average Y across two sub-populations which differ by one unit in x Longitudinal data Y ij = β C x i + β L (x ij x i ) + ɛ ij, j =,, n i i =,, m () Heagerty, 6
38 When j =, the two equations are the same; β C has the same cross-sectional interpretation Subtract equations above to obtain (Y ij Y i ) = β L (x ij x i ) + (ɛ ij ɛ i ) β L represents the expected change in Y per unit change in x 3 Heagerty, 6
39 3- Heagerty, 6 A - Cross-Sectional Data Reading Score o o o o o o o o o o o o Age
40 Cross-sectional Longitudinal Response 5 5 Response Months Months Both Response Months 3- Heagerty, 6
41 Age change in FEV males females FEV by Male/Female Age change in FEV f58 f58 f58 FEV by f Heagerty, 6
42 EDA Summary Observations Systematic trends: time, gender, F58 Random variation: individual, observation Questions Two time scales? Estimation / testing for rates of decline? Other? 4 Heagerty, 6
43 Longitudinal Data Analysis INTRODUCTION to CORRELATION and WEIGHTING 5 Heagerty, 6
44 Longitudinal Data The basic statistical problem is that variables from a given individual are correlated over time (generic) Q: So what? (-) ignoring dependence can lead to invalid inference (-) often limited information regarding dependence (+) can observe change for individuals over time (+) variety of statistical approaches that are available 6 Heagerty, 6
45 Longitudinal Data need to account for the dependence (generic) Q: How? Choice of Model Choice of Estimator 3 Choice of Summaries 7 Heagerty, 6
46 Dependent Data and Proper Variance Estimates Let X ij = denote placebo assignment and X ij = denote active treatment () Consider (Y i, Y i ) with (X i, X i ) = (, ) for i = : n and (X i, X i ) = (, ) for i = (n + ) : n ˆµ = n ˆµ = n n i= j= n Y ij i=n+ j= var(ˆµ ˆµ ) = n {σ ( + ρ)} 8 Heagerty, 6 Y ij
47 8- Heagerty, 6 Scenario subject control treatment time time time time ID = Y, Y, ID = Y, Y, ID = 3 Y 3, Y 3, ID = 4 Y 4, Y 4, ID = 5 Y 5, Y 5, ID = 6 Y 6, Y 6,
48 Dependent Data and Proper Variance Estimates () Consider (Y i, Y i ) with (X i, X i ) = (, ) for i = : n and (X i, X i ) = (, ) for i = (n + ) : n ˆµ = n ˆµ = n { n Y i + i= { n Y i + i= n i=n+ n i=n+ Y i } Y i } var(ˆµ ˆµ ) = n {σ ( ρ)} 9 Heagerty, 6
49 9- Heagerty, 6 Scenario subject control treatment time time time time ID = Y, Y, ID = Y, Y, ID = 3 Y 3, Y 3, ID = 4 Y 4, Y 4, ID = 5 Y 5, Y 5, ID = 6 Y 6, Y 6,
50 Dependent Data and Proper Variance Estimates If we simply had n independent observations on treatment (X = ) and n independent observations on control then we d obtain var(ˆµ ˆµ ) = σ n + σ n = n σ Q: What is the impact of dependence relative to the situation where all (n + n) observations are independent? () positive dependence, ρ >, results in a loss of precision () positive dependence, ρ >, results in an improvement in precision! 3 Heagerty, 6
51 Therefore: Dependent data impacts proper statements of precision Dependent data may increase or decrease standard errors depending on the design 3 Heagerty, 6
52 Weighted Estimation Consider the situation where subjects report both the number of attempts and the number of successes: (Y i, N i ) Examples: live born (Y i ) in a litter (N i ) condoms used (Y i ) in sexual encounters (N i ) SAEs (Y i ) among total surgeries (N i ) Q: How to combine these data from i = : m subjects to estimate a common rate (proportion) of successes? 3 Heagerty, 6
53 Weighted Estimation Proposal : ˆp = i Y i / i N i Proposal : Simple Example: ˆp = m Y i /N i i Data : (, ) (, ) ˆp = ( + )/() = 3 ˆp = {/ + /} = 5 33 Heagerty, 6
54 Weighted Estimation Note: Each of these estimators, ˆp, and ˆp, can be viewed as weighted estimators of the form: ˆp w = { i w i Y i N i } / i We obtain ˆp by letting w i = N i, corresponding to equal weight given each to binary outcome, Y ij, Y i = N i j= Y ij We obtain ˆp by letting w i =, corresponding to equal weight given to each subject w i Q: What s optimal? 34 Heagerty, 6
55 Weighted Estimation A: Whatever weights are closest to /variance of Y i /N i (stat theory called Gauss-Markov ) If subjects are perfectly homogeneous then and ˆp is best V (Y i ) = N i p( p) If subjects are heterogeneous then, for example V (Y i ) = N i p( p){ + (N i )ρ} and an estimator closer to ˆp is best 35 Heagerty, 6
56 Summary Longitudinal (dependent) data are common (and interesting!) Inference must account for the dependence Consideration as to the choice of weighting will depend on the variance/covariance of the response variables 36 Heagerty, 6
57 Summary Methodological Issues in the Study of Cognitive Decline Morris et al AJE (999) In modeling associations between risk factors and change in cognitive function, the usual analytic issues are complicated by the need to consider time in the analysis The potential for both fruitful investigation and misinterpretation of the data is increased Analyses that make the most effective use of the data usually require the most advanced methods of longitudinal analysis 37 Heagerty, 6
58 Summary Methodological Issues in the Study of Cognitive Decline Morris et al AJE (999) Use of the longitudinal designs and advanced statistical models described here is well worth the extra effort required Meeting the special challenges of measuring change in cognitive function should improve our ability to identify risk factors for cognitive decline and other diseases related to aging Many of these issues apply to any study of disease processes entailing change with age 38 Heagerty, 6
59 Some References: Books Diggle PJ, Heagerty PJ, Liang K-Y, Zeger SL () Analysis of Longitudinal Data, Second Edition, Oxford University Press Fitzmaurice GM, Laird NM, Ware JM (4) Applied Longitudinal Analysis, Wiley Verbeke G, Molenberghs G () Linear Mixed Models for Longitudinal Data, Springer Singer JD, Willett JB (3) Applied Longitudinal Data Analysis, Oxford University Press 39 Heagerty, 6
60 Longitudinal Data Analysis INTRODUCTION to REGRESSION APPROACHES 4 Heagerty, 6
61 Linear Mixed Model Regression model: mean response as a function of covariates systematic variation Random effects: variation from subject-to-subject in trajectory random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 4 Heagerty, 6
62 Scientific Questions as Regression Questions concerning the rate of decline refer to the time slope for FEV: E[ FEV X = age, gender, f58] = β (X) + β (X) time Time Scales Let age = age-at-entry, age i Let agel = time-since-entry, age ij age i 4 Heagerty, 6
63 CF Regression Model Model: E[FEV X i ] = β +β age + β agel +β 3 female +β 4 f58 = + β 5 f58 = +β 6 female agel +β 7 f58 = agel + β 8 f58 = agel = β (X i ) + β (X i ) agel 43 Heagerty, 6
64 43- Heagerty, 6 Intercept f58= f58= f58= male β + β age β + β age β + β age +β 4 +β 5 female β + β age β + β age β + β age +β 3 +β 3 + β 4 +β 3 + β 5
65 43- Heagerty, 6 Slope f58= f58= f58= male β β + β 7 β + β 8 female β β + β 7 β + β 8 +β 6 +β 6 +β 6
66 43-3 Heagerty, 6 Gender Groups (f58==) FEV Male Female Age (years)
67 43-4 Heagerty, 6 Genotype Groups (male) FEV f58= f58= f58= Age (years)
68 Define Y ij = FEV for subject i at time t ij X i = (X ij,, X ini ) X ij = (X ij,, X ij,,, X ij,p ) age, agel, gender, genotype Issue: Response variables measured on the same subject are correlated cov(y ij, Y ik ) 44 Heagerty, 6
69 Some Notation It is useful to have some notation that can be used to discuss the stack of data that correspond to each subject Let n i denote the number of observations for subject i Define: Y i = Y i Y i Y ini If the subjects are observed at a common set of times t, t,, t m then E(Y ij ) = µ j denotes the mean of the population at time t j 45 Heagerty, 6
70 Dependence and Correlation Recall that observations are termed independent when deviation in one variable does not predict deviation in the other variable Given two subjects with the same age and gender, then the blood pressure for patient ID= is not predictive of the blood pressure for patient ID=334 Observations are called dependent or correlated when one variable does predict the value of another variable The LDL cholesterol of patient ID= at age 57 is predictive of the LDL cholesterol of patient ID= at age 6 46 Heagerty, 6
71 Dependence and Correlation Recall: The variance of a variable, Y ij (fix time t j for now) is defined as: σ j = E [ (Y ij µ j ) ] = E [(Y ij µ j )(Y ij µ j )] The variance measures the average distance that an observation falls away from the mean 47 Heagerty, 6
72 Dependence and Correlation Define: The covariance of two variables, Y ij, and Y ik (fix t j and t k ) is defined as: σ jk = E [(Y ij µ j )(Y ik µ k )] The covariance measures whether, on average, departures in one variable, Y ij µ j, go together with departures in a second variable, Y ik µ k In simple linear regression of Y ij on Y ik the regression coefficient β in E(Y ij Y ik ) = β + β Y ik is the covariance divided by the variance of Y ik : β = σ jk σ k 48 Heagerty, 6
73 Dependence and Correlation Define: The correlation of two variables, Y ij, and Y ik (fix t j and t k ) is defined as: ρ jk = E [(Y ij µ j )(Y ik µ k )] σ j σ k The correlation is a measure of dependence that takes values between - and + Recall that a correlation of implies that the two measures are unrelated (linearly) Recall that a correlation of implies that the two measures fall perfectly on a line one exactly predicts the other! 49 Heagerty, 6
74 Why interest in covariance and/or correlation? Recall that on pages 8 and 9 our standard error for the sample mean difference µ µ depends on ρ In general a statistical model for the outcomes Y i = (Y i, Y i,, Y ini ) requires the following: Means: µ j Variances: σ j Covariances: σ jk, or correlations ρ jk Therefore, one approach to making inferences based on longitudinal data is to construct a model for each of these three components 5 Heagerty, 6
75 Something new to model cov(y i ) = var(y i ) cov(y i, Y i ) cov(y i, Y ini ) cov(y i, Y i ) var(y i ) cov(y i, Y ini ) cov(y ini, Y i ) cov(y ini, Y i ) var(y ini ) = σ σ σ ρ σ σ ni ρ ni σ σ ρ σ σ σ ni ρ ni σ ni σ ρ ni σ ni σ ρ ni σ n i 5 Heagerty, 6
76 Mean and Covariance Models for FEV Models: E(Y ij X i ) = µ ij (regression) cov(y i X i ) = Σ i = Z i DZ T i } {{ } + R }{{} i between subjects within subjects Q: What are appropriate covariance models for the FEV data? 5 Heagerty, 6
77 5- Heagerty, 6 Wide Data Data array for the residuals (first rows of rmat ) age8 age age age4 age6 age8 age age age4 [,] NA NA NA NA NA [,] NA NA NA NA NA [3,] NA NA NA NA NA [4,] NA NA NA NA NA NA NA -4-6 [5,] NA NA NA NA NA [6,] NA NA NA NA [7,] NA NA NA NA NA [8,] NA NA NA NA NA [9,] NA NA NA NA NA NA [,] NA NA NA NA
78 age age age age 4 age age 8 age age *^ 99975*^ 6*^ age 4 5- Heagerty, 6
79 5-3 Heagerty, 6 Empirical Covariance Matrix: [,] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,] NA NA NA [,] NA NA NA [3,] NA [4,] [5,] [6,] [7,] [8,] [9,] 687 Empirical Correlation Matrix: [,] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,] NA NA NA [,] NA NA NA [3,] NA [4,] [5,] [6,] [7,] [8,] 78 [9,]
80 5-4 Heagerty, 6 Number of observations (pairs): [,] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,] [,] [3,] [4,] [5,] [6,] [7,] [8,] 66 4 [9,] 48
81 How to build models for correlation? Mixed models random effects within-subject similarity due to sharing trajectory Serial correlation close in time implies strong similarity correlation decreases as time separation increases 53 Heagerty, 6
82 Linear Mixed Model Regression model: mean response as a function of covariates systematic variation Random effects: variation from subject-to-subject in trajectory random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 54 Heagerty, 6
83 54- Heagerty, 6 Two Subjects Response Months
84 Levels of Analysis We first consider the distribution of measurements within subjects: Y ij = β,i + β,i t ij + e ij e ij N (, σ ) E[Y i X i, β i ] = β,i + β,i t ij = [, time ij ] β,i β,i = X i β i 55 Heagerty, 6
85 Levels of Analysis We can equivalently separate the subject-specific regression coefficients into the average coefficient and the specific departure for subject i: β,i = β + b,i β,i = β + b,i This allows another perspective: Y ij = β,i + β,i t ij + e ij = (β + β t ij ) + (b,i + b,i t ij ) + e ij E[Y i X i, β i ] = X i β } {{ } + X i b } {{ } i mean model between-subject 56 Heagerty, 6
86 56- Heagerty, 6 Sample of Lines FEV Age (years)
87 56- Heagerty, 6 Intercepts and Slopes Slope Deviation (b) Intercept Deviation (b)
88 Levels of Analysis Next we consider the distribution of patterns (parameters) among subjects: equivalently β i N (β, D) b i N (, D) Y i = X i β } {{ } + X i b } {{ } i + e }{{} i mean model between-subject within-subject 57 Heagerty, 6
89 57- Heagerty, 6 Fixed intercept, Fixed slope Random intercept, Fixed slope beta + beta * time beta + beta * time time Fixed intercept, Random slope time Random intercept, Random slope beta + beta * time beta + beta * time time time
90 Between-subject Variation We can use the idea of random effects to allow different types of between-subject heterogeneity: The magnitude of heterogeneity is characterized by D: b i = var(b i ) = b,i b,i D D D D 58 Heagerty, 6
91 Between-subject Variation The components of D can be interpreted as: D the typical subject-to-subject deviation in the overall level of the response D the typical subject-to-subject deviation in the change (time slope) of the response D the covariance between individual intercepts and slopes If positive then subjects with high levels also have high rates of change If negative then subjects with high levels have low rates of change 59 Heagerty, 6
92 59- Heagerty, 6 Fixed intercept, Fixed slope Random intercept, Fixed slope beta + beta * time beta + beta * time time Fixed intercept, Random slope time Random intercept, Random slope beta + beta * time beta + beta * time time time
93 Between-subject Variation: Examples No random effects: Y ij = β + β t ij + e ij = [, time ij ]β + e ij Random intercepts: Y ij = (β + β t ij ) + b,i + e ij = [, time ij ]β + [ ]b,i + e ij Random intercepts and slopes: Y ij = (β + β t ij ) + b,i + b,i t ij + e ij = [, time ij ]β + [, time ij ]b i + e ij 6 Heagerty, 6
94 6- Heagerty, 6 Mixed Models and Covariances/Correlation Q: What is the correlation between outcomes Y ij and Y ik under these random effects models? Random Intercept Model Y ij = β + β t ij + b,i + e ij Y ik = β + β t ik + b,i + e ik var(y ij ) = var(b,i ) + var(e ij ) = D + σ cov(y ij, Y ik ) = cov(b,i + e ij, b,i + e ik ) = D
95 6- Heagerty, 6 Mixed Models and Covariances/Correlation Random Intercept Model corr(y ij, Y ik ) = D D + σ D + σ = D D + σ = between var between var + within var Therefore, any two outcomes have the same correlation Doesn t depend on the specific times, nor on the distance between the measurements Exchangeable correlation model Assuming: var(e ij ) = σ, and cov(e ij, e ik ) =
96 6-3 Heagerty, 6 Mixed Models and Covariances/Correlation Random Intercept and Slope Model Y ij = (β + β t ij ) + (b,i + b,i t ij ) + e ij Y ik = (β + β t ik ) + (b,i + b,i t ik ) + e ik var(y ij ) = var(b,i + b,i t ij ) + var(e ij ) = D + D t ij + D t ij + σ cov(y ij, Y ik ) = cov[(b,i + b,i t ij + e ij ), (b,i + b,i t ik + e ik )] = D + D (t ij + t ik ) + D t ij t ik
97 6-4 Heagerty, 6 Mixed Models and Covariances/Correlation Random Intercept and Slope Model ρ ijk = corr(y ij, Y ik ) = D + D (t ij + t ik ) + D t ij t ik D + D t ij + D t ij + σ D + D t ik + D t ik + σ Therefore, two outcomes may not have the same correlation Correlation depends on the specific times for the observations, and does not have a simple form Assuming: var(e ij ) = σ, and cov(e ij, e ik ) =
98 Linear Mixed Model Regression model: mean response as a function of covariates systematic variation Random effects: variation from subject-to-subject in trajectory random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 6 Heagerty, 6
99 Covariance Models Serial Models Linear mixed models assume that each subject follows his/her own line In some situations the dependence is more local meaning that observations close in time are more similar than those far apart in time 6 Heagerty, 6
100 Covariance Models Define e ij = ρ e ij + ɛ ij ɛ ij N (, σ ( ρ ) ) ɛ i N (, σ ) This leads to autocorrelated errors: cov(e ij, e ik ) = σ ρ j k 63 Heagerty, 6
101 ID=, rho=5 ID=, rho=5 ID=3, rho=5 ID=4, rho=5 y y y y months months months months ID=5, rho=5 ID=6, rho=5 ID=7, rho=5 ID=8, rho=5 y y y y months months months months ID=9, rho=5 ID=, rho=5 ID=, rho=5 ID=, rho=5 y y y y months months months months ID=3, rho=5 ID=4, rho=5 ID=5, rho=5 ID=6, rho=5 y y y y months months months months ID=7, rho=5 ID=8, rho=5 ID=9, rho=5 ID=, rho=5 y y y y months months months months 63- Heagerty, 6
102 ID=, rho=9 ID=, rho=9 ID=3, rho=9 ID=4, rho=9 y y y y months months months months ID=5, rho=9 ID=6, rho=9 ID=7, rho=9 ID=8, rho=9 y y y y months months months months ID=9, rho=9 ID=, rho=9 ID=, rho=9 ID=, rho=9 y y y y months months months months ID=3, rho=9 ID=4, rho=9 ID=5, rho=9 ID=6, rho=9 y y y y months months months months ID=7, rho=9 ID=8, rho=9 ID=9, rho=9 ID=, rho=9 y y y y months months months months 63- Heagerty, 6
103 63-3 Heagerty, 6 Two Subjects Response line for subject average line line for subject subject dropout Months
104 Linear Mixed Model Regression model: mean response as a function of covariates systematic variation Random effects: variation from subject-to-subject in trajectory random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 64 Heagerty, 6
Chapter 1. Longitudinal Data Analysis. 1.1 Introduction
Chapter 1 Longitudinal Data Analysis 1.1 Introduction One of the most common medical research designs is a pre-post study in which a single baseline health status measurement is obtained, an intervention
More informationIntroduction to mixed model and missing data issues in longitudinal studies
Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models
More informationAnalysis of Longitudinal Data for Inference and Prediction. Patrick J. Heagerty PhD Department of Biostatistics University of Washington
Analysis of Longitudinal Data for Inference and Prediction Patrick J. Heagerty PhD Department of Biostatistics University of Washington 1 ISCB 2010 Outline of Short Course Inference How to handle stochastic
More informationOverview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models
Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear
More informationLongitudinal Data Analysis
Longitudinal Data Analysis Acknowledge: Professor Garrett Fitzmaurice INSTRUCTOR: Rino Bellocco Department of Statistics & Quantitative Methods University of Milano-Bicocca Department of Medical Epidemiology
More informationIntroduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
More informationTips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD
Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes
More informationA LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop
A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY Ramon Alemany Montserrat Guillén Xavier Piulachs Lozada Riskcenter - IREA Universitat de Barcelona http://www.ub.edu/riskcenter
More informationAn Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc.
An Application of the G-formula to Asbestos and Lung Cancer Stephen R. Cole Epidemiology, UNC Chapel Hill Slides: www.unc.edu/~colesr/ 1 Acknowledgements Collaboration with David B. Richardson, Haitao
More informationOrganizing Your Approach to a Data Analysis
Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize
More informationSPPH 501 Analysis of Longitudinal & Correlated Data September, 2012
SPPH 501 Analysis of Longitudinal & Correlated Data September, 2012 TIME & PLACE: Term 1, Tuesday, 1:30-4:30 P.M. LOCATION: INSTRUCTOR: OFFICE: SPPH, Room B104 Dr. Ying MacNab SPPH, Room 134B TELEPHONE:
More informationStatistical Rules of Thumb
Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN
More informationMATH5885 LONGITUDINAL DATA ANALYSIS
MATH5885 LONGITUDINAL DATA ANALYSIS Semester 1, 2013 CRICOS Provider No: 00098G 2013, School of Mathematics and Statistics, UNSW MATH5885 Course Outline Information about the course Course Authority: William
More informationElements of statistics (MATH0487-1)
Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -
More informationBiostatistics Short Course Introduction to Longitudinal Studies
Biostatistics Short Course Introduction to Longitudinal Studies Zhangsheng Yu Division of Biostatistics Department of Medicine Indiana University School of Medicine Zhangsheng Yu (Indiana University) Longitudinal
More informationUSC Columbia, Lancaster, Salkehatchie, Sumter & Union campuses
NEW COURSE PROPOSAL USC Columbia, Lancaster, Salkehatchie, Sumter & Union campuses NCP INSTRUCTIONS: This form is used to add a new course to the University course database. This form is available online
More information4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4
4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression
More informationDepartment of Epidemiology and Public Health Miller School of Medicine University of Miami
Department of Epidemiology and Public Health Miller School of Medicine University of Miami BST 630 (3 Credit Hours) Longitudinal and Multilevel Data Wednesday-Friday 9:00 10:15PM Course Location: CRB 995
More informationThe Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities
The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities Elizabeth Garrett-Mayer, PhD Assistant Professor Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 1
More informationA course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011
A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011 1 Abstract We report back on methods used for the analysis of longitudinal data covered by the course We draw lessons for
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationOutline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
More informationSPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg
SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationPRACTICE PROBLEMS FOR BIOSTATISTICS
PRACTICE PROBLEMS FOR BIOSTATISTICS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION 1. The duration of time from first exposure to HIV infection to AIDS diagnosis is called the incubation period.
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationStudy Design and Statistical Analysis
Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationSchool of Public Health and Health Services Department of Epidemiology and Biostatistics
School of Public Health and Health Services Department of Epidemiology and Biostatistics Master of Public Health and Graduate Certificate Biostatistics 0-04 Note: All curriculum revisions will be updated
More informationTime series analysis as a framework for the characterization of waterborne disease outbreaks
Interdisciplinary Perspectives on Drinking Water Risk Assessment and Management (Proceedings of the Santiago (Chile) Symposium, September 1998). IAHS Publ. no. 260, 2000. 127 Time series analysis as a
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationHow to choose an analysis to handle missing data in longitudinal observational studies
How to choose an analysis to handle missing data in longitudinal observational studies ICH, 25 th February 2015 Ian White MRC Biostatistics Unit, Cambridge, UK Plan Why are missing data a problem? Methods:
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationThe Latent Variable Growth Model In Practice. Individual Development Over Time
The Latent Variable Growth Model In Practice 37 Individual Development Over Time y i = 1 i = 2 i = 3 t = 1 t = 2 t = 3 t = 4 ε 1 ε 2 ε 3 ε 4 y 1 y 2 y 3 y 4 x η 0 η 1 (1) y ti = η 0i + η 1i x t + ε ti
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More information1 Simple Linear Regression I Least Squares Estimation
Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and
More informationElementary Statistics
Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,
More informationThe 3-Level HLM Model
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 2 Basic Characteristics of the 3-level Model Level-1 Model Level-2 Model Level-3 Model
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationBiostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
More informationAnalysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More informationAppendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.
Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study Prepared by: Centers for Disease Control and Prevention National
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationExercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationThe Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.
The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables Kathleen M. Lang* Boston College and Peter Gottschalk Boston College Abstract We derive the efficiency loss
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationA Review of Cross Sectional Regression for Financial Data You should already know this material from previous study
A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL
More informationSelection bias in secondary analysis of electronic health record data. Sebastien Haneuse, PhD
Selection bias in secondary analysis of electronic health record data Sebastien Haneuse, PhD Department of Biostatistics Harvard School of Public Health 1 Symposium on Health Care Data Analytics, Seattle,
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationStatistics 151 Practice Midterm 1 Mike Kowalski
Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and
More informationLinear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares
Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects
More informationExploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More information2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationIllustration (and the use of HLM)
Illustration (and the use of HLM) Chapter 4 1 Measurement Incorporated HLM Workshop The Illustration Data Now we cover the example. In doing so we does the use of the software HLM. In addition, we will
More informationLongitudinal Modeling of Lung Function in Respiratory Drug Development
Longitudinal Modeling of Lung Function in Respiratory Drug Development Fredrik Öhrn, PhD Senior Clinical Pharmacometrician Quantitative Clinical Pharmacology AstraZeneca R&D Mölndal, Sweden Outline A brief
More informationUsing Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses
Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract
More informationAssignments Analysis of Longitudinal data: a multilevel approach
Assignments Analysis of Longitudinal data: a multilevel approach Frans E.S. Tan Department of Methodology and Statistics University of Maastricht The Netherlands Maastricht, Jan 2007 Correspondence: Frans
More informationIntroduction to Structural Equation Modeling (SEM) Day 4: November 29, 2012
Introduction to Structural Equation Modeling (SEM) Day 4: November 29, 202 ROB CRIBBIE QUANTITATIVE METHODS PROGRAM DEPARTMENT OF PSYCHOLOGY COORDINATOR - STATISTICAL CONSULTING SERVICE COURSE MATERIALS
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationExample: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationIntroduction to Statistics and Quantitative Research Methods
Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationEconometric Methods for Panel Data
Based on the books by Baltagi: Econometric Analysis of Panel Data and by Hsiao: Analysis of Panel Data Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies
More informationMultilevel Models for Longitudinal Data. Fiona Steele
Multilevel Models for Longitudinal Data Fiona Steele Aims of Talk Overview of the application of multilevel (random effects) models in longitudinal research, with examples from social research Particular
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More informationSYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
More informationDescriptive Statistics and Exploratory Data Analysis
Descriptive Statistics and Exploratory Data Analysis Dean s s Faculty and Resident Development Series UT College of Medicine Chattanooga Probasco Auditorium at Erlanger January 14, 2008 Marc Loizeaux,
More informationAnalyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest
Analyzing Intervention Effects: Multilevel & Other Approaches Joop Hox Methodology & Statistics, Utrecht Simplest Intervention Design R X Y E Random assignment Experimental + Control group Analysis: t
More information10. Analysis of Longitudinal Studies Repeat-measures analysis
Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.
More informationWhen to Use a Particular Statistical Test
When to Use a Particular Statistical Test Central Tendency Univariate Descriptive Mode the most commonly occurring value 6 people with ages 21, 22, 21, 23, 19, 21 - mode = 21 Median the center value the
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationIntroduction to Data Analysis in Hierarchical Linear Models
Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM
More informationChapter 11 Introduction to Survey Sampling and Analysis Procedures
Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152
More informationFactors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationFIXED AND MIXED-EFFECTS MODELS FOR MULTI-WATERSHED EXPERIMENTS
FIXED AND MIXED-EFFECTS MODELS FOR MULTI-WATERSHED EXPERIMENTS Jack Lewis, Mathematical Statistician, Pacific Southwest Research Station, 7 Bayview Drive, Arcata, CA, email: jlewis@fs.fed.us Abstract:
More informationVariables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
More informationSchools Value-added Information System Technical Manual
Schools Value-added Information System Technical Manual Quality Assurance & School-based Support Division Education Bureau 2015 Contents Unit 1 Overview... 1 Unit 2 The Concept of VA... 2 Unit 3 Control
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationStat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015
Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field
More informationReview of the Methods for Handling Missing Data in. Longitudinal Data Analysis
Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics
More informationHighlights the connections between different class of widely used models in psychological and biomedical studies. Multiple Regression
GLMM tutor Outline 1 Highlights the connections between different class of widely used models in psychological and biomedical studies. ANOVA Multiple Regression LM Logistic Regression GLM Correlated data
More informationProblem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationHLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
More information