Analysis of Correlated Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Size: px
Start display at page:

Download "Analysis of Correlated Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington"

Transcription

1 Analysis of Correlated Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Heagerty, 6

2 Course Outline Examples of longitudinal data Correlation and weighting Exploratory data analysis between- and within-person variation correlation / covariance Regression analysis Models for the mean Models for the correlation Linear mixed model GEE Heagerty, 6

3 Course Outline Missing data Binary and Count Data Models for the mean Models for the correlation Generalized linear mixed model GEE Transition models Time-dependent covariates Multilevel models Design considerations 3 Heagerty, 6

4 Bio & Notes Patrick J Heagerty Professor, University of Washington Collaborative roles = RWJ, VA ERIC, NIAMS MCRC, K Books: Diggle, Heagerty, Liang & Zeger Analysis of Longitudinal Data Oxford, van Belle, Fisher, Heagerty & Lumley Biostatistics Wiley, 4 (introductory chapter on LDA) 4 Heagerty, 6

5 Course Notes & Slides UW Biostat 57 = PhD applied core sequence Winter 999,,, UM Epi 766 = Longitudinal Data Analysis / Epi Summer (Summer 4 with VA/UW Biostat/Epi) Second Seattle Symposium (with S Zeger) Fall RAND short course Fall NICHD short course Fall 3 5 Heagerty, 6

6 Longitudinal Data Analysis INTRODUCTION to EXAMPLES AND ISSUES 6 Heagerty, 6

7 Outline Examples of longitudinal data between- and within-person variation correlation / covariance Scientific motivation Opportunities Issues Time scales Cross-sectional contrasts Longitudinal contrasts Correlation and weighting Impact of correlation Weighted estimation 7 Heagerty, 6

8 Continuous Longitudinal Data Example : Cystic Fibrosis and Lung Function There is a large registry of cystic fibrosis patient data Annual measurements include standard pulmonary function measures: FVC, FEV primary outcome: FEV percent predicted covariates: age, gender, genotype Q: Does change in lung function differ by gender and/or genotype? 8 Heagerty, 6

9 8- Heagerty, fev age

10 Continuous Longitudinal Data Example : Beta-carotene and vitamin E a phase II study to ascertain the pharmacokinetics of beta-carotene supplementation and the subsequent impact on vitamin E levels primary outcome: plasma measures taken monthly for 3 months prior to, 9 months during, and 3 months after supplementation covariates: dose (, 5, 3, 45, 6 mg/day) and time Q: What is the time course? Dose-response? Relationship between beta-carotene and vitamin E? 9 Heagerty, 6

11 9- Heagerty, 6 Dose = 45 Subject = 9 Subject = Subject = Subject = 3 beta-carotene (solid) beta-carotene (solid) beta-carotene (solid) beta-carotene (solid) months since randomization months since randomization months since randomization months since randomization Subject = 3 Subject = 3 Subject = 46 Subject = 47 beta-carotene (solid) beta-carotene (solid) beta-carotene (solid) beta-carotene (solid) Vitamin E (dashed) months since randomization months since randomization months since randomization months since randomization

12 9- Heagerty, 6 Dose = 45 Subject = 9 Subject = Subject = Subject = 3 log(beta-carotene) (solid) log(beta-carotene) (solid) log(beta-carotene) (solid) log(beta-carotene) (solid) months since randomization months since randomization months since randomization months since randomization Subject = 3 Subject = 3 Subject = 46 Subject = 47 log(beta-carotene) (solid) log(beta-carotene) (solid) log(beta-carotene) (solid) log(beta-carotene) (solid) Vitamin E (dashed) months since randomization months since randomization months since randomization months since randomization

13 Categorical Longitudinal Data Example 3: Maternal Stress and Child Morbidity daily indicators of stress (maternal), and illness (child) primary outcome: illness, utilization covariates: employment, stress Q: association between employment, stress and morbidity? Heagerty, 6

14 - Heagerty, 6

15 Illness Percent Employed= Employed= Day Stress Percent Employed= Employed= Day - Heagerty, 6

16 -3 Heagerty, 6 subject = 4 subject = subject = 56 Y illness N Y stress N Day Y illness N Y stress N Day Y illness N Y stress N Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day Y illness N Y stress N subject = Day

17 Continuous Longitudinal Data Example 4: PANSS Data PANSS is a standard symptom assessment for schizophrenic patients This study compares different doses of a new agent to a standard agent and to placebo primary outcome: PANSS covariates: treatment, time Q: What s the best treatment? Heagerty, 6

18 - Heagerty, 6 Last Visit = 8 Last Visit = 6 Last Visit = 4 PANSS Score PANSS Score PANSS Score Time Time Time Last Visit = Last Visit = Last Visit = PANSS Score PANSS Score PANSS Score Time Time Time

19 Longitudinal Data In longitudinal studies of health, we typically observe two distinct kinds of outcomes Times of clinical or other key events Repeated values of markers of the health status of participants In general terms, the scientific question is how explanatory variables affect times to clinical events and markers of the level or change in health status over time The relationship between the event times and markers can also be of interest Use markers as predictors of, or surrogates for, the clinical event time Heagerty, 6

20 Longitudinal Studies Benefits of longitudinal studies: Incident events are recorded Measure the new occurrence of disease Timing of disease onset can be correlated with recent changes in patient exposure and/or with chronic exposure Prospective ascertainment of exposure Participants can have their exposure status recorded at multiple follow-up visits This can alleviate recall bias Temporal order of exposures and outcomes is observed 3 Heagerty, 6

21 3 Measurement of individual change in outcomes A key strength of a longitudinal study is the ability to measure change in outcomes and/or exposure at the individual level Longitudinal studies provide the opportunity to observe individual patterns of change 4 Separation of time effects: Cohort, Period, Age When studying change over time there are many time scales to consider cohort scale is the time of birth such as 945 or 963 period is the current time such as 4 age is (period - cohort) A longitudinal study with times t, t, t n can characterize multiple time scales such as age and cohort effects using covariates derived from the calendar time and birth year: age of subject i at time t j is age ij = (t j birth i ); and cohort is cohort ij = birth i 4 Heagerty, 6

22 5 Control for cohort effects In a cross-sectional study the comparison of subgroups of different ages combines the effects of aging and the effects of different cohorts That is, comparison of outcomes measured in 3 among 58 year old subjects and among 4 year old subjects reflects both the fact that the groups differ by 8 years (aging) and the fact that the subjects were born in different eras In a longitudinal study the cohort under study is fixed and thus changes in time are not confounded by cohort differences An nice overview of LDA opportunities in respiratory epidemiology is presented in Weiss and Ware (996) Lebowitz (996) discusses age, period, and cohort effects 5 Heagerty, 6

23 Longitudinal Studies The benefits of a longitudinal design are not without cost There are several challenges posed: Challenges of longitudinal studies: Participant follow-up Risk of bias due to incomplete follow-up, or drop-out of study participants If subjects that are followed to the planned end of study differ from subjects who discontinue follow-up then a naive analysis may provide summaries that are not representative of the original target population 6 Heagerty, 6

24 Analysis of correlated data Statistical analysis of longitudinal data requires methods that can properly account for the intra-subject correlation of response measurements If such correlation is ignored then inferences such as statistical tests or confidence intervals can be grossly invalid 7 Heagerty, 6

25 3 Time-varying covariates Although longitudinal designs offer the opportunity to associate changes in exposure with changes in the outcome of interest, the direction of causality can be complicated by feedback between the outcome and the exposure Example = MSCM with stresss and illness Although scientific interest generally lies in the effect of exposure on health, reciprocal influence between exposure and outcome poses analytical difficulty when trying to separate the effect of exposure on health from the effect of health on exposure How to choose exposure lag? eg Is it the air pollution today, yesterday, or last week that is the important predictor of morbidity today? 8 Heagerty, 6

26 Longitudinal Studies The Scientific Opportunity Observe individual changes over time Characterize the time-course of disease Outcome Measures A single outcome at a fixed follow-up time The time until an event occurs Repeated measures taken over time 9 Heagerty, 6

27 Motivation Cystic Fibrosis and Pulmonary Function Several specific aspects are of interest: What is the rate of decline in FEV? Is the time course different for males and females? 3 Is the time course different for F58 homozygous subjects? Reference: Davis PB (997) Journal of Pediatrics Heagerty, 6

28 Data ID = patient id FEV = percent-predicted forced expiratory volume in second AGE = age (years) GENDER = sex (=male, =female) PSEUDOA = infection with Pseudomonas Aeruginosa (=no, 3=yes) F58 = genotype (=homozygous, =heterozygous, 3=none) PANCREAT = pancreatic enzyme supplmentation (,=no, =yes) Heagerty, 6

29 - Heagerty, 6 EDA: Numerical Summaries Total number of subjects = Number of observations (number of subjects with ni): Distribution of males / females male female 98 Number of mutations of f

30 - Heagerty, 6 EDA: Numerical Summaries Age at entry N = Median = 9655 Quartiles = 7758, 5335 Decimal point is at the colon 5 : : : : : : 3349 : : : : : : : : : 4 : 3778 : 5577 : : 8

31 ID = 57 ID = 5796 ID = 577 ID = 774 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 7 ID = 6345 ID = 884 ID = 749 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 3564 ID = 587 ID = 439 ID = 7755 FEV 5 5 FEV 5 5 FEV PA PA PA PA PC PC PC PC Heagerty, FEV 5 5

32 ID = 743 ID = 977 ID = 876 ID = 83 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 3399 ID = 979 ID = 8645 ID = 7 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 679 ID = 5 ID = 35 ID = 8 FEV 5 5 FEV 5 5 FEV PA PA PA PA PC PC PC PC Heagerty, FEV 5 5

33 ID = 9847 ID = 67 ID = 97 ID = 597 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 736 ID = 4367 ID = 63 ID = 945 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 6699 ID = 394 ID = 54 ID = 7 FEV 5 5 FEV 5 5 FEV PA PA PA PA PC PC PC PC Heagerty, FEV 5 5

34 ID = 837 ID = 74 ID = 76 ID = 74 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 7483 ID = 4864 ID = 786 ID = 754 FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC FEV 5 5 PA PC Age Age Age Age ID = 66 ID = 8579 ID = 383 ID = 9469 FEV 5 5 FEV 5 5 FEV PA PA PA PA PC PC PC PC Heagerty, FEV 5 5

35 -7 Heagerty, 6 FEV versus Age fev 5 5 slope = age

36 FEV versus Age-at-Entry FEV 5 5 slope = Age at Entry FEV change FEV Change versus Age-since-Entry slope = Age since Entry -8 Heagerty, 6

37 Distinguishing Cross-sectional and Longitudinal Associations Cross-sectional data Y i = β C x i + ɛ i, i =,, m () β C represents the difference in average Y across two sub-populations which differ by one unit in x Longitudinal data Y ij = β C x i + β L (x ij x i ) + ɛ ij, j =,, n i i =,, m () Heagerty, 6

38 When j =, the two equations are the same; β C has the same cross-sectional interpretation Subtract equations above to obtain (Y ij Y i ) = β L (x ij x i ) + (ɛ ij ɛ i ) β L represents the expected change in Y per unit change in x 3 Heagerty, 6

39 3- Heagerty, 6 A - Cross-Sectional Data Reading Score o o o o o o o o o o o o Age

40 Cross-sectional Longitudinal Response 5 5 Response Months Months Both Response Months 3- Heagerty, 6

41 Age change in FEV males females FEV by Male/Female Age change in FEV f58 f58 f58 FEV by f Heagerty, 6

42 EDA Summary Observations Systematic trends: time, gender, F58 Random variation: individual, observation Questions Two time scales? Estimation / testing for rates of decline? Other? 4 Heagerty, 6

43 Longitudinal Data Analysis INTRODUCTION to CORRELATION and WEIGHTING 5 Heagerty, 6

44 Longitudinal Data The basic statistical problem is that variables from a given individual are correlated over time (generic) Q: So what? (-) ignoring dependence can lead to invalid inference (-) often limited information regarding dependence (+) can observe change for individuals over time (+) variety of statistical approaches that are available 6 Heagerty, 6

45 Longitudinal Data need to account for the dependence (generic) Q: How? Choice of Model Choice of Estimator 3 Choice of Summaries 7 Heagerty, 6

46 Dependent Data and Proper Variance Estimates Let X ij = denote placebo assignment and X ij = denote active treatment () Consider (Y i, Y i ) with (X i, X i ) = (, ) for i = : n and (X i, X i ) = (, ) for i = (n + ) : n ˆµ = n ˆµ = n n i= j= n Y ij i=n+ j= var(ˆµ ˆµ ) = n {σ ( + ρ)} 8 Heagerty, 6 Y ij

47 8- Heagerty, 6 Scenario subject control treatment time time time time ID = Y, Y, ID = Y, Y, ID = 3 Y 3, Y 3, ID = 4 Y 4, Y 4, ID = 5 Y 5, Y 5, ID = 6 Y 6, Y 6,

48 Dependent Data and Proper Variance Estimates () Consider (Y i, Y i ) with (X i, X i ) = (, ) for i = : n and (X i, X i ) = (, ) for i = (n + ) : n ˆµ = n ˆµ = n { n Y i + i= { n Y i + i= n i=n+ n i=n+ Y i } Y i } var(ˆµ ˆµ ) = n {σ ( ρ)} 9 Heagerty, 6

49 9- Heagerty, 6 Scenario subject control treatment time time time time ID = Y, Y, ID = Y, Y, ID = 3 Y 3, Y 3, ID = 4 Y 4, Y 4, ID = 5 Y 5, Y 5, ID = 6 Y 6, Y 6,

50 Dependent Data and Proper Variance Estimates If we simply had n independent observations on treatment (X = ) and n independent observations on control then we d obtain var(ˆµ ˆµ ) = σ n + σ n = n σ Q: What is the impact of dependence relative to the situation where all (n + n) observations are independent? () positive dependence, ρ >, results in a loss of precision () positive dependence, ρ >, results in an improvement in precision! 3 Heagerty, 6

51 Therefore: Dependent data impacts proper statements of precision Dependent data may increase or decrease standard errors depending on the design 3 Heagerty, 6

52 Weighted Estimation Consider the situation where subjects report both the number of attempts and the number of successes: (Y i, N i ) Examples: live born (Y i ) in a litter (N i ) condoms used (Y i ) in sexual encounters (N i ) SAEs (Y i ) among total surgeries (N i ) Q: How to combine these data from i = : m subjects to estimate a common rate (proportion) of successes? 3 Heagerty, 6

53 Weighted Estimation Proposal : ˆp = i Y i / i N i Proposal : Simple Example: ˆp = m Y i /N i i Data : (, ) (, ) ˆp = ( + )/() = 3 ˆp = {/ + /} = 5 33 Heagerty, 6

54 Weighted Estimation Note: Each of these estimators, ˆp, and ˆp, can be viewed as weighted estimators of the form: ˆp w = { i w i Y i N i } / i We obtain ˆp by letting w i = N i, corresponding to equal weight given each to binary outcome, Y ij, Y i = N i j= Y ij We obtain ˆp by letting w i =, corresponding to equal weight given to each subject w i Q: What s optimal? 34 Heagerty, 6

55 Weighted Estimation A: Whatever weights are closest to /variance of Y i /N i (stat theory called Gauss-Markov ) If subjects are perfectly homogeneous then and ˆp is best V (Y i ) = N i p( p) If subjects are heterogeneous then, for example V (Y i ) = N i p( p){ + (N i )ρ} and an estimator closer to ˆp is best 35 Heagerty, 6

56 Summary Longitudinal (dependent) data are common (and interesting!) Inference must account for the dependence Consideration as to the choice of weighting will depend on the variance/covariance of the response variables 36 Heagerty, 6

57 Summary Methodological Issues in the Study of Cognitive Decline Morris et al AJE (999) In modeling associations between risk factors and change in cognitive function, the usual analytic issues are complicated by the need to consider time in the analysis The potential for both fruitful investigation and misinterpretation of the data is increased Analyses that make the most effective use of the data usually require the most advanced methods of longitudinal analysis 37 Heagerty, 6

58 Summary Methodological Issues in the Study of Cognitive Decline Morris et al AJE (999) Use of the longitudinal designs and advanced statistical models described here is well worth the extra effort required Meeting the special challenges of measuring change in cognitive function should improve our ability to identify risk factors for cognitive decline and other diseases related to aging Many of these issues apply to any study of disease processes entailing change with age 38 Heagerty, 6

59 Some References: Books Diggle PJ, Heagerty PJ, Liang K-Y, Zeger SL () Analysis of Longitudinal Data, Second Edition, Oxford University Press Fitzmaurice GM, Laird NM, Ware JM (4) Applied Longitudinal Analysis, Wiley Verbeke G, Molenberghs G () Linear Mixed Models for Longitudinal Data, Springer Singer JD, Willett JB (3) Applied Longitudinal Data Analysis, Oxford University Press 39 Heagerty, 6

60 Longitudinal Data Analysis INTRODUCTION to REGRESSION APPROACHES 4 Heagerty, 6

61 Linear Mixed Model Regression model: mean response as a function of covariates systematic variation Random effects: variation from subject-to-subject in trajectory random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 4 Heagerty, 6

62 Scientific Questions as Regression Questions concerning the rate of decline refer to the time slope for FEV: E[ FEV X = age, gender, f58] = β (X) + β (X) time Time Scales Let age = age-at-entry, age i Let agel = time-since-entry, age ij age i 4 Heagerty, 6

63 CF Regression Model Model: E[FEV X i ] = β +β age + β agel +β 3 female +β 4 f58 = + β 5 f58 = +β 6 female agel +β 7 f58 = agel + β 8 f58 = agel = β (X i ) + β (X i ) agel 43 Heagerty, 6

64 43- Heagerty, 6 Intercept f58= f58= f58= male β + β age β + β age β + β age +β 4 +β 5 female β + β age β + β age β + β age +β 3 +β 3 + β 4 +β 3 + β 5

65 43- Heagerty, 6 Slope f58= f58= f58= male β β + β 7 β + β 8 female β β + β 7 β + β 8 +β 6 +β 6 +β 6

66 43-3 Heagerty, 6 Gender Groups (f58==) FEV Male Female Age (years)

67 43-4 Heagerty, 6 Genotype Groups (male) FEV f58= f58= f58= Age (years)

68 Define Y ij = FEV for subject i at time t ij X i = (X ij,, X ini ) X ij = (X ij,, X ij,,, X ij,p ) age, agel, gender, genotype Issue: Response variables measured on the same subject are correlated cov(y ij, Y ik ) 44 Heagerty, 6

69 Some Notation It is useful to have some notation that can be used to discuss the stack of data that correspond to each subject Let n i denote the number of observations for subject i Define: Y i = Y i Y i Y ini If the subjects are observed at a common set of times t, t,, t m then E(Y ij ) = µ j denotes the mean of the population at time t j 45 Heagerty, 6

70 Dependence and Correlation Recall that observations are termed independent when deviation in one variable does not predict deviation in the other variable Given two subjects with the same age and gender, then the blood pressure for patient ID= is not predictive of the blood pressure for patient ID=334 Observations are called dependent or correlated when one variable does predict the value of another variable The LDL cholesterol of patient ID= at age 57 is predictive of the LDL cholesterol of patient ID= at age 6 46 Heagerty, 6

71 Dependence and Correlation Recall: The variance of a variable, Y ij (fix time t j for now) is defined as: σ j = E [ (Y ij µ j ) ] = E [(Y ij µ j )(Y ij µ j )] The variance measures the average distance that an observation falls away from the mean 47 Heagerty, 6

72 Dependence and Correlation Define: The covariance of two variables, Y ij, and Y ik (fix t j and t k ) is defined as: σ jk = E [(Y ij µ j )(Y ik µ k )] The covariance measures whether, on average, departures in one variable, Y ij µ j, go together with departures in a second variable, Y ik µ k In simple linear regression of Y ij on Y ik the regression coefficient β in E(Y ij Y ik ) = β + β Y ik is the covariance divided by the variance of Y ik : β = σ jk σ k 48 Heagerty, 6

73 Dependence and Correlation Define: The correlation of two variables, Y ij, and Y ik (fix t j and t k ) is defined as: ρ jk = E [(Y ij µ j )(Y ik µ k )] σ j σ k The correlation is a measure of dependence that takes values between - and + Recall that a correlation of implies that the two measures are unrelated (linearly) Recall that a correlation of implies that the two measures fall perfectly on a line one exactly predicts the other! 49 Heagerty, 6

74 Why interest in covariance and/or correlation? Recall that on pages 8 and 9 our standard error for the sample mean difference µ µ depends on ρ In general a statistical model for the outcomes Y i = (Y i, Y i,, Y ini ) requires the following: Means: µ j Variances: σ j Covariances: σ jk, or correlations ρ jk Therefore, one approach to making inferences based on longitudinal data is to construct a model for each of these three components 5 Heagerty, 6

75 Something new to model cov(y i ) = var(y i ) cov(y i, Y i ) cov(y i, Y ini ) cov(y i, Y i ) var(y i ) cov(y i, Y ini ) cov(y ini, Y i ) cov(y ini, Y i ) var(y ini ) = σ σ σ ρ σ σ ni ρ ni σ σ ρ σ σ σ ni ρ ni σ ni σ ρ ni σ ni σ ρ ni σ n i 5 Heagerty, 6

76 Mean and Covariance Models for FEV Models: E(Y ij X i ) = µ ij (regression) cov(y i X i ) = Σ i = Z i DZ T i } {{ } + R }{{} i between subjects within subjects Q: What are appropriate covariance models for the FEV data? 5 Heagerty, 6

77 5- Heagerty, 6 Wide Data Data array for the residuals (first rows of rmat ) age8 age age age4 age6 age8 age age age4 [,] NA NA NA NA NA [,] NA NA NA NA NA [3,] NA NA NA NA NA [4,] NA NA NA NA NA NA NA -4-6 [5,] NA NA NA NA NA [6,] NA NA NA NA [7,] NA NA NA NA NA [8,] NA NA NA NA NA [9,] NA NA NA NA NA NA [,] NA NA NA NA

78 age age age age 4 age age 8 age age *^ 99975*^ 6*^ age 4 5- Heagerty, 6

79 5-3 Heagerty, 6 Empirical Covariance Matrix: [,] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,] NA NA NA [,] NA NA NA [3,] NA [4,] [5,] [6,] [7,] [8,] [9,] 687 Empirical Correlation Matrix: [,] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,] NA NA NA [,] NA NA NA [3,] NA [4,] [5,] [6,] [7,] [8,] 78 [9,]

80 5-4 Heagerty, 6 Number of observations (pairs): [,] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,] [,] [3,] [4,] [5,] [6,] [7,] [8,] 66 4 [9,] 48

81 How to build models for correlation? Mixed models random effects within-subject similarity due to sharing trajectory Serial correlation close in time implies strong similarity correlation decreases as time separation increases 53 Heagerty, 6

82 Linear Mixed Model Regression model: mean response as a function of covariates systematic variation Random effects: variation from subject-to-subject in trajectory random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 54 Heagerty, 6

83 54- Heagerty, 6 Two Subjects Response Months

84 Levels of Analysis We first consider the distribution of measurements within subjects: Y ij = β,i + β,i t ij + e ij e ij N (, σ ) E[Y i X i, β i ] = β,i + β,i t ij = [, time ij ] β,i β,i = X i β i 55 Heagerty, 6

85 Levels of Analysis We can equivalently separate the subject-specific regression coefficients into the average coefficient and the specific departure for subject i: β,i = β + b,i β,i = β + b,i This allows another perspective: Y ij = β,i + β,i t ij + e ij = (β + β t ij ) + (b,i + b,i t ij ) + e ij E[Y i X i, β i ] = X i β } {{ } + X i b } {{ } i mean model between-subject 56 Heagerty, 6

86 56- Heagerty, 6 Sample of Lines FEV Age (years)

87 56- Heagerty, 6 Intercepts and Slopes Slope Deviation (b) Intercept Deviation (b)

88 Levels of Analysis Next we consider the distribution of patterns (parameters) among subjects: equivalently β i N (β, D) b i N (, D) Y i = X i β } {{ } + X i b } {{ } i + e }{{} i mean model between-subject within-subject 57 Heagerty, 6

89 57- Heagerty, 6 Fixed intercept, Fixed slope Random intercept, Fixed slope beta + beta * time beta + beta * time time Fixed intercept, Random slope time Random intercept, Random slope beta + beta * time beta + beta * time time time

90 Between-subject Variation We can use the idea of random effects to allow different types of between-subject heterogeneity: The magnitude of heterogeneity is characterized by D: b i = var(b i ) = b,i b,i D D D D 58 Heagerty, 6

91 Between-subject Variation The components of D can be interpreted as: D the typical subject-to-subject deviation in the overall level of the response D the typical subject-to-subject deviation in the change (time slope) of the response D the covariance between individual intercepts and slopes If positive then subjects with high levels also have high rates of change If negative then subjects with high levels have low rates of change 59 Heagerty, 6

92 59- Heagerty, 6 Fixed intercept, Fixed slope Random intercept, Fixed slope beta + beta * time beta + beta * time time Fixed intercept, Random slope time Random intercept, Random slope beta + beta * time beta + beta * time time time

93 Between-subject Variation: Examples No random effects: Y ij = β + β t ij + e ij = [, time ij ]β + e ij Random intercepts: Y ij = (β + β t ij ) + b,i + e ij = [, time ij ]β + [ ]b,i + e ij Random intercepts and slopes: Y ij = (β + β t ij ) + b,i + b,i t ij + e ij = [, time ij ]β + [, time ij ]b i + e ij 6 Heagerty, 6

94 6- Heagerty, 6 Mixed Models and Covariances/Correlation Q: What is the correlation between outcomes Y ij and Y ik under these random effects models? Random Intercept Model Y ij = β + β t ij + b,i + e ij Y ik = β + β t ik + b,i + e ik var(y ij ) = var(b,i ) + var(e ij ) = D + σ cov(y ij, Y ik ) = cov(b,i + e ij, b,i + e ik ) = D

95 6- Heagerty, 6 Mixed Models and Covariances/Correlation Random Intercept Model corr(y ij, Y ik ) = D D + σ D + σ = D D + σ = between var between var + within var Therefore, any two outcomes have the same correlation Doesn t depend on the specific times, nor on the distance between the measurements Exchangeable correlation model Assuming: var(e ij ) = σ, and cov(e ij, e ik ) =

96 6-3 Heagerty, 6 Mixed Models and Covariances/Correlation Random Intercept and Slope Model Y ij = (β + β t ij ) + (b,i + b,i t ij ) + e ij Y ik = (β + β t ik ) + (b,i + b,i t ik ) + e ik var(y ij ) = var(b,i + b,i t ij ) + var(e ij ) = D + D t ij + D t ij + σ cov(y ij, Y ik ) = cov[(b,i + b,i t ij + e ij ), (b,i + b,i t ik + e ik )] = D + D (t ij + t ik ) + D t ij t ik

97 6-4 Heagerty, 6 Mixed Models and Covariances/Correlation Random Intercept and Slope Model ρ ijk = corr(y ij, Y ik ) = D + D (t ij + t ik ) + D t ij t ik D + D t ij + D t ij + σ D + D t ik + D t ik + σ Therefore, two outcomes may not have the same correlation Correlation depends on the specific times for the observations, and does not have a simple form Assuming: var(e ij ) = σ, and cov(e ij, e ik ) =

98 Linear Mixed Model Regression model: mean response as a function of covariates systematic variation Random effects: variation from subject-to-subject in trajectory random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 6 Heagerty, 6

99 Covariance Models Serial Models Linear mixed models assume that each subject follows his/her own line In some situations the dependence is more local meaning that observations close in time are more similar than those far apart in time 6 Heagerty, 6

100 Covariance Models Define e ij = ρ e ij + ɛ ij ɛ ij N (, σ ( ρ ) ) ɛ i N (, σ ) This leads to autocorrelated errors: cov(e ij, e ik ) = σ ρ j k 63 Heagerty, 6

101 ID=, rho=5 ID=, rho=5 ID=3, rho=5 ID=4, rho=5 y y y y months months months months ID=5, rho=5 ID=6, rho=5 ID=7, rho=5 ID=8, rho=5 y y y y months months months months ID=9, rho=5 ID=, rho=5 ID=, rho=5 ID=, rho=5 y y y y months months months months ID=3, rho=5 ID=4, rho=5 ID=5, rho=5 ID=6, rho=5 y y y y months months months months ID=7, rho=5 ID=8, rho=5 ID=9, rho=5 ID=, rho=5 y y y y months months months months 63- Heagerty, 6

102 ID=, rho=9 ID=, rho=9 ID=3, rho=9 ID=4, rho=9 y y y y months months months months ID=5, rho=9 ID=6, rho=9 ID=7, rho=9 ID=8, rho=9 y y y y months months months months ID=9, rho=9 ID=, rho=9 ID=, rho=9 ID=, rho=9 y y y y months months months months ID=3, rho=9 ID=4, rho=9 ID=5, rho=9 ID=6, rho=9 y y y y months months months months ID=7, rho=9 ID=8, rho=9 ID=9, rho=9 ID=, rho=9 y y y y months months months months 63- Heagerty, 6

103 63-3 Heagerty, 6 Two Subjects Response line for subject average line line for subject subject dropout Months

104 Linear Mixed Model Regression model: mean response as a function of covariates systematic variation Random effects: variation from subject-to-subject in trajectory random between-subject variation Within-subject variation: variation of individual observations over time random within-subject variation 64 Heagerty, 6

Chapter 1. Longitudinal Data Analysis. 1.1 Introduction

Chapter 1. Longitudinal Data Analysis. 1.1 Introduction Chapter 1 Longitudinal Data Analysis 1.1 Introduction One of the most common medical research designs is a pre-post study in which a single baseline health status measurement is obtained, an intervention

More information

Introduction to mixed model and missing data issues in longitudinal studies

Introduction to mixed model and missing data issues in longitudinal studies Introduction to mixed model and missing data issues in longitudinal studies Hélène Jacqmin-Gadda INSERM, U897, Bordeaux, France Inserm workshop, St Raphael Outline of the talk I Introduction Mixed models

More information

Analysis of Longitudinal Data for Inference and Prediction. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Analysis of Longitudinal Data for Inference and Prediction. Patrick J. Heagerty PhD Department of Biostatistics University of Washington Analysis of Longitudinal Data for Inference and Prediction Patrick J. Heagerty PhD Department of Biostatistics University of Washington 1 ISCB 2010 Outline of Short Course Inference How to handle stochastic

More information

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear

More information

Longitudinal Data Analysis

Longitudinal Data Analysis Longitudinal Data Analysis Acknowledge: Professor Garrett Fitzmaurice INSTRUCTOR: Rino Bellocco Department of Statistics & Quantitative Methods University of Milano-Bicocca Department of Medical Epidemiology

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD Tips for surviving the analysis of survival data Philip Twumasi-Ankrah, PhD Big picture In medical research and many other areas of research, we often confront continuous, ordinal or dichotomous outcomes

More information

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY Ramon Alemany Montserrat Guillén Xavier Piulachs Lozada Riskcenter - IREA Universitat de Barcelona http://www.ub.edu/riskcenter

More information

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc.

An Application of the G-formula to Asbestos and Lung Cancer. Stephen R. Cole. Epidemiology, UNC Chapel Hill. Slides: www.unc. An Application of the G-formula to Asbestos and Lung Cancer Stephen R. Cole Epidemiology, UNC Chapel Hill Slides: www.unc.edu/~colesr/ 1 Acknowledgements Collaboration with David B. Richardson, Haitao

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

SPPH 501 Analysis of Longitudinal & Correlated Data September, 2012

SPPH 501 Analysis of Longitudinal & Correlated Data September, 2012 SPPH 501 Analysis of Longitudinal & Correlated Data September, 2012 TIME & PLACE: Term 1, Tuesday, 1:30-4:30 P.M. LOCATION: INSTRUCTOR: OFFICE: SPPH, Room B104 Dr. Ying MacNab SPPH, Room 134B TELEPHONE:

More information

Statistical Rules of Thumb

Statistical Rules of Thumb Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN

More information

MATH5885 LONGITUDINAL DATA ANALYSIS

MATH5885 LONGITUDINAL DATA ANALYSIS MATH5885 LONGITUDINAL DATA ANALYSIS Semester 1, 2013 CRICOS Provider No: 00098G 2013, School of Mathematics and Statistics, UNSW MATH5885 Course Outline Information about the course Course Authority: William

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

Biostatistics Short Course Introduction to Longitudinal Studies

Biostatistics Short Course Introduction to Longitudinal Studies Biostatistics Short Course Introduction to Longitudinal Studies Zhangsheng Yu Division of Biostatistics Department of Medicine Indiana University School of Medicine Zhangsheng Yu (Indiana University) Longitudinal

More information

USC Columbia, Lancaster, Salkehatchie, Sumter & Union campuses

USC Columbia, Lancaster, Salkehatchie, Sumter & Union campuses NEW COURSE PROPOSAL USC Columbia, Lancaster, Salkehatchie, Sumter & Union campuses NCP INSTRUCTIONS: This form is used to add a new course to the University course database. This form is available online

More information

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4 4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

More information

Department of Epidemiology and Public Health Miller School of Medicine University of Miami

Department of Epidemiology and Public Health Miller School of Medicine University of Miami Department of Epidemiology and Public Health Miller School of Medicine University of Miami BST 630 (3 Credit Hours) Longitudinal and Multilevel Data Wednesday-Friday 9:00 10:15PM Course Location: CRB 995

More information

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities Elizabeth Garrett-Mayer, PhD Assistant Professor Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 1

More information

A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011

A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011 A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011 1 Abstract We report back on methods used for the analysis of longitudinal data covered by the course We draw lessons for

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

PRACTICE PROBLEMS FOR BIOSTATISTICS

PRACTICE PROBLEMS FOR BIOSTATISTICS PRACTICE PROBLEMS FOR BIOSTATISTICS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION 1. The duration of time from first exposure to HIV infection to AIDS diagnosis is called the incubation period.

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Study Design and Statistical Analysis

Study Design and Statistical Analysis Study Design and Statistical Analysis Anny H Xiang, PhD Department of Preventive Medicine University of Southern California Outline Designing Clinical Research Studies Statistical Data Analysis Designing

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

School of Public Health and Health Services Department of Epidemiology and Biostatistics

School of Public Health and Health Services Department of Epidemiology and Biostatistics School of Public Health and Health Services Department of Epidemiology and Biostatistics Master of Public Health and Graduate Certificate Biostatistics 0-04 Note: All curriculum revisions will be updated

More information

Time series analysis as a framework for the characterization of waterborne disease outbreaks

Time series analysis as a framework for the characterization of waterborne disease outbreaks Interdisciplinary Perspectives on Drinking Water Risk Assessment and Management (Proceedings of the Santiago (Chile) Symposium, September 1998). IAHS Publ. no. 260, 2000. 127 Time series analysis as a

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

How to choose an analysis to handle missing data in longitudinal observational studies

How to choose an analysis to handle missing data in longitudinal observational studies How to choose an analysis to handle missing data in longitudinal observational studies ICH, 25 th February 2015 Ian White MRC Biostatistics Unit, Cambridge, UK Plan Why are missing data a problem? Methods:

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

The Latent Variable Growth Model In Practice. Individual Development Over Time

The Latent Variable Growth Model In Practice. Individual Development Over Time The Latent Variable Growth Model In Practice 37 Individual Development Over Time y i = 1 i = 2 i = 3 t = 1 t = 2 t = 3 t = 4 ε 1 ε 2 ε 3 ε 4 y 1 y 2 y 3 y 4 x η 0 η 1 (1) y ti = η 0i + η 1i x t + ε ti

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

1 Simple Linear Regression I Least Squares Estimation

1 Simple Linear Regression I Least Squares Estimation Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,

More information

The 3-Level HLM Model

The 3-Level HLM Model James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 2 Basic Characteristics of the 3-level Model Level-1 Model Level-2 Model Level-3 Model

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

Biostatistics: Types of Data Analysis

Biostatistics: Types of Data Analysis Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study. Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study Prepared by: Centers for Disease Control and Prevention National

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.

The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College. The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables Kathleen M. Lang* Boston College and Peter Gottschalk Boston College Abstract We derive the efficiency loss

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL

More information

Selection bias in secondary analysis of electronic health record data. Sebastien Haneuse, PhD

Selection bias in secondary analysis of electronic health record data. Sebastien Haneuse, PhD Selection bias in secondary analysis of electronic health record data Sebastien Haneuse, PhD Department of Biostatistics Harvard School of Public Health 1 Symposium on Health Care Data Analytics, Seattle,

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

More information

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Illustration (and the use of HLM)

Illustration (and the use of HLM) Illustration (and the use of HLM) Chapter 4 1 Measurement Incorporated HLM Workshop The Illustration Data Now we cover the example. In doing so we does the use of the software HLM. In addition, we will

More information

Longitudinal Modeling of Lung Function in Respiratory Drug Development

Longitudinal Modeling of Lung Function in Respiratory Drug Development Longitudinal Modeling of Lung Function in Respiratory Drug Development Fredrik Öhrn, PhD Senior Clinical Pharmacometrician Quantitative Clinical Pharmacology AstraZeneca R&D Mölndal, Sweden Outline A brief

More information

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses

Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses Using Repeated Measures Techniques To Analyze Cluster-correlated Survey Responses G. Gordon Brown, Celia R. Eicheldinger, and James R. Chromy RTI International, Research Triangle Park, NC 27709 Abstract

More information

Assignments Analysis of Longitudinal data: a multilevel approach

Assignments Analysis of Longitudinal data: a multilevel approach Assignments Analysis of Longitudinal data: a multilevel approach Frans E.S. Tan Department of Methodology and Statistics University of Maastricht The Netherlands Maastricht, Jan 2007 Correspondence: Frans

More information

Introduction to Structural Equation Modeling (SEM) Day 4: November 29, 2012

Introduction to Structural Equation Modeling (SEM) Day 4: November 29, 2012 Introduction to Structural Equation Modeling (SEM) Day 4: November 29, 202 ROB CRIBBIE QUANTITATIVE METHODS PROGRAM DEPARTMENT OF PSYCHOLOGY COORDINATOR - STATISTICAL CONSULTING SERVICE COURSE MATERIALS

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Econometric Methods for Panel Data

Econometric Methods for Panel Data Based on the books by Baltagi: Econometric Analysis of Panel Data and by Hsiao: Analysis of Panel Data Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies

More information

Multilevel Models for Longitudinal Data. Fiona Steele

Multilevel Models for Longitudinal Data. Fiona Steele Multilevel Models for Longitudinal Data Fiona Steele Aims of Talk Overview of the application of multilevel (random effects) models in longitudinal research, with examples from social research Particular

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

Descriptive Statistics and Exploratory Data Analysis

Descriptive Statistics and Exploratory Data Analysis Descriptive Statistics and Exploratory Data Analysis Dean s s Faculty and Resident Development Series UT College of Medicine Chattanooga Probasco Auditorium at Erlanger January 14, 2008 Marc Loizeaux,

More information

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest Analyzing Intervention Effects: Multilevel & Other Approaches Joop Hox Methodology & Statistics, Utrecht Simplest Intervention Design R X Y E Random assignment Experimental + Control group Analysis: t

More information

10. Analysis of Longitudinal Studies Repeat-measures analysis

10. Analysis of Longitudinal Studies Repeat-measures analysis Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

More information

When to Use a Particular Statistical Test

When to Use a Particular Statistical Test When to Use a Particular Statistical Test Central Tendency Univariate Descriptive Mode the most commonly occurring value 6 people with ages 21, 22, 21, 23, 19, 21 - mode = 21 Median the center value the

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Introduction to Data Analysis in Hierarchical Linear Models

Introduction to Data Analysis in Hierarchical Linear Models Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM

More information

Chapter 11 Introduction to Survey Sampling and Analysis Procedures

Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter 11 Introduction to Survey Sampling and Analysis Procedures Chapter Table of Contents OVERVIEW...149 SurveySampling...150 SurveyDataAnalysis...151 DESIGN INFORMATION FOR SURVEY PROCEDURES...152

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

FIXED AND MIXED-EFFECTS MODELS FOR MULTI-WATERSHED EXPERIMENTS

FIXED AND MIXED-EFFECTS MODELS FOR MULTI-WATERSHED EXPERIMENTS FIXED AND MIXED-EFFECTS MODELS FOR MULTI-WATERSHED EXPERIMENTS Jack Lewis, Mathematical Statistician, Pacific Southwest Research Station, 7 Bayview Drive, Arcata, CA, email: jlewis@fs.fed.us Abstract:

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Schools Value-added Information System Technical Manual

Schools Value-added Information System Technical Manual Schools Value-added Information System Technical Manual Quality Assurance & School-based Support Division Education Bureau 2015 Contents Unit 1 Overview... 1 Unit 2 The Concept of VA... 2 Unit 3 Control

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015 Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

More information

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

More information

Highlights the connections between different class of widely used models in psychological and biomedical studies. Multiple Regression

Highlights the connections between different class of widely used models in psychological and biomedical studies. Multiple Regression GLMM tutor Outline 1 Highlights the connections between different class of widely used models in psychological and biomedical studies. ANOVA Multiple Regression LM Logistic Regression GLM Correlated data

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

HLM software has been one of the leading statistical packages for hierarchical

HLM software has been one of the leading statistical packages for hierarchical Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

More information