Linear Mixed Models PGRM 15. Outline. Linear regression. Correlated measurements (eg repeated)

Transcription

1 Lear ixed odels PGR 15 Outle Lear regression Correlated measurements (eg repeated) Random effects leadg to different components of variance & correlated measurements Different Correlation Structures Simple Analysis of Clustered Data Split Plot Analysis Repeated easures Analysis 1

2 Relationship between variables Yield and fertilizer level Temperature and bacterial growth Light and respiration rate Age and alcohol consumption Can the relationship be represented by a simple curve? Is there necessarily a causal relationship? Simple lear regression 8 6 Y X 2

3 Simple lear regression model Data: (x i, y i ) Probabilistic model: yi = 0 1 i β + β x + ε Random term with variance σ 2 Population Characteristic β 0 β 1 Sample Statistic $ β 0 $ β 1 To predict from the model: yˆ i =β ˆ ˆ 0+ β1x i σ 2 s 2 Simple lear regression Estimated equation of the le: ^ y = x 8 6 Y 4 2 Intercept = β 0 = 2 ^ Slope = β 1 = 0.75 ^ X 3

4 Simple lear regression Estimated equation of the le: ^ y = x 8 6 Y 4 2 Variance = s 2 = X Assumptions for a Simple Lear Regression model Note: If you are fittg a simple lear regression model to your own data, there are assumptions that must be satisfied. d details of how to test the assumptions for your fitted model any basic statistics text book. 4

5 Regression example Distance from fire ire damage station x (miles) y ($1000) Damage caused by fire ($1000) vs Distance from fire station (miles) 50 damage ($1000) Distance from fire station (miles) 5

6 Damage caused by fire ($1000) vs Distance from fire station (miles) damage ($1000) y ^ = x Distance from fire station (miles) Damage caused by fire ($1000) vs Distance from fire station (miles) damage ($1000) Distance from fire station (miles) 6

7 Correlated easurements Two values are correlated if knowledge of one provides formation about the other eg if knowg one value is high (relative to the appropriate mean) means the other is also likely to be high (+ve correlation), or low (-ve correlation) Correlation can be visualized by trends plots Independent values are uncorrelated Positive correlation often arises between values resultg from shared fluences When modellg, shared fluences must be taken to account (eg by cludg random effects, or specifyg a correlation structure) Pearson s correlation coefficient ρ (rho) easures the strength and direction (+ve or ve) of lear relationship between two variables measured on the same experimental unit Does not depend on the units of measurement Lies between 1 and +1 In fittg a regression le y = a + bx the estimate of ρ, r, gives: 100r 2 = % of variation y explaed by the lear regression Hence r = 0.5 dicates a negative relationship that explas 25% of the variation y. r = 0.8 explas approximately 2/3 of the variation. 7

8 Interpretg r A high value of r corresponds to pots closely scattered around a le Examples of correlated responses Clustered data easurements on piglets from several litters responses from piglets with a litter are related, sharg parental and environmental fluences Experiment at several sites responses with a site are related Split-plot designs (with ma and sub-plots see later) Responses from the same ma plot will be related, sharg the ma plot characteristics Repeated easures from experimental units responses from the same unit are related, sharg the fluential characteristics of the unit if repeated over time, responses from the same unit close time may be more closely related than responses further apart 8

9 Identifyg Correlation & Different Sources of Variation Components of variation & correlation So far background noise has been accounted for by a sgle source of variation the variation between experimental units sharg the same values of explanatory variables. With clustered data, some of the variation will be due to differences between clusters (eg litters, sites). Individuals with a litter are related genetically. Plots with a site are related as they share common environmental elements. Clusters may form a source of random variation (any particular litter could have been allocated to a different treatment group but all piglets the litter would then have the same treatment and would not be dependent. Variation between sites may contribute to the variation of the estimate of a treatment effect based on a number of sites. The treatment effect may also vary over sites. Observations with the same value of a random effect will be positively correlated 9

10 ore general correlation structures Introducg a random effect for clusters of measurements assumes a very restricted structure of correlation ore generally we identify clusters whose members are correlated specify the correlation structure Example of clustered data analysis 27 sites with 30 plots per site = 810 plots our species 2 grasses (G) & 2 legumes (L) Simplex design at two densities 10

11 Location of Sites 16 id European (E) 4 Northern European (NE) 3 oist editerranean () 2 Dry editerranean (D) 2 Other A typical E site Species - G 1 Lolium perenne - G 2 Dactylis glomerata - L 1 Trifolium pratense - L 2 Trifolium repens irst complete year s data was used. 11

12 Diversity effect is positive and significant at most sites Country Type Av mono (t ha -1 ) δˆ (t ha -1 ) Germany E Ireland E Lithuania E Lithuania E Lithuania E Netherlands E Norway E Norway E Norway E Poland E Spa E Sweden E Sweden E Switzerland E Wales E Wales E Average Site fixed vs site random Site ixed Proc glm data=yield; Class site monomix density; odel tot_yld=site density monomix ; Lsmeans monomix/stderr; Run; Diff 2.39 SE LSmeans ean SE ono ix Site Random Proc mixed data=yield; Class site monomix density; odel tot_yld= density monomix; Lsmeans monomix/stderr; Random site Run; Diff 2.39 SE LSmeans ean SE ono ix

13 onomix tested agast random site*monomix Estimate of diversity effect is the same (2.39) SE of diversity effect is now greater (0.254) Loss of precision is compensated for by a wider range of ference about the diversity effect Any new site predict a diversity effect of 2.39 but use the se = for settg bounds for the prediction. Unbalanced mixed model analysis The example data was balanced each treatment (combation of a level of V with a level of N) appeared the same number of times (once!) each block or unbalanced data every comparison has a different SE. It s important to use the ddfm = kenwardroger option on the model statement to drop teraction terms if non-significant See PGR

14 Cost data revisited meta analysis Consistency of treatment effects across multiple sites (meta analysis) If an experiment is repeated across a number of sites the question arises as to how well the average treatment effect estimated across all sites represents the difference between treatments. Is there variation the onomix effect across sites? If sites are regarded as random (drawn at random from some population of sites), how well does the average onomix effect over sites represent the typical monomix effect at a site drawn from the same population.. To use the average monomix effect this way its should conta variation due to how the effect varies from site to site (i.e. formation on the site x monomix teraction. This is achieved by cludg the SAS program the le random site site*monomix; Diversity effect is positive and significant at most sites eta analysis Country Type Av mono (t ha -1 ) δˆ (t ha -1 ) Germany E Ireland E Lithuania E Lithuania E Lithuania E Netherlands E Norway E Norway E Norway E Poland E Spa E Sweden E Sweden E Switzerland E Wales E Wales E Average

15 SAS programs Site random proc mixed data=yield; class site monomix density; model tot_yld = density monomix /ddfm=kenwardroger ; run; random site; lsmeans monomix; estimate 'diversity effect' monomix -1 1; SAS programs Site random proc mixed data=yield; class site monomix density; model tot_yld = density monomix /ddfm=kenwardroger ; run; random site; random site*monomix; lsmeans monomix; estimate 'diversity effect' monomix -1 1; 15

16 Site ixed Proc glm data=yield; Site fixed vs site random and Site*monomix cluded Class site monomix density; odel tot_yld=site density monomix site*monomix; Lsmeans monomix/stderr; Run; Diff 2.39 SE LSmeans ean SE ono ix Site Random Proc mixed data=yield; Class site monomix density; odel tot_yld= density monomix; Lsmeans monomix/stderr; Random site Random site*monomix Run; Diff 2.39 SE LSmeans ean SE ono ix Comparison of models Random effects SITE SITE SITE*ONOIX Variance Estimates Site Site * onomix 0.70 Residual log Likelihood ono SE ixture SE Diversity effect SE P value <0.001 <

17 Split Plot Experimental Design Split Plot: effect of fertiliser on 3 varieties of oats In each of 6 homogeneous blocks: Varieties allocated to ma plots at random In each sub plot varyg levels of fertiliser N (kg/ha) are allocated at random 17

18 Analysis of split plot design Interest is on effect of V (variety) and N (nitrogen level) on Y (yield) possible sources of background variation Initially regard N level as categorical odel terms would clude B (block), V, N and the teraction V*N In addition model must reflect the split plot structure: yields from the same ma plot will share similar (randomly allocated) characteristics, and consequently be correlated a plots are identified by specifyg B and V, ie B*V. SAS/IXED procedure caters for the random effects of B*V. SAS/IXED code proc mixed data = oats; class b v n; model y = b n v n*v / htype = 1; random b*v; estimate 'Variety 1v2' v 1-1 0; estimate 'N 1v2' n ; run; b*v each level of this corresponds to a ma plot which contributes its characteristics to the response 18

19 SAS/IXED OUTPUT Type 1 Tests of ixed Effects Effect Num D Den D Value Pr > b n <.0001 v v*n V*N teraction is far from significant (so the model could be re-run without this term) 2. N has a strong effect 3. Varieties do not appear to differ SAS/IXED OUTPUT Covariance Parameter Estimates Cov Parm Estimate b*v Residual There are 2 sources of background variation: 1. with ma plots (component of variance: ) 2. between ma plots (component of variance: ) Different comparisons will use appropriate combations of these 19

20 LSmeans & SEDs Code lsmeans v n v*n / diff; a effect means (ie for v (SED = 7.079), n (SED = 4.436) and SEDs are appropriate if v*n is non-significant Comparison of the 12 v*n means volves 2 different SEDs When comparg 2 levels of N with a particular variety (SED = 7.683) When makg comparisons where the variety changes (SED = 9.715) lsmeans / diff; OUTPUT fferences of Least Squares eans Effect Variety Nitrogen Variety Nitrogen Estimate Standard Error D t Value Pr > t v Golden Ra arvellous v Golden Ra Victory v arvellous Victory n <.0001 n <.0001 n <.0001 n n <.0001 n v*n Golden Ra 0 Golden Ra v*n Golden Ra 0 Golden Ra <.0001 v*n Golden Ra 0 Golden Ra <.0001 v*n v*n Golden Ra Golden Ra 0 0 arvellous arvellous

21 Is there a simple description of the N effect? Usg SAS/EANS procedure calculate the mean yield for each treatment (V, N combation) Plot ean Yield v Nitrogen Level for each variety it a le for each variety. Split plot design 21

22 SAS/IXED: Split Plot Analysis N seems to have equal effect on yield for the 3 varieties Plot suggests effect of N is lear Add variable nval = n to data set oats it & test for separate slopes and LO: proc mixed data = oats; class b v n; model y = b nval n v nval*v n*v / htype = 1; random b*v; run; n n*v terms measure LO nval*v checks equal slopes SAS OUTPUT Type 1 Tests of ixed Effects Num Den Effect D D Value Pr > b nval <.0001 n v nval*v v*n LO terms not significant (p =.27,.93) 2. Lear effect of N same for the 3 varieties (p =.63) 3. Lear effect of N highly significant (p <.0001) 4. Refit model without LO terms and teraction nval*v to get coefficient of nval: 0.59 (SE: ) 22

23 Practical exercise SAS/IXED for Split Plot Analysis Lab Session 6 Exercise 6.2 (a) (e) Simple Analyses of Correlated Data 23

24 A simple analysis combe correlated values Seedlgs are planted 20 each of 10 contaers and the dry mass of each plant measured after 60 days. Responses from plants the same contaer will be correlated. Use as response variable the total dry mass of all 20 plants the same contaer. This gives 10 dependent response values Responses from piglets the same litter could be combed, though adjustments would have to be made for varyg litter size. Averages from large litters could give more precise formation than from small, but may also be lower. A simple analysis combe correlated values easurg plant height every 14 days for 10 weeks gives 6 measurements for each plant (after 0, 2, 4, 6, 8, and 10 weeks) If daily growth is of terest a sgle value from each plant could be calculated by (fal height itial height)/140 the slope of the fitted le to the data from the plant 24

25 Repeated easures Analysis utiple measures on the same units at the same time Repeated measurements on the same units over time Example The weights of 27 male and female children were measured at ages 8, 10, 12 and 14 years. Child Gender To keep SAS happy replace 8, 10, etc with w_age8, w_age10, etc Child Gender

26 Issues Repeated easures Analysis 1. What questions are to be addressed by the analysis? Can these be answered by reducg the measurements to a sgle value? eg fal weight weight ga over a particular period (from 8 to 10 or 10 to 14 yr) the slope of a straight le Issues Repeated easures Analysis 2. Are the measurements taken at equally spaced time pots? When times are not equally spaced, and even more so when patterns of timg differ from unit to unit, the analysis is more difficult 26

27 Issues Repeated easures Analysis 3. What is a reasonable error/correlation structure for the data? Are measurements equally related, irrespective of whether they are close together time or otherwise? Does the correlation decle as time pots are farther apart? Can this decle be described simply? Child Weights Example simple analysis usg 2 possible sgle values data c; set childweights; wga1 = y4 - y1; run; proc glm data = c; class gender; model wga1 = gender; lsmeans gender; estimate 'diff -' gender -1 1; quit; 27

28 SAS OUTPUT Ga 8 10 yr Parameter Estimate Standard Error t Value Pr > t diff Ga yr Parameter Estimate Standard Error t Value Pr > t diff Differences between genders depend on age ie there is a gender year teraction The complete data set needs to be analysed Gettg the picture! Are the slopes equal? 28

29 Repeated easures Analysis of Weight Restructure data for SAS Obs Child Gender weight age Centre age usg cage = age - 11 This means that gender comparisons will be made at age 11 yr. Repeated easures Analysis of Weight Observations from the same child will be correlated A simple way to clude this is to troduce a random effect for the child However: this assumes that any two observations on the same child are equally correlated, dependent of whether far apart or not (COPOUND SYETRY) proc mixed data = cw; class gender cage child; model weight = gender cage cage*gender / ddfm = kenwardroger htype = 1; random child; run; Note: it s best to use model option: ddfm = kenwardroger 29

30 IXED selected OUTPUT Least Squares eans Effect Gender Estimate Standard Error D t Value Pr > t Gender <.0001 Gender <.0001 Effect Gender cage Type 1 Tests of ixed Effects Num D Den D Value Pr > <.0001 Covariance Parameter Estimates Cov Parm Child Residual Estimate cage*gender The common correlation between observations on the same subject (child) is /( ) = 0.63 IXED selected OUTPUT Gender Difference Differences of Least Squares eans Effect Gender Gender Estimate Standard Error D t Value Pr > t Gender LSmeans and difference refer to cage = 0 30

31 IXED selected OUTPUT Age Effect Solution for ixed Effects Effect Gender Estimate Standard Error D t Value Pr > t Intercept <.0001 Gender Gender cage <.0001 cage*gender cage*gender Slopes : = : These differ significantly (p = ) IXED selected OUTPUT rcorr option gives correlation between the 4 measurements on each subject (child) Estimated R Correlation atrix for Child 1 Row Col1 Col2 Col3 Col Note the CS structure and value of the correlation 31

32 Was compound symmetry a good choice for Child Weights example? What are the alternatives? Alternatives to Compound Symmetry Alternatively we can specify that measurements on the same subject are correlated with compound symmetry or another structure as follows: IXED syntax: replace random child; with Choice of correlation structure: cs ar(1) unstructured repeated / type = cs subject = child rcorr; With balanced data these are equivalent, but repeated permits more complex structures: AR(1) correlation gets less for measurement father apart beg ρ, ρ 2, ρ 3, for measurements 1, 2, 3, time units apart (eg.5,.25,.125, ) UNSTRUCTURED no particular structure assumed 32

33 Gettg the correlation structure correct Unstructured correlation Estimated R Correlation atrix for Child 1 Row Col1 Col2 Col3 Col Not very different from CS 2. Other results similar 3. Correlation does not decrease with separation don t use AR(1) correlation Gettg the correlation structure correct Tests for correlation structure Chi-squared tests can compare CS and AR(1) with unstructured (UN) (they are cluded UN) If both CS and AR(1) are not rejected when compared to UN there isn t a formal test; be guided by appropriateness and the prciple of maximisg likelihood or the formal test we need to note the number of estimated parameters the structures beg compared the difference is the df for the relevant chisquared test 33

34 Chi-squared test Need to re-run IXED with different structures. The default method used by IXED is REL which is best for determg random effects IXED OUTPUT (usg REL) cludes a table of it cludg -2 Res Log Likelihood The number of covariance parameters is OUTPUT the Dimensions Testg whether a simplified covariance structure will do the chi-squared df is the difference covariance parameters where p = 1 - cdf( chisq, d, df); d = change (-2 Res Log Likelihood) Child Weights Example Structure df -2 Res Log Likelihood Change from UN p UN CS (8 df) 0.32 AR(1) (8 df) 0.01 data _null_; p = 1-cdf( chisq, 9.3, 8); put p = ; run; Result - SAS LOG: p= CS ok AR(1) rejected 34

35 ore correlation structures See SAS Help! or example: suppose some subjects are more variable than others, but there is still a common correlation between different measurements on the same subject. Covariance Parameter Estimates Cov Parm Subject Estimate Var(1) Child Var(2) Child Var(3) Child Use Heterogeneous CS: Var(4) CSH Child Child type = csh Structure CSH CS df Res Log Likelihood Change from CSH 1.8 (3 df) p 0.61 CS not rejected when compared to CSH SAS/IXED code The Idea Summarised Values that are correlated must be identified - each set of correlated values beg a level of some factor Introduce a variable identifyg correlated values Include it the class statement Use a repeated statement & specify the correlation structure Alternatively: use a random statement if the correlation is simply due to units sharg a common fluences 35

36 Practical exercise SAS/IXED for Repeated easures Analysis Lab Session 6 Exercise 6.3 (a) (d) Random coefficients model Previous odel assumes a sgle regression le, with subjects (children) deviatg from this each year with correlated deviations. ore generally we could allow each subject to have its own regression le; this has the advantage beg possible when subjects are measured at different times Sce subjects are randomly allocated to treatments, the dividual tercepts and slopes are random variables We fit a fixed le (tercept and slope beg the estimated mean for all subjects) Each subject has its own le resultg from addg/subtractg from the mean tercept and slope. 36

37 Random coefficients model 6 (of 11) females 6 (of 16) males Plots show a le fit to each child s data (for the first 6 males and 6females) SAS/IXED: random coefficients model SAS/IXED code proc mixed data = cw; class gender child; model weight = gender cage gender*cage / ddfm = kenwardroger htype = 1 solution; random tercept cage / type = un subject = child gcorr; run; 37

38 IXED: selected OUTPUT Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) Child UN(2,1) Child UN(2,2) Child Residual Estimated G Correlation atrix Row Effect Child Col1 Col2 1 Intercept age Type 1 Tests of ixed Effects Effect Num D Den D Value Pr > Gender cage <.0001 cage*gender Correlation between tercept and slope: Variances: tercept 5.79, slope 0.033, residual Conclusions: the pattern of growth differs between males and females (p = ), as with earlier analysis. Average slopes unchanged. The End but hopefully just the begng! 38