Introduction to PROC MIXED

Size: px
Start display at page:

Download "Introduction to PROC MIXED"

Transcription

1 Page 1 of 26 Introduction to PROC MIXED Table of Contents 1. Short description of methods of estimation used in PROC MIXED 2. Description of the syntax of PROC MIXED 3. References 4. Examples and comparisons of results from MIXED and GLM - balanced data: fixed effect model and mixed effect model, - unbalanced data, mixed effect model 1. Short description of methods of estimation used in PROC MIXED. The SAS procedures GLM and MIXED can be used to fit linear models. Proc GLM was designed to fit fixed effect models and later amended to fit some random effect models by including RANDOM statement with TEST option. The REPEATED statement in PROC GLM allows to estimate and test repeated measures models with an arbitrary correlation structure for repeated observations. The PROC MIXED was specifically designed to fit mixed effect models. It can model random and mixed effect data, repeated measures, spacial data, data with heterogeneous variances and autocorrelated observations.the MIXED procedure is more general than GLM in the sense that it gives a user more flexibility in specifying the correlation structures, particularly useful in repeated measures and random effect models. It has to be emphasized, however, that the PROC MIXED is not an extended, more general version of GLM. They are based on different statistical principles; GLM and MIXED use different estimation methods. GLM uses the ordinary least squares (OLS) estimation, that is, parameter estimates are such values of the parameters of the model that minimize the squared difference between observed and predicted values of the dependent variable. That approach leads to the familiar analysis of variance table in which the variability in the dependent variable (the total sum of squares) is divided into variabilities due to different sources (sum of squares for effects in the model). PROC MIXED does not produce an analysis of variance table, because it uses estimation methods based on different principles. PROC MIXED has three options for the method of estimation. They are: ML (Maximum Likelihood), REML (Restricted or Residual maximum likelihood, which is the default method) and MIVQUE0 (Minimum Variance Quadratic Unbiased Estimation). ML and REML are based on a maximum likelihood estimation approach. They require the assumption that the distribution of the dependent variable (error term and the random effects) is normal. ML is just the regular maximum likelihood method,that is, the parameter estimates that it produces are such values of the model parameters that maximize the likelihood function. REML method is a variant of maximum likelihood estimation; REML estimators are obtained not from maximizing the whole likelihood function, but only that part that is invariant to the fixed effects part of the linear model. In other words, if y = Xb + Zu + e, where Xb is the

2 Page 2 of 26 fixed effects part, Zu is the random effects part and e is the error term, then the REML estimates are obtained by maximizing the likelihood function of K'y, where K is a full rank matrix with columns orthogonal to the columns of the X matrix, that is, K'X = 0. It leads to REML estimator of the variancecovariance matrix of y, say V. It does not depend on the choice of matrix K. Then the generalized least squares equations, known also from the weighted least squares approach and the GLM procedure, X'(inverse of V)Xb=X'(inverse of V)y, where V is replaced with its estimator, are solved to obtain the estimates of fixed effects parameters b. It is assumed that the random effects u and the error vector e are normally distributed, uncorrelated and have expectations 0. Under the assumption that u and e are not correlated, V, the variance-covariance matrix of y, is equal to ZGZ + R, where G and R are the variance matrices of u and e, respectively. Estimators of V, the variance-covariance matrix of y, can also be obtained in PROC MIXED by the MIVQUE0 method. For a short description of the method see reference (3), p.506. This method has two advantages over ML and REML; it does not require normality assumption (for computing the estimators) as do ML and REML and does not involve iterations. However simulation studies by Swallow and Monahan (1984) present evidence favoring ML and REML over MIVQUE0. PROC MIXED uses MIVQUE0 as starting values for the ML and RELM procedures. For balanced data the REML method of PROC MIXED provides estimators and hypotheses test results that are identical to ANOVA (OLS method of GLM), provided that the ANOVA estimators of variance components are not negative. The estimators, as in GLM, are unbiased and have minimum variance properties. The ML estimators are biased in that case. In general case of unbalanced data neither the ML nor the REML estimators are unbiased and they do not have to be equal to those obtained from PROC GLM. There are many models involving forms of variance-covariance structure of observations that can not be analyzed using PROC GLM with TEST or PROC GLM with the REPEATED options. PROC MIXED can handle such cases. It also has to be mentioned that PROC GLM was design for analysis of fixed effects models and all computations are done under the assumption that there is only one variance component in the model, the error term. The RANDOM statement with the TEST option can be used to get the right tests in the case random effects are present in the model, but still some printed results, variances and standard errors, will be incorrect. 2. Description of the syntax of PROC MIXED The PROC MIXED syntax is similar to the syntax of PROC GLM. There are, however, a few important differences. The random effects and repeated statements are used differently, random effects are not listed in the model statement, GLM has MEANS and LSMEANS statements, whereas MIXED has only the LSMEANS statement, GLM offers Type I, II, III and IV tests for fixed effects, while MIXED offers TYPE I and TYPE III. The following is a general form of PROC MIXED statement: PROC MIXED options; CLASS variable-list; MODEL dependent=fixed effects/ options; RANDOM random effects / options; REPEATED repeated effects / options; CONTRAST 'label' fixed-effect values random-effect values/ options; ESTIMATE 'label' fixed-effect values random-effect values/ options;

3 Page 3 of 26 LSMEANS fixed-effects / options; MAKE 'table' OUT= SAS-data-set < options >; RUN; The CONTRAST, ESTIMATE, LSMEANS, MAKE and RANDOM statements can appear multiple times, all other statements can appear only once. The PROC MIXED and MODEL statements are required. The MODEL statement must appear after the CLASS statement if CLASS statement is used. The CONTRAST, ESTIMATE, LSMEANS, RANDOM and REPEATED statement must follow the MODEL statement. CONTRAST and ESTIMATE statements must follow RANDOM statement if the RANDOM is used. A detailed description of all functions and options of each PROC MIXED statement is given in SAS/STAT Software Changes and Enhancements through Release 6.11 and SAS/STAT Software Changes and Enhancements for Release 6.12, SAS Institute Inc. (1996). The following is a short summary of selected, most often used, MIXED procedure statements. PROC MIXED <options>; Selected options: DATA= SAS data set Names SAS data set to be used by PROC MIXED. The default is the most recently created data set. METHOD=REML METHOD=ML METHOD=MIVQUE0 Specifies the estimation method. See Section 1 for a brief description of the methods and references. REML is the default method. COVTEST Prints asymptotic standard errors and Wald Z-test for variance-covariance structure parameter estimates. For example, if a random effect A is included in the model, then the estimator of the variance of A will be printed together with the Wald test of the hypothesis that the variance of A is 0. The COVTEST option is specified after Proc mixed and before semicolon;. For example, Proc mixed data=mydata method=reml covtest; CLASS variables; Lists classification variables (categorical independent variables in the model). For example: proc mixed data=mydata covtest;

4 Page 4 of 26 Class group gender agecat; MODEL dependent = fixed effects </options>; The model statement names a single dependent variable and the fixed effects, that is independent variables that are not random. An intercept is included in the model by default. The NOINT option can be used to remove the intercept. NOTE: Even though PROC MIXED allows only for one dependent variable in the model statement, it is possible to use it to model, for example, multivariate repeated measures. In such case, the data set has to be properly prepared and should contain a variable indicating the measurement type. The correlation between observations on the same unit has to be modeled properly with the REPEATED statement. For example, suppose your observed data consist of heights and weights of children measured over several successive years. Your input data set should then contain variables similar to the following: Y, all of the heights and weights, with a separate observation (line in the data file) for each VAR, indicating whether the measurement is a height or a weight YEAR, indicating the year of measurement CHILD, indicating the child on which the measurement was taken. Selected Options of the model statement: CHISQ, request χ2 tests (Wald tests) be performed for all fixed effects in addition to the F-tests. DDFM=RESIDUAL DDFM=CONTAIN DDFM=BETWITHN DDFM=SATTERTH, The DDFM= options specifies the method for computing the denominator degrees of freedom for the tests of fixed effects. DDFM=SATTERTH will result in the Satterthwaite approximation for the denominator degrees of freedom. For balanced designs with random effects it will produce the same test results as RANDOM / TEST option in PROC GLM (if the default METHOD=REML is used in proc mixed). P, requests that the predicted values be printed. RANDOM random effects </options>; The RANDOM statement defines the random effects in the model. It can be used to specify traditional variance components (independent random effects with different variances) or to list correlated random effects and specify a correlation structure for them with the TYPE=covariance-structure option. A variety of structures are available (see references 5 and 6), most often used are either TYPE=VC, a variance components correlation structure or TYPE=UN, an unstructured, that is, arbitrary covariance matrix. TYPE=VC is the default structure. In the following example, the effect of subject is random.

5 Page 5 of 26 Proc mixed data=one method=reml covtest; Class gender treat subject; Model y=gender treat gender*treat /ddfm=satterth; Random subject(gender); In the next example there are two random effects specified (besides the error term) and it is assumed that they are correlated. Intercept and the slope coefficient in the regression equation have fixed and random parts which are assumed to be correlated. The model is: yij = a0 +aj + b0*time + bj*time + eij, where yij is observation i for person j. The random effects, aj, bj and eij, are asumed to have normal distributions with mean zero and different variances and it is also assumed that aj and bj are correlated. Proc mixed data=one method=reml covtest; Class person; Model y=time /solution; Random intercept time /type=un subject=person; REPEATED repeated effects / options; The repeated statement is used in PROC MIXED to specify the covariance structure of the error term. The repeated effect has to be categorical and has to appear in the class statement and the data has to be sorted accordingly. For example, suppose that for each subject a measurement was taken at five equally spaced time points. The time is the repeated effect and the data has to be sorted by subject and time within each subject. If time is also used as a continuous independent variable in the model then a new variable, say t, identical to time has to be defined and t should be used in the class and repeated statements. For example: Data one; Set one; T=time; Proc sort data=one; By group id t; Proc mixed data=one covtest; Class t group id; Model y=group time group*time; Repeated t /type=ar(1) subject=id; The option TYPE in the REPEATED statement specifies the type of the error correlation structure. The one specified in the above example is the first-order autoregressive correlation. The subject option is needed to identify observations that are correlated. Observations within the same subject are correlated with the type of correlation specified in TYPE, observations from different subjects are independent.

6 Page 6 of 26 The TYPE option allows for many types of correlation structures. Most commonly used are autocorrelation, compound symmetry, Huynh-Feldt, Toeplitz, variance components, unstructured and spatial. For the complete list and examples, see references (7) and (8). CONTRAST label fixed-effect values random-effect values / options; ESTIMATE label fixed-effect values random-effect values / options; The CONTRAST statement is used when there is need for custom hypothesis tests, the ESTIMATE statement, when there is need for custom estimates. Although they were extended in PROC MIXED to include random effects, their use is very similar to the CONTRAST and ESTIMATE statement in PROC GLM. LABEL is required for every contrast or estimate statement. It identifies the contrast or estimated parameter on the output. It can not be longer than 20 characters. FIXED-EFFECT is the name of an effect appearing in the MODEL statement. RANDOM-EFFECT is the name of an effect appearing in the RANDOM statement. VALUES are the coefficients of the contrast to be tested or the parameter to be estimated. For example, suppose that we want to test if there is a significant effect of treat in group 2, where treat has three levels and group four levels. We also want to estimate the mean for treat 1 in group 2, the mean for treat 2 in group 2 and the difference between these two means. We will need the following CONTRAST and ESTIMATE statements to obtain these results. Proc mixed data=one method=reml covtest; Class group treat subject; Model y=group treat group*treat /ddfm=satterth; Random subject(group); Contrast treat in group 2 Treat group*treat , Treat group*treat ; Estimate treat1 group2 mean intercept 1 group treat group*treat ; Estimate treat2 group2 mean intercept 1 group treat Group*treat ; Estimate mean diff t1g2-t2g2 Treat group*treat ; LSMEANS fixed-effects / options; LSMEANS computes the least squares means of fixed effects. The ADJUST option requests a multiple

7 Page 7 of 26 comparison adjustment to the p-values for pair-wise comparisons of means. The following adjustments are available: BON (Bonferroni), DUNNET, SCHEFFE, SIDAK, SIMULATE, SMM GT2 and TUKEY. The ADJUST option results in all possible pair-wise comparisons. If comparisons with a control level are only needed then in addition to ADJUST option, PDIFF=control should be used. The SLICE option allows to test the significance of one effect at each level of another effect. For example, suppose that we want to compute the least squares means for group*treat and do pair-wise comparisons with the control being group 1 and treat 1. We also want to test for the significance of the treat effect within each group level using the SLICE option.. Proc mixed data=one method=reml covtest; Class group treat subject; Model y=group treat group*treat /ddfm=satterth; Random subject(group); lsmeans group*treat /adjust=bon pdiff=control('1' '1') slice=group; MAKE 'table' OUT= SAS-data-set < options >; The MAKE statement converts any table produced by PROC MIXED into a sas data set. NOPRINT option can be used to prevent printing the requested table. Only requested or default output can be converted into a sas data set. Hence, in particular, the P option has to be used in the model statement to produce a data set with predicted values, and the LSMEANS statement has to be included to output least squares means. For example, Proc mixed data=one method=reml covtest; Class group treat subject; Model y=group treat group*treat /ddfm=satterth p; Random subject(group); lsmeans group*treat /adjust=bon pdiff=control('1' '1') slice=group; make LSMeans out=gtmeans; make predicted out=pred noprint; Proc print data=gtmeans; Proc print data=pred; References Statistics Books: 1. Searle, Shayle R. (1987). Linear Models For Unbalanced Data, John Wiley & Sons. 2. Searle, Shayle R. (1971). Linear Models, John Wiley & Sons.

8 Page 8 of Searle, S.R., Casella, G., and McCulloch, C.E. (1992), Variance Components. John Wiley&Sons. 4. Verbeke, G., Molenberghs, G. (Editors) (1997), Linear Mixed Models in Practice. A SAS-Oriented Approach. Springer-Verlag SAS Institute Books: 5. Littell, Ramon C., Milliken, George A., Stroup, Walter W., Wolfinger, Russell D. (1996). SAS System For Mixed Models, SAS Institute Inc. 6. SAS Institute Course Notes (1996). Advanced General Linear Models with an Emphasis on Mixed Models, SAS Institute Inc. 7. SAS/STAT Software Changes and Enhancements through Release 6.11, SAS Institute Inc SAS/STAT Software Changes and Enhancements for Release 6.12, SAS Institute Inc Examples and comparisons of the results from PROC MIXED and PROC GLM. Example1. Fixed effect model, balanced data. In this example, 36 subjects are randomly assigned to 12 group treatment combinations, 3 to each combination. There are three treatments and four groups. In the following program, factor treat with 3 levels is the effect of the treatment and factor group with 4 levels is the effect of the group. As you can see below, the results from both procedures are identical. Program: options ls=76; data one; input y group treat subject; cards;

9 Page 9 of ; run; Proc mixed data=one method=reml; Class group treat; Model y=group treat group*treat; lsmeans group*treat /adjust=bon pdiff=control('1' '1') slice=group; Contrast 'treat in group 2' Treat group*treat , Treat group*treat ; Estimate 'treat1 group2 mean' intercept 1 group treat group*treat ; Estimate 'treat2 group2 mean' intercept 1 group treat Group*treat ; Estimate 'mean diff t1g2-t2g2' Treat group*treat ; proc GLM data=one; class group treat; Model y=group treat group*treat; lsmeans group*treat /adjust=bon pdiff=control('1' '1') slice=group; Contrast 'treat in group 2' Treat group*treat , Treat group*treat ; Estimate 'treat1 group2 mean' intercept 1 group treat 1 0 0

10 Page 10 of 26 group*treat ; Estimate 'treat2 group2 mean' intercept 1 group treat Group*treat ; Estimate 'mean diff t1g2-t2g2' Treat group*treat ; Results: The MIXED Procedure GROUP TREAT Tests of Fixed Effects Source NDF DDF Type III F Pr > F GROUP TREAT GROUP*TREAT ESTIMATE Statement Results Parameter Estimate Std Error DF t Pr > t treat1 group2 mean treat2 group2 mean mean diff t1g2-t2g CONTRAST Statement Results Source NDF DDF F Pr > F treat in group Least Squares Means Effect GROUP TREAT LSMEAN Std Error GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT

11 Page 11 of 26 GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT Differences of Least Squares Means Effect GROUP TREAT GROUP _TREAT Difference Std Error DF GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT Differences of Least Squares Means t Pr > t Adjustment Adj P Bonferroni Bonferroni Bonferroni Bonferroni Bonferroni Bonferroni Bonferroni Bonferroni Bonferroni Bonferroni Bonferroni Tests of Effect Slices Effect GROUP NDF DDF F Pr > F GROUP*TREAT GROUP*TREAT GROUP*TREAT GROUP*TREAT

12 Page 12 of 26 General Linear Models Procedure Class Level Information GROUP TREAT General Linear Models Procedure Dependent Variable: Y Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total R-Square C.V. Root MSE Y Mean Source DF Type III SS Mean Square F Value Pr > F GROUP TREAT GROUP*TREAT General Linear Models Procedure Least Squares Means Adjustment for multiple comparisons: Bonferroni GROUP TREAT Y Pr > T H0: LSMEAN LSMEAN=CONTROL

13 Page 13 of GROUP*TREAT Effect Sliced by GROUP for Y Sum of Mean GROUP DF Squares Square F Value Pr > F Dependent Variable: Y Contrast DF Contrast SS Mean Square F Value Pr > F treat in group T for H0: Pr > T Std Error of Parameter Estimate Parameter=0 Estimate treat1 group2 mean treat2 group2 mean mean diff t1g2-t2g Example 2. Mixed effect model, balanced data. In this example, 12 subjects are randomly assigned to 4 groups, 3 to each group. There are three observations for each subject corresponding to measurements taken at time 1, 2 and 3. In the following program, factor time with 3 levels is the effect of the time and factor group with 4 levels is the effect of the group. A mixed effect model with fixed effect of group and time and random effect of subject will be used to analyze the data. It is assumed that the effect of the subject has a normal distribution with mean 0 and variance sigmas squared (it measures between subject variability). It is also assumed that the error term has a normal distribution with mean 0 and variance sigmae squared (it measures within subject error) and the error and subject effects are not correlated As you can see below, the results of MIXED and GLM are not identical. The F and p-values for the tests are the same. Values from proc mixed have to be compared with the Tests of Hypotheses for Mixed

14 Page 14 of 26 Model Analysis from proc GLM, not with the main, General Linear Model Procedure, ANOVA table. The values in the main ANOVA table in proc GLM are incorrect for this example; they are computed under the assumption that subject is a fixed effect. However, the standard error of the lsmeans and requested estimates are not the same for proc MIXED and proc GLM. The ones printed by proc MIXED are correct. Again, proc GLM computed the standard error assuming that the subject effect is fixed. Note that the standard error for the third estimate, the mean difference between time 1 and time 2 in group 2 is the same for both. This is because when you compute that difference, the effect of the subject cancels out. Also note that proc GLM results printed in the Test of Hypotheses table include the F-test for the significance of the subject effect. The test is not printed in proc Mixed. The corresponding table includes only the fixed effects. The estimates of the random effects, in this case sigmas squared (variance of the subject effect) and sigmae squared (variance of the error term) are printed in the table named Covariance Parameter Estimates. The test of significance is the Wald test. The estimates are consistent with the proc GLM results. The residual variance in proc MIXED is the same as MSS (mean sum of squares) for the error in proc GLM. The subject variance can be computed from the GLM Type III Expected Mean Square table. Type III Expected Mean Square GROUP Var(Error) + 3 Var(SUBJECT(GROUP)) + Q(GROUP,GROUP*TIME) SUBJECT(GROUP) Var(Error) + 3 Var(SUBJECT(GROUP)) TIME Var(Error) + Q(TIME,GROUP*TIME) GROUP*TIME Var(Error) + Q(GROUP*TIME) According to that table, MSS(subject)=var(error)+3*var(subject). Hence var(subject)=(mss(subject) var(error))/3. Since the expected mean of MSS(error)=var(error), we can use MSS(error) as the estimate of var(error) and replace var(error) with MSS(error) in the above formula. Thus, Var(subject)=( )/3=3.5139, which is the same as the value printed in the proc MIXED Covariance Parameter Estimates table for the subject. Program: options ls=76; data one; input y group time subject; cards;

15 Page 15 of ; run; proc sort data=one; by group subject time; run; Proc mixed data=one method=reml covtest; Class group time subject; Model y=group time group*time / DDFM=SATTERTH; RANDOM SUBJECT(group); lsmeans group*time /adjust=bon pdiff=control('1' '1') slice=group; Contrast 'time in group 2' time group*time , time group*time ; Estimate 'time1 group2 mean' intercept 1 group time group*time ; Estimate 'time2 group2 mean' intercept 1 group time Group*time ; Estimate 'mean diff t1g2-t2g2' time group*time ; proc GLM data=one; class group time subject;

16 Page 16 of 26 Model y=group subject(group) time group*time; RANDOM SUBJECT(GROUP) /TEST; lsmeans group*time /stderr; lsmeans group*time /adjust=bon pdiff=control('1' '1') slice=group; Contrast 'time in group 2' time group*time , time group*time ; Estimate 'time1 group2 mean' intercept 1 group time group*time ; Estimate 'time2 group2 mean' intercept 1 group time Group*time ; Estimate 'mean diff t1g2-t2g2' time group*time ; Results: The MIXED Procedure GROUP TIME SUBJECT Covariance Parameter Estimates (REML) Cov Parm Estimate Std Error Z Pr > Z SUBJECT(GROUP) Residual Tests of Fixed Effects Source NDF DDF Type III F Pr > F GROUP TIME GROUP*TIME ESTIMATE Statement Results Parameter Estimate Std Error DF t Pr > t time1 group2 mean time2 group2 mean mean diff t1g2-t2g CONTRAST Statement Results

17 Page 17 of 26 Source NDF DDF F Pr > F time in group Least Squares Means Effect GROUP TIME LSMEAN Std Error DF t Pr > t GROUP*TIME GROUP*TIME GROUP*TIME GROUP*TIME GROUP*TIME GROUP*TIME GROUP*TIME GROUP*TIME GROUP*TIME GROUP*TIME GROUP*TIME GROUP*TIME Tests of Effect Slices Effect GROUP NDF DDF F Pr > F GROUP*TIME GROUP*TIME GROUP*TIME GROUP*TIME General Linear Models Procedure GROUP TIME SUBJECT General Linear Models Procedure Dependent Variable: Y Sum of Mean Source DF Squares Square F Value Pr > F

18 Page 18 of 26 Model Error Corrected Total R-Square C.V. Root MSE Y Mean Source DF Type III SS Mean Square F Value Pr > F GROUP SUBJECT(GROUP) TIME GROUP*TIME Source GROUP SUBJECT(GROUP) TIME GROUP*TIME Type III Expected Mean Square Var(Error) + 3 Var(SUBJECT(GROUP)) + Q(GROUP,GROUP*TIME) Var(Error) + 3 Var(SUBJECT(GROUP)) Var(Error) + Q(TIME,GROUP*TIME) Var(Error) + Q(GROUP*TIME) General Linear Models Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable: Y Source: GROUP * Error: MS(SUBJECT(GROUP)) Denominator Denominator DF Type III MS DF MS F Value Pr > F * - This test assumes one or more other fixed effects are zero. Source: SUBJECT(GROUP) Error: MS(Error) Denominator Denominator DF Type III MS DF MS F Value Pr > F

19 Page 19 of 26 Source: TIME * Error: MS(Error) Denominator Denominator DF Type III MS DF MS F Value Pr > F * - This test assumes one or more other fixed effects are zero. Source: GROUP*TIME Error: MS(Error) Denominator Denominator DF Type III MS DF MS F Value Pr > F Least Squares Means GROUP TIME Y Std Err Pr > T LSMEAN LSMEAN H0:LSMEAN= GROUP*TIME Effect Sliced by GROUP for Y Sum of Mean GROUP DF Squares Square F Value Pr > F Contrast DF Contrast SS Mean Square F Value Pr > F time in group

20 Page 20 of 26 T for H0: Pr > T Std Error of Parameter Estimate Parameter=0 Estimate time1 group2 mean time2 group2 mean mean diff t1g2-t2g Example 3. Mixed effect model, unbalanced data. In this example, there are 2 subjects in group 1, 3 in group 2, 4 in group 3 and 3 in group 4. There are three observations for each subject corresponding to measurements taken under three conditions, 1, 2 and 3 for subjects in groups 1 and 3 and two observations for each subject corresponding to measurements taken at different conditions, 4 and 5 for subjects in groups 2 and 4. In the following program, factor cond with 5 levels is the effect of the condition and factor group with 4 levels is the effect of the group. A mixed effect model with fixed effect of group and cond(group) and random effect of subject will be used to analyze the data. It is assumed that the effect of the subject has a normal distribution with mean 0 and variance sigmas squared (it measures between subject variability). It is also assumed that the error term has a normal distribution with mean 0 and variance sigmae squared (it measures within subject variability) and the error and subject effects are not correlated. Note the use of the option E3 in the model statement. It makes proc mixed print the coefficients of the type 3 contrasts for the model effects hypotheses. As can be seen below, the results of proc MIXED and proc GLM are different in this case. Program: options ls=76; data one; input y group cond subject; cards;

21 Page 21 of ; run; proc sort data=one; by group subject cond; run; Proc mixed data=one method=reml covtest; Class group cond subject; Model y=group cond(group) / DDFM=SATTERTH e3; RANDOM SUBJECT(group); lsmeans cond(group) /adjust=bon pdiff=control('1' '1') slice=group; Contrast 'cond 1 vs 2 in group 1' cond(group) ; contrast 'cond 1 vs 2 in group 3' cond(group) ; Estimate 'diff c1g1-c1g3' group cond(group) ; proc GLM data=one; class group cond subject; Model y=group subject(group) cond(group); RANDOM SUBJECT(GROUP) /TEST; lsmeans cond(group) /stderr; lsmeans cond(group) /adjust=bon pdiff=control('1' '1') slice=group; Contrast 'cond 1 vs 2 in group 1' cond(group) ; contrast 'cond 1 vs 2 in group 3' cond(group) ; Estimate 'diff c1g1-c1g3' group cond(group) ; Results:

22 Page 22 of 26 The MIXED Procedure GROUP COND SUBJECT Covariance Parameter Estimates (REML) Cov Parm Estimate Std Error Z Pr > Z SUBJECT(GROUP) Residual Type III Coefficients for COND(GROUP) Effect GROUP COND Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 INTERCEPT GROUP GROUP GROUP GROUP COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) Tests of Fixed Effects Source NDF DDF Type III F Pr > F GROUP COND(GROUP) ESTIMATE Statement Results Parameter Estimate Std Error DF t Pr > t

23 Page 23 of 26 diff c1g1-c1g CONTRAST Statement Results Source NDF DDF F Pr > F cond 1 vs 2 in group cond 1 vs 2 in group Least Squares Means Effect GROUP COND LSMEAN Std Error DF t Pr > t COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) Tests of Effect Slices Effect GROUP NDF DDF F Pr > F COND(GROUP) COND(GROUP) COND(GROUP) COND(GROUP) General Linear Models Procedure GROUP COND SUBJECT General Linear Models Procedure

24 Page 24 of 26 Dependent Variable: Y Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total R-Square C.V. Root MSE Y Mean Source DF Type III SS Mean Square F Value Pr > F GROUP SUBJECT(GROUP) COND(GROUP) General Linear Models Procedure Source GROUP SUBJECT(GROUP) COND(GROUP) Type III Expected Mean Square Var(Error) Var(SUBJECT(GROUP)) + Q(GROUP,COND(GROUP)) Var(Error) Var(SUBJECT(GROUP)) Var(Error) + Q(COND(GROUP)) General Linear Models Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Source: GROUP * Error: *MS(SUBJECT(GROUP)) *MS(Error) Denominator Denominator DF Type III MS DF MS F Value Pr > F * - This test assumes one or more other fixed effects are zero. Source: SUBJECT(GROUP) Error: MS(Error) Denominator Denominator

25 Page 25 of 26 DF Type III MS DF MS F Value Pr > F Source: COND(GROUP) Error: MS(Error) Denominator Denominator DF Type III MS DF MS F Value Pr > F Least Squares Means COND GROUP Y Std Err Pr > T LSMEAN LSMEAN H0:LSMEAN= Least Squares Means COND(GROUP) Effect Sliced by GROUP for Y Sum of Mean GROUP DF Squares Square F Value Pr > F Dependent Variable: Y Contrast DF Contrast SS Mean Square F Value Pr > F cond 1 vs 2 in group cond 1 vs 2 in group T for H0: Pr > T Std Error of Parameter Estimate Parameter=0 Estimate

26 Page 26 of 26 diff c1g1-c1g

Random effects and nested models with SAS

Random effects and nested models with SAS Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/

More information

SAS Syntax and Output for Data Manipulation:

SAS Syntax and Output for Data Manipulation: Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining

More information

xtmixed & denominator degrees of freedom: myth or magic

xtmixed & denominator degrees of freedom: myth or magic xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or

More information

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure Technical report Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure Table of contents Introduction................................................................ 1 Data preparation

More information

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Linear mixedeffects modeling in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Table of contents Introduction................................................................3 Data preparation for MIXED...................................................3

More information

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility

More information

Chapter 19 Split-Plot Designs

Chapter 19 Split-Plot Designs Chapter 19 Split-Plot Designs Split-plot designs are needed when the levels of some treatment factors are more difficult to change during the experiment than those of others. The designs have a nested

More information

9.2 User s Guide SAS/STAT. The MIXED Procedure. (Book Excerpt) SAS Documentation

9.2 User s Guide SAS/STAT. The MIXED Procedure. (Book Excerpt) SAS Documentation SAS/STAT 9.2 User s Guide The MIXED Procedure (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

861 Example SPLH. 5 page 1. prefer to have. New data in. SPSS Syntax FILE HANDLE. VARSTOCASESS /MAKE rt. COMPUTE mean=2. COMPUTE sal=2. END IF.

861 Example SPLH. 5 page 1. prefer to have. New data in. SPSS Syntax FILE HANDLE. VARSTOCASESS /MAKE rt. COMPUTE mean=2. COMPUTE sal=2. END IF. SPLH 861 Example 5 page 1 Multivariate Models for Repeated Measures Response Times in Older and Younger Adults These data were collected as part of my masters thesis, and are unpublished in this form (to

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from 1990 1995.

Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from 1990 1995. Lecture 18 1. Random intercepts and slopes 2. Notation for mixed effects models 3. Comparing nested models 4. Multilevel/Hierarchical models 5. SAS versions of R models in Gelman and Hill, chapter 12 1

More information

ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS

ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS Many treatments are equally spaced (incremented). This provides us with the opportunity to look at the response curve

More information

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA Paper P-702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form. One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.

More information

Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts, Procedures and Illustrations

Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts, Procedures and Illustrations Research Article TheScientificWorldJOURNAL (2011) 11, 42 76 TSW Child Health & Human Development ISSN 1537-744X; DOI 10.1100/tsw.2011.2 Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts,

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

SUGI 29 Statistics and Data Analysis

SUGI 29 Statistics and Data Analysis Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

More information

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format: Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

Mixed Models. Jing Cheng Gayla Olbricht Nilupa Gunaratna Rebecca Kendall Alex Lipka Sudeshna Paul Benjamin Tyner. May 19, 2005

Mixed Models. Jing Cheng Gayla Olbricht Nilupa Gunaratna Rebecca Kendall Alex Lipka Sudeshna Paul Benjamin Tyner. May 19, 2005 Mixed Models Jing Cheng Gayla Olbricht Nilupa Gunaratna Rebecca Kendall Alex Lipka Sudeshna Paul Benjamin Tyner May 19, 2005 1 Contents 1 Introduction 3 2 Two-Way Mixed Effects Models 3 2.1 Pearl Data

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing

More information

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

ADVANCED FORECASTING MODELS USING SAS SOFTWARE ADVANCED FORECASTING MODELS USING SAS SOFTWARE Girish Kumar Jha IARI, Pusa, New Delhi 110 012 gjha_eco@iari.res.in 1. Transfer Function Model Univariate ARIMA models are useful for analysis and forecasting

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

More information

Interactions involving Categorical Predictors

Interactions involving Categorical Predictors Interactions involving Categorical Predictors Today s Class: To CLASS or not to CLASS: Manual vs. program-created differences among groups Interactions of continuous and categorical predictors Interactions

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

Use of deviance statistics for comparing models

Use of deviance statistics for comparing models A likelihood-ratio test can be used under full ML. The use of such a test is a quite general principle for statistical testing. In hierarchical linear models, the deviance test is mostly used for multiparameter

More information

Experimental Design for Influential Factors of Rates on Massive Open Online Courses

Experimental Design for Influential Factors of Rates on Massive Open Online Courses Experimental Design for Influential Factors of Rates on Massive Open Online Courses December 12, 2014 Ning Li nli7@stevens.edu Qing Wei qwei1@stevens.edu Yating Lan ylan2@stevens.edu Yilin Wei ywei12@stevens.edu

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

1 Theory: The General Linear Model

1 Theory: The General Linear Model QMIN GLM Theory - 1.1 1 Theory: The General Linear Model 1.1 Introduction Before digital computers, statistics textbooks spoke of three procedures regression, the analysis of variance (ANOVA), and the

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.

Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data. Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Statistical Functions in Excel

Statistical Functions in Excel Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Chapter 29 The GENMOD Procedure. Chapter Table of Contents Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,

More information

Highlights the connections between different class of widely used models in psychological and biomedical studies. Multiple Regression

Highlights the connections between different class of widely used models in psychological and biomedical studies. Multiple Regression GLMM tutor Outline 1 Highlights the connections between different class of widely used models in psychological and biomedical studies. ANOVA Multiple Regression LM Logistic Regression GLM Correlated data

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

More information

One-Way Analysis of Variance

One-Way Analysis of Variance One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

More information

From the help desk: Swamy s random-coefficients model

From the help desk: Swamy s random-coefficients model The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients

More information

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases:

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases: Profile Analysis Introduction Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases: ) Comparing the same dependent variables

More information

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996) MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

Logistic (RLOGIST) Example #1

Logistic (RLOGIST) Example #1 Logistic (RLOGIST) Example #1 SUDAAN Statements and Results Illustrated EFFECTS RFORMAT, RLABEL REFLEVEL EXP option on MODEL statement Hosmer-Lemeshow Test Input Data Set(s): BRFWGT.SAS7bdat Example Using

More information

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS * Hun Myoung Park,

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

1.1. Simple Regression in Excel (Excel 2010).

1.1. Simple Regression in Excel (Excel 2010). .. Simple Regression in Excel (Excel 200). To get the Data Analysis tool, first click on File > Options > Add-Ins > Go > Select Data Analysis Toolpack & Toolpack VBA. Data Analysis is now available under

More information

Longitudinal Data Analysis

Longitudinal Data Analysis Longitudinal Data Analysis Acknowledge: Professor Garrett Fitzmaurice INSTRUCTOR: Rino Bellocco Department of Statistics & Quantitative Methods University of Milano-Bicocca Department of Medical Epidemiology

More information

Financial Risk Management Exam Sample Questions/Answers

Financial Risk Management Exam Sample Questions/Answers Financial Risk Management Exam Sample Questions/Answers Prepared by Daniel HERLEMONT 1 2 3 4 5 6 Chapter 3 Fundamentals of Statistics FRM-99, Question 4 Random walk assumes that returns from one time period

More information

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

More information

Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

HLM software has been one of the leading statistical packages for hierarchical

HLM software has been one of the leading statistical packages for hierarchical Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

More information

The ANOVA for 2x2 Independent Groups Factorial Design

The ANOVA for 2x2 Independent Groups Factorial Design The ANOVA for 2x2 Independent Groups Factorial Design Please Note: In the analyses above I have tried to avoid using the terms "Independent Variable" and "Dependent Variable" (IV and DV) in order to emphasize

More information

Testing for Lack of Fit

Testing for Lack of Fit Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

More information

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 1 DAVID C. HOWELL 4/26/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 1 DAVID C. HOWELL 4/26/2010 MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 1 DAVID C. HOWELL 4/26/2010 FOR THE SECOND PART OF THIS DOCUMENT GO TO www.uvm.edu/~dhowell/methods/supplements/mixed Models Repeated/Mixed Models for

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Multivariate Analysis of Variance (MANOVA) Aaron French, Marcelo Macedo, John Poulsen, Tyler Waterson and Angela Yu Keywords: MANCOVA, special cases, assumptions, further reading, computations Introduction

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

More information

Part II. Multiple Linear Regression

Part II. Multiple Linear Regression Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations

More information

data visualization and regression

data visualization and regression data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species

More information

Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.

Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS

More information

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

More information

Introduction to Fixed Effects Methods

Introduction to Fixed Effects Methods Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Introduction to Data Analysis in Hierarchical Linear Models

Introduction to Data Analysis in Hierarchical Linear Models Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM

More information

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group Introduction to Multilevel Modeling Using HLM 6 By ATS Statistical Consulting Group Multilevel data structure Students nested within schools Children nested within families Respondents nested within interviewers

More information

Factor Analysis. Factor Analysis

Factor Analysis. Factor Analysis Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information