# Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from

Save this PDF as:
Size: px
Start display at page:

Download "Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from 1990 1995."

## Transcription

1 Lecture Random intercepts and slopes 2. Notation for mixed effects models 3. Comparing nested models 4. Multilevel/Hierarchical models 5. SAS versions of R models in Gelman and Hill, chapter 12 1 Random intercepts and slopes Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from family_ income_ Obs id income year expenses debt cohort time 1000s no A no A no A no A yes A no A yes B no B yes B no B no B (Example data adapted from UCLA Academic Technology Services, 2

2 3 Mean function: class cohort year; model income_1000s= year cohort year*cohort ; A class cohort; model income_1000s= year cohort year*cohort ; B What s the difference? 4

3 Model A: mean function with year categorical: 5 Model B: mean function with year continuous? Interpretation: cohort slope is mean annual change in income 6

4 Use time = year = 1,..., 6 instead of year; better numerically Proc Mixed data=econ_long; class family_id cohort; model income_1000s= time cohort time*cohort / solution; random intercept / subject=family_id v vcorr; Solution for Fixed Effects Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B Solution for Fixed Effects Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B Write the equation for each cohort: Do the cohorts have different slopes? 8

5 How can we graph these lines? Ask for LSmeans: Proc Mixed data=econ_long; class family_id cohort; model income_1000s= time cohort time*cohort / solution; random intercept / subject=family_id v vcorr; lsmeans time*cohort; Proc Mixed data=econ_long; NOTE: PROCEDURE MIXED used (Total process time): real time 0.01 sec ERROR: Only class variables allowed in this effect. NOTE: The SAS System stopped processing this step because of errors class family_id cohort; 2062 model income_1000s= time cohort time*cohort / solution; 2063 random intercept / subject=family_id v vcorr; 2064 lsmeans time*cohort ; What s wrong? 10

6 Get fitted values to graph by adding points to the data set: data pred; input family_id time cohort \$; year = time ; cards; 0 1 A 0 6 A 0 1 B 0 6 B ; data family_income; set pred econ_long; 11 Proc Mixed does not have an output statement. Instead, there are options for the model statement. Proc Mixed data=family_income; class family_id cohort; model income_1000s= time cohort time*cohort / solution outpredm=fitted_values ; outpredm gives fitted means random intercept / subject=family_id v vcorr; proc print data=fitted_values (obs=12); 12

7 i n c S f o t a e m d m x e E i c i p _ r l o n e 1 r A L U R y t h y c n d 0 P P l o p e O _ i o e o s e 0 r r p w p s b i m r a m e b 0 e e D h e e i s d e t r e s t s d d F a r r d A A A A A A B B B B B B proc SGplot data=fitted_values; where family_id = 0; series x=year y=pred / group= cohort; 14

8 Adding a random slope Proc Mixed data=family_income; class family_id cohort; model income_1000s= time cohort time*cohort / solution ; random intercept / subject=family_id v vcorr; Proc Mixed data=family_income; class family_id cohort; model income_1000s= time cohort time*cohort / solution; random intercept time / subject=family_id v vcorr; time is a continuous predictor, so a random time effect is a random slope 15 Fixed effects from random intercept model: Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B Fixed effects from random slope and intercept model: Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B

9 Mean function (two lines) look almost exactly the same change is in the p-values. Do the cohorts have different mean annual increases in income? 17 2 covariance parameters from random intercept model: Cov Parm Subject Estimate Intercept family_id Residual Income variance at each year = Res + Intercept = = ˆΩ = Intercept Res + Intercept = = Estimated V Correlation Matrix for family_id 1 Row Col1 Col2 Col3 Col4 Col5 Col

10 3 covariance parameters from random slope and intercept model: Cov Parm Subject Estimate Intercept family_id variance of random intercepts time family_id variance of random slopes Residual No longer have compound symmetry: Estimated V Correlation Matrix for family_id 1 Row Col1 Col2 Col3 Col4 Col5 Col covariance parameters from random slope and intercept model also give changing income variance over time (along the diagonal): Estimated V Matrix for family_id 1 Row Col1 Col2 Col3 Col4 Col5 Col Model for the mean functions are the same in the two models, but random effects are different. How do we compare the models to decide which fits better? 20

11 Notation for mixed effects models Random intercept model: income ijk = (Ø 0 + b 0k ) + Ø 1 (Cohort i) + Ø 2 (Year j ) + Ø 3 (Cohort i Year j ) + " ijk, {b 0k } are independent Normal(0,æ 2 b ), errors {" ijk} are independent Normal(0,æ 2 e ), and {b 0k} are independent of the errors {" ijk }. For each family, there is 1 random effect (intercept) and 6 fixed effect parameters: Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B Random slope and intercept model: income ijk = Ø 0 + b 0k +Ø1 (Cohort i)+ Ø 2 + b 2k (Year j )+Ø3 (Cohort i Year j )+" ijk, {b 0k } are independent Normal(0,æ 2 0 ), {b 2k} are independent Normal(0,æ 2 2 ), errors {" ijk} are independent Normal(0,æ 2 e ), and {b 0k},{b 2k }, and {" ijk } are mutually independent. For each family, there are 2 random effects (intercept and slope) and 6 fixed effect parameters: Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B

12 Rearrange the models, putting random effects last: income ijk = Ø 0 + Ø 1 (Cohort i) + Ø 2 (Year j ) + Ø 3 (Cohort i Year j ) + b 0k + " ijk income ijk = Ø 0 + Ø 1 (Cohort i) + Ø 2 (Year j ) + Ø 3 (Cohort i Year j ) + b 0k + b 2k + " ijk In matrix form, these models are often written y = XØ + Zb + ", (n 1) X contains predictors for fixed effects Z contains predictors for random effects In SAS notation, G is the covariance matrix of the random effects b, R is the block-diagonal covariance matrix of the errors ", 23 Random intercept model: Dimensions Covariance Parameters 2 Columns in X 6 fixed Columns in Z Per Subject 1 random Subjects 50 Max Obs Per Subject 6 Random slope and intercept model: Dimensions Covariance Parameters 3 Columns in X 6 Columns in Z Per Subject 2 Subjects 50 Max Obs Per Subject 6 24

13 Comparing nested models Model for the mean functions are the same in the two models, but random effects are different. How do we compare the models to decide which fits better? Random intercept model is nested in random slope and intercept model, because all the parameters of the first model are contained in the second. Test whether extra parameters in larger model are needed. 25 General test to compare nested models: H 0 : extra parameters in the larger model are all zero that is, the smaller model fits as well as the larger one. H A : extra parameters in the larger model are not all zero that is, the larger model fits better than smaller one. This is a general test to compare nested models: Mean functions must be identical to compare covariance structures. Covariance structures must be identical to compare mean functions. 26

14 Test is based on the difference in log likelihood values for the two models: X = ( 2Res Log Likelihood, smaller model) ( 2Res Log Likelihood, larger model) X has a chi-square distribution approximately, with degrees of freedom equal to the difference in number of parameters: df = (number of parameters, larger model) (number of parameters, smaller model). 27 For random intercept model: Covariance Parameters 2 Columns in X 6 Columns in Z Per Subject 1 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) For random slope and intercept model: Covariance Parameters 3 Columns in X 6 Columns in Z Per Subject 2-2 Res Log Likelihood Test statistic is X = = 285.6, with 3 2 = 1 df 28

15 29 Use SAS to calculate the test statistic and find the p-value: probchi (x, n) gives the probability of a value x for a chi-square variable with n degrees of freedom. (We want probability for value x.) data chisq; LL_diff = ; param_diff = 3-2; pvalue = probchi (LL_diff, param_diff); Proc Print data=chisq; param_ Obs LL_diff diff pvalue report this as p <.0001 Conclusion? 30

16 Revisit the fixed effects results from the random slope and intercept model: Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time <.0001 cohort time*cohort Do we need to keep the interaction term? 31 Random slope and intercept main-effects model: Proc Mixed data=econ_long; class family_id cohort; model income_1000s= time cohort/ solution; random intercept time / subject=family_id v vcorr; Dimensions Covariance Parameters 3 Columns in X 4 Columns in Z Per Subject 2 Covariance structure is the same as before, but model for mean is nested in interaction model. 32

17 For interaction model with random slope and intercept: Covariance Parameters 3 Columns in X 6 Columns in Z Per Subject 2-2 Res Log Likelihood For main-effects model with random slope and intercept: Covariance Parameters 3 Columns in X 4 Columns in Z Per Subject 2-2 Res Log Likelihood Test statistic is parameter) X = = 0.4, with 1 df (1 non-zero interaction From SAS, p = We already have a test for this: type III F-test Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time <.0001 cohort time*cohort F-test is not exactly the same as likelihood ratio test, but very similar. 34

18 Solution for Fixed Effects Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time <.0001 cohort What is the mean annual increase in income? Do the cohorts have different starting incomes? 35 Main effects model fits parallel lines: 36

19 Examples of multi-level or hierarchical data Example 1. Study of standardized test scores from 4th grade students. Sample: 8000 students at 46 schools in Wisconsin and Texas. Student-level predictors: gender, race, pre-test scores School-level predictors: state, school district, public/private, socio-economic status of school s neighborhood. School-level regression of scores on student characteristics School-level regression of school mean score on school, district, state characteristics 37 Example 2. Retrospective study to assess effect of surgical volume on early hospital mortality for pediatric cardiac surgery (L Kochilas, Plan B project). Patient-level predictors: age, gender, risk-score for surgery Hospital-level predictors: time period, surgical volume How does effect of surgical volume on probability of survival vary between different types of patients? 38

20 Example 3. Measurements of radon (carcinogenic gas) in samples of homes in 85 counties in Minnesota. Aim: estimate county mean radon levels. House-level predictor: floor where radon measurement was taken. basement (floor=0), first floor (floor=1) County-level predictors: uranium measurement for county Gelman and Hill (2007) Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge U Press. Chapter 12: multi-level models in R, which we will fit in Proc Mixed. 39 county_ House log_radon floor number uranium

21 Model 1. Random intercept for each county with house-level predictor (floor) Random intercept model for radon measured in house i in the county j radon in house ij= Ø 0 + random countyj effect + Ø floor i + " ij Gelman and Hill, 12.4, model radon measurements y ij y ij = Æ j [i] + Ø floor i + " ij Assume Æ j [i] are Normal(0,æ 2 Æ ) and independent of the errors {" ij} ª Normal(0,æ 2 y ). SAS version of this model: y ij = (Ø 0 + b j ) + Ø floor i + " ij Estimate only æ 2 Æ instead of 85 regression coefficients for 85 counties 41 The sums (Ø 0 + b j ) = Ø 0 + random countyj effect are the estimated mean radon levels in each county so we want to save the random intercepts: Proc Mixed data= arhm.radon; class county_number; model radon = floor / solution ddfm=bw; random intercept / subject=county_number v vcorr solution; ODS output SolutionR = A; saves random effects to A 42

22 Class Level Information Class Levels Values county_number Dimensions Covariance Parameters 2 Columns in X 2 Columns in Z Per Subject 1 Subjects 85 Max Obs Per Subject 116 Number of Observations Number of Observations Read 919 Number of Observations Used 919 Number of Observations Not Used 0 43 Slope for floor is averaged across counties: Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept <.0001 floor <.0001 Covariance estimates of æ 2 Æ and æ2 y (R gives the square roots) Covariance Parameter Estimates Cov Parm Subject Estimate Intercept county_number Residual

23 Random effects for each county (Gelman and Hill: county-level errors p 260): Solution for Random Effects county_ Std Err Effect number Estimate Pred DF t Value Pr > t Intercept Intercept <.0001 Intercept Intercept To get estimates of county means, we need fitted values that add these random intercepts to overall intercept. In model options, outpredm gives fitted mean (fixed effects), outpred gives fitted fixed + random effects proc mixed data= arhm.radon; * p 259; class county_number; model radon=floor / solution ddfm=bw outpred = county_estimates ; random intercept / subject=county_number v vcorr ; proc print data=county_estimates(obs=15); 46

24 c o u n S t t y d _ u E n r r r f u a r A L U R a l m n P P l o p e d o b i r r p w p s o o e u e e D h e e i n r r m d d F a r r d How can we get one observation per county at floor=0? 47 Model 2. Group-level predictor + subject-level predictor (Gelman & Hill, 12.6) Two regression models: lower level for houses, upper level for counties House-level regression (radon in house ij) = Ø 0 + random countyj effect + Ø floor i + " ij combined with county-level regression (mean radon, county i) = (uranium, countyi) ++e j Gelman and Hill notation: y ij = Æ j [i] + Ø floor i + " ij Æ j = u j + e j 48

25 To fit this in Proc Mixed, just add the county level predictor. Uranium is constant across houses within a county. Proc Mixed data= arhm.radon; GH p 266 ; class county_number; model radon = floor uranium / solution ddfm=bw; random intercept / subject=county_number v vcorr solution; 49 Fixed effects: Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept <.0001 floor <.0001 uranium <.0001 Does uranium help the model? 50

### SAS Syntax and Output for Data Manipulation:

Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining

### Random effects and nested models with SAS

Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/

### I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### 861 Example SPLH. 5 page 1. prefer to have. New data in. SPSS Syntax FILE HANDLE. VARSTOCASESS /MAKE rt. COMPUTE mean=2. COMPUTE sal=2. END IF.

SPLH 861 Example 5 page 1 Multivariate Models for Repeated Measures Response Times in Older and Younger Adults These data were collected as part of my masters thesis, and are unpublished in this form (to

### Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA

Paper P-702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals

### Electronic Thesis and Dissertations UCLA

Electronic Thesis and Dissertations UCLA Peer Reviewed Title: A Multilevel Longitudinal Analysis of Teaching Effectiveness Across Five Years Author: Wang, Kairong Acceptance Date: 2013 Series: UCLA Electronic

### Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling

Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling Pre-requisites Modules 1-4 Contents P5.1 Comparing Groups using Multilevel Modelling... 4

### This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Introducing the Multilevel Model for Change

Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Multilevel Modeling - A Brief Introduction 2 3 4 5 Introduction In this lecture, we introduce the multilevel model for change.

### E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

### Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

### An Introduction to Modeling Longitudinal Data

An Introduction to Modeling Longitudinal Data Session I: Basic Concepts and Looking at Data Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu August 2010 Robert Weiss

### HLM software has been one of the leading statistical packages for hierarchical

Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

### Introduction to Data Analysis in Hierarchical Linear Models

Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM

### Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

### Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED

1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility

### The Latent Variable Growth Model In Practice. Individual Development Over Time

The Latent Variable Growth Model In Practice 37 Individual Development Over Time y i = 1 i = 2 i = 3 t = 1 t = 2 t = 3 t = 4 ε 1 ε 2 ε 3 ε 4 y 1 y 2 y 3 y 4 x η 0 η 1 (1) y ti = η 0i + η 1i x t + ε ti

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

### Introduction to Hierarchical Linear Modeling with R

Introduction to Hierarchical Linear Modeling with R 5 10 15 20 25 5 10 15 20 25 13 14 15 16 40 30 20 10 0 40 30 20 10 9 10 11 12-10 SCIENCE 0-10 5 6 7 8 40 30 20 10 0-10 40 1 2 3 4 30 20 10 0-10 5 10 15

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

### SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

### Binary Logistic Regression

Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

### Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

### Statistics, Data Analysis & Econometrics

Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important

### Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch

### Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.

Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### ANOVA. February 12, 2015

ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

### 6 Variables: PD MF MA K IAH SBS

options pageno=min nodate formdlim='-'; title 'Canonical Correlation, Journal of Interpersonal Violence, 10: 354-366.'; data SunitaPatel; infile 'C:\Users\Vati\Documents\StatData\Sunita.dat'; input Group

### Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:

### Overview of Methods for Analyzing Cluster-Correlated Data. Garrett M. Fitzmaurice

Overview of Methods for Analyzing Cluster-Correlated Data Garrett M. Fitzmaurice Laboratory for Psychiatric Biostatistics, McLean Hospital Department of Biostatistics, Harvard School of Public Health Outline

### Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL

### ADVANCED FORECASTING MODELS USING SAS SOFTWARE

ADVANCED FORECASTING MODELS USING SAS SOFTWARE Girish Kumar Jha IARI, Pusa, New Delhi 110 012 gjha_eco@iari.res.in 1. Transfer Function Model Univariate ARIMA models are useful for analysis and forecasting

### Notes on Applied Linear Regression

Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

### An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing

### SUGI 29 Statistics and Data Analysis

Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

### Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

### High School Graduation Rates in Maryland Technical Appendix

High School Graduation Rates in Maryland Technical Appendix Data All data for the brief were obtained from the National Center for Education Statistics Common Core of Data (CCD). This data represents the

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE

Linear mixedeffects modeling in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Table of contents Introduction................................................................3 Data preparation for MIXED...................................................3

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### data visualization and regression

data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species

### Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

### Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

Technical report Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure Table of contents Introduction................................................................ 1 Data preparation

### Multiple Regression. Page 24

Multiple Regression Multiple regression is an extension of simple (bi-variate) regression. The goal of multiple regression is to enable a researcher to assess the relationship between a dependent (predicted)

### Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### Multilevel Modeling Tutorial. Using SAS, Stata, HLM, R, SPSS, and Mplus

Using SAS, Stata, HLM, R, SPSS, and Mplus Updated: March 2015 Table of Contents Introduction... 3 Model Considerations... 3 Intraclass Correlation Coefficient... 4 Example Dataset... 4 Intercept-only Model

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Cool Tools for PROC LOGISTIC

Cool Tools for PROC LOGISTIC Paul D. Allison Statistical Horizons LLC and the University of Pennsylvania March 2013 www.statisticalhorizons.com 1 New Features in LOGISTIC ODDSRATIO statement EFFECTPLOT

### New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 1 DAVID C. HOWELL 4/26/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 1 DAVID C. HOWELL 4/26/2010 FOR THE SECOND PART OF THIS DOCUMENT GO TO www.uvm.edu/~dhowell/methods/supplements/mixed Models Repeated/Mixed Models for

### xtmixed & denominator degrees of freedom: myth or magic

xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### The Basic Two-Level Regression Model

2 The Basic Two-Level Regression Model The multilevel regression model has become known in the research literature under a variety of names, such as random coefficient model (de Leeuw & Kreft, 1986; Longford,

### Use of deviance statistics for comparing models

A likelihood-ratio test can be used under full ML. The use of such a test is a quite general principle for statistical testing. In hierarchical linear models, the deviance test is mostly used for multiparameter

### Factor Analysis. Factor Analysis

Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

### Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2

### Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

### Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

### Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

### Module 14: Missing Data Stata Practical

Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724

### Simple Linear Regression, Scatterplots, and Bivariate Correlation

1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

### Using PROC MIXED in Hierarchical Linear Models: Examples from two- and three-level school-effect analysis, and meta-analysis research

Using PROC MIXED in Hierarchical Linear Models: Examples from two- and three-level school-effect analysis, and meta-analysis research Sawako Suzuki, DePaul University, Chicago Ching-Fan Sheu, DePaul University,

### Mihaela Ene, Elizabeth A. Leighton, Genine L. Blue, Bethany A. Bell University of South Carolina

Paper 134-2014 Multilevel Models for Categorical Data using SAS PROC GLIMMIX: The Basics Mihaela Ene, Elizabeth A. Leighton, Genine L. Blue, Bethany A. Bell University of South Carolina ABSTRACT Multilevel

### Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

### ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into

### Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

### 1.1. Simple Regression in Excel (Excel 2010).

.. Simple Regression in Excel (Excel 200). To get the Data Analysis tool, first click on File > Options > Add-Ins > Go > Select Data Analysis Toolpack & Toolpack VBA. Data Analysis is now available under

### 2. Making example missing-value datasets: MCAR, MAR, and MNAR

Lecture 20 1. Types of missing values 2. Making example missing-value datasets: MCAR, MAR, and MNAR 3. Common methods for missing data 4. Compare results on example MCAR, MAR, MNAR data 1 Missing Data

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association