Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from


 Anne Bruce
 3 years ago
 Views:
Transcription
1 Lecture Random intercepts and slopes 2. Notation for mixed effects models 3. Comparing nested models 4. Multilevel/Hierarchical models 5. SAS versions of R models in Gelman and Hill, chapter 12 1 Random intercepts and slopes Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from family_ income_ Obs id income year expenses debt cohort time 1000s no A no A no A no A yes A no A yes B no B yes B no B no B (Example data adapted from UCLA Academic Technology Services, 2
2 3 Mean function: class cohort year; model income_1000s= year cohort year*cohort ; A class cohort; model income_1000s= year cohort year*cohort ; B What s the difference? 4
3 Model A: mean function with year categorical: 5 Model B: mean function with year continuous? Interpretation: cohort slope is mean annual change in income 6
4 Use time = year = 1,..., 6 instead of year; better numerically Proc Mixed data=econ_long; class family_id cohort; model income_1000s= time cohort time*cohort / solution; random intercept / subject=family_id v vcorr; Solution for Fixed Effects Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B Solution for Fixed Effects Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B Write the equation for each cohort: Do the cohorts have different slopes? 8
5 How can we graph these lines? Ask for LSmeans: Proc Mixed data=econ_long; class family_id cohort; model income_1000s= time cohort time*cohort / solution; random intercept / subject=family_id v vcorr; lsmeans time*cohort; Proc Mixed data=econ_long; NOTE: PROCEDURE MIXED used (Total process time): real time 0.01 sec ERROR: Only class variables allowed in this effect. NOTE: The SAS System stopped processing this step because of errors class family_id cohort; 2062 model income_1000s= time cohort time*cohort / solution; 2063 random intercept / subject=family_id v vcorr; 2064 lsmeans time*cohort ; What s wrong? 10
6 Get fitted values to graph by adding points to the data set: data pred; input family_id time cohort $; year = time ; cards; 0 1 A 0 6 A 0 1 B 0 6 B ; data family_income; set pred econ_long; 11 Proc Mixed does not have an output statement. Instead, there are options for the model statement. Proc Mixed data=family_income; class family_id cohort; model income_1000s= time cohort time*cohort / solution outpredm=fitted_values ; outpredm gives fitted means random intercept / subject=family_id v vcorr; proc print data=fitted_values (obs=12); 12
7 i n c S f o t a e m d m x e E i c i p _ r l o n e 1 r A L U R y t h y c n d 0 P P l o p e O _ i o e o s e 0 r r p w p s b i m r a m e b 0 e e D h e e i s d e t r e s t s d d F a r r d A A A A A A B B B B B B proc SGplot data=fitted_values; where family_id = 0; series x=year y=pred / group= cohort; 14
8 Adding a random slope Proc Mixed data=family_income; class family_id cohort; model income_1000s= time cohort time*cohort / solution ; random intercept / subject=family_id v vcorr; Proc Mixed data=family_income; class family_id cohort; model income_1000s= time cohort time*cohort / solution; random intercept time / subject=family_id v vcorr; time is a continuous predictor, so a random time effect is a random slope 15 Fixed effects from random intercept model: Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B Fixed effects from random slope and intercept model: Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B
9 Mean function (two lines) look almost exactly the same change is in the pvalues. Do the cohorts have different mean annual increases in income? 17 2 covariance parameters from random intercept model: Cov Parm Subject Estimate Intercept family_id Residual Income variance at each year = Res + Intercept = = ˆΩ = Intercept Res + Intercept = = Estimated V Correlation Matrix for family_id 1 Row Col1 Col2 Col3 Col4 Col5 Col
10 3 covariance parameters from random slope and intercept model: Cov Parm Subject Estimate Intercept family_id variance of random intercepts time family_id variance of random slopes Residual No longer have compound symmetry: Estimated V Correlation Matrix for family_id 1 Row Col1 Col2 Col3 Col4 Col5 Col covariance parameters from random slope and intercept model also give changing income variance over time (along the diagonal): Estimated V Matrix for family_id 1 Row Col1 Col2 Col3 Col4 Col5 Col Model for the mean functions are the same in the two models, but random effects are different. How do we compare the models to decide which fits better? 20
11 Notation for mixed effects models Random intercept model: income ijk = (Ø 0 + b 0k ) + Ø 1 (Cohort i) + Ø 2 (Year j ) + Ø 3 (Cohort i Year j ) + " ijk, {b 0k } are independent Normal(0,æ 2 b ), errors {" ijk} are independent Normal(0,æ 2 e ), and {b 0k} are independent of the errors {" ijk }. For each family, there is 1 random effect (intercept) and 6 fixed effect parameters: Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B Random slope and intercept model: income ijk = Ø 0 + b 0k +Ø1 (Cohort i)+ Ø 2 + b 2k (Year j )+Ø3 (Cohort i Year j )+" ijk, {b 0k } are independent Normal(0,æ 2 0 ), {b 2k} are independent Normal(0,æ 2 2 ), errors {" ijk} are independent Normal(0,æ 2 e ), and {b 0k},{b 2k }, and {" ijk } are mutually independent. For each family, there are 2 random effects (intercept and slope) and 6 fixed effect parameters: Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B
12 Rearrange the models, putting random effects last: income ijk = Ø 0 + Ø 1 (Cohort i) + Ø 2 (Year j ) + Ø 3 (Cohort i Year j ) + b 0k + " ijk income ijk = Ø 0 + Ø 1 (Cohort i) + Ø 2 (Year j ) + Ø 3 (Cohort i Year j ) + b 0k + b 2k + " ijk In matrix form, these models are often written y = XØ + Zb + ", (n 1) X contains predictors for fixed effects Z contains predictors for random effects In SAS notation, G is the covariance matrix of the random effects b, R is the blockdiagonal covariance matrix of the errors ", 23 Random intercept model: Dimensions Covariance Parameters 2 Columns in X 6 fixed Columns in Z Per Subject 1 random Subjects 50 Max Obs Per Subject 6 Random slope and intercept model: Dimensions Covariance Parameters 3 Columns in X 6 Columns in Z Per Subject 2 Subjects 50 Max Obs Per Subject 6 24
13 Comparing nested models Model for the mean functions are the same in the two models, but random effects are different. How do we compare the models to decide which fits better? Random intercept model is nested in random slope and intercept model, because all the parameters of the first model are contained in the second. Test whether extra parameters in larger model are needed. 25 General test to compare nested models: H 0 : extra parameters in the larger model are all zero that is, the smaller model fits as well as the larger one. H A : extra parameters in the larger model are not all zero that is, the larger model fits better than smaller one. This is a general test to compare nested models: Mean functions must be identical to compare covariance structures. Covariance structures must be identical to compare mean functions. 26
14 Test is based on the difference in log likelihood values for the two models: X = ( 2Res Log Likelihood, smaller model) ( 2Res Log Likelihood, larger model) X has a chisquare distribution approximately, with degrees of freedom equal to the difference in number of parameters: df = (number of parameters, larger model) (number of parameters, smaller model). 27 For random intercept model: Covariance Parameters 2 Columns in X 6 Columns in Z Per Subject 1 Fit Statistics 2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) For random slope and intercept model: Covariance Parameters 3 Columns in X 6 Columns in Z Per Subject 22 Res Log Likelihood Test statistic is X = = 285.6, with 3 2 = 1 df 28
15 29 Use SAS to calculate the test statistic and find the pvalue: probchi (x, n) gives the probability of a value x for a chisquare variable with n degrees of freedom. (We want probability for value x.) data chisq; LL_diff = ; param_diff = 32; pvalue = probchi (LL_diff, param_diff); Proc Print data=chisq; param_ Obs LL_diff diff pvalue report this as p <.0001 Conclusion? 30
16 Revisit the fixed effects results from the random slope and intercept model: Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B time*cohort A time*cohort B Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time <.0001 cohort time*cohort Do we need to keep the interaction term? 31 Random slope and intercept maineffects model: Proc Mixed data=econ_long; class family_id cohort; model income_1000s= time cohort/ solution; random intercept time / subject=family_id v vcorr; Dimensions Covariance Parameters 3 Columns in X 4 Columns in Z Per Subject 2 Covariance structure is the same as before, but model for mean is nested in interaction model. 32
17 For interaction model with random slope and intercept: Covariance Parameters 3 Columns in X 6 Columns in Z Per Subject 22 Res Log Likelihood For maineffects model with random slope and intercept: Covariance Parameters 3 Columns in X 4 Columns in Z Per Subject 22 Res Log Likelihood Test statistic is parameter) X = = 0.4, with 1 df (1 nonzero interaction From SAS, p = We already have a test for this: type III Ftest Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time <.0001 cohort time*cohort Ftest is not exactly the same as likelihood ratio test, but very similar. 34
18 Solution for Fixed Effects Standard Effect cohort Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 cohort A cohort B Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time <.0001 cohort What is the mean annual increase in income? Do the cohorts have different starting incomes? 35 Main effects model fits parallel lines: 36
19 Examples of multilevel or hierarchical data Example 1. Study of standardized test scores from 4th grade students. Sample: 8000 students at 46 schools in Wisconsin and Texas. Studentlevel predictors: gender, race, pretest scores Schoollevel predictors: state, school district, public/private, socioeconomic status of school s neighborhood. Schoollevel regression of scores on student characteristics Schoollevel regression of school mean score on school, district, state characteristics 37 Example 2. Retrospective study to assess effect of surgical volume on early hospital mortality for pediatric cardiac surgery (L Kochilas, Plan B project). Patientlevel predictors: age, gender, riskscore for surgery Hospitallevel predictors: time period, surgical volume How does effect of surgical volume on probability of survival vary between different types of patients? 38
20 Example 3. Measurements of radon (carcinogenic gas) in samples of homes in 85 counties in Minnesota. Aim: estimate county mean radon levels. Houselevel predictor: floor where radon measurement was taken. basement (floor=0), first floor (floor=1) Countylevel predictors: uranium measurement for county Gelman and Hill (2007) Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge U Press. Chapter 12: multilevel models in R, which we will fit in Proc Mixed. 39 county_ House log_radon floor number uranium
21 Model 1. Random intercept for each county with houselevel predictor (floor) Random intercept model for radon measured in house i in the county j radon in house ij= Ø 0 + random countyj effect + Ø floor i + " ij Gelman and Hill, 12.4, model radon measurements y ij y ij = Æ j [i] + Ø floor i + " ij Assume Æ j [i] are Normal(0,æ 2 Æ ) and independent of the errors {" ij} ª Normal(0,æ 2 y ). SAS version of this model: y ij = (Ø 0 + b j ) + Ø floor i + " ij Estimate only æ 2 Æ instead of 85 regression coefficients for 85 counties 41 The sums (Ø 0 + b j ) = Ø 0 + random countyj effect are the estimated mean radon levels in each county so we want to save the random intercepts: Proc Mixed data= arhm.radon; class county_number; model radon = floor / solution ddfm=bw; random intercept / subject=county_number v vcorr solution; ODS output SolutionR = A; saves random effects to A 42
22 Class Level Information Class Levels Values county_number Dimensions Covariance Parameters 2 Columns in X 2 Columns in Z Per Subject 1 Subjects 85 Max Obs Per Subject 116 Number of Observations Number of Observations Read 919 Number of Observations Used 919 Number of Observations Not Used 0 43 Slope for floor is averaged across counties: Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept <.0001 floor <.0001 Covariance estimates of æ 2 Æ and æ2 y (R gives the square roots) Covariance Parameter Estimates Cov Parm Subject Estimate Intercept county_number Residual
23 Random effects for each county (Gelman and Hill: countylevel errors p 260): Solution for Random Effects county_ Std Err Effect number Estimate Pred DF t Value Pr > t Intercept Intercept <.0001 Intercept Intercept To get estimates of county means, we need fitted values that add these random intercepts to overall intercept. In model options, outpredm gives fitted mean (fixed effects), outpred gives fitted fixed + random effects proc mixed data= arhm.radon; * p 259; class county_number; model radon=floor / solution ddfm=bw outpred = county_estimates ; random intercept / subject=county_number v vcorr ; proc print data=county_estimates(obs=15); 46
24 c o u n S t t y d _ u E n r r r f u a r A L U R a l m n P P l o p e d o b i r r p w p s o o e u e e D h e e i n r r m d d F a r r d How can we get one observation per county at floor=0? 47 Model 2. Grouplevel predictor + subjectlevel predictor (Gelman & Hill, 12.6) Two regression models: lower level for houses, upper level for counties Houselevel regression (radon in house ij) = Ø 0 + random countyj effect + Ø floor i + " ij combined with countylevel regression (mean radon, county i) = (uranium, countyi) ++e j Gelman and Hill notation: y ij = Æ j [i] + Ø floor i + " ij Æ j = u j + e j 48
25 To fit this in Proc Mixed, just add the county level predictor. Uranium is constant across houses within a county. Proc Mixed data= arhm.radon; GH p 266 ; class county_number; model radon = floor uranium / solution ddfm=bw; random intercept / subject=county_number v vcorr solution; 49 Fixed effects: Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > t Intercept <.0001 floor <.0001 uranium <.0001 Does uranium help the model? 50
SAS Syntax and Output for Data Manipulation:
Psyc 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling WithinPerson Change The models for this example come from Hoffman (in preparation) chapter 5. We will be examining
More informationRandom effects and nested models with SAS
Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANACHAMPAIGN Linear Algebra Slide 1 of
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS OneSample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More information861 Example SPLH. 5 page 1. prefer to have. New data in. SPSS Syntax FILE HANDLE. VARSTOCASESS /MAKE rt. COMPUTE mean=2. COMPUTE sal=2. END IF.
SPLH 861 Example 5 page 1 Multivariate Models for Repeated Measures Response Times in Older and Younger Adults These data were collected as part of my masters thesis, and are unpublished in this form (to
More informationIndividual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA
Paper P702 Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Individual growth models are designed for exploring longitudinal data on individuals
More informationElectronic Thesis and Dissertations UCLA
Electronic Thesis and Dissertations UCLA Peer Reviewed Title: A Multilevel Longitudinal Analysis of Teaching Effectiveness Across Five Years Author: Wang, Kairong Acceptance Date: 2013 Series: UCLA Electronic
More informationModule 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling
Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling Prerequisites Modules 14 Contents P5.1 Comparing Groups using Multilevel Modelling... 4
More informationThis can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.
OneDegreeofFreedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationIntroducing the Multilevel Model for Change
Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Multilevel Modeling  A Brief Introduction 2 3 4 5 Introduction In this lecture, we introduce the multilevel model for change.
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationIntroduction to Longitudinal Data Analysis
Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction
More informationAn Introduction to Modeling Longitudinal Data
An Introduction to Modeling Longitudinal Data Session I: Basic Concepts and Looking at Data Robert Weiss Department of Biostatistics UCLA School of Public Health robweiss@ucla.edu August 2010 Robert Weiss
More informationHLM software has been one of the leading statistical packages for hierarchical
Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush
More informationIntroduction to Data Analysis in Hierarchical Linear Models
Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM
More informationMultiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.
More informationMilk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED
1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility
More informationThe Latent Variable Growth Model In Practice. Individual Development Over Time
The Latent Variable Growth Model In Practice 37 Individual Development Over Time y i = 1 i = 2 i = 3 t = 1 t = 2 t = 3 t = 4 ε 1 ε 2 ε 3 ε 4 y 1 y 2 y 3 y 4 x η 0 η 1 (1) y ti = η 0i + η 1i x t + ε ti
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationWe extended the additive model in two variables to the interaction model by adding a third term to the equation.
Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic
More informationIntroduction to Hierarchical Linear Modeling with R
Introduction to Hierarchical Linear Modeling with R 5 10 15 20 25 5 10 15 20 25 13 14 15 16 40 30 20 10 0 40 30 20 10 9 10 11 1210 SCIENCE 010 5 6 7 8 40 30 20 10 010 40 1 2 3 4 30 20 10 010 5 10 15
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 16233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a stepbystep guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More informationBinary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two Means
Lesson : Comparison of Population Means Part c: Comparison of Two Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationStatistics, Data Analysis & Econometrics
Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important
More informationStat 5303 (Oehlert): Tukey One Degree of Freedom 1
Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch
More informationChapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.
Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationANOVA. February 12, 2015
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationClass 19: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationFailure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.
Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is
More information6 Variables: PD MF MA K IAH SBS
options pageno=min nodate formdlim=''; title 'Canonical Correlation, Journal of Interpersonal Violence, 10: 354366.'; data SunitaPatel; infile 'C:\Users\Vati\Documents\StatData\Sunita.dat'; input Group
More informationIndices of Model Fit STRUCTURAL EQUATION MODELING 2013
Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:
More informationOverview of Methods for Analyzing ClusterCorrelated Data. Garrett M. Fitzmaurice
Overview of Methods for Analyzing ClusterCorrelated Data Garrett M. Fitzmaurice Laboratory for Psychiatric Biostatistics, McLean Hospital Department of Biostatistics, Harvard School of Public Health Outline
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 16233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationUsing An Ordered Logistic Regression Model with SAS Vartanian: SW 541
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL
More informationADVANCED FORECASTING MODELS USING SAS SOFTWARE
ADVANCED FORECASTING MODELS USING SAS SOFTWARE Girish Kumar Jha IARI, Pusa, New Delhi 110 012 gjha_eco@iari.res.in 1. Transfer Function Model Univariate ARIMA models are useful for analysis and forecasting
More informationNotes on Applied Linear Regression
Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 4448935 email:
More informationAn Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA
ABSTRACT An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA Often SAS Programmers find themselves in situations where performing
More informationSUGI 29 Statistics and Data Analysis
Paper 19429 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationHigh School Graduation Rates in Maryland Technical Appendix
High School Graduation Rates in Maryland Technical Appendix Data All data for the brief were obtained from the National Center for Education Statistics Common Core of Data (CCD). This data represents the
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationTechnical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE
Linear mixedeffects modeling in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE Table of contents Introduction................................................................3 Data preparation for MIXED...................................................3
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationdata visualization and regression
data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species
More informationUsing Stata for Categorical Data Analysis
Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,
More informationLinear MixedEffects Modeling in SPSS: An Introduction to the MIXED Procedure
Technical report Linear MixedEffects Modeling in SPSS: An Introduction to the MIXED Procedure Table of contents Introduction................................................................ 1 Data preparation
More informationMultiple Regression. Page 24
Multiple Regression Multiple regression is an extension of simple (bivariate) regression. The goal of multiple regression is to enable a researcher to assess the relationship between a dependent (predicted)
More informationBill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1
Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationMultilevel Modeling Tutorial. Using SAS, Stata, HLM, R, SPSS, and Mplus
Using SAS, Stata, HLM, R, SPSS, and Mplus Updated: March 2015 Table of Contents Introduction... 3 Model Considerations... 3 Intraclass Correlation Coefficient... 4 Example Dataset... 4 Interceptonly Model
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA Email: peverso1@swarthmore.edu 1. Introduction
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study loglinear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationCool Tools for PROC LOGISTIC
Cool Tools for PROC LOGISTIC Paul D. Allison Statistical Horizons LLC and the University of Pennsylvania March 2013 www.statisticalhorizons.com 1 New Features in LOGISTIC ODDSRATIO statement EFFECTPLOT
More informationNew SAS Procedures for Analysis of Sample Survey Data
New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many
More informationChapter 29 The GENMOD Procedure. Chapter Table of Contents
Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationMIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 1 DAVID C. HOWELL 4/26/2010
MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 1 DAVID C. HOWELL 4/26/2010 FOR THE SECOND PART OF THIS DOCUMENT GO TO www.uvm.edu/~dhowell/methods/supplements/mixed Models Repeated/Mixed Models for
More informationxtmixed & denominator degrees of freedom: myth or magic
xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationThe Basic TwoLevel Regression Model
2 The Basic TwoLevel Regression Model The multilevel regression model has become known in the research literature under a variety of names, such as random coefficient model (de Leeuw & Kreft, 1986; Longford,
More informationUse of deviance statistics for comparing models
A likelihoodratio test can be used under full ML. The use of such a test is a quite general principle for statistical testing. In hierarchical linear models, the deviance test is mostly used for multiparameter
More informationFactor Analysis. Factor Analysis
Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we
More informationComparing Multiple Proportions, Test of Independence and Goodness of Fit
Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2
More informationInteraction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015
Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,
More informationLecture 15. Endogeneity & Instrumental Variable Estimation
Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
More informationModule 14: Missing Data Stata Practical
Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189250103 and MRC grant G0900724
More informationSimple Linear Regression, Scatterplots, and Bivariate Correlation
1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.
More informationUsing PROC MIXED in Hierarchical Linear Models: Examples from two and threelevel schooleffect analysis, and metaanalysis research
Using PROC MIXED in Hierarchical Linear Models: Examples from two and threelevel schooleffect analysis, and metaanalysis research Sawako Suzuki, DePaul University, Chicago ChingFan Sheu, DePaul University,
More informationMihaela Ene, Elizabeth A. Leighton, Genine L. Blue, Bethany A. Bell University of South Carolina
Paper 1342014 Multilevel Models for Categorical Data using SAS PROC GLIMMIX: The Basics Mihaela Ene, Elizabeth A. Leighton, Genine L. Blue, Bethany A. Bell University of South Carolina ABSTRACT Multilevel
More informationData Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationln(p/(1p)) = α +β*age35plus, where p is the probability or odds of drinking
Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into
More informationSurvey, Statistics and Psychometrics Core Research Facility University of NebraskaLincoln. LogRank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of NebraskaLincoln LogRank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
More information1.1. Simple Regression in Excel (Excel 2010).
.. Simple Regression in Excel (Excel 200). To get the Data Analysis tool, first click on File > Options > AddIns > Go > Select Data Analysis Toolpack & Toolpack VBA. Data Analysis is now available under
More information2. Making example missingvalue datasets: MCAR, MAR, and MNAR
Lecture 20 1. Types of missing values 2. Making example missingvalue datasets: MCAR, MAR, and MNAR 3. Common methods for missing data 4. Compare results on example MCAR, MAR, MNAR data 1 Missing Data
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationLongitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts, Procedures and Illustrations
Research Article TheScientificWorldJOURNAL (2011) 11, 42 76 TSW Child Health & Human Development ISSN 1537744X; DOI 10.1100/tsw.2011.2 Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts,
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More information