Regression III: Advanced Methods



Similar documents
POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Part 2: Analysis of Relationship Between Two Variables

Module 5: Multiple Regression Analysis

Example: Boats and Manatees

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Regression III: Advanced Methods

Simple Regression Theory II 2010 Samuel L. Baker

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Examining a Fitted Logistic Model

Multiple Linear Regression

Simple Linear Regression Inference

Chapter 7: Simple linear regression Learning Objectives

Multivariate Normal Distribution

Least-Squares Intersection of Lines

Penalized regression: Introduction

2. Simple Linear Regression

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

SAS Software to Fit the Generalized Linear Model

Factor Analysis. Chapter 420. Introduction

Smoothing and Non-Parametric Regression

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Introduction to General and Generalized Linear Models

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Least Squares Estimation

Notes on Applied Linear Regression

ANALYSIS OF TREND CHAPTER 5

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Statistical Models in R

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Using R for Linear Regression

Statistical Models in R

Section 1.1 Linear Equations: Slope and Equations of Lines

Directions for using SPSS

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

A Basic Introduction to Missing Data

Multiple Regression: What Is It?

Simple linear regression

DERIVATIVES AS MATRICES; CHAIN RULE

17. SIMPLE LINEAR REGRESSION II

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases:

CURVE FITTING LEAST SQUARES APPROXIMATION

STATISTICA Formula Guide: Logistic Regression. Table of Contents

The KaleidaGraph Guide to Curve Fitting

Microeconomic Theory: Basic Math Concepts

13. Poisson Regression Analysis

Dimensionality Reduction: Principal Components Analysis

Descriptive Statistics

CAPM, Arbitrage, and Linear Factor Models

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

LOGIT AND PROBIT ANALYSIS

Specifications for this HLM2 run

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

SPSS Guide: Regression Analysis

Session 7 Bivariate Data and Analysis

Chapter 1 Introduction. 1.1 Introduction

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Factors affecting online sales

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Correlation and Simple Linear Regression

Indiana State Core Curriculum Standards updated 2009 Algebra I

II. DISTRIBUTIONS distribution normal distribution. standard scores

2013 MBA Jump Start Program. Statistics Module Part 3

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

The Graphical Method: An Example

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING THE TWO-WAY ANOVA

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Objectives. Materials

Statistics in Retail Finance. Chapter 6: Behavioural models

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Introduction to Matrix Algebra

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

5 Systems of Equations

data visualization and regression

Module 3: Correlation and Covariance

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Equations, Inequalities & Partial Fractions

11. Analysis of Case-control Studies Logistic Regression

Nonlinear Regression:

exspline That: Explaining Geographic Variation in Insurance Pricing

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Integration. Topic: Trapezoidal Rule. Major: General Engineering. Author: Autar Kaw, Charlie Barker.

Introduction to Principal Component Analysis: Stock Market Values

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Simple Predictive Analytics Curtis Seare

Principal Component Analysis

Confidence Intervals for One Standard Deviation Using Standard Deviation

Transcription:

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3

Goals of the Lecture Introduce Additive Models Explain how they extend from simple nonparametric regression (i.e., local polynomial regression) Discuss estimation using backfitting Explain how to interpret their results Conclude with some examples of Additive Models applied to real social science data 2

Limitations of the Multiple Nonparametric Models Recall that the general nonparametric model (both the lowess smooth and the smoothing spline) takes the following form: As we see here, the multiple nonparametric model allows all possible interactions between the independent variables in their effects on Y we specify a jointly conditional functional form This model is ideal under the following circumstances: 1. There are no more than two predictors 2. The pattern of nonlinearity is complicated and thus cannot be easily modelled with a simple transformation or polynomial regression 3. The sample size is sufficiently large 3

Limitations of the Multiple Nonparametric Models (2) The general nonparametric model becomes impossible to interpret and unstable as we add more explanatory variables, however 1. For example, in the lowess case, as the number of variables increases, the window span must become wider in order to ensure that each local regression has enough cases This process can create significant bias (the curve becomes too smooth) 2. It is impossible to interpret general nonparametric regression when there are more than two variables there are no coefficients, and we cannot graph effects more than three dimensions These limitations lead us to the Additive Models 4

Additive Regression Models Additive regression models essentially apply local regression to low dimensional projections of the data The nonparametric additive regression model is The f i are arbitrary functions estimated from the data; the errors ε are assumed to have constant variance and a mean of 0 Additive models create an estimate of the regression surface by a combination of a collection of onedimensional functions The estimated functions f i are the analogues of the coefficients in linear regression 5

Additive Regression Models (2) The assumption that the contribution of each covariate is additive is analogous to the assumption in linear regression that each component is estimated separately Recall that the linear regression model is where the B j represent linear effects For the additive model we model Y as an additive combination of arbitrary functions of the Xs The f j represent arbitrary functions that can be estimated by lowess or smoothing splines 6

Additive Regression Models (3) Now comes the question: How do we find these arbitrary functions? If the X s were completely independent which will not be the case we could simply estimate each functional form using a nonparametric regression of Y on each of the X s separately Similarly in linear regression when the X s are completely uncorrelated the partial regression slopes are identical to the marginal regression slopes Since the X s are related, however, we need to proceed in another way, in effect removing the effects of other predictors which are unknown before we begin We use a procedure called backfitting to find each curve, controlling for the effects of the others 7

Estimation and Backfitting Suppose that we had a two predictor additive model: If we unrealistically knew the partial regression function f 2 but not f 1 we could rearrange the equation in order to solve for f 1 In other words, smoothing Y i -f 2 (x i2 ) against x i1 produces an estimate of α+f 1 (x i1 ). Simply put, knowing one function allows us to find the other in the real world, however we don t know either so we must proceed initially with estimates 8

Estimation and Backfitting (2) 1. We start by expressing the variables in mean deviation form so that the partial regressions sum to zero, thus eliminating the individual intercepts 2. We then take preliminary estimates of each function from a least-squares regression of Y on the X s 3. These estimates are then used as step (0) in an iterative estimation process 4. We then find the partial residuals for X 1, which removes Y from its linear relationship to X 2 but retains the relationship between Y and X 1 9

Estimation and Backfitting (3) The partial residuals for X 1 are then 5. The same procedure in step 4 is done for X 2 6. Next we smooth these partial residuals against their respective X s, providing a new estimate of f where S is the (n n) smoother transformation matrix for X j that depends only on the configuration of X ij for the jth predictor 10

Estimation and Backfitting (4) This process of finding new estimates of the functions by smoothing the partial residuals is reiterated until the partial functions converge That is, when the estimates of the smooth functions stabilize from one iteration to the next we stop When this process is done, we obtain estimates of s j (X ij ) for every value of X j More importantly, we will have reduced a multiple regression to a series of two-dimensional partial regression problems, making interpretation easy: Since each partial regression is only two-dimensional, the functional forms can be plotted on two-dimensional plots showing the partial effects of each X j on Y In other words, perspective plots are no longer necessary unless we include an interaction between two smoother terms 11

Interpreting the Effects A plot of of X j versus s j (X j ) shows the relationship between X j and Y holding constant the other variables in the model Since Y is expressed in mean deviation form, the smooth term s j (X j ) is also centered and thus each plot represents how Y changes relative to its mean with changes in X Interpreting the scale of the graphs then becomes easy: The value of 0 on the Y-axis is the mean of Y As the line moves away from 0 in a negative direction we subtract the distance from the mean when determining the fitted value. For example, if the mean is 45, and for a particular X-value (say x=15) the curve is at s j (X j )=4, this means the fitted value of Y controlling for all other explanatory variables is 45+4=49. If there are several nonparametric relationships, we can add together the effects on the two graphs for any particular observation to find its fitted value of Y 12

Additive Regression Models in R: Example: Canadian prestige data Here we use the Canadian Prestige data to fit an additive model to prestige regressed on income and occupation In R we use the gam function (for generalized additive models) that is found in mgcv package The gam function in mgcv fits only smoothing splines (local polynomial regression can be done in S-PLUS) The formula takes the same form as the glm function except now we have the option of having parametric terms and smoothed estimates Smooths will be fit to any variable specified with the s(variable) argument The simple R-script is as follows: 13

Additive Regression Models in R: Example: Canadian prestige data (2) The summary function returns tests for each smooth, the degrees of freedom for each smooth, and an adjusted R- square for the model. The deviance can be obtained from the deviance(model) command 14

Additive Regression Models in R: Example: Canadian prestige data (3) Again, as with other nonparametric models, we have no slope parameters to investigate (we do have an intercept, however) A plot of the regression surface is necessary 15

Additive Regression Models in R: Example: Canadian prestige data (4) Additive Model: We can see the nonlinear relationship for both education and Income with Prestige but there is no interaction between them i.e., the slope for income is the same at every value of education We can compare this model to the general nonparametric regression model 80 Prestige 60 40 20 5000 10000 15000 Income 20000 25000 8 10 12 14 Education 16

Additive Regression Models in R: Example: Canadian prestige data (5) General Nonparametric Model: This model is quite similar to the additive model, but there are some nuances particularly in the midrange of income that are not picked up by the additive model because the X s do not interact Prestige 80 60 40 20 5000 10000 15000 Income 20000 8 10 12 14 Education 25000 17

Additive Regression Models in R: Example: Canadian prestige data (6) Perspective plots can also be made automatically using the persp.gam function. These graphs include a 95% confidence region 80 60 40 20 5000 10000 15000 income 20000 25000 8 14 12 10 education red/green are +/-2 se 18

Additive Regression Models in R: Example: Canadian prestige data (7) Since the slices of the additive regression in the direction of one predictor (holding the other constant) are parallel, we can graph each partialregression function separately This is the benefit of the additive model we can graph as many plots as there are variables, and allowing us to easily visualize the relationships In other words, a multidimensional regression has been reduced to a series of two-dimensional partial-regression plots To get these in R: 19

Additive Regression Models in R: Example: Canadian prestige data (8) -20 0 10 0 5000 10000 15000 20000 25000 income s(education,3.18) -20 0 10 s(income,3.12) 6 8 10 12 14 16 education 20

Additive Regression Models in R: Example: Canadian prestige data (9) s(income,3.12) -20-10 0 10 20 s(education,3.18) -20-10 0 10 20 0 5000 15000 25000 income 6 8 10 12 14 16 education 21

R-script for previous slide 22

Residual Sum of Squares As was the case for smoothing splines and lowess smooths, statistical inference and hypothesis testing is based on the residual sum of squares (or deviance in the case of generalized additive models) and the degrees of freedom The RSS for an additive model is easily defined in the usual manner: The approximate degrees of freedom, however, need to be adjusted from the regular nonparametric case, however, because we are no longer specifying a jointly-conditional functional form 23

Degrees of Freedom Recall that for nonparametric regression, the approximate degrees of freedom are equal to the trace of the smoother matrix (the matrix that projects Y onto Y-hat) We extend this to the additive model: 1 is subtracted from each df reflecting the constraint that each partial regression function sums to zero (the individual intercept have been removed) Parametric terms entered in the model each occupy a single degree of freedom as in the linear regression case The individual degrees of freedom are then combined for a single measure: 1 is added to the final degrees of freedom to account for the overall constant in the model 24

Testing for Linearity I can compare the linear model of prestige regressed on income and education with the additive model by carrying out an analysis of deviance I begin by fitting the linear model using the gam function Next I want the residual degrees of freedom from the additive model 25

Testing for Linearity (2) Now I simply calculate the difference in the deviance between the two model relative to the difference in degrees of freedom (difference in df=7.3-2=5) This gives a Chi-square test for linearity The difference between the models is highly statistically significant the additive model describe the relationship between prestige and education and income much better 26

Testing for Linearity An anova function written by John Fox (see the R-script for this class) makes the analysis of deviance simpler to implement: As we see here, the results are identical to those found on the previous slide 27