Curvilinear Regression Analysis

Analysis Lecture 18 April 7, 2005 Applied Analysis Lecture #18-4/7/2005 Slide 1 of 29

Today s Lecture ANOVA with a continuous independent variable. Today s Lecture regression analysis. Interactions with continuous variables. Lecture #18-4/7/2005 Slide 2 of 29

An Example From Pedhazur, p. 513-514: Example ANOVA SS Differences ANOVA Table Assume that in an experiment on the learning of paired associates, the independent variable is the number of exposures to a list. Specifically, 15 subjects are randomly assigned, in equal numbers, to five levels of exposure to a list, so that one group is given one exposure, a second group is given two exposures, and so on to five exposures for the fifth group. The dependent variable measure is the number of correct responses on a subsequent test. Lecture #18-4/7/2005 Slide 3 of 29

The Analysis Running an ANOVA (from Analyze...General Linear Model...Univariate in SPSS) produces these results: Example ANOVA SS Differences ANOVA Table Lecture #18-4/7/2005 Slide 4 of 29

The Interpretation From the example, we could test the hypothesis: Example ANOVA SS Differences ANOVA Table H 0 : µ 1 = µ 2 = µ 3 = µ 4 = µ 5 Here, F 4,10 = 2.10, which gives a p-value of 0.156. Using any reasonable Type-I error rate (like 0.05), we would fail to reject the null hypothesis. We would then conclude that there is no effect of number of exposures on learning (as measured by test score). Note that for this analysis there were five coded vectors produced (four degrees of freedom for the numerator). Lecture #18-4/7/2005 Slide 5 of 29

A New Analysis Example ANOVA SS Differences ANOVA Table Instead of running an ANOVA to test for differences between the means of the test scores at each level of X, couldn t we run an linear regression? In the words of Marv Albert: YES! For the linear regression to be valid, the means of the levels of X must fall on the linear regression line. The key point is that the means must follow a linear trend. Using the difference between the ANOVA and the, I will show you how you can test for a linear trend in the analysis. Lecture #18-4/7/2005 Slide 6 of 29

Multiple Results Running an regression (from Analyze...Linear in SPSS) produces these results: Example ANOVA SS Differences ANOVA Table Lecture #18-4/7/2005 Slide 7 of 29

Multiple Results From the example, we could test the hypothesis: Example ANOVA SS Differences ANOVA Table H 0 : b 1 = 0 Here, F 1,13 = 8.95, which gives a p-value of 0.010. Using any reasonable Type-I error rate (like 0.05), we would reject the null hypothesis. We would then conclude that there is a significant relationship between number of exposures and learning (as measured by test score). This conclusion is different than the conclusion we drew before. What is different about our analysis? Lecture #18-4/7/2005 Slide 8 of 29

SS Differences Notice from the ANOVA analysis, the SS treatment = 8.40. Example ANOVA SS Differences ANOVA Table From the regression analysis, the SS regression = 7.50. Note the difference between the two. The SS treatment is larger. SS deviation = SS treatment SS regression = 0.90. The difference between SS treatment and SS regression is termed SS deviation. Take a look at how that difference comes about. Lecture #18-4/7/2005 Slide 9 of 29

SS Differences The estimated regression line is: Example ANOVA SS Differences ANOVA Table Y = 2.7 + 0.5X X N X X Y X Y ( X Y ) 2 N X ( X Y ) 2 1 3 3.0 3.2-0.2 0.04 0.12 2 3 4.0 3.7 0.3 0.09 0.27 3 3 4.0 4.2-0.2 0.04 0.12 4 3 5.0 4.7 0.3 0.09 0.27 5 3 5.0 5.2-0.2 0.04 0.12 NX ( X Y ) 2 0.90 Lecture #18-4/7/2005 Slide 10 of 29

Data Scatterplot Example ANOVA SS Differences ANOVA Table number correct 6.00 5.00 M M 4.00 M M 3.00 M 2.00 1.00 2.00 3.00 4.00 5.00 number of exposures Lecture #18-4/7/2005 Slide 11 of 29

SS Differences Example ANOVA SS Differences ANOVA Table The value obtained in the previous slide, 0.90, was equal to the SS deviation. The SS deviation is literally the calculation of a statistic that measures a variable s deviation from linearity. This value serves as a basis for the question of: What is the difference between restricting the data to confirm to a linear trend and placing no such restriction? (Pedhazur, p. 517) Lecture #18-4/7/2005 Slide 12 of 29

SS Differences Example ANOVA SS Differences ANOVA Table When the SS treatment is calculated, there is no restriction on the means of the treatment groups. If the means fall onto a (straight) line, there will be no difference between SS treatment and SS regression, SS deviation = 0. With departures from linearity, the SS treatment will be much larger than the SS regression. Do you feel a statistical hypothesis test coming on? Lecture #18-4/7/2005 Slide 13 of 29

Hypothesis Test Example ANOVA SS Differences ANOVA Table The SS Treatments can be partitioned into two components: SS (also called the SS due to linearity), and the remainder, the SS due to deviation from linearity. Source df SS M S F Between Treatments 4 8.60 Linearity 1 7.50 7.50 7.50 Deviation From Linearity 3 0.90 0.30 0.30 Within Treatments 10 10.00 1.00 Total 14 18.40 If the SS due to linearity leads to a significant F value, then one can conclude a linear trend exists, and that linear regression is appropriate. Lecture #18-4/7/2005 Slide 14 of 29

The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple The preceding example demonstrated how a linear trend could be detected using a statistical hypothesis test. A linear trend is something we are very familiar with, having encountered linear regression for most of this course. regression analysis can be used to determine if not-so-linear trends exist between Y and X. Pedhazur distinguishes between two types of trends possible: Intrinsically linear. Intrinsically nonlinear. Lecture #18-4/7/2005 Slide 15 of 29

The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple An intrinsically linear model is one that is linear in its parameters but not linear in the variables. By transformation such a model may be reduced to a linear model. Such models are the focus of this remainder of this lecture. An intrinsically nonlinear model is one that may not be coerced into linearity by transformation. Such models often require more complicated estimation algorithms than what is provided by least squares and the GLM. Lecture #18-4/7/2005 Slide 16 of 29

The Polynomial Model The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple A simple regression model extension for curved relations is the polynomial model, such as the following second-degree polynomial: Y = a + b 1 X 1 + b 2 X 2 1 One could also estimate a third-degree polynomial: Y = a + b 1 X 1 + b 2 X 2 1 + b 3 X 3 1 Or a fourth-degree polynomial: Y = a + b 1 X 1 + b 2 X 2 1 + b 3 X 3 1 + b 4 X 4 1 And so on... Lecture #18-4/7/2005 Slide 17 of 29

The Polynomial Model: Estimation The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple The way of determining the extent to which a given model is applicable is similar to determining if added variables significantly improve the predictive ability of a regression model. Beginning with a linear model (a first-degree polynomial), estimate the model, denoted as R 2 y.x. The tests of incremental variance accounted for are done for each level of the polynomial: Linear: R 2 y.x Quadratic: R 2 y.x,x 2 R 2 y.x Cubic: R 2 y.x,x 2,x 3 R 2 y.x,x 2 Quartic: R 2 y.x,x 2,x 3,x 4 R 2 y.x,x 2,x 3 Lecture #18-4/7/2005 Slide 18 of 29

A New Example From Pedhazur, p. 522: The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple Suppose that we are interested in the effect of time spent in practice on the performance of a visual discrimination task. Subjects are randomly assigned to different levels of practice, following which a test of visual discrimination is administered, and the number of correct responses is recorded for each subject. As there are six levels the highest-degree polynomial possible for these data is the fifth. Our aim, however, is to determine the lowest degree-polynomial that best fits the data. Lecture #18-4/7/2005 Slide 19 of 29

Data Scatterplot 20.00 The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple Task Score 15.00 10.00 5.00 2.50 5.00 7.50 10.00 Practice Time Lecture #18-4/7/2005 Slide 20 of 29

Estimation In SPSS The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple To estimate the degrees of a polynomial, first one must create new variables in SPSS, each representing X raised to a given power. Then successive regression analyses must be run, each adding a level to the equation: Model R 2 Increase Over Previous F X 0.883 0.940 121.029 * X, X 2 0.943 0.060 15.604 * X, X 2, X 3 0.946 0.003 0.911 Because adding X 3 did not significantly increase R 2, we stop with the quadratic model. Lecture #18-4/7/2005 Slide 21 of 29

Estimation In SPSS Of course, there is an easier way... In SPSS go to Analyze...Curve Estimation The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple Lecture #18-4/7/2005 Slide 22 of 29

Estimation In SPSS MODEL: MOD_2. Independent: x The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple Dependent Mth Rsq d.f. F Sigf b0 b1 b2 b3 y LIN.883 16 121.03.000 3.2667 1.5571 y QUA.943 15 123.55.000-1.9000 3.4946 -.1384 y CUB.946 14 82.18.000.6667 1.8803.1290 -.0127 Lecture #18-4/7/2005 Slide 23 of 29

Data Scatterplot The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple Lecture #18-4/7/2005 Slide 24 of 29

Parameter Interpretation The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple The b parameters in a polynomial regression are nearly impossible to interpret. An independent variable is represented by more than a single vector - what s held constant? The relative magnitude of the b parameters for different degrees cannot be compared because the SD of the higher degree polynomials explodes. X s 2 x X 2 (s 2 x) 2 X 3 (s 2 x) 3... Lecture #18-4/7/2005 Slide 25 of 29

Variable Centering The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple Centering variables in a polynomial equation can avoid collinearity problems. Centering does not change the R 2 of a model, only the regression parameters. Lecture #18-4/7/2005 Slide 26 of 29

Multiple The Polynomial Model New Example Estimation In SPSS Parameter Interpretation Variable Centering Multiple Running multiple curvilinear regression models are straight forward extensions from what was shown today: Y = a + b 1 X + b 2 Z + b 3 XZ + b 4 X 2 + b 5 Z 2 Note the cross-product XZ. This cross product term is tested above and beyond X and Z individually. Lecture #18-4/7/2005 Slide 27 of 29

Final Thought Final Thought Next Class regression can be accomplished using techniques we are familiar with. Interpretation can be tricky... We are all lucky to be students during this season... Lecture #18-4/7/2005 Slide 28 of 29

Next Time Final Thought Next Class No class next week (I m in Montreal...if you are there, say hello). Chapter 14: Continuous and categorical independent variables. Comedy provided by this guy: Lecture #18-4/7/2005 Slide 29 of 29