POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.



Similar documents
Simple Linear Regression Inference

Univariate Regression

Regression III: Advanced Methods

Part 2: Analysis of Relationship Between Two Variables

2. Simple Linear Regression

Elementary Statistics Sample Exam #3

Module 5: Multiple Regression Analysis

Testing for Lack of Fit

Chapter 7: Simple linear regression Learning Objectives

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

Dimensionality Reduction: Principal Components Analysis

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Factors affecting online sales

Causal Forecasting Models

Simple Regression Theory II 2010 Samuel L. Baker

Regression Analysis: A Complete Example

Using R for Linear Regression

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Example: Boats and Manatees

Multiple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

ANALYSIS OF TREND CHAPTER 5

The Correlation Coefficient

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Notes on Applied Linear Regression

Algebra I Vocabulary Cards

Algebra 1 Course Title

Exercise 1.12 (Pg )

Multiple Regression: What Is It?

Introduction to General and Generalized Linear Models

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Section 13, Part 1 ANOVA. Analysis Of Variance

Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

Pearson s Correlation

Least Squares Estimation

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Least-Squares Intersection of Lines

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Correlation and Regression Analysis: SPSS

Review of Fundamental Mathematics

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Final Exam Practice Problem Answers

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

General Regression Formulae ) (N-2) (1 - r 2 YX

17. SIMPLE LINEAR REGRESSION II

Indiana State Core Curriculum Standards updated 2009 Algebra I

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Applied Regression Analysis and Other Multivariable Methods

The importance of graphing the data: Anscombe s regression examples

DATA INTERPRETATION AND STATISTICS

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Interaction between quantitative predictors

Scatter Plot, Correlation, and Regression on the TI-83/84

Introduction to Matrix Algebra

Chapter 4 and 5 solutions

SPSS Guide: Regression Analysis

Nonlinear Regression:

Algebra and Geometry Review (61 topics, no due date)

Correlation key concepts:

5. Linear Regression

Straightening Data in a Scatterplot Selecting a Good Re-Expression Model

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Curve Fitting. Before You Begin

PEARSON R CORRELATION COEFFICIENT

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

2013 MBA Jump Start Program. Statistics Module Part 3

Elements of statistics (MATH0487-1)

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

MATH. ALGEBRA I HONORS 9 th Grade ALGEBRA I HONORS

Marginal Person. Average Person. (Average Return of College Goers) Return, Cost. (Average Return in the Population) (Marginal Return)

Difference of Means and ANOVA Problems

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Linear Models in STATA and ANOVA

Week 5: Multiple Linear Regression

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Regression III: Advanced Methods

What are the place values to the left of the decimal point and their associated powers of ten?

Additional sources Compilation of sources:

Module 3: Correlation and Covariance

Correlation and Simple Linear Regression

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS

Comparing Nested Models

Regression step-by-step using Microsoft Excel

Simple linear regression

Chapter 23. Inferences for Regression

Algebra 1 Course Information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Lean Six Sigma Analyze Phase Introduction. TECH QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Post-hoc comparisons & two-way analysis of variance. Two-way ANOVA, II. Post-hoc testing for main effects. Post-hoc testing 9.

Transcription:

Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression that allows one to predict a single y variable by decomposing the x variable into a n th order polynomial. Generally has the form: y = a + ß 1 x + ß x + ß 3 x 3... ß k x k In polynomial regression, different powers of the x variable are successively added to the equation to see if they increase r significantly. Note, as successive powers of x are added to the equation, the best fit line changes shape. y = a + ß 1 x (a straight line) y = a + ß 1 x + ß x (this quadratic equation produces a parabola) y = a + ß 1 x + ß x + ß 3 x 3 (this cubic equation produces a s-shaped curve) 1

An examination of a scatter plot of the data should help you decide to what power to use. For a single curvilinear line use the quadratic equation. If the curve has two bends, you'll need a cubic model. Curves with multiple kinks need even higher-order terms. It's rare to go past a quadratic. Significance tests should be conducted to test if the polynomial regression explains a significant amount of the variance. One conducts hypothesis tests at each step to see if the increase in r is significantly different. Thus you test the difference in the r value between the lower power and next higher power equations asking the question: does an additional term significantly increase the r value? Begin by testing if the data are linear. If so, stop (no polynomial regression is necessary). If the linear regression is insignificant, compare the r for the linear versus the r for the quadratic equation. Continue adding powers and doing tests until significance is reached. H 0 : r i = r j H 1 : r i < r j Where i refers to the lower power equation and j refers to the higher power equation. This test is done using the F test statistic where: r j r i F = df j 1 r j F is distributed with j degrees of freedom in the numerator and n-j-1 degrees of freedom in the denominator where j is the power of the higher order polynomial. Warning. With the addition of successive powers of x, the r will no doubt increase. As the powers of x (k or the order of the polynomial equation) approaches n-1, the equation will begin to give a "perfect fit" by connecting all the data points. This is not what one wants! Best to generally stay within the quadratic (x ) or cubic (x 3 ) orders

Multiple Regression (Least Squares) Multiple regression allows one to predict a single variable from a set of k additional variables where k = m-1 Generally has the form: y = a + ß 1 x 1 + ß x... ß k x k a = initial intercept ß = partial regression coefficients for variables x 1... x k Can get the matrix of coefficients (the ß's) by the equation B=A C where... B = vector of regression coefficients A = matrix of sums of squares and cross products (based on mean deviations) C = a vector of sums of cross products of y B ß1 ß k A C Σx 1 Σx 1 x k Σx 1 y = Σx k x 1 Σx k Σx k y To get the initial intercept (a) plug in the means for y and each x, then solve for a a = y ß 1 x 1 ß x... ß k x k The equation values for each regressed object onto the line equals: ˆ y i =a+ß 1 x i1 + ß x i... ß k x ik + e i 3

The error term (e i ) for each ˆ y i can be found (y i - ˆ y i) The ß's may be difficult to interpret (but may be thought of as weights or loadings). In matrix form, Y=XB+E where... Y is a vector of independent variables X is the matrix of dependent variables B is the column vector of regression coefficients E is the column vector of residuals (error terms) B is determined to minimize the sums of squares of the residuals and corresponds to finding the best fitting k-dimensional hyper plane in m dimensional space. The least-squares approach minimized the sum of the squared distances parallel to the y-axis Significant tests on multiple regression H 0 : ß 0 = ß 1 = ß... ß k = 0 H 1 : at least one of the ß coefficients 0 Statistics are the same as with bivariate regression and analysis of variance. Partition the variance into three different sums of squares SS r = ( ˆ y i y ) SS D = ( y i ˆ y 1 ) SS T = SS r + SS D Variance explained by regression = S r = SS r / k ( ) Unexplained variance = S D = SS D / n - k -1 ( ) Total variance = S t = SS t / n -1 note: the ratio SS r /SS t = R coefficient where R = the multiple correlation 4

Use F-test to see if the regression explains a significant proportion of the variance, i.e. the multiple regression coefficients = 0. Observed value of F can be obtained by: F = S r S D and should be tested against a critical value of F from a table for a desired α and k df in numerator and n-k-1 df in denominator. Each individual regression coefficient can also be tested by the t-test by calculating the standard error for each variable. For each coefficient calculate the standard error (se b ) and then calculated t value. for ß coefficient j = se b j = S 1 a D jj where a -1 is element j in the inverses matrix of sums of squares and jj cross products t = ß j 0 se b j test the observed t-value for each coefficient (which is distributed with n-k-1 degrees of freedom 5

e.g. Variables measured on trilobites VARIABLES Sample L (Y) W(X1) GSL(X) GA(X3) TH(X4) 1 3 6. 4.05 1.3 5.05 4. 6.45 4.3 1.5 5.5 3 4.9 6.95 3.9 1.15 5.7 4 4.65 6.75 4.1 1. 5.65 5 4.75 7 4 1. 5.3 6 5.5 7.15 4.1 1. 5.6 7 5.3 7.35 4.05 1.3 5.85 8 4.75 6.85 4.1 1. 5.55 9 5.05 6.85 4.35 1.35 5.5 10 5.15 6.85 4.1 1.5 5.65 6