Least Squares Regression. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University arnholt@math.appstate.



Similar documents
Correlation. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

Univariate Regression

Multiple Regression. Page 24

Multiple Linear Regression

Econometrics Simple Linear Regression

Comparing Nested Models

Week 5: Multiple Linear Regression

MULTIPLE REGRESSION EXAMPLE

2. Simple Linear Regression

Exercise 1.12 (Pg )

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang

Using R for Linear Regression

1.1. Simple Regression in Excel (Excel 2010).

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Introduction to Linear Regression

5. Linear Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Introduction to Linear Regression

Statistical Models in R

Simple Linear Regression Inference

SPSS Guide: Regression Analysis

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Psychology 205: Research Methods in Psychology

Correlation and Simple Linear Regression

Generating Random Data. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

Regression III: Advanced Methods

Session 7 Bivariate Data and Analysis

Causal Forecasting Models

The importance of graphing the data: Anscombe s regression examples

The Big Picture. Correlation. Scatter Plots. Data

Coefficient of Determination

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Elementary Statistics Sample Exam #3

Linear Regression. use waist

Part 2: Analysis of Relationship Between Two Variables

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

Homework 8 Solutions

When to use Excel. When NOT to use Excel 9/24/2014

Section 1: Simple Linear Regression

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

Regression Analysis (Spring, 2000)

Notes on Applied Linear Regression

Correlation and Regression

Simple linear regression

Getting Correct Results from PROC REG

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Example: Boats and Manatees

2013 MBA Jump Start Program. Statistics Module Part 3

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Scatter Plot, Correlation, and Regression on the TI-83/84

Estimation of σ 2, the variance of ɛ

1 Simple Linear Regression I Least Squares Estimation

5 Correlation and Data Exploration

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Introduction to Regression and Data Analysis

Chapter 23. Inferences for Regression

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Dimensionality Reduction: Principal Components Analysis

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

1 Introduction to Matrices

13. Poisson Regression Analysis

Multiple Regression: What Is It?

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Testing for Lack of Fit

5. Multiple regression

Pearson s Correlation

Regression Analysis: A Complete Example

Simple Linear Regression

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Algebra Cheat Sheets

Georgia Standards of Excellence Curriculum Map. Mathematics. GSE 8 th Grade

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Final Exam Practice Problem Answers

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Common Core Unit Summary Grades 6 to 8

SPSS Tests for Versions 9 to 13

Statistical Models in R

Factors affecting online sales

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Moderation. Moderation

Chapter 7: Simple linear regression Learning Objectives

17. SIMPLE LINEAR REGRESSION II

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Transcription:

Least Squares Regression Alan T. Arnholt Department of Mathematical Sciences Appalachian State University arnholt@math.appstate.edu Spring 2006 R Notes 1 Copyright c 2006 Alan T. Arnholt

2 Least Squares Regression Overview of Regression The R Script

3 Least Squares Regression When a linear pattern is evident from a scatter plot, the relationship between the two variables is often modeled with a straight line.

4 Least Squares Regression When a linear pattern is evident from a scatter plot, the relationship between the two variables is often modeled with a straight line. When modeling a bivariate relationship, Y is called the response or dependent variable, and x is called the predictor or independent variable.

5 Least Squares Regression When a linear pattern is evident from a scatter plot, the relationship between the two variables is often modeled with a straight line. When modeling a bivariate relationship, Y is called the response or dependent variable, and x is called the predictor or independent variable. The simple linear regression model is written Y i = β 0 + β 1 x i + ε i (1)

6 OLS The goal is to estimate the coefficients β 0 and β 1 in (1). The most well known method of estimating the coefficients β 0 and β 1 is to use ordinary least squares (OLS). OLS provides estimates of β 0 and β 1 by minimizing the sum of the squared deviations of the Y i s for all possible lines. Specifically, the sum of the squared residuals (ˆε i = e i = Y i Ŷi) is minimized when the OLS estimators of β 0 and β 1 are b 1 = b 0 = ȳ b 1 x (2) n i=1 (x i x) (y i ȳ) n i=1 (x i x) 2 (3) respectively. Note that the estimated regression function is written as Ŷ i = b 0 + b 1 x i.

6 5 4 Y 3 Ŷ 2 } 2 ˆε 2 = Y 2 Ŷ2 1 0 Y 2 0 1 2 3 4 5 6 x Figure: Graph depicting residuals. The vertical distances shown with a dotted line between the Y i s, depicted with a solid circle, and the Ŷis, depicted with a clear square, are the residuals.

8 Example Use the data frame Gpa from the BSDA package to: 1. Create a scatterplot of CollGPA versus HSGPA.

9 Example Use the data frame Gpa from the BSDA package to: 1. Create a scatterplot of CollGPA versus HSGPA. 2. Find the least squares estimates of β 0 and β 1 using Equations (2) and (3) respectively.

10 Example Use the data frame Gpa from the BSDA package to: 1. Create a scatterplot of CollGPA versus HSGPA. 2. Find the least squares estimates of β 0 and β 1 using Equations (2) and (3) respectively. 3. Find the least squares estimates of β 0 and β 1 using the R function lm().

11 Example Use the data frame Gpa from the BSDA package to: 1. Create a scatterplot of CollGPA versus HSGPA. 2. Find the least squares estimates of β 0 and β 1 using Equations (2) and (3) respectively. 3. Find the least squares estimates of β 0 and β 1 using the R function lm(). 4. Add the least squares line to the scatterplot created in 1 using the R function abline().

12 R Code Code for part 1. > library(bsda) > attach(gpa) > Y <- CollGPA > x <- HSGPA > plot(x, Y, col="blue", + main="scatterplot of College Versus High School GPA", + xlab="high School GPA",ylab="College GPA")

Scatterplot of GPA Scatterplot of College Versus High School GPA College GPA 1.5 2.0 2.5 3.0 3.5 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 High School GPA Figure: Scatterplot requested in part 1

14 Using Equations (2) and (3) to Find b 0 and b 1 Using Equations (2) and (3) to answer part 2. > b1 <- sum( (x-mean(x))*(y-mean(y)) ) / + sum( (x-mean(x))^2 ) > b0 <- mean(y)-b1*mean(x) > c(b0,b1) [1] -0.950366 1.346999

15 Using abline() Using the R function abline() to add the least squares regression line to Figure 2 on page 13. abline() adds one or more straight lines to the current plot.

16 Using abline() Using the R function abline() to add the least squares regression line to Figure 2 on page 13. abline() adds one or more straight lines to the current plot. The arguments to abline() are a=b 0 and b=b 1

17 Using abline() Using the R function abline() to add the least squares regression line to Figure 2 on page 13. abline() adds one or more straight lines to the current plot. The arguments to abline() are a=b 0 and b=b 1 > abline(model,col="blue",lwd=2) Note: the object model contains b 0 and b 1.

18 Scatterplot of GPA with Superimposed Least Squares Regression Line Scatterplot of College Versus High School GPA College GPA 1.5 2.0 2.5 3.0 3.5 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 High School GPA Figure: Scatterplot requested in part 4

19 Residuals and Predicted (Fitted) Values The i th residual is defined to be e i = Y i Ŷi.

20 Residuals and Predicted (Fitted) Values The i th residual is defined to be e i = Y i Ŷi. The resulting value Ŷi from Equation (1) given an x i is referred to as the predicted value, as well as the fitted value.

21 Residuals and Predicted (Fitted) Values The i th residual is defined to be e i = Y i Ŷi. The resulting value Ŷi from Equation (1) given an x i is referred to as the predicted value, as well as the fitted value. The R functions predict() and fitted() can be used on lm objects.

22 Using fitted() and predict() > yhat <- b0+b1*x > yhatrp <- predict(model) > yhatrf <- fitted(model) > e <- Y - yhat > er <- resid(model) > COMPARE <- rbind(yhat,yhatrp,yhatrf,e,er) > COMPARE[,1:4] # all rows columns 1:4 1 2 3 4 yhat 2.68653 3.2253294 1.8783309 3.3600293 yhatrp 2.68653 3.2253294 1.8783309 3.3600293 yhatrf 2.68653 3.2253294 1.8783309 3.3600293 e -0.48653-0.4253294 0.5216691 0.4399707 er -0.48653-0.4253294 0.5216691 0.4399707

23 Sum of Squares Due to Error The sum of squares due to error (also called the residual sum of squares) is defined as SSE = n (Y i Ŷi) 2 = e 2 i (4) i=1 Use the definition in (4) and the R function anova() to compute the SSE for the regression of Y on x (Gpa).

24 R Code > SSE <- sum(e^2) > SSE [1] 1.502284 > anova(model) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) x 1 3.7177 3.7177 19.798 0.002141 ** Residuals 8 1.5023 0.1878 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 > anova(model)[2,2] [1] 1.502284

25 Pretty ANOVA Table Df Sum Sq Mean Sq F value Pr(>F) x 1 3.718 3.718 19.798 0.002 Residuals 8 1.502 0.188

26 Link to the R Script Go to my web page Script for Regression Homework: problems 2.35-2.40, 2.42-2.46 See me if you need help!