Coefficient of Determination



Similar documents
Estimation of σ 2, the variance of ɛ

Hypothesis testing - Steps

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Two Related Samples t Test

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Interaction between quantitative predictors

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

5.1 Identifying the Target Parameter

CALCULATIONS & STATISTICS

4. Continuous Random Variables, the Pareto and Normal Distributions

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for Cp

Chapter 4 and 5 solutions

Confidence Intervals for Spearman s Rank Correlation

2. Simple Linear Regression

Confidence Intervals for Cpk

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Final Exam Practice Problem Answers

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Confidence intervals

3.4 Statistical inference for 2 populations based on two samples

The Normal distribution

Confidence Intervals for One Standard Deviation Using Standard Deviation

1 Simple Linear Regression I Least Squares Estimation

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notes on Applied Linear Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Econometrics Simple Linear Regression

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Regression Analysis: A Complete Example

CORRELATION ANALYSIS

SPSS Guide: Regression Analysis

Simple Regression Theory II 2010 Samuel L. Baker

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Section 1: Simple Linear Regression

Chapter 13 Introduction to Linear Regression and Correlation Analysis

2. Linear regression with multiple regressors

Part II. Multiple Linear Regression

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

CS 147: Computer Systems Performance Analysis

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

Experimental Uncertainties (Errors)

Using R for Linear Regression

17. SIMPLE LINEAR REGRESSION II

Social Studies 201 Notes for November 19, 2003

Session 7 Bivariate Data and Analysis

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

MULTIPLE REGRESSION EXAMPLE

Univariate Regression

Recall this chart that showed how most of our course would be organized:

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

How To Run Statistical Tests in Excel

12: Analysis of Variance. Introduction

Standard Deviation Estimator

Theory at a Glance (For IES, GATE, PSU)

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

QUADRATIC EQUATIONS EXPECTED BACKGROUND KNOWLEDGE

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Vector and Matrix Norms

Simple Linear Regression

How To Test For Significance On A Data Set

Lecture Notes Module 1

Part 2: Analysis of Relationship Between Two Variables

A Short Guide to Significant Figures

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

NCC5010: Data Analytics and Modeling Spring 2015 Practice Exemption Exam

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis

6.4 Normal Distribution

Probability Distributions

NCSS Statistical Software

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Chapter 2. Hypothesis testing in one population

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

Predictability Study of ISIP Reading and STAAR Reading: Prediction Bands. March 2014

5. Linear Regression

Illustration (and the use of HLM)

International Statistical Institute, 56th Session, 2007: Phil Everson

Statistics 104: Section 6!

INTRODUCTION TO MULTIPLE CORRELATION

Multiple Linear Regression in Data Mining

1.5 Oneway Analysis of Variance

Difference of Means and ANOVA Problems

Simple Linear Regression Inference

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

CURVE FITTING LEAST SQUARES APPROXIMATION

Statistics and Data Analysis

Chapter 5 Risk and Return ANSWERS TO SELECTED END-OF-CHAPTER QUESTIONS

Probability Distribution for Discrete Random Variables

4. Multiple Regression in Practice

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Lin s Concordance Correlation Coefficient

Analytical Test Method Validation Report Template

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Transcription:

Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed as: R 2 = SS yy SSE SS yy = SS yy SS yy SSE SS yy = 1 SSE SS yy R 2 measures the relative sizes of SS yy and SSE. The smaller SSE, the more reliable the predictions obtained from the model. Stat 328 - Fall 2004 1

Coefficient of Determination (cont d) The higher the R 2, the more useful the model. R 2 takes on values between 0 and 1. Essentially, R 2 tells us how much better we can do in predicting y by using the model and computing ŷ than by just using the mean ȳ as a predictor. Note that when we use the model and compute ŷ the prediction depends on x because ŷ = b 0 + b 1 x. Thus, we act as if x contains information about y. If we just use ȳ to predict y, then we are saying that x does not contribute information about y and thus our predictions of y do not depend on x. Stat 328 - Fall 2004 2

Coefficient of Determination (cont d) More formally: SS yy measures the deviations of the observations from their mean: SS yy = i (y i ȳ) 2. If we were to use ȳ to predict y, then SS yy would measure the variability of the y around their predicted value. SSE measures the deviations of observations from their predicted values: SSE = i (y i ŷ i ) 2. If x contributes no information about y, then SS yy and SSE will be almost identical, because b 1 0. If x contributes lots of information about y then SSE is very small. Interpretation: R 2 tells us how much better we do by using the regression equation rather than just ȳ to predict y. Stat 328 - Fall 2004 3

Coefficient of Determination - Example Consider Tampa sales example. From printout, R 2 = 0.9453. Interpretation: 94% of the variability observed in sale prices can be explained by assessed values of homes. Thus, the assessed value of the home contributes a lot of information about the home s sale price. We can also find the pieces we need to compute R 2 by hand in either JMP or SAS outputs: SS yy is called Sum of Squares of Model in SAS and JMP SSE is called Sum of Squares of Error in both SAS and JMP. In Tampa sales example, SS yy = 1673142, SSE = 96746 and thus R 2 = 1673142 96746 1673142 = 0.94. Stat 328 - Fall 2004 4

Estimation and prediction With our regression model, we might wish to do two things: 1. Estimate the mean (or expected) value of y for a given x. 2. Predict the value of a single y given a value of x. In both cases, we use the same sample estimator (or predictor): ŷ = b 0 + b 1 x. The difference between estimating a mean or predicting a single observation is in the accuracy with which we can do each of these two things the standard errors in each of the two cases are different. Stat 328 - Fall 2004 5

Estimation and prediction (cont d) The standard deviation of the estimator ŷ of the mean of y for a certain value of x, say x p is where σŷ = σ 1 n + (x p x) 2 SS xx, σ is the error standard deviation, estimated by RMSE (or S). x p is the specific value of x for which we wish to estimate the mean of the y σŷ is called the standard error of ŷ. If we use RMSE in place of σ, we obtain an estimate ˆσŷ. Stat 328 - Fall 2004 6

Estimation and prediction (cont d) The standard deviation of the estimator ŷ of an individual y value given a certain value of x, say x p is σ (y ŷ) = σ 1 + 1 n + (x p x) 2 SS xx, We call σ (y ŷ) the standard error of prediction. If we use RMSE (or S) in place of σ, then we have an estimate of the standard error of prediction, and we denote the estimate by ˆσ (y ŷ). Stat 328 - Fall 2004 7

Estimation and prediction - Example Consider the Tampa sales example, and refer to the JMP ouput. From output, RMSE == S = 32.78, and mean assessed price ( x) is $201.75. We wish to estimate the mean price of houses assessed at x p = $320 (in $1,000s) and also compute ˆσŷ, the standard error of ŷ: ŷ = 20.94 + 1.069 320 = 363. To compute ˆσŷ we also need SS xx. We use the computational formula SS xx = i x 2 i n( x) 2. Stat 328 - Fall 2004 8

Estimation and prediction - Example To get i x2 i we can create a new column in JMP which is equal to Assvalue squared, and then ask for its sum. In Tampa sales example: SS xx = i x 2 i n( x) 2 = 5, 209, 570.75 92 201.75 2 = 1, 464, 889. An estimate of the standard error of ŷ is now: 1 (320 201.75)2 ˆσŷ = 32.78 + 92 1, 464, 889 = 32.78 0.01086 + 0.009545 = 4.68. Stat 328 - Fall 2004 9

Estimation and prediction - Example Suppose that now we wish to predict the sale price of a single house that is appraised at $320,000. The point estimate is the same as before: ŷ = 20.94 + 1.069 320 = 363. The standard error of prediction however is computed using the second formula: ˆσ (y ŷ) = S 1 + 1 n + (x p x) 2 SS xx. Stat 328 - Fall 2004 10

Estimation and prediction - Example We have S (or RMSE), n, (x p x) 2 and SS xx from before, so all we need to do is ˆσ (y ŷ) = 32.78 1 + 1 92 + (320 201.75)2 1, 464, 889 = 32.78 1 + 0.01086 + 0.009545 = 33.11 Stat 328 - Fall 2004 11

Estimation and prediction - Example Note that in Tampa sales example, ˆσ (y ŷ) > ˆσŷ (33.11 versus 4.68). This is true always: we can estimate a mean value for y for a given x p much more accurately than we can predict the value of a single y for x = x p. In estimating a mean y for x = x p, the only uncertainty arises because we do not know the true regression line. In predicting a single y for x = x p, we have two uncertainties: the true regression line plus the expected variability of y values around the true line. Stat 328 - Fall 2004 12

Estimation and prediction - Using JMP For each observation in a dataset we can get from JMP (or from SAS): ŷ, ˆσŷ, and also ˆσ (y ŷ). In JMP do: 1. Choose Fit Model 2. From Response icon, choose Save Columns and then choose Predicted Values, Std Error of Predicted, and Std Error of Individual. Stat 328 - Fall 2004 13

Estimation and prediction - Using JMP A VERY unfortunate thing! book: JMP calls things different from the In book: ˆσŷ is standard error of estimation but in JMP it is standard error of prediction. In book: ˆσ (y ŷ) is standard error of prediction but in JMP it is standard error of individual. SAS calls them the same as the book: standard error of the mean and standard error of prediction. Stat 328 - Fall 2004 14

Confidence intervals We can compute a 100(1 α)% CI for the true mean of y at x = x p. We can also compute a 100(1 α)% CI for true value of a single y at x = x p. In both cases, the formula is the same as the general formula for a CI: estimator ± tα 2,n 2 standard error Stat 328 - Fall 2004 15

Confidence intervals (cont d) The CI for the true mean of y at x = x p is ŷ ± tα 2,n 2ˆσŷ The CI for a true single value of y at x = x p is ŷ ± tα 2,n 2ˆσ (y ŷ) Stat 328 - Fall 2004 16

Confidence intervals - Example In Tampa sales example, we computed ŷ for x = 320 and we also computed the standard error of the mean of y and the standard error of a single y at x = 320. The 95% CI for the true mean of y at x = 320 is 95%CI = ŷ ± tα 2,n 2ˆσŷ = 363 ± 1.98 4.68 = (354, 372). The 95% CI for the true value of a single y at x = 320 is 95%CI = ŷ ± tα 2,n 2ˆσ (y ŷ) = 363 ± 1.98 33.11 = (297, 429). Stat 328 - Fall 2004 17

Confidence intervals - Interpretation The 95% CI for the mean sale price of houses assessed at $320,000 is $354,000 to $372,000. If many houses assessed at about $320,000 go on the market, we expect that the mean sale price of those houses will be included within those two values. The 95% CI for the sale price of a single house that is assessed at $320,000 is $297,000 to $429,000. That means that a homeowner who has a house valued at $320,000 can expect to get between $297,000 and $429,000 if she decides to sell the house. Again, notice that it is much more difficult to precisely predict a single value than it is to predict the mean of many values. See Figure 3.25 on page 135 of textbook. Stat 328 - Fall 2004 18