Univariate Regression



Similar documents
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Simple Linear Regression Inference

2. Simple Linear Regression

Simple linear regression

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Example: Boats and Manatees

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

17. SIMPLE LINEAR REGRESSION II

5. Linear Regression

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Week TSX Index

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

The Big Picture. Correlation. Scatter Plots. Data

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Multiple Linear Regression

Introduction to Regression and Data Analysis

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Homework 11. Part 1. Name: Score: / null

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Exercise 1.12 (Pg )

Final Exam Practice Problem Answers

Regression Analysis: A Complete Example

Simple Methods and Procedures Used in Forecasting

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Simple Linear Regression

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Testing for Lack of Fit

Module 5: Multiple Regression Analysis

Additional sources Compilation of sources:

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

Relationships Between Two Variables: Scatterplots and Correlation

Simple Predictive Analytics Curtis Seare

Scatter Plot, Correlation, and Regression on the TI-83/84

Simple Linear Regression, Scatterplots, and Bivariate Correlation

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Multiple Regression: What Is It?

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Correlation and Regression

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

The importance of graphing the data: Anscombe s regression examples

2013 MBA Jump Start Program. Statistics Module Part 3

SPSS Guide: Regression Analysis

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Simple Regression Theory II 2010 Samuel L. Baker

Factors affecting online sales

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Section 1: Simple Linear Regression

Using Excel for Statistical Analysis

Regression step-by-step using Microsoft Excel

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Analysing Questionnaires using Minitab (for SPSS queries contact -)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Linear Models in STATA and ANOVA

Chapter 7: Simple linear regression Learning Objectives

Least Squares Regression. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Part 2: Analysis of Relationship Between Two Variables

Example G Cost of construction of nuclear power plants

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

1.5 Oneway Analysis of Variance

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

Point Biserial Correlation Tests

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Study Guide for the Final Exam

DATA INTERPRETATION AND STATISTICS

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Statistics 104: Section 6!

II. DISTRIBUTIONS distribution normal distribution. standard scores

CHAPTER 11: ARBITRAGE PRICING THEORY

Elementary Statistics Sample Exam #3

Notes on Applied Linear Regression

Straightening Data in a Scatterplot Selecting a Good Re-Expression Model

How To Write A Data Analysis

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

SPSS Tests for Versions 9 to 13

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

Data Analysis Tools. Tools for Summarizing Data

Data Analysis for Marketing Research - Using SPSS

Getting Correct Results from PROC REG

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

Estimation of σ 2, the variance of ɛ

Projects Involving Statistics (& SPSS)

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Comparing Nested Models


The Dummy s Guide to Data Analysis Using SPSS

Premaster Statistics Tutorial 4 Full solutions

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Homework 8 Solutions

Common Core Unit Summary Grades 6 to 8

Statistical Models in R

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Transcription:

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is to +/- 1, the more closely the points of the scatterplot approach the regression line Squared Correlations r 2 is the proportion of the variance in the variable y which is accounted for by its relationship to x o i.e., how closely the dots cluster around the regression line o If r =.45 the two variables share ~20% of their variance Residuals Points usually don t all lie on the line. o The vertical difference between a real, observed y-value (Y) and the point that the regression line predicts it should be ( ) is called the residual. Regression involves an Independent (or explanatory) variable and a dependent (or response) variable. The Linear regression equation: It summarises / models real observations Allows us to try and make a prediction on the value of y, based on a given value of x beyond the values we have observed (between the limits of the observable data sample we have i.e. max. and min. values of x). It describes a straight line which minimizes squared deviations of observed values of Y from those on the regression line, i.e., the squared residuals Simple Linear Regression α, β: model parameters; = + + : the unpredictable random disturbance term α,β are unknown, and must be estimated using sample data We use the estimated regression equation = + Greek alphabet α, β, γ are used to denote parameters of a regression equation English alphabet a, b, c are used to denote the estimates of these parameters

You want to find a regression model which minimises the error term o Use the method of least squares to estimate α and β. Method of Least Squares Provides the regression line in which the sum of squared differences between the observed values and the values predicted by the model is as small as possible Σ(Y Y pred ) 2 Deviation = Σ(observed model) 2 o Differences are squared to allow for positive / negative (Y Y pred ) Goodness of Fit How well does the model describe / reproduce the observed data? Sums of Squares (SS) Use Sums of Squares = ( ) i.e. The sum of the squared differences between each observed value of y and the mean of y.

= ( ) i.e. The sum of the squared differences between each observed value of y and its corresponding predicted value of y. = ( ) i.e. The sum of the squared differences between each predicted value of y and the mean of y.

= These are the same equations as SS Between, SS Total and SS Within, respectively, in ANOVA R 2 = An indication of how much better the model is at predicting Y than if only the mean of Y was used. (Pearson Correlation Coefficient) 2 We want the ratio of SS M :SS T to be LARGE R 2 represents the proportion of variance in y that can be explained by the model R 2 * 100 = percentage of variance accounted for by the model In univariate regression, the correlation coefficient, r, is o Doesn t capture whether positive / negative, but this can be established by looking at a scatter plot or at b in the regression equation If the model is good at predicting, then SS M will be large compared to SS R Testing the Model Using the F-Ratio = SS are totals, therefore affected by sample size Mean Squares (MS) can be used instead (as in ANOVA) = = = = 1 F-ratio tells us how much better our model is at predicting values of Y than chance alone (the mean) As with ANOVA, we want our F to be LARGE Calculate critical value, or look up in table. Provide a p-value: o Generally speaking, when p <.05, the result is said to be significant.

Important Values in a Regression output e.g. And you can write the regression equation: Y i = b 0 + b 1 X i Health Value = 12.357 + (.237)*(whatever X is) X is significantly positively correlated with Y

X explains approximately 13.4% of variance in Y This is greater than the proportion expected by chance We can predict Y from X