Formula for linear models. Prediction, extrapolation, significance test against zero slope.



Similar documents
SPSS Guide: Regression Analysis

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Chapter 23. Inferences for Regression

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

The Dummy s Guide to Data Analysis Using SPSS

Module 5: Statistical Analysis

An SPSS companion book. Basic Practice of Statistics

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Directions for using SPSS

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Simple linear regression

Chapter 7: Simple linear regression Learning Objectives

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Relationships Between Two Variables: Scatterplots and Correlation

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Univariate Regression

How To Run Statistical Tests in Excel

Table of Contents. Preface

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different)

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

SPSS Explore procedure

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Introduction to Regression and Data Analysis

Using SPSS, Chapter 2: Descriptive Statistics

Vectors. Objectives. Assessment. Assessment. Equations. Physics terms 5/15/14. State the definition and give examples of vector and scalar variables.

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2014/11/6) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

A full analysis example Multiple correlations Partial correlations

Curve Fitting in Microsoft Excel By William Lee

Updates to Graphing with Excel

Module 3: Correlation and Covariance

Dealing with Data in Excel 2010

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Regression and Correlation

Correlation and Regression Analysis: SPSS

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

Introduction to SPSS 16.0

Microsoft Excel Tutorial

Simple Predictive Analytics Curtis Seare

Using Microsoft Excel to Plot and Analyze Kinetic Data

Summary of important mathematical operations and formulas (from first tutorial):

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

5. Correlation. Open HeightWeight.sav. Take a moment to review the data file.

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Lab 1: The metric system measurement of length and weight

Chapter 7: Modeling Relationships of Multiple Variables with Linear Regression

0 Introduction to Data Analysis Using an Excel Spreadsheet

Main Effects and Interactions

Exercise 1.12 (Pg )

Simple Regression Theory II 2010 Samuel L. Baker

Mario Guarracino. Regression

Introduction Course in SPSS - Evening 1

II. DISTRIBUTIONS distribution normal distribution. standard scores

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

The correlation coefficient

Simple Linear Regression

Basic Statistical Analysis in Excel NICAR 15 / Norm Lewis, University of Florida

5 Correlation and Data Exploration

Linear Models in STATA and ANOVA

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

x x y y Then, my slope is =. Notice, if we use the slope formula, we ll get the same thing: m =

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Factors affecting online sales

1.3 LINEAR EQUATIONS IN TWO VARIABLES. Copyright Cengage Learning. All rights reserved.

An introduction to IBM SPSS Statistics

Projects Involving Statistics (& SPSS)

(More Practice With Trend Forecasts)

Data analysis and regression in Stata

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Absorbance Spectrophotometry: Analysis of FD&C Red Food Dye #40 Calibration Curve Procedure

Directions for Frequency Tables, Histograms, and Frequency Bar Charts

Math 132. Population Growth: the World

Correlation key concepts:

Figure 1. An embedded chart on a worksheet.

Chapter 5 Analysis of variance SPSS Analysis of variance

The Circumference Function

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Moderation. Moderation

Father s height (inches)

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

4. Simple regression. QBUS6840 Predictive Analytics.

An analysis method for a quantitative outcome and two categorical explanatory variables.

SPSS Tutorial, Feb. 7, 2003 Prof. Scott Allard

Part 2: Analysis of Relationship Between Two Variables

Estimating a market model: Step-by-step Prepared by Pamela Peterson Drake Florida Atlantic University

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

SPSS Tests for Versions 9 to 13

Pearson s Correlation

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

Getting Correct Results from PROC REG

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Section 3 Part 1. Relationships between two numerical variables

Transcription:

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Last time, we looked the linear regression formula.

It s the line that fits the data best. The Pearson correlation can be considered a measure of how well that line fits.

We can compute the slope b from the correlation coefficient r, the standard deviations of the x data and y data.

The slope is rate that y changes for each unit x changes. Y increases by b units when X increases by one unit

The trend is stronger when the correlation is stronger, so the slope increases as the correlation r increases.

If y changes a lot relative to x, it has to have wider range and more variance than x, so s y will be bigger than s x.

The simplest way to find the intercept a is to use the slope b, and the sample means of x and y.

Example: We have data with the following properties. Find the slope and intercept. = 12 = 7 = 4 = 22 r = -0.35 Find the regression line from this data.

We start with the slope b (Apologies to the alphabet, b comes before a in regression)

Next we get the intercept a, using the slope b that we just found.

Now that we have the slope and intercept, our regression equation is:

If each dot were a note, it would sound like a dragon walking across a piano.

Regression in SPSS. To find the slope and intercept, also called the we first go to Analyze

Put the dependent (Y) variable in, and Put the independent (X) variable in. Later, we ll put multiple variables in Independent, but not yet. Then click OK. (not shown, at bottom of pop-up)

The results! The results of interest are the unstandardized coefficients is the intercept, a = 4.701 is the slope, b = -0.841

We can also find the p-value of the hypothesis that the slope is zero. (Small sig. means there is significant evidence of a slope) When there s only one independent variable, this is the same as the p-value of the correlation.

We might also be interested in drawing the regression line as well as getting its formula. There s an option to do so with a scatterplot, so build a scatterplot first. (Graphs Legacy Dialogs Scatter/Dot, Choose Simple Scatter and OK, Put the Dependent in Y-axis, Independent in X-axis)

Once you have a scatterplot, go to the output window and on it. Choose Edit Content In Separate Window

If a pop-up like the properties window comes up, just ignore it and click Close.

In the Chart Editor window that comes up, go to. Then, click with the window that pops up after. Clicking on the icon that looks like Fit Line at Total does this too.

Now you have a regression line for your data. Serves 6-8, best when chilled.

You and SPSS: Making beautiful music together. Now with watermarks!

With a linear regression, we can use a value of X to predict a value of Y. Our best prediction would be that any new data is right on the line. Let s say, after doing the books vs. TV analysis, we wanted to predict how many books/year were read by a child who watched x = 2.5 hours of TV/day.

The regression line was: = 4.701 0.841 X

So, to predict newcomer s books/year, we replace the TV/year variable x with 2.5. = 4.701 (0.841) = By our estimate, a child watches 2.5 hours of TV/day also reads books/year.

We could choose different x values and get different y predictions.. = 4.701 (0.841) (1.5) = 3.440 = 4.701 (0.841) (5) = 0.496

But we can t choose any value: Some predictions don t make any sense, we won t know that until after we draw the regression line. 7 hours/day of TV -1.186 books/year Some predictions won t make sense no matter what the model is. -2 hours/day of TV 6.383 books/year 27 hours/day of TV -18.006 books/year

How do we know what s allowed for prediction? values, those are predicted from between observed data points, are usually good.

values, those predicted beyond the data, are bad. A regression line tells us the trend among the data that we see, it tells us nothing of the data beyond.

If you extrapolate, you can get some pretty absurd results. Source: xkcd.com/605

Sometimes the problem with extrapolating can be more subtle. is the act of extrapolating in time. Without data, extrapolation is meaningless.

The intercept is the Y value when X is zero. When X = 0 is within the data (or could reasonably be), the intercept means something in the real world. X: Average monthly temperature (Celsius) Y: Monthly heating bill (CDN $) = 45.00-1.60X Slope: Intercept:

The intercept is the Y value when X is zero. When X = 0 is within the data (or could reasonably be), the intercept means something in the real world. X: Average monthly temperature (Celsius) Y: Monthly heating bill (CDN $) = 45.00-1.60X Slope: For every, the heating bill Intercept: The average heating bill

When X = 0 is beyond the data, the intercept is useful only as a mathematical construct. X: Resting Heart Rate (beats per minute) Y: Body Mass Index (kg / m 2 ) = 16.7 + 0.17X Slope: Intercept: Source: http://www.pps.org.pk/pjp/6-1/talay.pdf, Pak. J. Phisol. (2010) 6:1

X: Resting Heart Rate (beats per minute) Y: Body Mass Index (kg / m 2 ) = 16.7 + 0.17X Intercept: The average person with a resting heart rate of zero has a BMI of 16.7. Slope: For every additional beat-per-minute, BMI increases by 0.17 kg / m 2.

BMI = 16.7 + 0.17(Heart Rate) Intercept: The average person with a resting heart rate of zero has a BMI of 16.7. Is this reasonable? Doesn t this imply that dead people, on average have a BMI of 16.7? Was anyone with a heart rate near zero even tested? Go to the source: http://www.pps.org.pk/pjp/6-1/talay.pdf

Final note: As long as the relationship is linear (or close), slope always has a real-world interpretation. The slope interpretation implies that it s for the interval of observed data and not anything beyond that. X = 0 isn t always within the observed interval, so the intercept doesn t always have a real-world interpretation.

Chapter 11 to be continued Mon, July 16. Next time: Midterm review (send suggestions/requests).