THE LEAST SQUARES LINE (other names Best-Fit Line or Regression Line )



Similar documents
Curve Fitting in Microsoft Excel By William Lee

Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different)

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Dealing with Data in Excel 2010

A Guide to Using Excel in Physics Lab

Graphing in excel on the Mac

Scientific Graphing in Excel 2010

Using Microsoft Excel to Plot and Analyze Kinetic Data

What is the difference between simple and compound interest and does it really matter?

Excel Math Project for 8th Grade Identifying Patterns

Microsoft Excel Tutorial

HOW MUCH WILL I SPEND ON GAS?

Updates to Graphing with Excel

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

(Least Squares Investigation)

Creating Charts in Microsoft Excel A supplement to Chapter 5 of Quantitative Approaches in Business Studies

Summary of important mathematical operations and formulas (from first tutorial):

Creating a Gradebook in Excel

How to make a line graph using Excel 2007

Prism 6 Step-by-Step Example Linear Standard Curves Interpolating from a standard curve is a common way of quantifying the concentration of a sample.

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

ITS Training Class Charts and PivotTables Using Excel 2007

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Lab 1: The metric system measurement of length and weight

Excel Tutorial. Bio 150B Excel Tutorial 1

Exploring Relationships between Highest Level of Education and Income using Corel Quattro Pro

Using Excel for Handling, Graphing, and Analyzing Scientific Data:

Worksheet A5: Slope Intercept Form

Probability Distributions

Introduction to SPSS 16.0

Creating an Excel XY (Scatter) Plot

Interactive Excel Spreadsheets:

Scatter Plots with Error Bars

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Excel Pivot Tables. Blue Pecan Computer Training Ltd - Onsite Training Provider :: :: info@bluepecan.co.

This activity will show you how to draw graphs of algebraic functions in Excel.

0 Introduction to Data Analysis Using an Excel Spreadsheet

Creating and Formatting Charts in Microsoft Excel

How To Analyze Data In Excel 2003 With A Powerpoint 3.5

USING EXCEL ON THE COMPUTER TO FIND THE MEAN AND STANDARD DEVIATION AND TO DO LINEAR REGRESSION ANALYSIS AND GRAPHING TABLE OF CONTENTS

Lab 11: Budgeting with Excel

Absorbance Spectrophotometry: Analysis of FD&C Red Food Dye #40 Calibration Curve Procedure

The Center for Teaching, Learning, & Technology

In this example, Mrs. Smith is looking to create graphs that represent the ethnic diversity of the 24 students in her 4 th grade class.

Excel -- Creating Charts

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

Microsoft Excel: Exercise 5

Basic Pivot Tables. To begin your pivot table, choose Data, Pivot Table and Pivot Chart Report. 1 of 18

SPSS Guide: Regression Analysis

Assignment objectives:

To launch the Microsoft Excel program, locate the Microsoft Excel icon, and double click.

FREE FALL. Introduction. Reference Young and Freedman, University Physics, 12 th Edition: Chapter 2, section 2.5

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Coins, Presidents, and Justices: Normal Distributions and z-scores

Microsoft Excel. Qi Wei

Generating ABI PRISM 7700 Standard Curve Plots in a Spreadsheet Program

3. Solve the equation containing only one variable for that variable.

Using Excel 2003 with Basic Business Statistics

Getting started in Excel

Excel Basics By Tom Peters & Laura Spielman

Tutorial on Using Excel Solver to Analyze Spin-Lattice Relaxation Time Data

Experiment 1A: Excel Graphing Exercise

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Plots, Curve-Fitting, and Data Modeling in Microsoft Excel

TIPS FOR DOING STATISTICS IN EXCEL

Excel 2003 A Beginners Guide

Question 2: How do you solve a matrix equation using the matrix inverse?

Acceleration of Gravity Lab Basic Version

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

MS Excel. Handout: Level 2. elearning Department. Copyright 2016 CMS e-learning Department. All Rights Reserved. Page 1 of 11

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

Excel 2007 Basic knowledge

ROUND(cell or formula, 2)

Estimating a market model: Step-by-step Prepared by Pamela Peterson Drake Florida Atlantic University

Control Debt Use Credit Wisely

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Excel 2007 A Beginners Guide

Data exploration with Microsoft Excel: analysing more than one variable

Making a Chart Using Excel

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Module 3: Correlation and Covariance

Excel What you will do:

Tutorial 2: Using Excel in Data Analysis

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

PERFORMING REGRESSION ANALYSIS USING MICROSOFT EXCEL

The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION INTEGRATED ALGEBRA. Thursday, August 16, :30 to 11:30 a.m.

2. Setting Up The Charts

Microsoft Excel Tutorial

Microsoft Excel Basics

Create a Poster Using Publisher

Applied. Grade 9 Assessment of Mathematics LARGE PRINT RELEASED ASSESSMENT QUESTIONS

Years after US Student to Teacher Ratio

Intro to Excel spreadsheets

Introduction to Microsoft Excel 2007/2010

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Sample Table. Columns. Column 1 Column 2 Column 3 Row 1 Cell 1 Cell 2 Cell 3 Row 2 Cell 4 Cell 5 Cell 6 Row 3 Cell 7 Cell 8 Cell 9.

MARS STUDENT IMAGING PROJECT

Instructions for Using Excel as a Grade Book

Math 132. Population Growth: the World

Transcription:

Sales THE LEAST SQUARES LINE (other names Best-Fit Line or Regression Line ) 1 Problem: A sales manager noticed that the annual sales of his employees increase with years of experience. To estimate the annual sales for his potential new sales person he collected data concerning annual sales and years of experience of his current employees: We ll use his data to create a formula that will help him estimate annual sales based on years of experience. Years of experience 1 3 8 10 13 Annual sales in thousands 110 10 Work: Figure to the right shows scatter graph of his data. Each point represents data on one single current employee. For example: the first employee has 1 year of experience and made thousands in sales, so he is represented by the point with coordinates (1,). 150 130 10 110 90 70 0 1 3 4 5 6 7 8 9 10 11 1 13 14 Years We ll create the equation of the least squares line, which is also called the best-fit line or regression line. The line is passing in between our points while the sum of the squares of the vertical distances from the data points to the line is as small as possible. The picture to the left shows the least square line passing in between our points, and the distances d 1, d, d 3, d 4, and d 5. The equation of the least square line is found by minimizing the sum: ( d1 ) ( d) ( d3) ( d4) ( d5) The procedure will be omitted in this paper. The final result of the minimization are formulas that let us calculate coefficients a and b for equation of the least square line y ax b. This equation may be used for the prediction of sales. FORMULAS FOR COEFFICIENTS OF THE LEAST SQUARES LINE: n xy x y y a x a b n x x n In this formula n stands for number of observed cases. In our case that is 5 employees. So: n 5 Symbol is called sigma and stands for the sum of all. In our case: x 1 3 8 10 13 35 (sum of all x values) y 110 110 10 550 (sum of all y values) x 1 3 8 10 13 343 (sum of squares of x values) xy 1 3 8110 1010 13 4 (sum of xy products)

We are now ready to use results of our calculations in the formula: 5 4 35550 150 550 4.388 35 396.4 a 4.388 b 79. 84 5343 35 490 5 5 That means that the equation of the least square line is y 4.388x 79. 84 Conclusion: We created the formula y 4.388x 79. 84 that may be used to predict sales in thousands for a future employee. For example, formula may predict that an employee with x=15 years of experience will generate: y 4.38815 79.84 145 thousands in sales per year. Is our prediction reliable? Once an equation is found for the least square line, we need to have some way of judging just how good the equation is for predictive purposes. In order to have a quantitative basis for confidence in our predictions, we need to calculate coefficient of correlation, denoted r. It may be calculated using the following formula: FORMULA FOR COEFFICIENT OF CORRELATION r n n xy x y x x n y y We ll calculate the coefficient of correlation for data in our example: y 110 10 6500 54 35550 150 150 r 0.971 534335 56500 550 490 00 13.594 The value of r that is close to 1 indicates that our formula will give us a reliable prediction for sales level, based on years of experience of the employee. What does the Coefficient of Correlation tell us? The correlation coefficient is always a number between 1and 1. The picture below shows how its value numerically describes our data: The equation may be used as a source for reliable prediction if the correlation coefficient is a number that is close to 1or1. That means that your observed values are close to the least square line (second and fourth picture above). If not, the value of the correlation coefficient is closer to 0. Such a small value for the coefficient of correlation indicates that the observed data are widely spread around, so our formula is not reliable source of prediction (like on first and third picture above). It is usually more convenient to numerically measure reliability of our formula using the square of correlation coefficient. Some books and computer software are using symbol R for it (read as r square ) In our example; R 0.971 0.943 R square is always a number between 0 and 1. Values of R that are close to 1 indicate reliable formula.

3 FINDING THE LEAST SQUARES LINE USING EXCEL 007 Problem: A sales manager noticed that the annual sales of his employees increase with years of experience. We ll use Excel to graph his data and create a formula that will help him estimate annual sales based on years of experience. Years of experience 1 3 8 10 13 Annual sales in thousands 110 10 Work Open a new blank sheet in Excel and type in data as shown to the right. Step in cell B1. While keeping left finger down, pool cursor to cell F. This procedure will highlight (color blue) coordinates of points that should be graphed. Once your data are highlighted, click on insert and select chart from the menu as shown below left. New Chart Wizard pop-up screen will ask you what kind of chart do you want. Select XY(Scatter) that compares pairs of values as shown below right. Click on finish. A simple scatter graph of your data will appear, as shown to the right. The next step is to draw the least squares line and calculate its equation and R square. Carefully right-click on any data point on the graph. A small pop-up screen will come out as pictured below. Select Add trendline. 1 10 40 0 0 0 4 6 8 10 1 14 Series1

4 A new pop-up screen will come out. Select Linear Trend and click on tab Options as on the picture to the right.. In Options tab check Display equation on chart and check Display R- Squared value as shown on picture below to the left. Click OK to obtain the final picture. The final picture shown below features the equation of the least squares line y 4.3878x 79.86 and the value of R square R 0. 9434 1 10 40 0 0 y = 4.3878x + 79.86 R = 0.9434 0 4 6 8 10 1 14 NOTICE You can switch x and y on your graph and in your formula by clicking on chart tools switch x-y to get both chart and table printed next to each other: click on blank area outside of your chart right before saving the document Practice questions: 1) Find the equation of the least squares line and the coefficient of correlation for the data shown: (Answer: y 0.614 x 10.68, R² = 0.9759, r=-0.9879 ) x 3 5 7 8 y 8.7 7.9 6. 5.8 ) The table to the right shows the relationship between MPG (miles per gallon) and HP (horsepower) for some sports cars produced in year 015. Find the equation of the least squares line and use it to estimate MPG for Chevrolet Corvette that has 4 HP engine. (Answer: Using the obtained equation y = -0.03x + 30.88 R 0.858 we estimate 17 MPG. Notice that the real Corvette MPG is 1 so we conclude that this car has better than expected MPG.) x=hp y=mpg Mitcubishi Lancer 148 9 Mazda Miata 158 3 Subaru WRX 68 4 Porche Boxter 315 3 Chevy Camaro 33 0 Ferrari Spider 570 14 3) Millers are buying a house in NWI so they d like to estimate yearly house taxes in Valparaiso. Using data from ReMax web site they obtained 014 taxes for 5 houses in that city. Use the table to create the least squares line equation and its coefficient of correlation. Use this equation to estimate yearly taxes for a $00,000 house in Valparaiso. x=house price in thousands 45 340 50 70 y=tax paid in year 014 3570 370 0 1700 810 (Answer: using y = 7.6x + 34.1 we estimate yearly tax for 00 thousands house to be $1554. r=0.97)

5 4) The table below gives monthly sales y (in thousands), corresponding to advertising expenditures x (in thousands). Advertising expenditures x 0 1 3 4 5 6 Monthly sales y 3 9 11 15 17 0 7 Create the least squares line. Examine if advertising expenditures appear to have strong effect on monthly sales by calculating and interpreting R, and if so, predict the monthly sales if 11 thousands dollars is spent on advertising. (Answer: y 3.57x 3. 86 and R 0. 97, so if we spend 11 thousand dollars on advertising the monthly sales will increase to 43 thousand dollars.) 5) A hospital conducted study to determine relation between age and blood pressure of their patients. The table below shows collected data. Find the equation of the least squares line and use it to calculate blood pressure of a 50 years old patient. Age x 43 48 56 61 67 70 Pressure y 18 10 135 143 141 15 (Answer: y 0.964x 81. 048, so 50 years old patient should have 19.) 6) Find the equation of the least squares line for the data shown below: x 0 5 10 0 y 8 7 5 7) A researcher wishes to see whether there is a relationship between number of hours of study and test scores on exam, so she collected data shown in the table below. Find the equation of the least squares line and use it to calculate how many hours should a student study to obtain 93 percent on the test. Hours of study x 6 1 5 3 Score on test y 8 63 57 88 68 75 8) The table below compares rents for one-bedroom and two-bedroom apartments in 7 different cities. Find the equation of the least squares line and R square. A one-bedroom rent in a doorman building in Lower Manhattan averaged $3000 in august 007. Calculate a two bedroom rent using the obtained formula. One-bedroom rent x 78 486 451 59 618 50 845 Two bedroom rent y 13 90 739 954 1055 875 1455 9) Thanks to the progress of science the cancer survival rate is improving over time. The table shows 10-year survival rate for diagnosis year 71 81 91 101 ovary cancer in England, diagnosed in year 1971, 1981, 1991 percent lived over 10 years 18 5 7 and 001. According to the table 7% of the patients diagnosed in year 001 survived over 10 years. Find the equation of the least squares line and R square. Use it to estimate the 10-year ovary cancer survival rate for the patients who are diagnosed in year 015. 10) Millers are buying a house in NWI so they d like to estimate yearly house taxes in Crown Point. Using data from ReMax web site they obtained 014 taxes for 6 houses in that city. Use the table to create the least squares line equation. Use this equation to estimate yearly taxes for a $00,000 house in Crown Point. x=price in thousands 155 15 05 75 110 7 y=tax for year 014 3096 117 819 385 877 13390