MATLAB Workshop 15 - Linear Regression in MATLAB



Similar documents
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

1. Measuring association using correlation and regression

MATLAB Workshop 14 - Plotting Data in MATLAB

CHAPTER 14 MORE ABOUT REGRESSION

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

What is Candidate Sampling

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

Least Squares Fitting of Data

An Alternative Way to Measure Private Equity Performance

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

How To Calculate The Accountng Perod Of Nequalty

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Calculating the high frequency transmission line parameters of power cables

The Mathematical Derivation of Least Squares

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

SIMPLE LINEAR CORRELATION

8 Algorithm for Binary Searching in Trees

The OC Curve of Attribute Acceptance Plans

Quantization Effects in Digital Filters

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Can Auto Liability Insurance Purchases Signal Risk Attitude?

21 Vectors: The Cross Product & Torque

Faraday's Law of Induction

1 Example 1: Axis-aligned rectangles

Section 5.4 Annuities, Present Value, and Amortization

Description of the Force Method Procedure. Indeterminate Analysis Force Method 1. Force Method con t. Force Method con t

Calculation of Sampling Weights

Analysis of Premium Liabilities for Australian Lines of Business

Economic Interpretation of Regression. Theory and Applications

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

BERNSTEIN POLYNOMIALS

Recurrence. 1 Definitions and main statements

DEFINING %COMPLETE IN MICROSOFT PROJECT

Using Series to Analyze Financial Situations: Present Value

where the coordinates are related to those in the old frame as follows.

The Application of Fractional Brownian Motion in Option Pricing

Updating the E5810B firmware

Problem Set 3. a) We are asked how people will react, if the interest rate i on bonds is negative.

Section 5.3 Annuities, Future Value, and Sinking Funds

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Canon NTSC Help Desk Documentation

Rotation Kinematics, Moment of Inertia, and Torque

We assume your students are learning about self-regulation (how to change how alert they feel) through the Alert Program with its three stages:

Texas Instruments 30X IIS Calculator

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Regression Models for a Binary Response Using EXCEL and JMP

Heuristic Static Load-Balancing Algorithm Applied to CESM

Implementation of Deutsch's Algorithm Using Mathcad

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Brigid Mullany, Ph.D University of North Carolina, Charlotte

7.5. Present Value of an Annuity. Investigate

Simple Interest Loans (Section 5.1) :

Estimating the Effect of the Red Card in Soccer

Lecture 2: Single Layer Perceptrons Kevin Swingler

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Goals Rotational quantities as vectors. Math: Cross Product. Angular momentum

Multi-Robot Tracking of a Moving Object Using Directional Sensors

n + d + q = 24 and.05n +.1d +.25q = 2 { n + d + q = 24 (3) n + 2d + 5q = 40 (2)

L10: Linear discriminants analysis

Support Vector Machines

Forecasting the Direction and Strength of Stock Market Movement

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Modern Problem Solving Techniques in Engineering with POLYMATH, Excel and MATLAB. Introduction

Types of Injuries. (20 minutes) LEARNING OBJECTIVES MATERIALS NEEDED

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

RequIn, a tool for fast web traffic inference

The Greedy Method. Introduction. 0/1 Knapsack Problem

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

PERRON FROBENIUS THEOREM

FLASH POINT DETERMINATION OF BINARY MIXTURES OF ALCOHOLS, KETONES AND WATER. P.J. Martínez, E. Rus and J.M. Compaña

HÜCKEL MOLECULAR ORBITAL THEORY

total A A reag total A A r eag

An interactive system for structure-based ASCII art creation

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

The issue of June, 1925 of Industrial and Engineering Chemistry published a famous paper entitled

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

Lecture 14: Implementing CAPM

Figure 1. Inventory Level vs. Time - EOQ Problem

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

Although ordinary least-squares (OLS) regression

EE31 Series. Manual. Logger & Visualisation Software. BA_EE31_VisuLoggerSW_01_eng // Technical data are subject to change V1.0

the Manual on the global data processing and forecasting system (GDPFS) (WMO-No.485; available at

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Mean Molecular Weight

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

A Multi-mode Image Tracking System Based on Distributed Fusion

Chapter 7: Answers to Questions and Problems

Transcription:

MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 1 MATLAB Workshop 15 - Lnear Regresson n MATLAB Objectves: Learn how to obtan the coeffcents of a straght-lne ft to data, dsplay the resultng equaton as a lne on the data plot, and dsplay the equaton and goodness-of-ft statstc on the graph. MATLAB Features: data analyss Command polyft(x,y,n) Acton fnds lnear, least-squares coeffcents for polynomal equaton of degree N that s best ft to the (x,y) data set. graphcs commands Command plot(x,y,symbol) semlogy(x,y,symbol) loglog(x,y,symbol) xlabel(xname) ylabel(yname) ttle(graphname) axs('equal') hold on hold off text(x,y,'strng') gtext('strng') Acton creates a pop up wndow that dsplays the (x,y) data ponts specfed on lnearly-scaled axes wth the symbol (and color) specfed n the strng varable symbol. The data ponts are suppled as separate x and y vectors. MATLAB automatcally scales the axes to ft the data. creates a pop up wndow that dsplays the (x,y) data ponts specfed on a graph wth the y-axs scaled n powers of 10 and the x-axs scaled lnearly wth the symbol (and color) specfed n the strng varable symbol. The data ponts are suppled as separate x and y vectors. MATLAB automatcally scales the axes to ft the data. creates a pop up wndow that dsplays the (x,y) data ponts specfed on a graph wth both the x- and y-axes scaled n powers of 10 wth the symbol (and color) specfed n the strng varable symbol. The data ponts are suppled as separate x and y vectors. MATLAB automatcally scales the axes to ft the data. adds the text n the strng varable xname below the x-axs. adds the text n the strng varable yname below the y-axs. adds the text n the strng varable graphname above the plot. forces equal-scalng on the x- and y-axes mantans current plot for addtonal plottng overlay turns off hold on dsplays strng at (X,Y)-coordnates on current plot dsplays strng at plot locaton desgnated by cross-hars

MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 2 graph symbol optons Graph Symbol Optons Color Symbol Lne y yellow. pont - sold lne m magenta o crcle : dotted lne c cyan x x-mark -. dash-dot lne r red + plus -- dashed lne g green blue * star b blue s square w whte d damond k black v trangle (down) ^ trangle (up) < trangle (left) > trangle (rght) p pentagram h hexagram

MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 3 Textbook costs Concerned about the ever rsng cost of textbooks, an engneerng student decded to see whether the cost of textbooks n a partcular subject was related to the number of pages. He went to the bookstore and found the followng data for 10 mechancal engneerng books: Mechancal Engneerng textbook cost versus number of pages Number of pages 166 195 200 260 265 335 370 450 517 552 Cost, $ 54.00 82.00 72.00 72.00 90.00 124.00 94.00 118.00 152.00 132.00 Usng the MATLAB scrpt developed n Workshop 14, the engneer produced the plot shown at the rght. The data does look as f t fts a lnear relatonshp. Several questons arse. The frst s what are the approprate values for the coeffcents a 1 and a 0 n the lnear equaton, C = a1p + a 0 where C s the textbook cost, $, and P s the number of pages, that best descrbes the data. A second queston s what does ths lne look lke when plotted wth the data. A thrd queston s how well does the lne actually represent the data. (1) Create a plot of cost versus number of pages. Create a data fle contanng the data. Use your scrpt from Workshop 14 to create a fgure showng the data ponts as llustrated above. Check to see what varables are n the Workspace by typng who at the command prompt. You should have at least xdat, ydat, symbol, xname, yname, and graphname. Why? (2) MATLAB connects the dots. Because the graph varable nformaton s present n the Workspace, we can use the Command Wndow to llustrate some more features of graphs and graph management n MATLAB. What would happen f we let MATLAB draw a lne for the data ponts? To observe ths, enter» hold on» plot(xdat,ydat,'r-')

MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 4 at the command prompt. The hold command s used to manage fgure dsplay. hold on says to keep the current fgure and supermpose any addtonal plot commands on top of t. hold off says to replace the current fgure wth whatever the next plot command dctates. In ths case, the plot command asks that the same data be plotted, but ths tme wth a red lne. The fgure at the rght results. MATLAB by tself wll connect the dots - not very useful f we are tryng to fnd an equaton that relates the cost to the number of pages. Lets return to the orgnal data plot. Unfortunately, there s no undo command that wll remove the lne just added. You wll have to replace the fgure - but t can be done from the Command Wndow by ssung the followng commands (why?).» hold off» plot(xdat,ydat,symbol)» xlabel(xname)» ylabel(yname)» ttle(graphname) Fttng a lne to data Many methods exst for fndng a best ft lne or curve to some data. One of the most popular s called least squares regresson or lnear regresson. For a straght-lne approxmaton, we are seekng the lne y = a1x + a 0 that best approxmates the data. If we knew the values for a 1 and a 0, we could estmate the y-values for each of the data ponts by ( yest) = a1 ( xdat) + a0 where refers to an ndvdual data pont. The error assocated wth the estmate s defned as the vertcal dstance between the data pont and the proposed lne,.e., e = ( ydat) ( yest) were e s the error. Lnear regresson fnds values for a 1 and a 0 by a mathematcal procedure that mnmzes the sum of the error-squared for all of the data ponts. (3) Least squares n MATLAB. Because fttng a lne to data s such a common actvty, MATLAB has a sngle command that wll fnd the estmates, coeff = polyft(xdat,ydat,n)

MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 5 where coeff s a varable that wll capture the coeffcents for the best ft equaton, xdat s the x-data vector, ydat s the y-data vector, and N s the degree of the polynomal lne (or curve) that you want to ft the data to. A straght lne s a 1 st -degree polynomal, so the value for N would be 1. Fnd the best ft to the book data by enterng» coeff = polyft(xdat,ydat,1) coeff = 0.2048 31.2181 MATLAB responds wth the coeffcent vector n the order [a 1 a 0 ]. (How would you suppress dsplay of coeff?) Thus, accordng to MATLAB and the least squares procedure, the best ft equaton for the lne representng a lnear relaton between the cost of a Mechancal Engneerng text and the number of pages s C = 0.2048P + 31.2181 (4) Dsplayng the best ft on the data graph. Vsual confrmaton that the best ft equaton s ndeed representatve of the data comes next. There are two problems at the moment. The frst s that we have the coeffcents for the equaton, but not the x- and y- vectors that are requred for the plot command. The x- and y-vectors wll need to be generated. Ths brngs us to the second problem. Remember that MATLAB uses connect the dots for creatng a lne. If the plot ponts for the data are far apart, the lne mght have angles and corners and not appear smooth. In order to counter ths, we need to use a large number of ponts when plottng a lne. Ths wll make any pont to pont dstance small and make the resultng connect the dots pcture look smooth. Generally 200 ponts are suffcent, but you mght want to use more. Thus, the steps that we need to follow to create a smooth lne ft to the data are to 1. defne a vector of 200 x-ponts n the range of the data 2. calculate the correspondng vector of y-ponts 3. dsplay the x- and y-ponts as a lne n the fgure. To see how ths works, enter the followng at the command prompt» xlne = lnspace( mn(xdat), max(xdat), 200); What does the lnspace command do? The mn command? The max command? Why does ths work to create a vector of x-values that span the data doman? Note that the varable name xlne s beng used to dstngush ths vector from the data vector. Now enter» ylne = coeff(1)*xlne+coeff(2); Ths command creates a vector of y-values correspondng to the best ft equaton. Why? We can now plot the best ft lne» hold on» plot(xlne,ylne,'r-')

MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 6 The result, dsplayed at the rght, shows that the best-ft calculated by the polyft command s a reasonable representaton of the data. The next queston s: how good? Error and goodness-of-ft estmaton As engneers, we should always be nterested n knowng how close our approxmatons (n ths case, the lne) actually come to the measured, physcal realty. As can be seen n the approxmaton at the rght, only one data pont actually seems to fall on or near the lne! The frst queston we can ask s what s the absolute error assocated wth the ft. Ths can be calculated as e = ( ydat) ( yest) for each data pont. Note that the absolute error treats postve and negatve devatons of the data from the lne n the same manner. In MATLAB code, ths becomes» yest = coeff(1)*xdat+coeff(2);» abs_error = abs(ydat-yest); Gven abs_error, we can extract the magntude of the maxmum absolute error and data pont at whch t occurred by usng a varaton of the max command:» [max_abs_error, maxpt] = max(abs_error); max_abs_error wll have the value of the maxmum absolute error and maxpt wll be the ndex where t s found n abs_error. For the plot above, max_abs_error = 24.1809 maxpt = 6 Thus the maxmum error s found at the sxth data pont (xdat = 335 where dd ths come from?). How could you fnd the mnmum absolute error? The absolute error provdes the magntude of the error. However, ths does not tell us how serous the error actually s. For example, whch s better: an absolute error of 50 unts relatve to an expected value of 100 unts or an absolute error of 50 unts relatve to an expected value of 5000 unts. Both have the same absolute error. But the percentage error n the frst case s 50% whle t s only 1% n the second case. Relatve error s lke a percentage error n that how large the error s compared wth the expected error. Relatve error s sometmes referred to as the fractonal error because t s obtaned by dvdng the absolute error by the magntude of the correspondng y-value. The MATLAB command to do element by element dvson s» rel_error = abs_error./ydat; How would you fnd the greatest relatve error and the locaton at whch t occurs?

MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 7 A commonly used statstc that s related to the error, but s not the same as the error s the goodness-of-ft r 2 (r-squared) statstc. The r 2 statstc ranges from a value of 0 for absolutely no relaton between the data and the lne to a value of 1 whch occurs only f all of the data fall exactly on the lne,.e., no error. In some engneerng dscplnes, an equaton ftted to data s acceptable only f r 2 > 0.9. Other engneerng dscplnes mght fnd an r 2 as low as 0.7 acceptable for use. where and The r 2 statstc s calculated from 2 SSE r = 1 SST SSE = SST = n = 1 n = 1 [( ydat) [( ydat) ( yest) ] y ave ] 2 2 MATLAB mplementaton of these equatons s straght forward. For example, what (sngle) MATLAB command would you use to compute the average value of the y-data? The r 2 statstc for the text book cost versus number of pages ft s r 2 = 0.8204. (5) Calculate the varous error estmates. Implement the MATLAB commands (n the Command Wndow) to fnd the followng 1) Maxmum absolute error 2) Index of the value where the maxmum absolute error was found. 3) X-data pont where maxmum absolute error was found. 4) Maxmum relatve error 5) Index of the value where the maxmum relatve error was found. 6) X-data pont where maxmum relatve error was found. 7) r 2 statstc for the ft. Dsplayng equaton and r 2 statstc on the graph The fnal bell and whstle n dsplayng data and a best lne ft to the data on a graph s to also dsplay the equaton and r 2 statstc as text. In order to do ths, we need to buld both the equaton and r 2 as a strng varables for dsplay. The equaton can be bult from the followng commands» a1str = num2str(coeff(1));» a0str = num2str(coeff(2));» eqnstr = ['y = (', a1str, ')*x + (', a0str, ')']; where the frst two command convert the numbers for the equaton coeffcents to ther equvalent strngs. The thrd command creates a strng varable wth the text and coeffcents n order. The r 2 statstc strng can be bult by the commands» rsqstr = ['r^2 = (', num2str(rsq)]; Ths command used the num2str command nternally to create the strng rather than create another varable to hold the converson. The process of buldng the strngs part by part s referred to as concatenaton. Both the equaton and the r 2 statstc can be dsplayed by usng the text command: text(x,y,'text to dsplay') where X and Y are the (x,y)-coordnates on the current plot at whch to start the text strng. As always, the text strng can be a strng varable name. An alternatve s to use the gtext command gtext({eqnstr,rsqstr})

MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 8 Ths causes a cross-hars to appear on the plot, as shown above and to the rght, whch can be moved by movng the mouse. A left-clck on the mouse wll cause the requested strngs to be placed at the locaton of the cross-hars as shown n the fgure above and to the left. Note how the r 2 equaton strng s dsplayed wth the number 2 showng as an exponent. Why? (6) Dsplay equaton on graph. 1) Dsplay the equaton and r 2 statstc on the current graph usng text. 2) Dsplay the equaton and r 2 statstc on the current graph usng gtext. Exercses: 1. Modfy your lnearplot functon from Workshop 14 so that t wll now a Dsplay the data ponts (as prevously); b Calculate a best-ft lne to the data; c Dsplay the best-ft lne as a lne only; d Calculate the r 2 statstc; e Dsplay the equaton and r 2 statstc on the plot; Note: you should tell the user what to do f you use gtext. f Return the equaton coeffcents and r 2 statstc to the callng functon. 2. Test your modfed functon by runnng your scrpt from Workshop 14 and reproducng the graphs of ths workshop. Recap: You should have learned That MATLAB uses connect-the-dots to draw lnes between ponts. How to use polyft to fnd a best ft straght lne to data. How to dsplay a best ft straght lne to data on the same plot as the data.

MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 9 That many ponts are requred to have a smooth lne dsplayed n MATLAB. The meanng of and how to calculate absolute error. How to fnd maxmum and mnmum absolute error and ther x-locaton. The meanng of and how to calculate relatve error. How to fnd maxmum and mnmum relatve error and ther x-locaton. How to calculate the r 2 statstc. How to dsplay text strngs on the plot.