MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 1 MATLAB Workshop 15 - Lnear Regresson n MATLAB Objectves: Learn how to obtan the coeffcents of a straght-lne ft to data, dsplay the resultng equaton as a lne on the data plot, and dsplay the equaton and goodness-of-ft statstc on the graph. MATLAB Features: data analyss Command polyft(x,y,n) Acton fnds lnear, least-squares coeffcents for polynomal equaton of degree N that s best ft to the (x,y) data set. graphcs commands Command plot(x,y,symbol) semlogy(x,y,symbol) loglog(x,y,symbol) xlabel(xname) ylabel(yname) ttle(graphname) axs('equal') hold on hold off text(x,y,'strng') gtext('strng') Acton creates a pop up wndow that dsplays the (x,y) data ponts specfed on lnearly-scaled axes wth the symbol (and color) specfed n the strng varable symbol. The data ponts are suppled as separate x and y vectors. MATLAB automatcally scales the axes to ft the data. creates a pop up wndow that dsplays the (x,y) data ponts specfed on a graph wth the y-axs scaled n powers of 10 and the x-axs scaled lnearly wth the symbol (and color) specfed n the strng varable symbol. The data ponts are suppled as separate x and y vectors. MATLAB automatcally scales the axes to ft the data. creates a pop up wndow that dsplays the (x,y) data ponts specfed on a graph wth both the x- and y-axes scaled n powers of 10 wth the symbol (and color) specfed n the strng varable symbol. The data ponts are suppled as separate x and y vectors. MATLAB automatcally scales the axes to ft the data. adds the text n the strng varable xname below the x-axs. adds the text n the strng varable yname below the y-axs. adds the text n the strng varable graphname above the plot. forces equal-scalng on the x- and y-axes mantans current plot for addtonal plottng overlay turns off hold on dsplays strng at (X,Y)-coordnates on current plot dsplays strng at plot locaton desgnated by cross-hars
MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 2 graph symbol optons Graph Symbol Optons Color Symbol Lne y yellow. pont - sold lne m magenta o crcle : dotted lne c cyan x x-mark -. dash-dot lne r red + plus -- dashed lne g green blue * star b blue s square w whte d damond k black v trangle (down) ^ trangle (up) < trangle (left) > trangle (rght) p pentagram h hexagram
MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 3 Textbook costs Concerned about the ever rsng cost of textbooks, an engneerng student decded to see whether the cost of textbooks n a partcular subject was related to the number of pages. He went to the bookstore and found the followng data for 10 mechancal engneerng books: Mechancal Engneerng textbook cost versus number of pages Number of pages 166 195 200 260 265 335 370 450 517 552 Cost, $ 54.00 82.00 72.00 72.00 90.00 124.00 94.00 118.00 152.00 132.00 Usng the MATLAB scrpt developed n Workshop 14, the engneer produced the plot shown at the rght. The data does look as f t fts a lnear relatonshp. Several questons arse. The frst s what are the approprate values for the coeffcents a 1 and a 0 n the lnear equaton, C = a1p + a 0 where C s the textbook cost, $, and P s the number of pages, that best descrbes the data. A second queston s what does ths lne look lke when plotted wth the data. A thrd queston s how well does the lne actually represent the data. (1) Create a plot of cost versus number of pages. Create a data fle contanng the data. Use your scrpt from Workshop 14 to create a fgure showng the data ponts as llustrated above. Check to see what varables are n the Workspace by typng who at the command prompt. You should have at least xdat, ydat, symbol, xname, yname, and graphname. Why? (2) MATLAB connects the dots. Because the graph varable nformaton s present n the Workspace, we can use the Command Wndow to llustrate some more features of graphs and graph management n MATLAB. What would happen f we let MATLAB draw a lne for the data ponts? To observe ths, enter» hold on» plot(xdat,ydat,'r-')
MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 4 at the command prompt. The hold command s used to manage fgure dsplay. hold on says to keep the current fgure and supermpose any addtonal plot commands on top of t. hold off says to replace the current fgure wth whatever the next plot command dctates. In ths case, the plot command asks that the same data be plotted, but ths tme wth a red lne. The fgure at the rght results. MATLAB by tself wll connect the dots - not very useful f we are tryng to fnd an equaton that relates the cost to the number of pages. Lets return to the orgnal data plot. Unfortunately, there s no undo command that wll remove the lne just added. You wll have to replace the fgure - but t can be done from the Command Wndow by ssung the followng commands (why?).» hold off» plot(xdat,ydat,symbol)» xlabel(xname)» ylabel(yname)» ttle(graphname) Fttng a lne to data Many methods exst for fndng a best ft lne or curve to some data. One of the most popular s called least squares regresson or lnear regresson. For a straght-lne approxmaton, we are seekng the lne y = a1x + a 0 that best approxmates the data. If we knew the values for a 1 and a 0, we could estmate the y-values for each of the data ponts by ( yest) = a1 ( xdat) + a0 where refers to an ndvdual data pont. The error assocated wth the estmate s defned as the vertcal dstance between the data pont and the proposed lne,.e., e = ( ydat) ( yest) were e s the error. Lnear regresson fnds values for a 1 and a 0 by a mathematcal procedure that mnmzes the sum of the error-squared for all of the data ponts. (3) Least squares n MATLAB. Because fttng a lne to data s such a common actvty, MATLAB has a sngle command that wll fnd the estmates, coeff = polyft(xdat,ydat,n)
MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 5 where coeff s a varable that wll capture the coeffcents for the best ft equaton, xdat s the x-data vector, ydat s the y-data vector, and N s the degree of the polynomal lne (or curve) that you want to ft the data to. A straght lne s a 1 st -degree polynomal, so the value for N would be 1. Fnd the best ft to the book data by enterng» coeff = polyft(xdat,ydat,1) coeff = 0.2048 31.2181 MATLAB responds wth the coeffcent vector n the order [a 1 a 0 ]. (How would you suppress dsplay of coeff?) Thus, accordng to MATLAB and the least squares procedure, the best ft equaton for the lne representng a lnear relaton between the cost of a Mechancal Engneerng text and the number of pages s C = 0.2048P + 31.2181 (4) Dsplayng the best ft on the data graph. Vsual confrmaton that the best ft equaton s ndeed representatve of the data comes next. There are two problems at the moment. The frst s that we have the coeffcents for the equaton, but not the x- and y- vectors that are requred for the plot command. The x- and y-vectors wll need to be generated. Ths brngs us to the second problem. Remember that MATLAB uses connect the dots for creatng a lne. If the plot ponts for the data are far apart, the lne mght have angles and corners and not appear smooth. In order to counter ths, we need to use a large number of ponts when plottng a lne. Ths wll make any pont to pont dstance small and make the resultng connect the dots pcture look smooth. Generally 200 ponts are suffcent, but you mght want to use more. Thus, the steps that we need to follow to create a smooth lne ft to the data are to 1. defne a vector of 200 x-ponts n the range of the data 2. calculate the correspondng vector of y-ponts 3. dsplay the x- and y-ponts as a lne n the fgure. To see how ths works, enter the followng at the command prompt» xlne = lnspace( mn(xdat), max(xdat), 200); What does the lnspace command do? The mn command? The max command? Why does ths work to create a vector of x-values that span the data doman? Note that the varable name xlne s beng used to dstngush ths vector from the data vector. Now enter» ylne = coeff(1)*xlne+coeff(2); Ths command creates a vector of y-values correspondng to the best ft equaton. Why? We can now plot the best ft lne» hold on» plot(xlne,ylne,'r-')
MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 6 The result, dsplayed at the rght, shows that the best-ft calculated by the polyft command s a reasonable representaton of the data. The next queston s: how good? Error and goodness-of-ft estmaton As engneers, we should always be nterested n knowng how close our approxmatons (n ths case, the lne) actually come to the measured, physcal realty. As can be seen n the approxmaton at the rght, only one data pont actually seems to fall on or near the lne! The frst queston we can ask s what s the absolute error assocated wth the ft. Ths can be calculated as e = ( ydat) ( yest) for each data pont. Note that the absolute error treats postve and negatve devatons of the data from the lne n the same manner. In MATLAB code, ths becomes» yest = coeff(1)*xdat+coeff(2);» abs_error = abs(ydat-yest); Gven abs_error, we can extract the magntude of the maxmum absolute error and data pont at whch t occurred by usng a varaton of the max command:» [max_abs_error, maxpt] = max(abs_error); max_abs_error wll have the value of the maxmum absolute error and maxpt wll be the ndex where t s found n abs_error. For the plot above, max_abs_error = 24.1809 maxpt = 6 Thus the maxmum error s found at the sxth data pont (xdat = 335 where dd ths come from?). How could you fnd the mnmum absolute error? The absolute error provdes the magntude of the error. However, ths does not tell us how serous the error actually s. For example, whch s better: an absolute error of 50 unts relatve to an expected value of 100 unts or an absolute error of 50 unts relatve to an expected value of 5000 unts. Both have the same absolute error. But the percentage error n the frst case s 50% whle t s only 1% n the second case. Relatve error s lke a percentage error n that how large the error s compared wth the expected error. Relatve error s sometmes referred to as the fractonal error because t s obtaned by dvdng the absolute error by the magntude of the correspondng y-value. The MATLAB command to do element by element dvson s» rel_error = abs_error./ydat; How would you fnd the greatest relatve error and the locaton at whch t occurs?
MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 7 A commonly used statstc that s related to the error, but s not the same as the error s the goodness-of-ft r 2 (r-squared) statstc. The r 2 statstc ranges from a value of 0 for absolutely no relaton between the data and the lne to a value of 1 whch occurs only f all of the data fall exactly on the lne,.e., no error. In some engneerng dscplnes, an equaton ftted to data s acceptable only f r 2 > 0.9. Other engneerng dscplnes mght fnd an r 2 as low as 0.7 acceptable for use. where and The r 2 statstc s calculated from 2 SSE r = 1 SST SSE = SST = n = 1 n = 1 [( ydat) [( ydat) ( yest) ] y ave ] 2 2 MATLAB mplementaton of these equatons s straght forward. For example, what (sngle) MATLAB command would you use to compute the average value of the y-data? The r 2 statstc for the text book cost versus number of pages ft s r 2 = 0.8204. (5) Calculate the varous error estmates. Implement the MATLAB commands (n the Command Wndow) to fnd the followng 1) Maxmum absolute error 2) Index of the value where the maxmum absolute error was found. 3) X-data pont where maxmum absolute error was found. 4) Maxmum relatve error 5) Index of the value where the maxmum relatve error was found. 6) X-data pont where maxmum relatve error was found. 7) r 2 statstc for the ft. Dsplayng equaton and r 2 statstc on the graph The fnal bell and whstle n dsplayng data and a best lne ft to the data on a graph s to also dsplay the equaton and r 2 statstc as text. In order to do ths, we need to buld both the equaton and r 2 as a strng varables for dsplay. The equaton can be bult from the followng commands» a1str = num2str(coeff(1));» a0str = num2str(coeff(2));» eqnstr = ['y = (', a1str, ')*x + (', a0str, ')']; where the frst two command convert the numbers for the equaton coeffcents to ther equvalent strngs. The thrd command creates a strng varable wth the text and coeffcents n order. The r 2 statstc strng can be bult by the commands» rsqstr = ['r^2 = (', num2str(rsq)]; Ths command used the num2str command nternally to create the strng rather than create another varable to hold the converson. The process of buldng the strngs part by part s referred to as concatenaton. Both the equaton and the r 2 statstc can be dsplayed by usng the text command: text(x,y,'text to dsplay') where X and Y are the (x,y)-coordnates on the current plot at whch to start the text strng. As always, the text strng can be a strng varable name. An alternatve s to use the gtext command gtext({eqnstr,rsqstr})
MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 8 Ths causes a cross-hars to appear on the plot, as shown above and to the rght, whch can be moved by movng the mouse. A left-clck on the mouse wll cause the requested strngs to be placed at the locaton of the cross-hars as shown n the fgure above and to the left. Note how the r 2 equaton strng s dsplayed wth the number 2 showng as an exponent. Why? (6) Dsplay equaton on graph. 1) Dsplay the equaton and r 2 statstc on the current graph usng text. 2) Dsplay the equaton and r 2 statstc on the current graph usng gtext. Exercses: 1. Modfy your lnearplot functon from Workshop 14 so that t wll now a Dsplay the data ponts (as prevously); b Calculate a best-ft lne to the data; c Dsplay the best-ft lne as a lne only; d Calculate the r 2 statstc; e Dsplay the equaton and r 2 statstc on the plot; Note: you should tell the user what to do f you use gtext. f Return the equaton coeffcents and r 2 statstc to the callng functon. 2. Test your modfed functon by runnng your scrpt from Workshop 14 and reproducng the graphs of ths workshop. Recap: You should have learned That MATLAB uses connect-the-dots to draw lnes between ponts. How to use polyft to fnd a best ft straght lne to data. How to dsplay a best ft straght lne to data on the same plot as the data.
MATLAB: Workshop 15 - Lnear Regresson n MATLAB page 9 That many ponts are requred to have a smooth lne dsplayed n MATLAB. The meanng of and how to calculate absolute error. How to fnd maxmum and mnmum absolute error and ther x-locaton. The meanng of and how to calculate relatve error. How to fnd maxmum and mnmum relatve error and ther x-locaton. How to calculate the r 2 statstc. How to dsplay text strngs on the plot.