Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of forecastng: Output 1 Regresson Analyss Determnes and measures the relatonshp between two or more varables Smple lnear regresson: varables Multple lnear regresson: 3+ varables 3 Smple Lnear Regresson Evaluates the relatonshp (gongtogether) of two varables Dependent varable () Independent varable () Relatonshp depcted by a straght lne model: = a + b 4 Forecastng Whch s Independent? Buld the model usng hstorcal data Then use knowledge of the ndependent varable () to forecast the value of the dependent varable () Assumptons: The relatonshp between and s strong The future follows the past Sales Age wear Demand Prce Advertsng Equpment Tme Unts sold 5 6
Regresson Forecastng Steps 1. Plot the scatter dagram. Compute the regresson equaton 3. Forecast usng the regresson model and estmates of Scatter Dagram The frst step for smple regresson modelng Used to Dsplay hstorcal raw data Spot patterns of relatonshps Wll help you determne f regresson s approprate 7 8 Drect lnear Postve relatonshp As ncreases, tends to ncrease by a constant amount Types of Relatonshps Inverse lnear Negatve relatonshp As ncreases, tends to decrease by a constant amount Types of Relatonshps 9 10 No correlaton Change n tells nothng about Types of Relatonshps Nonlnear relatonshp As ncreases, changes by a varyng amount Types of Relatonshps 11 1
Regresson Model Regresson Lne Expresses the relatonshp between and as a straght lne: c = a + b (the regresson lne) where c = estmated average for a gven = actual value of ndependent varable a = estmated -ntercept (f =0) b = estmated slope of regresson lne a b=slope c = a + b change n slope = change n 13 14 Purposes for the Regresson Provdes a mathematcal defnton of the relatonshp Precse, accuracy depends on data ft Is a standard of perfect correlaton Can compare lne wth actual data values If all values on the lne, perfect correlaton Is a model for forecastng usng Plug an -value nto: c = a + b 15 Whch Lne s Best? There are many possbltes for a and b Each defnes a dfferent lne and model To evaluate mathematcally, let: = hstorcal value of for a gven c = calculated usng n regresson lne ( - c ) = devaton, error between actual and model forecast 16 Measurng Goodness of Ft Measurng the ft of the lne to the data: Sum of the devatons n = 1 ( ) Is 0 for any lne gong through (,), due to +/- cancellatons c Measurng Goodness of Ft Sum of the squared devatons n = 1 ( ) c Elmnates the sgn problem Is the generally accepted least squares crteron 17 18
Least-Squares Regresson Lne To mnmze the squared devatons use: ( ) n b = ( ) n( ) a = b where: n = number of data ponts, = mean of 's, 's ( ) = sum of { } ( ) = sum of { 's squared} 19 Date of Advertsng Sept. 9 Sept. 6 Oct. Oct. 9 Oct. 16 Oct. 3 Mal Order Sales vs. Advertsng $ Spent on Advertsng $1,700 3,000,000 1,500 0 1,500 $ Sales n Next Week $,000,000,000,000,000,000 0 Scatter Plot Computng the Regresson Lne, Sales () 10 100 80 40 0 0 $0 $1 $ $3 $4, Advertsng ($000s) Advert.0 Sales 1 Step 1: Sum Column 1 for Σ Step : Sum Column for Σ (1) Advert.0 Sales (1) Advert.0 () Sales 3 4
Step 3: (1) ()=(3), Sum for Σ Step 4: (1) =(4), Sum for Σ (1) Advert.0 () Sales (3) (1)x() (1) Advert.0 () Sales (3) (1)x() (4) (1) 5 6 Step 5: Compute the Mean of = n Step 6: Compute the Mean of = n 7 8 b = ( ) n ( ) n( ) Compute b a = b Compute a 9
The Regresson Equaton The resultant equaton: c = 7.4 + 34.49 Interpretaton and reasonableness check: a = 7.4 = b = 34.49 = Forecast sales wth $1800 advertsng: Evaluatng the Model How Well Dd We Do? 31 3 Compare Actuals wth Estmates Model Estmate c Error (-c) Error (-c).0 66.09.93 76.44 59.19 8.15 59.19-6.09-0.93 8.56-4.19 1. 0.81 37.11 0.87 73.8 17.58 4.4 5 Correlaton Analyss Measures the degree of assocaton between two varables 33 34 Measurng Correlaton We compare two approaches to estmatng or forecastng for a gven : Usng the mean of Usng our least-squares regresson lne We could use to estmate (for any ) and, on average, be ok _ Can regresson do better? Varaton Analyss 35 36
Let s look at varatons around the regresson lne to see how much better t explans the s than the mean Varaton Analyss y 1 _ (x 1,y 1 ) c Explaned devaton from the mean: (c-) Devaton explaned by the regresson lne Explaned Devaton y 1 c1 _ (x 1,y 1 ) c Explaned } Devaton x 1 x 1 37 38 Devaton from the mean not explaned by the regresson lne: (y 1 -c) Unexplaned Devaton y 1 (x 1,y 1 ) Unexplaned c devaton { c1 _ Explaned } devaton The total devaton from the mean = explaned + unexplaned Total Devaton (x 1,y 1 ) y 1 c Total { c1 _ devaton{ } x 1 x 1 39 40 Varaton Varaton s the square of devatons from the mean of Total varaton = Explaned + Unexplaned varaton Total = Explaned + Unexplaned ( ) = ( ) + ( ) c c Sample coeffcent of determnaton: Explaned varaton r = Total varaton Porton Explaned, r The fracton of varaton from the mean explaned by the regresson lne r = ( c ) ( ) 41 4
r = 1 Perfect lnear correlaton All ponts are explaned by the lne All ponts are on the lne Extreme Values of r r = 0 No correlaton The regresson does not explan the data any better than the mean of provdes no useful nformaton about n ths context The correlaton coeffcent, r : r =± r Correlaton Untless Sgn: + f b>0, - f b<0 Smply a dfferent way of expressng the relatonshp (correlaton) between two varables 43 44 Correlaton Coeffcent r = +/-1 Only f a perfect lnear relatonshp =a+b exsts All ponts on the lne Some thnk that t looks better than r r = 0.36 r = 0. y x Example Scatterplot A 58 51 3 4 67 65 54 5 y 39 45 38 4 45 31 31 51 54 6 67 7 44 46 40 36 53 44 5 63 59 38 41 48 51 43 38 4 x 45 46 y x 38 45 5 58 5 40 6 41 57 61 34 56 70 64 50 40 56 65 5 57 53 43 45 54 56 35 54 63 34 5 40 48 35 51 y Example Scatterplot B x Shows The drecton of the relatonshp The strength of assocaton Cautons It only measures lnear assocaton It s unstable wth a small sample sze Is dstorted by extreme values or by ncludng dfferent data sets n the analyss Correlaton Coeffcent 47 48
Nonlnear Relatonshp Monkey Data Wt Ht 1 9 45 7 3 35 17 4 39 9 5 53 31 6 41 1 7 51 31 8 35 13 9 57 37 10 57 41 11 45 45 1 47 35 13 35 5 14 49 5 15 43 31 16 51 33 17 31 9 18 53 7 19 47 17 0 51 45 49 50 Monkey & Kng Kong Data 1 9 45 7 3 35 17 4 39 9 5 53 31 6 41 1 7 51 31 8 35 13 9 57 37 10 57 41 11 45 45 1 47 35 13 35 5 14 49 5 15 43 31 16 51 33 17 31 9 18 53 7 19 47 17 0 51 45 KK 1 150 Multple Regresson Same concept, more varables 51 5 Multple Regresson Models An extenson of the smple case Permts use of more varables to try to explan more varaton Example model: = a+ b11+ bl Real Estate Example Monthly sales () are related to Mortgage rates ( 1 ) Number of salespersons ( ) Wth smple regresson models: = a + b 1, r = 0.36 = a + b, r = 0.5 Multple regresson model = a + b 1 1 + b, r = 0.49, not 1! 53 54
Real Estate Example Why s not more varaton explaned? Multcollnearty exsts: 1 s correlated wth We want ndependence of the s (uncorrelated) Total varaton Explaned by 1 Explaned by MLR Software 56 MLR Input Ttle lne Varables and observatons Labels for varables, dependent last For each observaton j values, followed by j j s n label order Blanks separate all values and labels MLR Reports Descrptve statstcs Correlaton matrx and determnant Regresson equaton, each varable: Label coeffcent beta value standard error of the coeffcent t-statstc and probablty that b = 0 57 58 MLR Reports Analyss of varance P(nsgnfcant regresson model) Summary statstcs r s y,x Resdual summary (optonal) Resduals (errors) Graph Standard Error of the Estmate The standard devaton of the observed values of from the regresson lne s yx, ( c ) a b = = n n On average, how the data vares around the regresson lne 59
Confdence Intervals Usng the 68-95-99.7 Rule of Normalty µ ± 1 σ ncludes 68% of all values µ ± σ ncludes 95% µ ± 3 σ ncludes 99.7% b ± Z s y,x gves confdence nterval for a gven probablty and assocated Z- value If Z=1, a 68% confdence that the nterval contans the true regresson coeffcent 61