Chapter Regresso Aalyss Defto: Whe the values of two varables are measured for each member of a populato or sample, the resultg data s called bvarate. Whe both varables are quattatve, we may represet the data set as a set of ordered pars of umbers, (, y). The varable s called the put (or depedet) varable; the varable y s called the respose (or depedet) varable. We may eame the relatoshp betwee the two varables graphcally usg a scatter dagram, or scatterplot. The smplest type of model relatg two quattatve varables s called a smple lear regresso model, whch there s a assumed lear relatoshp betwee two varables. Oe varable s called the depedet varable, or predctor varable. The other varable s called the depedet varable, or the respose varable. Smple Lear Regresso Model The respose varable s assumed to be related to the predctor varable accordg to the followg equato: Y, where Y the value of the respose varable for the th member of the sample, a parameter, called the tercept of the le of best ft, or the regresso le, a parameter, called the slope of the le of best ft, or the regresso le, the value of the predctor varable for the th member of the sample,
a radom error varable assocated wth the th member of the sample; t s assumed that the radom errors are depedet ad ~ Normal, detcally dstrbuted, wth. A pcture of the model s show o p. 39. Sce t s assumed that a lear tred relatoshp ests betwee the predctor varable ad the respose varable, before we proceed to use the model, we must do a scatterplot to see whether the assumpto of learty s reasoable. We eed to use sample data to estmate the three parameters,,,. The estmato wll be doe usg the method of least squares. Gve a sample of sze, the data cossts of ordered pars, (, y ), (, y ),, (, y ). We wll fd the best estmators of the slope ad tercept by mmzg the resdual sum of squares (also called the error sum of squares): SSE y y e y, wth respect to the two parameters. I dog ths, we are smultaeously mmzg the squared vertcal dstaces of the data pots from the le of best ft to the data. A cocrete eample s useful here. Eample: p. 3
Image costructg ths scatterplot cocretely as follows: 3 ) Draw the coordate aes o a sheet of plywood. ) Hammer als to the board at each data pot. 3) Obta a th woode dowel ad s rubber bads. 4) Place each rubber bad aroud the dowel ad oe of the als. 5) Wat utl the dowel comes to rest. The rest posto of the dowel wll be the mmum eergy cofgurato of the system, the cofgurato for whch there wll be the least total stretchg of the rubber bads. Ths posto wll also be the least squares regresso le relatg thermal coductvty ad desty. We dfferetate SSE w.r.t. each parameter, ad set each dervatve equal to, obtag SSE y SSE y., ad Ths gves us two equatos two ukows, called the ormal equatos: y, ad y. The soluto s
4 y y y. SS SS y y y, The the estmated regresso le, or le of best ft to the data, s gve by: Y. The estmate of the error varace s foud from the error sum of squares to be SSE MSE. There are oly degrees of freedom assocated wth the error sum of squares because two parameters, the slope ad the tercept, have already bee estmated. To do ferece, we eed to kow the dstrbutoal propertes of the estmators, ad. Oe of the basc assumptos of the model s that the radom error terms, are..d. ormal wth mea ad commo varace. The Y ~ Normal,. Furthermore, the Y s are depedet of each other. From the ormal equatos, t s clear that s a lear fucto of the Y s, ad that s also a lear fucto of the Y s. We kow that a statstc that s a lear fucto of depedet ormal radom varables also has a ormal dstrbuto.
Specfcally, t ca be show that both estmators are ubased, ad that ~ Normal,, ad that SS ~ Normal, SS. We may use these facts to do hypothess testg ad terval estmato about the slope ad tercept. The stadard error of s gve by SE MSE SS. The stadard error of s gve by 5 SE MSE SS Therefore, we fd that. MSE SS ~ t, ad that ~ t. We wat to test whether there s a MSE SS lear tred relatoshp betwee the predctor ad the respose varable. Our hypotheses are H : v. H a :. We may use the dstrbutoal propertes of the estmated slope to fd a test statstc. We may do the hypothess test usg the t-dstrbuto of the estmator.
Tmee to Fracture (Hours) Eample: The paper A study of staless steel stress-corroso crackg by potetal measuremets (Corroso, 96, pp. 45-43) reported o the relatoshp betwee appled stress (the predctor varable,, kg/mm ) ad tme to fracture (the respose varable, hours) for 8-8 staless steel uder uaal tesle stress a 4% CaCl soluto at C. Ted dfferet settgs of appled stress were used, ad the resultg data values (as read from a graph whch appeared the paper) are gve the table below:.5 5 5 7.5 5 3 35 4 y 63 58 55 6 6 37 38 45 46 9 We wat to ) determe whether there s a lear tred relatoshp betwee appled tesle stress ad tme to fracture, ad ) estmate the relatoshp. We frst do a scatterplot, usg Ecel: Scatterplot of Tme to Fracture v. Tesle Stress 6 7 6 5 4 3 3 4 5 Tesle Stress (kg/square mm) It appears that there s a moderately strog egatve lear tred relatoshp betwee tme to fracture ad tesle stress.
Net we wat to test whether ths relatoshp geeralzes to the etre populato of 8-8 staless steel samples. 7 Step : H : H a :. Step :. =.5 Step 3: The test statstc that wll be used s MSR F, whch uder MSE the ull hypothess has a F(, 7). Step 4: We wll reject the ull hypothess f the value of the test statstc s greater tha F,7,.5 5.59.. Step 5: We eter the data Ecel. We choose Tools, Data Aalyss, ad Regresso. Ecel produces the followg ANOVA table. SUMMARY OUTPUT Regresso Statstcs Multple R.79537 R Square.635866 Adjusted R Square.5865835 Stadard Error 9.437466 Observatos ANOVA df SS MS F Sgfcace F Regresso 46.3766 46.3766 3.76978954.5949788 Resdual 8 666.38938 83.598673 Total 9 8.4 Coeffcets Stadard Error t Stat P-value Itercept 66.47699 5.6489399.75938.556E-6 - X Varable -.9884956.477596 3.776676.5949788
Step 6: We reject the ull hypothess at the.5 level of sgfcace. We have suffcet evdece to coclude that ;.e., there s a lear tred relatoshp betwee tesle stress ad tme to fracture. Def: The coeffcet of determato s defed by SSE SSR R. Ths quatty s the proporto of the varato SST SST of the respose varable that s eplaed by the lear relatoshp betwee the predctor varable ad the respose varable. I our eample, R =.635. Hece 63.5% of the varato tme to fracture s eplaed by the lear relatoshp betwee tesle stress ad tme to fracture. A large value for R (ear ) dcates that the model has good eplaatory power. A value for R ear dcates that the model does ot have good eplaatory power. The estmated regresso equato (le of best ft), may also be read from the last table the Ecel output. We have Y 66.477. 99. Ths says that for every kg/mm crease tesle stress, the tme to fracture decreases by.99 hours, o average. If the appled tesle stress s kg/mm, the the predcted tme to fracture s Y 66.477 (.99)() 55. 669 hours. 8