Weght Multple Regresson Multple Regresson Relatng a response (dependent, nput) y to a set of eplanatory (ndependent, output, predctor) varables,,,. A technue for modelng the relatonshp between one response varable wth several predctor varables. y y,,,..., Determnstc component... Random component Multple Regresson : y... mnmze e [ y (... )],,,,..., n the model can all be estmated by least suare estmators:, ˆ, ˆ, ˆ,..., ˆ ˆ The Least-Suare Regresson Euaton: y ˆ ˆ ˆ ˆ... ˆ ˆ Study weght (y) usng age ( ) and heght ( ). 0 0 00 00 Data: (months), heght (nches), weght (pounds) were recorded for a group of school chldren. 0 00 : 0 Scatter plo above show that both age and heght are lnearly related to weght. y 50 wth weght y, age, and heght 4 SPSS output Summary Adjusted Std. Error of R R Suare R Suare the Estmate.794 a..67.868 a. Predctors:,, Coeffcent of determnaton: the percentage of varablty n the response varable (Weght) that can be descrbed by predctor varables (, ) through the model. 5 Regresson Resdual a. Predctors:,, b. Dependent Varable: Weght ANOVA b Sum of Suares df Mean Suare F Sg. 56.54 86.67 99..000 a 9.76 4.858 8994.05 6 Test for sgnfcance of the model: H 0 : s nsgnfcant ( s are all zeros). H a : s sgnfcant (Some s are not zeros). 6
Multple Regresson estmaton: SPSS output -7..099-0.565.000..055.8 4..000.579.77.090.57.67.008.000.579.77 Tes for Regresson Coeffcen H 0 : = 0 vs. H a : 0 H 0 : = 0 vs. H a : 0 H 0 : = 0 vs. H a : 0 Collnearty * statstcs: If the VIF (Varance Inflaton Factor) s greater than 0 there s problem of Multcollnearty. (Some sad VIF needs to be less than 4.) 7-7..099-0.565.000..055.8 4..000.579.77.090.57.67.008.000.579.77 Least suare regresson euaton: yˆ 7.8.4. 09 The average weght of chldren 44 months old and whose heght s 55 nches would be: 7.8 +.4(44) +.09(55) = 76.69 lbs (estmated by the model) 8 How to nterpret, and? : y = + where y: Weght, :, : s the constant or the y-ntercept n the model. It s the average response when both predctor varables are 0. s the rate of change of epected (average) weght per unt change of age adjusted for the heght varable. s the rate of change of epected (average) weght per unt change of heght adjusted for the age varable. Other possble models: ( y: Weght, :, : ) y = + + y = + + Interacton term Wth nteracton term (Non-addtve): y = + + + + y = + + + y = + + + 9 0 Coeffcent Estmaton wth Interacton Between and : INTAG_HT y wth weght y, age, and heght 66.996 06.89.6.59 -.97.6 -.9 -.476.4.004 50.009 -.E-0. -.006 -.08.985.0 77.06.96E-0.00.66.847.066.00 50.996 Hgh VIF mples very serous collnearty. Interacton should not be used n the model. For boys: -.7 5.590-7.94.000.08.084.89.67.000.44.59.68.68.574 7.8.000.44.59 Is there a serous collnearty? Wrte the weght predcton euaton usng age and heght as predctor varables. Fnd the average weght for boys that are 44 months old and 55 nches tall.
Weght Multple Regresson For grls: -50.597 0.767-7.5.000.9.076.86.54.0.4..4.8.650 8.88.000.4. Is there a serous collnearty? Wrte the weght predcton euaton usng age and heght as predctor varables. Fnd the average weght for boys that are 44 months old and 55 nches tall. Indcator Varables - are bnary varables that take only two possble values, 0 and, and can be use for ncludng categorcal varables n the model. Weght Male: Female: 0 Male Female Group Statstcs Std. Error N Mean Std. Devaton Mean 6 0.448 9.968.779 98.878 8.66.767 4 One Bnary Independent Varable : (A model that models two ndependent samples stuaton wth eual varances condton.) y = + Two ndependent samples t-test can be modeled wth smple lnear regresson model SPSS output for two ndependent samples t-test for comparng the mean weght between male and female. Levene's Test f or Eualty of Varances Independent Samples Test t-test for Eualty of Means where y : Weght, : ( = 0 for female, = for male) When = 0: y = When = : y = The dfference of the means of the two categores s. 5 Mean Std. Error F Sg. t df Sg. (-taled) Dff erence Dff erence Weght Eual varances..48.85 5.07 4.5.58 assumed Eual varances.8 4..0 4.5.507 not assumed SPSS output for lnear regresson wth gender as predctor 98.878.86 5.846.000 6 4.5.58.8.85.07.000.000 L and as Predctor Varables : y wth y weght, ( 0 female, age, and male) gender -.8 8.778 -.74.04.669.05.64.5.000.000.000 4.59.94.7.8.00.000.000 0 00 and are both sgnfcant varables for predctng weght. Male There s sgnfcant dfference n average weght between genders f adjusted for age varable. Female 0 00 0 7 8
Multple Regresson,, & as Predctors : wth y y weght age heght gender ( 0 female, male) W e g h t 0 00 0 00 50 Male 9 Female 0-8.09.64-0.454.000.8.056.6 4.50.000.56.7.05.67..6.000.59.854 -.8.4 -.009 -.0.84.9.07 varable becomes nsgnfcant wth and varables n the model. When comparng the dfference n average wegh between genders, and adjusted for age and heght varables, the dfference s statstcally nsgnfcant. How to nclude a categorcal varable n the model? The proper way to nclude a categorcal varable s to use ndcator varables. For havng a categorcal varable wth k categores, one should set up k ndcator varables. Race varable: Whte =, Black =, Hspanc =. - ndcator varables wll be needed. Common Mstake: Use of the nternally coded values of a categorcal eplanatory varable drectly n lnear regresson modelng calculaton. Race : Whte =, Black =, Hspanc =. Number of hours of eercse per week Use of ndcator varables and for Race varable = represen Whte, otherwse = 0, = represen Black, otherwse = 0, = 0 and = 0 represen Hspanc. : y = Body Fat Percentage Number of hours of eercse per week Race : y = Body Fat Percentage Race Interpretaton of the model: Race: Whte = and = 0, y = Race: Black = 0 and =, y = Race: Hspanc = 0 and = 0, y = 4 4
Female lfe epectancy 99 Multple Regresson Suppose that the least suares regresson euaton for the model above s y 0... Estmate the avg. body fat for a whte person eercse 0 hours per week: 0 +. +. 0.0 =. Study female lfe epectancy usng percentage of urbanzaton and brth rate. 90 90 Estmate the avg. body fat for a black person eercse 0 hours per week: 0 +. 0 +..0 = 0. Estmate the avg. body fat for a hspanc person eercse 0 hours per week: 0 +. 0 +. 0.0 = 8.9 50 50 5 0 0 0 0 50 Brths per 000 populaton, 99 0 0 Percent urban, 99 00 6 0 y lfe epectancy, : y brth rate, Summary Adjusted Std. Error of R R Suare R Suare the Estmate.904 a.87.8 4.89 a. Predctors:, Brths per 000 populaton, 99, Percent urban, 99 percent urban Regresson Resdual ANOVA b Sum of Suares df Mean Suare F Sg. 577.056 688.58 6.595.000 a 85. 8.948 5.876 0 a. Predctors:, Brths per 000 populaton, 99, Percent urban, 99 b. Dependent Varable: Female lfe epectancy 99 Test for sgnfcance of the model: Coeffcent of determnaton: the percentage of varablty n the response varable (female lfe epectancy) that can be descrbed by predctor varables (brth rate, percentage of urbanzaton) through the model. 7 H 0 : s nsgnfcant ( s are all zeros). H a : s sgnfcant (Some s are not zeros). 8 estmaton: (SPSS output) Brths per 000 populaton, 99 Percent urban, 99 a. Dependent Varable: Female lfe epectancy 99 Tes for Regresson Coeffcen H 0 : = 0 v.s. H a : 0 H 0 : = 0 v.s. H a : 0 H 0 : = 0 v.s. H a : 0.50.000 76.6.4 -.555.045 -.648 -.96.000.55.84.54.05. 6.8.000.55.84 Collnearty * statstcs:if the VIF (Varance Inflaton Factor) s greater than 0 there s multcollnearty problem. (Some sad VIF needs to be less than 4.) 9 Least suare regresson euaton for estmatng average response value yˆ 76.6.555. 54 The average female lfe epectancy for the countres whose brth rate per 000 s 0 and whose percentage of urbanzaton s would be 76.6-0.555(0) + 0.54() = 65.76. 0 5
Multple Regresson Female Lfe Epectancy Multple Scatter Plot Before Transformaton Female lfe epectan Response varable: Female lfe epectancy Eplana varables: Brth Rate, Urbanzaton, Phones, Doctors, and GDP. Brths per 000 popu Percent urban, 99 Phones per 00 peopl Whch varables are sgnfcant factors to female lfe epectancy n the model? Doc tors per 0,000 p GDP per c apta Multple Scatter Plot After ln() Transformaton on Phones, Doctors, GDP Female lfe epectan Summary b Adjusted R Std. Error of R R Suare Suare the Estmate Durbn-Waon.94 a.87.867 4.08.0 a. Predctors:, Natural log of GDP, Percent urban, 99, Brths per 000 populaton, 99, Natural log of doctors per 0000, per 00 people b. Dependent Varable: Female lfe epectancy 99 Brths per 000 popu Percent urban, 99 Natural log of phone Natural log of docto Natural log of GDP ANOVA b Sum of Suares df Mean Suare F Sg. Regresson.0 5 44.666 45.4.000 a Resdual 768.48 06 6.68 89.679 a. Predctors:, Natural log of GDP, Percent urban, 99, Brths per 000 populaton, 99, Natural log of doctors per 0000, per 00 people b. Dependent Varable: Female lfe epectancy 99 4 Multcollnearty Stepwse Selecton z ed Coeffcent Coeffcen s 77.448 5.89.87.000 Brths per 000 -.7.058 -.9-4.659.000.56.90 populaton, 99 Percent urban, 99.97E-0.0.04.69.5.6.5.75.679.55 4.675.000.086.590 per 00 people Natural log of doctors.894.59.6.94.00.78 5.6 per 0000 ANOVA d Sum of Suares df Mean Suare F Sg. Regresson 59.884 59.884 449..000 a Resdual 7.795 0 4.84 89.679 Regresson.84 595.4.87.000 b Resdual 0.86 09 8.907 89.679 Regresson 069.50.67 8.45.000 c Resdual 8.77 08 6.87 Natural log of GDP -.90.784 -.90 -.77.079.05 9.54 89.679 a. Dependent Varable: Female lfe epectancy 99 Tolerance measures the strength of the lnear relaton between the ndependent varables.it s better to be a. Predctors:, per 00 people Predctors:, per 00 people, Brths per 000 populaton, 99 hgher than 0.. VIF s the recprocal of Tolerance. 5 6 b. c. Predctors:, per 00 people, Brths per 000 populaton, 99, Natural log of doctors per 0000 d. Dependent Varable: Fem ale lfe epectancy 99 6
Multple Regresson What are the varables that are sgnfcantly related to the female s lfe epectancy? per 00 people per 00 people Brths per 000 populaton, 99 per 00 people Brths per 000 populaton, 99 Natural log of doctors per 0000 Coeffcen B Std. Error a. Dependent Varable: Female lfe epectancy 99 z ed Coeffcent s Beta.84.56 07.84.000 5.6.4.896.98.000.000.000 7.566.9 4.9.000.5..58 9.048.000.9.04 -.7.055 -.8-5.957.000.9.04 68.76.7 9.48.000.86.44.44 5.496.000.4 4.68 -.46.056 -.88-4.64.000..576.054.546.84.76.000. 4.6 7 Use of regresson analyss Descrpton (model, system, relaton): Relaton between lfe epectancy & brth rate, GDP, Relaton between salary & rank, years of servce, Control: Ded too young, underpad, overpad, Predcton: Lfe epectancy, salary for new comers, future salary, Varable screenng (mportant factors): Sgnfcant factors for lfe epectancy, Sgnfcant factors for salary. 8 Constructon of regresson models. Hypothesze the form of the model for y,,,..., Selectng predctor varables. Decdng functonal form of the regresson euaton. Defnng scope of the model (desgn range).. Collect the sample data (observatons, epermen).. Use sample estmate unknown parameters n the model. 4. Understand the dstrbuton of the random error. 5. dagnostcs, resdual analyss. 6. Apply the model n decson makng. 7. Revew the model wth new data. 9 What s lnear model? Eample of a lnear model: y = 0 + + y = 0 + + + y = 0 + + + + y = 0 + + + + 4 + 5 + y = 0 + ln() + y = 0 + e + s lnear n terms of parameters. 7