The simple linear Regression Model

The smple lear Regresso Model Correlato coeffcet s o-parametrc ad just dcates that two varables are assocated wth oe aother, but t does ot gve a deas of the kd of relatoshp. Regresso models help vestgatg bvarate ad multvarate relatoshps betwee varables, where we ca hpothesze that 1 varable depeds o aother varable or a combato of other varables. Normall relatoshps betwee varables poltcal scece ad ecoomcs are ot eact uless true b defto, but relatoshps clude most ofte a o-structural or radom compoet, due to the probablstc ature of theores ad hpotheses PolSc, measuremet errors etc. Regresso aalss eables to fd average relatoshps that ma ot be obvous b just ee-ballg the data eplct formulato of structural ad radom compoets of a hpotheszed relatoshp betwee varables. Eample: postve relatoshp betwee uemplomet ad govermet spedg

Smple lear regresso aalss Lear relatoshp betwee (eplaator varable) ad (depedet varable) Epslo descrbes the radom compoet of the lear relatoshp betwee ad -10-5 0 5 10 15-2 0 2 4 6

Y s the value of the depedet varable (spedg) observato (e.g. the UK) Y s determed b 2 compoets: 1. the o-radom/ structural compoet alpha+beta* where s the depedet/ eplaator varable (uemplomet) observato (UK) ad alpha ad beta are fed quattes, the parameters of the model; alpha s called costat or tercept ad measures the value where the regresso le crosses the -as; beta s called coeffcet/ slope, ad measures the steepess of the regresso le. 2. the radom compoet called dsturbace or error term epslo observato

A smple eample: has 10 observatos: 0,1,2,3,4,5,6,7,8,9 The true relatoshp betwee ad s: =5+1*, thus, the true takes o the values: 5,6,7,8,9,10,11,12,13,14 There s some dsturbace e.g. a measuremet error, whch s stadard ormall dstrbuted: thus the we ca measure takes o the values: 6.95,5.22,6.36,7.03,9.71,9.67,10.69,13.85, 13.21,14.82 whch are close to the true values, but for a gve observato the observed values are a lttle larger or smaller tha the true values. the relatoshp betwee ad should hold o average true but s ot eact Whe we do our aalss, we do t kow the true relatoshp ad the true, we just have the observed ad. We kow that the relatoshp betwee ad should have the followg form: =alpha+beta*+epslo (we hpothesze a lear relatoshp) The regresso aalss estmates the parameters alpha ad beta b usg the gve observatos for ad. The smplest form of estmatg alpha ad beta s called ordar least squares (OLS) regresso

OLS-Regresso: Draw a le through the scatter plot a wa to mmze the devatos of the sgle observatos from the le: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 2 4 6 8 10 alpha 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 hat1 0 1 2 3 4 5 6 7 8 9 10 epslo7 Ftted values 7 hat7 Mmze the sum of all squared devatos from the le (squared resduals) ŷ ˆ ˆ ˆ ˆ ˆ ˆ Ths s doe mathematcall b the statstcal program at had the values of the depedet varable (values o the le) are called predcted values of the regresso (hat): 4.97,6.03,7.10,8.16,9.22, 10.28,11.34,12.41,13.47,14.53 these are ver close to the true values ; the estmated alpha = 4.97 ad beta = 1.06

OLS regresso Ordar least squares regresso: mmzes the squared resduals ŷ ˆ ˆ ˆ ˆ 2 2 ˆ 1 1 (Y Y ) ( ) m Compoets: DY: ; at least 1 IV: Costat or tercept term: alpha Regresso coeffcet, slope: beta Error term, resduals: epslo Compoet plus resdual -10-5 0 5 10 15-2 0 2 4 6

Dervato of the OLS-Parameters alpha ad beta: The relatoshp betwee ad s descrbed b the fucto: The dfferece betwee the depedet varable ad the estmated sstematc fluece of o s amed the resdual: e ˆ ˆ To receve the optmal estmates for alpha ad beta we eed a choce-crtero; the case of OLS ths crtero s the sum of squared resduals: we calculate alpha ad beta for the case whch the sum of all squared devatos (resduals) s mmal 2 2 ˆ ˆ ˆ ˆ m e m S, ˆ, ˆ ˆ, ˆ 1 1 Takg the squares of the resdual s ecessar sce a) postve ad egatve devato do ot cacel each other out, b) postve ad egatve estmato errors eter wth the same weght due to the squarg dow, t s therefore rrelevat whether the epected value for observato s uderestmated or overestmates Sce the measure s addtve o value s of outmost relevace. Especall large resduals receve a stroger weght due to squarg.

Mmzg the fucto requres to calculate the frst order codtos wth respect to alpha ad beta ad set them zero: ˆ ˆ I : 2 0 S, ˆ ˆ ˆ 1 ˆ ˆ II : 2 0 S, ˆ ˆ ˆ 1 Ths s just a lear sstem of two equatos wth two ukows alpha ad beta, whch we ca mathematcall solve for alpha: 1 I : ˆ ˆ 0 1 ˆ ˆ ˆ ˆ

ad beta: 1 1 1 II : ˆ ˆ 0 ˆ ˆ 0 2 ˆ ˆ 0 2 ˆ ˆ 2 ˆ ˆ 1 1 0 0 ˆ ˆ 0 ˆ ˆ 1 1 1 1 1 1 1 2 Cov, V ar 1 X 'X X '

Naturall we stll have to verf whether ˆ ad ˆ reall mmze the sum of squared resduals ad satsf the secod order codtos of the mmzg problem. Thus we eed the secod dervatves of the two fuctos wth respect to alpha ad beta whch are gve b the so called Hessa matr (matr of secod dervatves). (I spare the mathematcal dervato) The Hessa matr has to be postve defte (the determat must be larger tha 0) so that ˆ ad ˆ globall mmze the sum of squared resduals. Ol ths case alpha ad beta are optmal estmates for the relatoshp betwee the depedet varable ad the depedet varable.

Regresso coeffcet: ˆ 1 ( )( ) 1 ( ) 2 Beta equals the covarace betwee ad dvded b the varace of.

Iterpretato of regresso results: reg Source SS df MS Number of obs = 100 -------------+---------------------------------------------- F( 1, 98) = 89.78 Model 1248.96129 1 1248.96129 Prob > F = 0.0000 Resdual 1363.2539 98 13.9107541 R-squared = 0.4781 -------------+---------------------------------------------- Adj R-squared = 0.4728 Total 2612.21519 99 26.386012 Root MSE = 3.7297 ---------------------------------------------------------------------------------------------------- Coef. Std. Err. t P> t [95% Cof. Iterval] -------------+------------------------------------------------------------------------------------- 1.941914.2049419 9.48 0.000 1.535213 2.348614 _cos.8609647.4127188 2.09 0.040.0419377 1.679992 ---------------------------------------------------------------------------------------------------- If creases b 1 ut, creases b 1.94 uts: the terpretato s lear ad straghtforward

Iterpretato: eample alpha=4.97, beta=1.06 Educato ad eargs: o educato gves ou a mmal hourl wage of aroud 5 pouds. Each addtoal ear of educato creases the hourl wage b app. 1 poud: alpha 0 1 2 3 4 5 6 7 8 9 10 11 12 1314 15 beta=1.06 0 1 2 3 4 5 6 7 8 9 10 Ftted values

Propertes of the OLS estmator: Sce alpha ad beta are estmates of the ukow parameters, ŷ ˆ ˆ estmates the mea fucto or the sstematc part of the regresso equato. Sce a radom varable ca be predcted best b the mea fucto (uder the mea squared error crtero), hat ca be terpreted as the best predcto of. the dfferece betwee the depedet varable ad ts least squares predcto s the least squares resdual: e=-hat =-(alpha+beta*). A large resdual e ca ether be due to a poor estmato of the parameters of the model or to a large usstematc part of the regresso equato For the OLS model to be the best estmator of the relatoshp betwee ad several codtos (full deal codtos, Gauss- Markov codtos) have to be met. If the full deal codtos are met oe ca argue that the OLSestmator mtates the propertes of the ukow model of the populato. Ths meas e.g. that the eplaator varables ad the error term are ucorrelated.