We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample from a populatio with mea zero ad stadard deviatio σ. I most cases we also assume that this populatio is ormally distributed. The multiple liear regressio model is i = + x i + x i + x i + + x i + ε i for i =,,,, This model icludes the assumptio about the ε i s stated just above. This requires buildig up our symbols ito vectors. Thus = captures the etire depedet variable i a sigle symbol. The part of the otatio is just a shape remider. These get dropped oce the cotext is clear. For simple liear regressio, we will capture the idepedet variable through this matrix: X = x x x x The coefficiet vector will be = ad the oise vector will be ε = ε ε ε. ε
The simple liear regressio model is writte the as = X + ε. The product part, meaig X, is foud through the usual rule for matrix multiplicatio as X x + x x + x = x = + x x + x We usually write the model without the shape remiders as = X + ε. otatio for This is a shorthad + x +ε + x +ε = + x +ε + x +ε It is helpful that the multiple regressio story with predictors leads to the same model expressio = X + ε (just with differet shapes). As a otatioal coveiece, let p = +. I the multiple regressio case, we have X = p x x x x x x x x x x x x x x x x x x x x x 4 4 4 5 5 5 6 6 6 ad = p The detail show here is to suggest that X is a tall, skiy matrix. We formally require p. I most applicatios, is much, much larger tha p. The ratio p is ofte i the hudreds.
If it happes that p is as small as 5, we will worry that we do t have eough data (reflected i ) to estimate the umber of parameters i (reflected i p). The multiple regressio model is ow = X + ε, ad this is a shorthad for p p + x + x + x + + x + ε + x + x + x + + x + ε = + x + x + x + + x +ε + x + x + x + + x + ε The model form = X + ε is thus completely geeral. The assumptios o the oise terms ca be writte as E ε = ad Var ε = σ I. The I here is the idetity matrix. That is, I = The variace assumptio ca be writte as Var ε = this expressed as Cov( ε i, ε j ) = σ δ ij, where σ σ σ. ou may see σ δ ij if i = = if i j j
We will call b as the estimate for ukow parameter vector. ou will also fid the otatio ˆ as the estimate. Oce we get b, we ca compute the fitted vector ˆ = X b. This fitted value represets a ex-post guess at the expected value of. The estimate b is foud so that the fitted vector ˆ is close to the actual data vector. Closeess is defied i the least squares sese, meaig that we wat to miimize the criterio Q, where ( i th i etry ) Xb Q = ( ) This ca be doe by differetiatig this quatity p = + times, oce with respect to b, oce with respect to b,.., ad oce with respect to b. This is routie i simple regressio ( = ), ad it s possible with a lot of messy work i geeral. It happes that Q is the squared legth of the vector differece ca write Xb. This meas that we Q = ( Xb) ( Xb) This represets Q as a matrix, ad so we ca thik of Q as a ordiary umber. There are several ways to fid the b that miimizes Q. The simple solutio we ll show here (alas) requires kowig the aswer ad workig backward. Defie the matrix ( ) H = X X X X. We will call H as the hat matrix, ad it has p p p p some importat uses. There are several techical commets about H : () Fidig H requires the ability to get ( ) X X. This matrix iversio is p p possible if ad oly if X has full rak p. Thigs get very iterestig whe X almost has full rak p ; that s a loger story for aother time. () The matrix H is idempotet. The defiig coditio for idempotece is this: The matrix C is idempotet C C = C. Oly square matrices ca be idempotet. Sice H is square (It s.), it ca be checked for idempotece. ou will ideed fid that H H = H. 4
() The i th diagoal etry, that i positio (i, i), will be idetified for later use as the i th leverage value. The otatio is usually h i, but you ll also see h ii. Now write i the form H + (I H). Now let s develop Q. This will require usig the fact that H is symmetric, meaig H = H. This will also require usig the traspose of a matrix product. Specifically, the property will be ( X b) = b X. Q = ( Xb) ( Xb ) = ( I ) ( { } ) { ( I ) } ( ) H + H Xb H+ H Xb ( H Xb + H ) ( { H Xb} + ( I H) ) = { } ( I ) = { H Xb} { H Xb} { H Xb} ( I H) (( I H ) ) { H Xb} (( I H) ) ( I H) + + + The secod ad third summads above are zero, as a cosequece of X HX = X X ( X X) X X = X X =. ( I H ) X = { } { } ( I ) ( ) ( I ) = H Xb H Xb + H H If this is to be miimized over choices of b, the the miimizatio ca oly be doe with regard H Xb H Xb. It is possible to make the vector to the first summad { } { } - H Xb equal to by selectig b = ( X X) - H = X ( X X) X. X. This is very easy to see, as This b = ( ) - X X X is kow as the least squares estimate of. 5
b For the simple liear regressio case =, the estimate b = ad be foud with relative b Sxy ease. The slope estimate is b = xi x i = xii x ad where S xx = ( x x ) S, where S xy = ( )( ) xx i = xi ( x). For the multiple regressio case, the calculatio ivolves the iversio of the p p matrix X X. This task is best left to computer software. There is a computatioal trick, called mea-ceterig, that coverts the problem to a simpler oe of ivertig a matrix. The matrix otatio will allow the proof of two very helpful facts: * E b =. This meas that b is a ubiased estimate of. This is a good thig, but there are circumstaces i which biased estimates will work a little bit better. * Var b = ( ) σ X X. This idetifies the variaces ad covariaces of the estimated coefficiets. It s critical to ote that the separate etries of b are ot statistically idepedet. 6