Linear Regression and Multilinear Regression 1 Linear Regression For this section, assume that we are given data points in the form (x i, y i ) for i = 1,, N and that we desire to fit a line to these data points So, we need to find the slope and y intercept that make a line fit these data best The approach used in linear regression is to minimize the sum of the squares of the differences between the data and a line; that is, find the values of a and b that minimize the sum R = N (ax i + b y i ) (1) i=1 Minimizing this sum is called least squares minimization The minimum of the sum above will occur at a critical point By the first derivative test from calculus, we can find critical points by solving the system R a = 0 R b = 0 Specifically, finding the partial derivatives and rewriting gives a x i + b x i = x i y i a x i + Nb = (3) y i Since the values x i, x i, x i y i, y i can be computed, the above is a system of equations in unknowns These equations are known as the normal equations For example, consider performing linear regression on the data points (1, 101), (, 104), (3, 109), (4, 108), (5, 110) Then, the normal equations in Eq (3) become 55 a + 15 b = 1618 15 a + 5 b = 53 Solving this system gives a = 0, b = 998 So, the regression line (the line of best fit ) for the above data is y = 0x + 998 1 ()
The normal equations in Eq (3) can be solved to obtain general expressions for a and b These are the nasty formulas that are often taught in traditional, elementary statistics courses The formulas are: yi x a = i x i xi y i N x i ( x i ) b = N x i y i x i yi N x i ( x i ) Using matrices makes dealing with the normal equations a little easier For example, we can write the normal equations in Eq (3) as ( ) ( ) ( ) x i xi a = xi y i xi N b yi To solve this system, we must invert the -by- coefficient matrix and multiply it on both sides to get ( ) ( ) a x 1 ( ) = i xi xi y i b xi N yi So, the real work in solving this system is in inverting (if possible) the coefficient matrix Most of the applications of linear regression involve a large number of data points, and hence, a simple way of computing the coefficient matrix is useful The standard approach is to create the coefficient matrix from another matrix which is defined using the data as follows x 1 1 x 1 x 3 1 x N 1 Then, we can write ( x A t i xi xi N ), A t Y = ( ) xi y i yi (Be sure to check by hand that this is true) Thus, the normal equations can be written A t A W = A t Y (4) with W = ( ) a, Y = b y 1 y y 3 y N
The solution to this matrix equation is then W = (A t A) 1 A t Y, (5) if (A t A) 1 exists Although the definition for A appears to be arbitrarily chosen, there is a good reason behind it Remember that for linear regression, we want to find a, b so that we can reproduce y i from x i for each data point Specifically, we want ax 1 + b = y 1 ax + b = y ax 3 + b = y 3 ax N + b = y N This system of equations can also be written using our definition of A, W, and Y as AW = Y To solve this system, we would like to multiply by A 1 on both sides BUT, A is not a square matrix and cannot be inverted So, we need to rewrite the system so that we have a square matrix involved To do this, multiply both sides of the matrix equation by A t to obtain the matrix version of the normal equations A t A W = A t Y So, the definition of A makes sense and works! Multilinear Regression With multilinear regression, we assume that the dependent data, y i, depends linearly on several independent variables, x 1, x,, x k For the purposes of this discussion, assume that the given data depends only on two independent variables So, data points are of the form (x 11, x 1, y 1 ), (x 1, x, y ), (x 13, x 3, y 3 ),, (x 1N, x N, y N ) The goal is to minimize the sum R = N (a 1 x 1i + a x i + b y i ) (6) i=1 3
Ideally, we want to find a 0, a 1, a so that Rewrite this system as where a 1 x 11 + a x 1 + b = y 1 a 1 x 1 + a x + b = y a 1 x 13 + a x 3 + b = y 3 a 1 x 1N + a x N + b = y N AW = Y, x 11 x 1 1 y x 1 x 1 1 a 1 y x 13 x 3 1, W = a, Y = y 3 b x 1N x N 1 To solve this system, we would like to multiply by A 1 on both sides BUT, A is not a square matrix and cannot be inverted So, we need to rewrite the system so that we have a square matrix involved To do this, multiply both sides of the matrix equation by A t to obtain A t A W = A t Y Note that this equation is the same as Eq (4) The difference lies only in how the coefficient matrix A is created Indeed, if you take the partial derivatives R / a 1, R / a, and R / b as we did in linear regression, you will find that the equation above represents the normal equations in multilinear regression Now, A t A is a square matrix We can multiply by its inverse on both sides of our system, if it exists Thus, the solution for multilinear regression is W = (A t A) 1 A t Y, (7) if (A t A) 1 exists The fact that the matrix equations for linear and multilinear regression appear the same make this matrix approach very appealing Also, there are lots of numerical methods for finding (A t A) 1 accurately Finally, consider the data set below as an example (0, 030, 1014),(069, 060, 1193), (110, 090, 1357) (139, 10, 1417),(161, 150, 155), (179, 180, 1615) y N 4
Then, and so Then, 0 030 1 1014 069 060 1 1193 110 090 1 139 10 1, Y = 1357 1417 161 150 1 155 179 180 1 1615 941 871 658 A t 871 819 630 658 630 600 a 1 09 W = a = (A t A) 1 A t Y = 150 b 969 Thus, the line that best fits the data is y = 09x 1 + 150x + 969 5