Limitations of regression analysis Ragnar Nymoen Department of Economics, UiO 8 February 2009
Overview What are the limitations to regression? Simultaneous equations bias Measurement errors in explanatory variables In both cases the explanatory variable is not exogenous in the econometric sense Main reference is G Ch 15.1 and 15.2;. B Ch 8.1, 10.1 and 10.2;K: Ch 9.3,10.2
What are the limitations to regression analysis? It is not linearity in variables, as we have seen it is not linearity in parameters, although we have only covered the linear regression model here Remember that by rst estimating the linear model we can use the results to estimate parameters that are non-linear functions of the estimated model s parameters (the delta method or its equivalent in the Bårdsen method) If the model is non-linear in the parameter from the outset, can use Non-Linear Least Squares to t the best non-linear curve to the data. Greene Ch 11, not in the syllabus to this course. It si not con ned to single equation, as we seen with the SURE estimator. The real limitation to the regression model is when the regression function does not contain the parameter of interest
A simple Keynes model Let Y t denote GDP in period t D 1, 2,..., T. C t is endogenous expenditure and let X t denote exogenous expenditure. Assume that C t depends on GDP, then our example model is Y t D C t C X t (1) C t D b 1 C b 2 Y t C " t, 0 < b 2 < 1 (2) " t is a random disturbance term. We assume that it is white noise uncorrelated with X t. For simplicity we assume normality " t N.0, 2 " /. The parameter of interest is the marginal propensity to consume b 2.
The reduced form of the model (1) and (2) de nes a simultaneous equations model. Solution for the two endogenous variables: Y t D 11 C 12 X t C 1t (3) C t D 21 C 22 X t C 2t (4) 11 D b 1 12 D 1 1t D 1 " t 21 D b 1 21 D b 2 2t D 1 " t
The distribution of Y and C The Reduced Form written more compactly Y t D yt C 1t (5) C t D ct C 2t (6) where 1t 2t N 2 0, y cy cy 2 c j X t. (7) The conditional distributions of the stochastic variables 1t and 2t are binormal with zero expectations and variance matrix: 2 y cy j X t. cy 2 c
Conditional distribution of C It follows that Y t and C t are normally distributed with the same covariance matrix as. 1t 2t / 0 and expectations yt D 11 C 12 X t, ct D 21 C 22 X t. It also follows (Lect 1) that the conditional distribution of C t is normal with conditional expectation: E [C t j Y t ] D ct c y yt C c y Y t (8) D 21 C 22 X t c y. 11 C 12 X t / C c y Y t D. 21 c y 11 / C. 22 c y 12 /X t C c y Y t
We see that The macro model implies (8) as the conditional expextation for C t. It is the valid regression model of C t on Y t and can be estimated with full e cency by OLS. It will not deliver an estimate of the marginal propensity to consume, b 2! In sum: The regression function implied by (1) and (2) is (8), not the regression of C t on Y t and a constant. And the regression function (8) is not helpful for the estimation for the parameter of interest b 1 (in fact since c y D 1 it estimates the identity in this special case) )
Simultaneity bias in the macro model example Suppose we estimate the consumption function by OLS regardless. We will estimate some parameter. What is it? P P Ct.Y t NY / Ct.Y t NY / Ob 2 D P D.Yt NY / 2 P Yt.Y t NY / where NY D 1/T P Y t. Ob 2 D 1 P Yt.Y t NY / X fb1 C b 2 Y t C " t g t.y t NY / (9) D P "t.y t NY / b 2 C P.Yt NY / 2 We must evaluate the term P "t.y t NY / P.Yt NY / 2 in the light of the model.
Since Y t depends on the shocks " t to consumption, and C t depends on Y t, then " t and Y t are correlated. This correlation will not go away as T grows. Using the RF expression for Y t, the denominator can be written as 1 X.Yt NY / 2 D 1 X 12.X t NX / C. 1t N 1 / 2 T T Take probability limits: plim 1 T X.Yt NY / 2 D D plim 1 T X 2 12.X t NX / 2 C 2 12 plim 1 T X.Xt NX /. 1t N 1 / C plim 1 T X.1t N 1 / 2 D 2 12 Var.X t/ C 2 y
plim b O2 b 2 D D plim 1 P T "t.y t NY / plim 1 P T.Yt NY / 2 Cov." t, Y t / 2 12 Var.X t/ C 2 y From the Reduced Form we also have Cov." t, Y t / D E [" t yt ] D E [" t 1 " t ] D 1 Var[" t ] 2 12 Var.X t/ C 2 y D D 2 " 1 2 Var.X t / C 2 "
The inconsistency of OLS, plim b O2 b 2 D D 2 " 1 2 Var.Xt / C 2 ". / 2 " Var.X t / C 2 " D./ C 1 Var.X t / 2 " The bias is positive Large variance in X t relative to " t reduces the biases. But it does not kill the bias. The reason is that OLS assumes the wrong model for C t, one with Cov.Y t, " t / D 0. It is not here.
Example with an expectations variable Assume the simple regression model (in Greene s notation again): y i D 1 C 2 x i C " i, i D 1, 2,..., n. (10) with all the classical assumptions holding. If xi is an expectations variable that we as econometricians cannot observe or cannot measure without error, we can still try to estimate 1 and 2 using the observable (actual) where x i. We then need to make assumptions about the properties of the di erence u i D x i x i. (11)
Assumptions: u i is random, zero mean, variance 2 u Cov.u i, " i / D 0 Cov.u i, x i / D 0 Both u i and " i have the classical properties The model that we estimate becomes: But with y i D 1 C 2 x i C i (12) i D " i 2 u i (13) E [x i i ] D E [.x i C u i /." i 2 u i /] D 2 2 u
OLS gives and we have plim P i.x i Nx/ O 2 b 2 D 2 C P.xi Nx/ 2 plim O 2 2 D plim 1 P T i.x i Nx/ plim 1 P T.xi Nx/ 2 we already have that 2 2 u goes into the numerator. The denominator is more work (like in the sim eq case) but intuitively it must boil down to the sum of the variances of xi and u i, hence plim ( O 2 2 / D 2 2 u Var.x i / C 2 u
plim O 2 D 2 1 C 2 u Var.x i / < 2 if 2 is positive. It can be shown that by taking the inverse regression, x i on y i, gives an overestimation, so OLS de nes a bound around the true parameter. Measurement errors in y i : No bias problem, but potential for heteroscedasticity. Solution to both classes of bias problems exempli ed here: Replace OLS with other estimators. IV, 2SLS as we shall see.