# Instrumental variables and panel data methods in economics and finance

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Instrumental variables and panel data methods in economics and finance Christopher F Baum Boston College and DIW Berlin February 2009 Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

2 Instrumental variables estimators Regression with Instrumental Variables What are instrumental variables (IV) methods? Most widely known as a solution to endogenous regressors: explanatory variables correlated with the regression error term, IV methods provide a way to nonetheless obtain consistent parameter estimates. Although IV estimators address issues of endogeneity, the violation of the zero conditional mean assumption caused by endogenous regressors can also arise for two other common causes: measurement error in regressors (errors-in-variables) and omitted-variable bias. The latter may arise in situations where a variable known to be relevant for the data generating process is not measurable, and no good proxies can be found. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

3 Instrumental variables estimators Regression with Instrumental Variables What are instrumental variables (IV) methods? Most widely known as a solution to endogenous regressors: explanatory variables correlated with the regression error term, IV methods provide a way to nonetheless obtain consistent parameter estimates. Although IV estimators address issues of endogeneity, the violation of the zero conditional mean assumption caused by endogenous regressors can also arise for two other common causes: measurement error in regressors (errors-in-variables) and omitted-variable bias. The latter may arise in situations where a variable known to be relevant for the data generating process is not measurable, and no good proxies can be found. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

4 Instrumental variables estimators First let us consider a path diagram illustrating the problem addressed by IV methods. We can use ordinary least squares (OLS) regression to consistently estimate a model of the following sort. Standard regression: y = xb + u no association between x and u; OLS consistent x y u Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

5 Instrumental variables estimators However, OLS regression breaks down in the following circumstance: Endogeneity: y = xb + u correlation between x and u; OLS inconsistent x y u The correlation between x and u (or the failure of the zero conditional mean assumption E[u x] = 0) can be caused by any of several factors. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

6 Instrumental variables estimators However, OLS regression breaks down in the following circumstance: Endogeneity: y = xb + u correlation between x and u; OLS inconsistent x y u The correlation between x and u (or the failure of the zero conditional mean assumption E[u x] = 0) can be caused by any of several factors. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

7 Endogeneity Instrumental variables estimators Endogeneity We have stated the problem as that of endogeneity: the notion that two or more variables are jointly determined in the behavioral model. This arises naturally in the context of a simultaneous equations model such as a supply-demand system in economics, in which price and quantity are jointly determined in the market for that good or service. A shock or disturbance to either supply or demand will affect both the equilibrium price and quantity in the market, so that by construction both variables are correlated with any shock to the system. OLS methods will yield inconsistent estimates of any regression including both price and quantity, however specified. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

8 Endogeneity Instrumental variables estimators Endogeneity We have stated the problem as that of endogeneity: the notion that two or more variables are jointly determined in the behavioral model. This arises naturally in the context of a simultaneous equations model such as a supply-demand system in economics, in which price and quantity are jointly determined in the market for that good or service. A shock or disturbance to either supply or demand will affect both the equilibrium price and quantity in the market, so that by construction both variables are correlated with any shock to the system. OLS methods will yield inconsistent estimates of any regression including both price and quantity, however specified. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

9 Instrumental variables estimators Endogeneity As a different example, consider a cross-sectional regression of public health outcomes (say, the proportion of the population in various cities suffering from a particular childhood disease) on public health expenditures per capita in each of those cities. We would hope to find that spending is effective in reducing incidence of the disease, but we also must consider the reverse causality in this relationship, where the level of expenditure is likely to be partially determined by the historical incidence of the disease in each jurisdiction. In this context, OLS estimates of the relationship will be biased even if additional controls are added to the specification. Although we may have no interest in modeling public health expenditures, we must be able to specify such an equation in order to identify the relationship of interest, as we discuss henceforth. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

10 Instrumental variables estimators Endogeneity As a different example, consider a cross-sectional regression of public health outcomes (say, the proportion of the population in various cities suffering from a particular childhood disease) on public health expenditures per capita in each of those cities. We would hope to find that spending is effective in reducing incidence of the disease, but we also must consider the reverse causality in this relationship, where the level of expenditure is likely to be partially determined by the historical incidence of the disease in each jurisdiction. In this context, OLS estimates of the relationship will be biased even if additional controls are added to the specification. Although we may have no interest in modeling public health expenditures, we must be able to specify such an equation in order to identify the relationship of interest, as we discuss henceforth. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

11 Instrumental variables estimators Endogeneity The solution provided by IV methods may be viewed as: Instrumental variables regression: y = xb + u z uncorrelated with u, correlated with x z x y u The additional variable z is termed an instrument for x. In general, we may have many variables in x, and more than one x correlated with u. In that case, we shall need at least that many variables in z. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

12 Instrumental variables estimators Choice of instruments Choice of instruments To deal with the problem of endogeneity in a supply-demand system, a candidate z will affect (e.g.) the quantity supplied of the good, but not directly impact the demand for the good. An example for an agricultural commodity might be temperature or rainfall: clearly exogenous to the market, but likely to be important in the production process. For the public health example, we might use per capita income in each city as an instrument or z variable. It is likely to influence public health expenditure, as cities with a larger tax base might be expected to spend more on all services, and will not be directly affected by the unobserved factors in the primary relationship. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

13 Instrumental variables estimators Choice of instruments Choice of instruments To deal with the problem of endogeneity in a supply-demand system, a candidate z will affect (e.g.) the quantity supplied of the good, but not directly impact the demand for the good. An example for an agricultural commodity might be temperature or rainfall: clearly exogenous to the market, but likely to be important in the production process. For the public health example, we might use per capita income in each city as an instrument or z variable. It is likely to influence public health expenditure, as cities with a larger tax base might be expected to spend more on all services, and will not be directly affected by the unobserved factors in the primary relationship. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

14 Instrumental variables estimators Choice of instruments But why should we not always use IV? It may be difficult to find variables that can serve as valid instruments. Many variables that have an effect on included endogenous variables also have a direct effect on the dependent variable. IV estimators are innately biased, and their finite-sample properties are often problematic. Thus, most of the justification for the use of IV is asymptotic. Performance in small samples may be poor. The precision of IV estimates is lower than that of OLS estimates (least squares is just that). In the presence of weak instruments (excluded instruments only weakly correlated with included endogenous regressors) the loss of precision will be severe, and IV estimates may be no improvement over OLS. This suggests we need a method to determine whether a particular regressor must be treated as endogenous. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

15 Instrumental variables estimators Choice of instruments But why should we not always use IV? It may be difficult to find variables that can serve as valid instruments. Many variables that have an effect on included endogenous variables also have a direct effect on the dependent variable. IV estimators are innately biased, and their finite-sample properties are often problematic. Thus, most of the justification for the use of IV is asymptotic. Performance in small samples may be poor. The precision of IV estimates is lower than that of OLS estimates (least squares is just that). In the presence of weak instruments (excluded instruments only weakly correlated with included endogenous regressors) the loss of precision will be severe, and IV estimates may be no improvement over OLS. This suggests we need a method to determine whether a particular regressor must be treated as endogenous. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

16 Instrumental variables estimators Choice of instruments But why should we not always use IV? It may be difficult to find variables that can serve as valid instruments. Many variables that have an effect on included endogenous variables also have a direct effect on the dependent variable. IV estimators are innately biased, and their finite-sample properties are often problematic. Thus, most of the justification for the use of IV is asymptotic. Performance in small samples may be poor. The precision of IV estimates is lower than that of OLS estimates (least squares is just that). In the presence of weak instruments (excluded instruments only weakly correlated with included endogenous regressors) the loss of precision will be severe, and IV estimates may be no improvement over OLS. This suggests we need a method to determine whether a particular regressor must be treated as endogenous. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

17 The IV-GMM estimator IV estimation as a GMM problem Before discussing further the motivation for various weak instrument diagnostics, we define the setting for IV estimation as a Generalized Method of Moments (GMM) optimization problem. Economists consider GMM to be the invention of Lars Hansen in his 1982 Econometrica paper, but as Alistair Hall points out in his 2005 book, the method has its antecedents in Karl Pearson s Method of Moments [MM] (1895) and Neyman and Egon Pearson s minimum Chi-squared estimator [MCE] (1928). Their MCE approach overcomes the difficulty with MM estimators when there are more moment conditions than parameters to be estimated. This was recognized by Ferguson (Ann. Math. Stat. 1958) for the case of i.i.d. errors, but his work had no impact on the econometric literature. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

18 The IV-GMM estimator The model: y = Xβ + u, u (0, Ω) X (N k). Define a matrix Z (N l) where l k. This is the Generalized Method of Moments IV (IV-GMM) estimator. The l instruments give rise to a set of l moments: g i (β) = Z i u i = Z i (y i x i β), i = 1, N where each g i is an l-vector. The method of moments approach considers each of the l moment equations as a sample moment, which we may estimate by averaging over N: ḡ(β) = 1 N N z i (y i x i β) = 1 N Z u i=1 The GMM approach chooses ˆβ to solve ḡ( ˆβ GMM ) = 0. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

19 The IV-GMM estimator Exact identification and 2SLS If l = k, the equation to be estimated is said to be exactly identified by the order condition for identification: that is, there are as many excluded instruments as included right-hand endogenous variables. The method of moments problem is then k equations in k unknowns, and a unique solution exists, equivalent to the standard IV estimator: ˆβ IV = (Z X) 1 Z y In the case of overidentification (l > k) we may define a set of k instruments: ˆX = Z (Z Z ) 1 Z X = P Z X which gives rise to the two-stage least squares (2SLS) estimator ˆβ 2SLS = ( ˆX X) 1 ˆX y = (X P Z X) 1 X P Z y which despite its name is computed by this single matrix equation. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

20 The IV-GMM estimator Exact identification and 2SLS If l = k, the equation to be estimated is said to be exactly identified by the order condition for identification: that is, there are as many excluded instruments as included right-hand endogenous variables. The method of moments problem is then k equations in k unknowns, and a unique solution exists, equivalent to the standard IV estimator: ˆβ IV = (Z X) 1 Z y In the case of overidentification (l > k) we may define a set of k instruments: ˆX = Z (Z Z ) 1 Z X = P Z X which gives rise to the two-stage least squares (2SLS) estimator ˆβ 2SLS = ( ˆX X) 1 ˆX y = (X P Z X) 1 X P Z y which despite its name is computed by this single matrix equation. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

21 The IV-GMM estimator The IV-GMM approach The IV-GMM approach In the 2SLS method with overidentification, the l available instruments are boiled down" to the k needed by defining the P Z matrix. In the IV-GMM approach, that reduction is not necessary. All l instruments are used in the estimator. Furthermore, a weighting matrix is employed so that we may choose ˆβ GMM so that the elements of ḡ( ˆβ GMM ) are as close to zero as possible. With l > k, not all l moment conditions can be exactly satisfied, so a criterion function that weights them appropriately is used to improve the efficiency of the estimator. The GMM estimator minimizes the criterion J( ˆβ GMM ) = N ḡ( ˆβ GMM ) W ḡ( ˆβ GMM ) where W is a l l symmetric weighting matrix. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

22 The IV-GMM estimator The IV-GMM approach The IV-GMM approach In the 2SLS method with overidentification, the l available instruments are boiled down" to the k needed by defining the P Z matrix. In the IV-GMM approach, that reduction is not necessary. All l instruments are used in the estimator. Furthermore, a weighting matrix is employed so that we may choose ˆβ GMM so that the elements of ḡ( ˆβ GMM ) are as close to zero as possible. With l > k, not all l moment conditions can be exactly satisfied, so a criterion function that weights them appropriately is used to improve the efficiency of the estimator. The GMM estimator minimizes the criterion J( ˆβ GMM ) = N ḡ( ˆβ GMM ) W ḡ( ˆβ GMM ) where W is a l l symmetric weighting matrix. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

23 The IV-GMM estimator The GMM weighting matrix Solving the set of FOCs, we derive the IV-GMM estimator of an overidentified equation: ˆβ GMM = (X ZWZ X) 1 X ZWZ y which will be identical for all W matrices which differ by a factor of proportionality. The optimal weighting matrix, as shown by Hansen (1982), chooses W = S 1 where S is the covariance matrix of the moment conditions to produce the most efficient estimator: S = E[Z uu Z ] = lim N N 1 [Z ΩZ ] With a consistent estimator of S derived from 2SLS residuals, we define the feasible IV-GMM estimator as ˆβ FEGMM = (X Z Ŝ 1 Z X) 1 X Z Ŝ 1 Z y where FEGMM refers to the feasible efficient GMM estimator. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

24 The IV-GMM estimator The GMM weighting matrix Solving the set of FOCs, we derive the IV-GMM estimator of an overidentified equation: ˆβ GMM = (X ZWZ X) 1 X ZWZ y which will be identical for all W matrices which differ by a factor of proportionality. The optimal weighting matrix, as shown by Hansen (1982), chooses W = S 1 where S is the covariance matrix of the moment conditions to produce the most efficient estimator: S = E[Z uu Z ] = lim N N 1 [Z ΩZ ] With a consistent estimator of S derived from 2SLS residuals, we define the feasible IV-GMM estimator as ˆβ FEGMM = (X Z Ŝ 1 Z X) 1 X Z Ŝ 1 Z y where FEGMM refers to the feasible efficient GMM estimator. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

25 The IV-GMM estimator IV-GMM and the distribution of u The derivation makes no mention of the form of Ω, the variance-covariance matrix (vce) of the error process u. If the errors satisfy all classical assumptions are i.i.d., S = σ 2 ui N and the optimal weighting matrix is proportional to the identity matrix. The IV-GMM estimator is merely the standard IV (or 2SLS) estimator. If there is heteroskedasticity of unknown form, we usually compute robust standard errors in any Stata estimation command to derive a consistent estimate of the vce. In this context, Ŝ = 1 N N i=1 û 2 i Z i Z i where û is the vector of residuals from any consistent estimator of β (e.g., the 2SLS residuals). For an overidentified equation, the IV-GMM estimates computed from this estimate of S will be more efficient than 2SLS estimates. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

26 The IV-GMM estimator IV-GMM and the distribution of u The derivation makes no mention of the form of Ω, the variance-covariance matrix (vce) of the error process u. If the errors satisfy all classical assumptions are i.i.d., S = σ 2 ui N and the optimal weighting matrix is proportional to the identity matrix. The IV-GMM estimator is merely the standard IV (or 2SLS) estimator. If there is heteroskedasticity of unknown form, we usually compute robust standard errors in any Stata estimation command to derive a consistent estimate of the vce. In this context, Ŝ = 1 N N i=1 û 2 i Z i Z i where û is the vector of residuals from any consistent estimator of β (e.g., the 2SLS residuals). For an overidentified equation, the IV-GMM estimates computed from this estimate of S will be more efficient than 2SLS estimates. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

27 The IV-GMM estimator IV-GMM and the distribution of u We must distinguish the concept of IV/2SLS estimation with robust standard errors from the concept of estimating the same equation with IV-GMM, allowing for arbitrary heteroskedasticity. Compare an overidentified regression model estimated (a) with IV and classical standard errors and (b) with robust standard errors. Model (b) will produce the same point estimates, but different standard errors in the presence of heteroskedastic errors. However, if we reestimate that overidentified model using the GMM two-step estimator, we will get different point estimates because we are solving a different optimization problem: one in the l-space of the instruments (and moment conditions) rather than the k-space of the regressors, and l > k. We will also get different standard errors, and in general smaller standard errors as the IV-GMM estimator is more efficient. This does not imply, however, that summary measures of fit will improve. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

28 The IV-GMM estimator IV-GMM and the distribution of u We must distinguish the concept of IV/2SLS estimation with robust standard errors from the concept of estimating the same equation with IV-GMM, allowing for arbitrary heteroskedasticity. Compare an overidentified regression model estimated (a) with IV and classical standard errors and (b) with robust standard errors. Model (b) will produce the same point estimates, but different standard errors in the presence of heteroskedastic errors. However, if we reestimate that overidentified model using the GMM two-step estimator, we will get different point estimates because we are solving a different optimization problem: one in the l-space of the instruments (and moment conditions) rather than the k-space of the regressors, and l > k. We will also get different standard errors, and in general smaller standard errors as the IV-GMM estimator is more efficient. This does not imply, however, that summary measures of fit will improve. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

29 The IV-GMM estimator IV-GMM cluster-robust estimates If errors are considered to exhibit arbitrary intra-cluster correlation in a dataset with M clusters, we may derive a cluster-robust IV-GMM estimator using M Ŝ = where j=1 û j ûj û j = (y j x j ˆβ)X Z (Z Z ) 1 z j The IV-GMM estimates employing this estimate of S will be both robust to arbitrary heteroskedasticity and intra-cluster correlation, equivalent to estimates generated by Stata s cluster(varname) option. For an overidentified equation, IV-GMM cluster-robust estimates will be more efficient than 2SLS estimates. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

30 The IV-GMM estimator IV-GMM cluster-robust estimates The concept of the cluster-robust covariance matrix has been extended by Cameron, Gelbach and Miller (NBER TWP327, 2006) and Thompson (SSRN, 2009) to define two-way clustering. This allows for arbitrary within-cluster correlation in two cluster dimensions. For instance, in a panel dataset, we may want to allow observations belonging to each individual unit to be arbitrarily correlated, and we may want to allow observations coming from a particular time period to be arbitrarily correlated. Heretofore, a common tactic involved (one-way) clustering by unit (e.g., firm), and the introduction of a set of time (e.g., year) dummies. Although the latter account for any macro-level heterogeneity, they do not relax the assumption of independence across units at each point in time, which may be highly unlikely. Thus, it may be beneficial to utilize two-way clustering in estimating the covariance matrix. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

31 The IV-GMM estimator IV-GMM cluster-robust estimates The concept of the cluster-robust covariance matrix has been extended by Cameron, Gelbach and Miller (NBER TWP327, 2006) and Thompson (SSRN, 2009) to define two-way clustering. This allows for arbitrary within-cluster correlation in two cluster dimensions. For instance, in a panel dataset, we may want to allow observations belonging to each individual unit to be arbitrarily correlated, and we may want to allow observations coming from a particular time period to be arbitrarily correlated. Heretofore, a common tactic involved (one-way) clustering by unit (e.g., firm), and the introduction of a set of time (e.g., year) dummies. Although the latter account for any macro-level heterogeneity, they do not relax the assumption of independence across units at each point in time, which may be highly unlikely. Thus, it may be beneficial to utilize two-way clustering in estimating the covariance matrix. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

32 The IV-GMM estimator IV-GMM HAC estimates The IV-GMM approach may also be used to generate HAC standard errors: those robust to arbitrary heteroskedasticity and autocorrelation. Although the best-known HAC approach in econometrics is that of Newey and West, using the Bartlett kernel (per Stata s newey), that is only one choice of a HAC estimator that may be applied to an IV-GMM problem. Baum Schaffer Stillman s ivreg2 (from the SSC Archive) and Stata 10 s ivregress provide several choices for kernels. For some kernels, the kernel bandwidth (roughly, number of lags employed) may be chosen automatically in either command. The HAC options may be combined with one- or two-way clustering. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

33 The IV-GMM estimator IV-GMM HAC estimates The IV-GMM approach may also be used to generate HAC standard errors: those robust to arbitrary heteroskedasticity and autocorrelation. Although the best-known HAC approach in econometrics is that of Newey and West, using the Bartlett kernel (per Stata s newey), that is only one choice of a HAC estimator that may be applied to an IV-GMM problem. Baum Schaffer Stillman s ivreg2 (from the SSC Archive) and Stata 10 s ivregress provide several choices for kernels. For some kernels, the kernel bandwidth (roughly, number of lags employed) may be chosen automatically in either command. The HAC options may be combined with one- or two-way clustering. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

34 The IV-GMM estimator The ivreg2 command Implementation in Stata The estimators we have discussed are available from Baum, Schaffer and Stillman s ivreg2 package, revised February 2010 (ssc describe ivreg2). The ivreg2 command has the same basic syntax as Stata s older ivreg command: ivreg2 depvar [varlist1] (varlist2=instlist) /// [if] [in] [, options] The l variables in varlist1 and instlist comprise Z, the matrix of instruments. The k variables in varlist1 and varlist2 comprise X. Both matrices by default include a units vector. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

35 The IV-GMM estimator The ivreg2 command Implementation in Stata The estimators we have discussed are available from Baum, Schaffer and Stillman s ivreg2 package, revised February 2010 (ssc describe ivreg2). The ivreg2 command has the same basic syntax as Stata s older ivreg command: ivreg2 depvar [varlist1] (varlist2=instlist) /// [if] [in] [, options] The l variables in varlist1 and instlist comprise Z, the matrix of instruments. The k variables in varlist1 and varlist2 comprise X. Both matrices by default include a units vector. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

36 The IV-GMM estimator ivreg2 options By default ivreg2 estimates the IV estimator, or 2SLS estimator if l > k. If the gmm2s option is specified in conjunction with robust, cluster() or bw(), it estimates the IV-GMM estimator. With the robust option, the vce is heteroskedasticity-robust. With the cluster(varname) or cluster(varname1 varname2) option, the vce is cluster-robust. With the robust and bw( ) options, the vce is HAC with the default Bartlett ( Newey West ) kernel. Other kernel( ) choices lead to alternative HAC estimators. In ivreg2, both robust and bw( ) options must be specified to produce HAC. Estimates produced with bw( ) alone are robust to arbitrary autocorrelation but assume homoskedasticity. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

37 The IV-GMM estimator ivreg2 options By default ivreg2 estimates the IV estimator, or 2SLS estimator if l > k. If the gmm2s option is specified in conjunction with robust, cluster() or bw(), it estimates the IV-GMM estimator. With the robust option, the vce is heteroskedasticity-robust. With the cluster(varname) or cluster(varname1 varname2) option, the vce is cluster-robust. With the robust and bw( ) options, the vce is HAC with the default Bartlett ( Newey West ) kernel. Other kernel( ) choices lead to alternative HAC estimators. In ivreg2, both robust and bw( ) options must be specified to produce HAC. Estimates produced with bw( ) alone are robust to arbitrary autocorrelation but assume homoskedasticity. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

38 The IV-GMM estimator ivreg2 options By default ivreg2 estimates the IV estimator, or 2SLS estimator if l > k. If the gmm2s option is specified in conjunction with robust, cluster() or bw(), it estimates the IV-GMM estimator. With the robust option, the vce is heteroskedasticity-robust. With the cluster(varname) or cluster(varname1 varname2) option, the vce is cluster-robust. With the robust and bw( ) options, the vce is HAC with the default Bartlett ( Newey West ) kernel. Other kernel( ) choices lead to alternative HAC estimators. In ivreg2, both robust and bw( ) options must be specified to produce HAC. Estimates produced with bw( ) alone are robust to arbitrary autocorrelation but assume homoskedasticity. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

39 The IV-GMM estimator Example of IV and IV-GMM estimation Example of IV and IV-GMM estimation We illustrate with a wage equation estimated from the Griliches dataset (griliches76) of very young men s wages. Their log(wage) is explained by completed years of schooling, experience, job tenure and IQ score. The IQ variable is considered endogenous, and instrumented with three factors: their mother s level of education (med), their score on a standardized test (kww) and their age. The estimation in ivreg2 is performed with ivreg2 lw s expr tenure (iq = med kww age) where the parenthesized expression defines the included endogenous and excluded exogenous variables. You could also use official Stata s ivregress 2sls. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

40 The IV-GMM estimator Example of IV and IV-GMM estimation. esttab, label stat(rmse) mtitles(iv IVrob IVGMMrob) nonum IV IVrob IVGMMrob iq score (-1.06) (-1.01) (-1.34) completed years of ~ g 0.122*** 0.122*** 0.128*** (7.68) (7.51) (7.88) experience, years *** *** *** (5.15) (5.10) (5.26) tenure, years *** *** *** (4.78) (4.51) (4.96) Constant 4.441*** 4.441*** 4.523*** (14.22) (13.21) (13.46) rmse t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001 Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

41 The IV-GMM estimator Example of IV and IV-GMM estimation These three columns compare standard IV (2SLS) estimates, IV with robust standard errors, and IV-GMM with robust standard errors, respectively. Notice that the coefficients point estimates change when IV-GMM is employed, and that their t-statistics are larger than those of robust IV. Note also that the IQ score is not significant in any of these models. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

42 Tests of overidentifying restrictions Tests of overidentifying restrictions If and only if an equation is overidentified, we may test whether the excluded instruments are appropriately independent of the error process. That test should always be performed when it is possible to do so, as it allows us to evaluate the validity of the instruments. A test of overidentifying restrictions regresses the residuals from an IV or 2SLS regression on all instruments in Z. Under the null hypothesis that all instruments are uncorrelated with u, the test has a large-sample χ 2 (r) distribution where r is the number of overidentifying restrictions. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

43 Tests of overidentifying restrictions Tests of overidentifying restrictions If and only if an equation is overidentified, we may test whether the excluded instruments are appropriately independent of the error process. That test should always be performed when it is possible to do so, as it allows us to evaluate the validity of the instruments. A test of overidentifying restrictions regresses the residuals from an IV or 2SLS regression on all instruments in Z. Under the null hypothesis that all instruments are uncorrelated with u, the test has a large-sample χ 2 (r) distribution where r is the number of overidentifying restrictions. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

44 Tests of overidentifying restrictions Under the assumption of i.i.d. errors, this is known as a Sargan test, and is routinely produced by ivreg2 for IV and 2SLS estimates. It can also be calculated after ivreg estimation with the overid command, which is part of the ivreg2 suite. After ivregress, the command estat overid provides the test. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

45 Tests of overidentifying restrictions If we have used IV-GMM estimation in ivreg2, the test of overidentifying restrictions becomes J: the GMM criterion function. Although J will be identically zero for any exactly-identified equation, it will be positive for an overidentified equation. If it is too large, doubt is cast on the satisfaction of the moment conditions underlying GMM. The test in this context is known as the Hansen test or J test, and is calculated by ivreg2 when the gmm2s option is employed. The Sargan Hansen test of overidentifying restrictions should be performed routinely in any overidentified model estimated with instrumental variables techniques. Instrumental variables techniques are powerful, but if a strong rejection of the null hypothesis of the Sargan Hansen test is encountered, you should strongly doubt the validity of the estimates. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

46 Tests of overidentifying restrictions If we have used IV-GMM estimation in ivreg2, the test of overidentifying restrictions becomes J: the GMM criterion function. Although J will be identically zero for any exactly-identified equation, it will be positive for an overidentified equation. If it is too large, doubt is cast on the satisfaction of the moment conditions underlying GMM. The test in this context is known as the Hansen test or J test, and is calculated by ivreg2 when the gmm2s option is employed. The Sargan Hansen test of overidentifying restrictions should be performed routinely in any overidentified model estimated with instrumental variables techniques. Instrumental variables techniques are powerful, but if a strong rejection of the null hypothesis of the Sargan Hansen test is encountered, you should strongly doubt the validity of the estimates. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

47 Tests of overidentifying restrictions For instance, let s rerun the last IV-GMM model we estimated and focus on the test of overidentifying restrictions provided by the Hansen J statistic. The model is overidentified by two degrees of freedom, as there is one endogenous regressor and three excluded instruments. We see that the J statistic strongly rejects its null, casting doubts on the quality of these estimates. Let s reestimate the model excluding age from the instrument list and see what happens. We will see that the sign and significance of the key endogenous regressor changes as we respecify the instrument list, and the p-value of the J statistic becomes large when age is excluded. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

48 Tests of overidentifying restrictions For instance, let s rerun the last IV-GMM model we estimated and focus on the test of overidentifying restrictions provided by the Hansen J statistic. The model is overidentified by two degrees of freedom, as there is one endogenous regressor and three excluded instruments. We see that the J statistic strongly rejects its null, casting doubts on the quality of these estimates. Let s reestimate the model excluding age from the instrument list and see what happens. We will see that the sign and significance of the key endogenous regressor changes as we respecify the instrument list, and the p-value of the J statistic becomes large when age is excluded. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

49 Tests of overidentifying restrictions Example: Test of overidentifying restrictions. esttab, label stat(j jdf jp) mtitles(age no_age) nonum age no_age iq score ** (-1.34) (2.97) completed years of ~ g 0.128*** ** (7.88) (2.63) experience, years *** *** (5.26) (5.58) tenure, years *** *** (4.96) (3.48) Constant 4.523*** 2.989*** (13.46) (7.58) j jdf 2 1 jp 1.50e t statistics in parentheses * p<0.05, ** p<0.01, *** p< sjlog close Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

50 Tests of overidentifying restrictions Testing a subset of overidentifying restrictions We may be quite confident of some instruments independence from u but concerned about others. In that case a GMM distance or C test may be used. The orthog( ) option of ivreg2 tests whether a subset of the model s overidentifying restrictions appear to be satisfied. This is carried out by calculating two Sargan Hansen statistics: one for the full model and a second for the model in which the listed variables are (a) considered endogenous, if included regressors, or (b) dropped, if excluded regressors. In case (a), the model must still satisfy the order condition for identification. The difference of the two Sargan Hansen statistics, often termed the GMM distance or C statistic, will be distributed χ 2 under the null hypothesis that the specified orthogonality conditions are satisfied, with d.f. equal to the number of those conditions. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

51 Tests of overidentifying restrictions Testing a subset of overidentifying restrictions We may be quite confident of some instruments independence from u but concerned about others. In that case a GMM distance or C test may be used. The orthog( ) option of ivreg2 tests whether a subset of the model s overidentifying restrictions appear to be satisfied. This is carried out by calculating two Sargan Hansen statistics: one for the full model and a second for the model in which the listed variables are (a) considered endogenous, if included regressors, or (b) dropped, if excluded regressors. In case (a), the model must still satisfy the order condition for identification. The difference of the two Sargan Hansen statistics, often termed the GMM distance or C statistic, will be distributed χ 2 under the null hypothesis that the specified orthogonality conditions are satisfied, with d.f. equal to the number of those conditions. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

52 Tests of overidentifying restrictions Testing a subset of overidentifying restrictions A variant on this strategy is implemented by the endog( ) option of ivreg2, in which one or more variables considered endogenous can be tested for exogeneity. The C test in this case will consider whether the null hypothesis of their exogeneity is supported by the data. If all endogenous regressors are included in the endog( ) option, the test is essentially a test of whether IV methods are required to estimate the equation. If OLS estimates of the equation are consistent, they should be preferred. In this context, the test is equivalent to a Hausman test comparing IV and OLS estimates, as implemented by Stata s hausman command with the sigmaless option. Using ivreg2, you need not estimate and store both models to generate the test s verdict. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

53 Tests of overidentifying restrictions Testing a subset of overidentifying restrictions A variant on this strategy is implemented by the endog( ) option of ivreg2, in which one or more variables considered endogenous can be tested for exogeneity. The C test in this case will consider whether the null hypothesis of their exogeneity is supported by the data. If all endogenous regressors are included in the endog( ) option, the test is essentially a test of whether IV methods are required to estimate the equation. If OLS estimates of the equation are consistent, they should be preferred. In this context, the test is equivalent to a Hausman test comparing IV and OLS estimates, as implemented by Stata s hausman command with the sigmaless option. Using ivreg2, you need not estimate and store both models to generate the test s verdict. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

54 Tests of overidentifying restrictions Testing a subset of overidentifying restrictions For instance, with the model above, we might question whether IV techniques are needed. We can conduct the C test via: ivreg2 lw s expr tenure (iq=med kww), gmm2s robust endog(iq) where the endog(iq) option tests the null hypothesis that iq is properly exogenous in this model. The test statistic has a p-value of , suggesting that the data overwhelmingly reject the use of OLS in favor of IV. At the same time, the J statistic (with a p-value of 0.60) indicates that the overidentifying restrictions are not rejected. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

55 Testing for weak instruments The weak instruments problem Instrumental variables methods rely on two assumptions: the excluded instruments are distributed independently of the error process, and they are sufficiently correlated with the included endogenous regressors. Tests of overidentifying restrictions address the first assumption, although we should note that a rejection of their null may be indicative that the exclusion restrictions for these instruments may be inappropriate. That is, some of the instruments have been improperly excluded from the regression model s specification. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

56 Testing for weak instruments The specification of an instrumental variables model asserts that the excluded instruments affect the dependent variable only indirectly, through their correlations with the included endogenous variables. If an excluded instrument exerts both direct and indirect influences on the dependent variable, the exclusion restriction should be rejected. This can be readily tested by including the variable as a regressor. In our earlier example we saw that including age in the excluded instruments list caused a rejection of the J test. We had assumed that age could be treated as excluded from the model. Is that assumption warranted? If age is entered as a regressor, it has a t-statistic over 8. Thus, its rejection as an excluded instrument may well reflect the misspecification of the equation, omitting age. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

57 Testing for weak instruments The specification of an instrumental variables model asserts that the excluded instruments affect the dependent variable only indirectly, through their correlations with the included endogenous variables. If an excluded instrument exerts both direct and indirect influences on the dependent variable, the exclusion restriction should be rejected. This can be readily tested by including the variable as a regressor. In our earlier example we saw that including age in the excluded instruments list caused a rejection of the J test. We had assumed that age could be treated as excluded from the model. Is that assumption warranted? If age is entered as a regressor, it has a t-statistic over 8. Thus, its rejection as an excluded instrument may well reflect the misspecification of the equation, omitting age. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

58 Testing for weak instruments The specification of an instrumental variables model asserts that the excluded instruments affect the dependent variable only indirectly, through their correlations with the included endogenous variables. If an excluded instrument exerts both direct and indirect influences on the dependent variable, the exclusion restriction should be rejected. This can be readily tested by including the variable as a regressor. In our earlier example we saw that including age in the excluded instruments list caused a rejection of the J test. We had assumed that age could be treated as excluded from the model. Is that assumption warranted? If age is entered as a regressor, it has a t-statistic over 8. Thus, its rejection as an excluded instrument may well reflect the misspecification of the equation, omitting age. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

59 Testing for weak instruments To test the second assumption that the excluded instruments are sufficiently correlated with the included endogenous regressors we should consider the goodness-of-fit of the first stage regressions relating each endogenous regressor to the entire set of instruments. It is important to understand that the theory of single-equation ( limited information ) IV estimation requires that all columns of X are conceptually regressed on all columns of Z in the calculation of the estimates. We cannot meaningfully speak of this variable is an instrument for that regressor or somehow restrict which instruments enter which first-stage regressions. Stata s ivregress or ivreg2 will not let you do that because such restrictions only make sense in the context of estimating an entire system of equations by full-information methods (for instance, with reg3). Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

60 Testing for weak instruments To test the second assumption that the excluded instruments are sufficiently correlated with the included endogenous regressors we should consider the goodness-of-fit of the first stage regressions relating each endogenous regressor to the entire set of instruments. It is important to understand that the theory of single-equation ( limited information ) IV estimation requires that all columns of X are conceptually regressed on all columns of Z in the calculation of the estimates. We cannot meaningfully speak of this variable is an instrument for that regressor or somehow restrict which instruments enter which first-stage regressions. Stata s ivregress or ivreg2 will not let you do that because such restrictions only make sense in the context of estimating an entire system of equations by full-information methods (for instance, with reg3). Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

61 Testing for weak instruments The first and ffirst options of ivreg2 present several useful diagnostics that assess the first-stage regressions. If there is a single endogenous regressor, these issues are simplified, as the instruments either explain a reasonable fraction of that regressor s variability or not. With multiple endogenous regressors, diagnostics are more complicated, as each instrument is being called upon to play a role in each first-stage regression. With sufficiently weak instruments, the asymptotic identification status of the equation is called into question. An equation identified by the order and rank conditions in a finite sample may still be effectively unidentified. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

62 Testing for weak instruments The first and ffirst options of ivreg2 present several useful diagnostics that assess the first-stage regressions. If there is a single endogenous regressor, these issues are simplified, as the instruments either explain a reasonable fraction of that regressor s variability or not. With multiple endogenous regressors, diagnostics are more complicated, as each instrument is being called upon to play a role in each first-stage regression. With sufficiently weak instruments, the asymptotic identification status of the equation is called into question. An equation identified by the order and rank conditions in a finite sample may still be effectively unidentified. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

63 Testing for weak instruments As Staiger and Stock (Econometrica, 1997) show, the weak instruments problem can arise even when the first-stage t- and F-tests are significant at conventional levels in a large sample. In the worst case, the bias of the IV estimator is the same as that of OLS, IV becomes inconsistent, and instrumenting only aggravates the problem. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

64 Testing for weak instruments Beyond the informal rule-of-thumb diagnostics such as F > 10, ivreg2 computes several statistics that can be used to critically evaluate the strength of instruments. We can write the first-stage regressions as X = Z Π + v With X 1 as the endogenous regressors, Z 1 the excluded instruments and Z 2 as the included instruments, this can be partitioned as X 1 = [Z 1 Z 2 ] [Π 11 Π 12 ] + v 1 The rank condition for identification states that the L K 1 matrix Π 11 must be of full column rank. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

65 Testing for weak instruments The Anderson canonical correlation statistic We do not observe the true Π 11, so we must replace it with an estimate. Anderson s (John Wiley, 1984) approach to testing the rank of this matrix (or that of the full Π matrix) considers the canonical correlations of the X and Z matrices. If the equation is to be identified, all K of the canonical correlations will be significantly different from zero. The squared canonical correlations can be expressed as eigenvalues of a matrix. Anderson s CC test considers the null hypothesis that the minimum canonical correlation is zero. Under the null, the test statistic is distributed χ 2 with (L K + 1) d.f., so it may be calculated even for an exactly-identified equation. Failure to reject the null suggests the equation is unidentified. ivreg2 reports this Lagrange Multiplier (LM) statistic. Christopher F Baum (Boston College) IVs and Panel Data Feb / 43

### Using instrumental variables techniques in economics and finance

Using instrumental variables techniques in economics and finance Christopher F Baum 1 Boston College and DIW Berlin German Stata Users Group Meeting, Berlin, June 2008 1 Thanks to Mark Schaffer for a number

### Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

### Instrumental Variables Regression. Instrumental Variables (IV) estimation is used when the model has endogenous s.

Instrumental Variables Regression Instrumental Variables (IV) estimation is used when the model has endogenous s. IV can thus be used to address the following important threats to internal validity: Omitted

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### Evaluating one-way and two-way cluster-robust covariance matrix estimates

Evaluating one-way and two-way cluster-robust covariance matrix estimates Christopher F Baum 1 Austin Nichols 2 Mark E Schaffer 3 1 Boston College and DIW Berlin 2 Urban Institute 3 Heriot Watt University

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

### 1 The Problem: Endogeneity There are two kinds of variables in our models: exogenous variables and endogenous variables. Endogenous Variables: These a

Notes on Simultaneous Equations and Two Stage Least Squares Estimates Copyright - Jonathan Nagler; April 19, 1999 1. Basic Description of 2SLS ffl The endogeneity problem, and the bias of OLS. ffl The

### Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Weak instruments: An overview and new techniques

Weak instruments: An overview and new techniques July 24, 2006 Overview of IV IV Methods and Formulae IV Assumptions and Problems Why Use IV? Instrumental variables, often abbreviated IV, refers to an

### Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

### Dynamic Panel Data estimators

Dynamic Panel Data estimators Christopher F Baum EC 823: Applied Econometrics Boston College, Spring 2013 Christopher F Baum (BC / DIW) Dynamic Panel Data estimators Boston College, Spring 2013 1 / 50

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### Testing for serial correlation in linear panel-data models

The Stata Journal (2003) 3, Number 2, pp. 168 177 Testing for serial correlation in linear panel-data models David M. Drukker Stata Corporation Abstract. Because serial correlation in linear panel-data

### Econometric analysis of the Belgian car market

Econometric analysis of the Belgian car market By: Prof. dr. D. Czarnitzki/ Ms. Céline Arts Tim Verheyden Introduction In contrast to typical examples from microeconomics textbooks on homogeneous goods

### Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially

### Solución del Examen Tipo: 1

Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Enhanced routines for instrumental variables/gmm estimation and testing

The Stata Journal (yyyy) vv, Number ii, pp. 1 38 Enhanced routines for instrumental variables/gmm estimation and testing Christopher F. Baum Mark E. Schaffer Boston College Heriot Watt University Steven

### Clustering in the Linear Model

Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple

### Cointegration. Basic Ideas and Key results. Egon Zakrajšek Division of Monetary Affairs Federal Reserve Board

Cointegration Basic Ideas and Key results Egon Zakrajšek Division of Monetary Affairs Federal Reserve Board Summer School in Financial Mathematics Faculty of Mathematics & Physics University of Ljubljana

### INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

### Employer-Provided Health Insurance and Labor Supply of Married Women

Upjohn Institute Working Papers Upjohn Research home page 2011 Employer-Provided Health Insurance and Labor Supply of Married Women Merve Cebi University of Massachusetts - Dartmouth and W.E. Upjohn Institute

### Estimation and Inference in Cointegration Models Economics 582

Estimation and Inference in Cointegration Models Economics 582 Eric Zivot May 17, 2012 Tests for Cointegration Let the ( 1) vector Y be (1). Recall, Y is cointegrated with 0 cointegrating vectors if there

### Note 2 to Computer class: Standard mis-specification tests

Note 2 to Computer class: Standard mis-specification tests Ragnar Nymoen September 2, 2013 1 Why mis-specification testing of econometric models? As econometricians we must relate to the fact that the

### SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

### Panel Data Analysis in Stata

Panel Data Analysis in Stata Anton Parlow Lab session Econ710 UWM Econ Department??/??/2010 or in a S-Bahn in Berlin, you never know.. Our plan Introduction to Panel data Fixed vs. Random effects Testing

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

### 1 Demand Estimation. Empirical Problem Set # 2 Graduate Industrial Organization, Fall 2005 Glenn Ellison and Stephen Ryan

Empirical Problem Set # 2 Graduate Industrial Organization, Fall 2005 Glenn Ellison and Stephen Ryan 1 Demand Estimation The intent of this problem set is to get you familiar with Stata, if you are not

### 2. What are the theoretical and practical consequences of autocorrelation?

Lecture 10 Serial Correlation In this lecture, you will learn the following: 1. What is the nature of autocorrelation? 2. What are the theoretical and practical consequences of autocorrelation? 3. Since

### S6: Spatial regression models: : OLS estimation and testing

: Spatial regression models: OLS estimation and testing Course on Spatial Econometrics with Applications Profesora: Coro Chasco Yrigoyen Universidad Autónoma de Madrid Lugar: Universidad Politécnica de

### Clustered Errors in Stata

10 Sept 2007 Clustering of Errors Cluster-Robust Standard Errors More Dimensions A Seemingly Unrelated Topic Clustered Errors Suppose we have a regression model like Y it = X it β + u i + e it where the

### Powerful new tools for time series analysis

Christopher F Baum Boston College & DIW August 2007 hristopher F Baum ( Boston College & DIW) NASUG2007 1 / 26 Introduction This presentation discusses two recent developments in time series analysis by

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### If the model with two independent variables is the true model, the error term of the model with one independent variable can be expressed as follows:

Omitted Variable Bias Model with one independent variable: Model with two independent variables: If the model with two independent variables is the true model, the error term of the model with one independent

### ESTIMATING AN ECONOMIC MODEL OF CRIME USING PANEL DATA FROM NORTH CAROLINA BADI H. BALTAGI*

JOURNAL OF APPLIED ECONOMETRICS J. Appl. Econ. 21: 543 547 (2006) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/jae.861 ESTIMATING AN ECONOMIC MODEL OF CRIME USING PANEL

### 1. THE LINEAR MODEL WITH CLUSTER EFFECTS

What s New in Econometrics? NBER, Summer 2007 Lecture 8, Tuesday, July 31st, 2.00-3.00 pm Cluster and Stratified Sampling These notes consider estimation and inference with cluster samples and samples

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables We often consider relationships between observed outcomes

### Online Appendices to the Corporate Propensity to Save

Online Appendices to the Corporate Propensity to Save Appendix A: Monte Carlo Experiments In order to allay skepticism of empirical results that have been produced by unusual estimators on fairly small

### Correlated Random Effects Panel Data Models

INTRODUCTION AND LINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. The Linear

### The following postestimation commands for time series are available for regress:

Title stata.com regress postestimation time series Postestimation tools for regress with time series Description Syntax for estat archlm Options for estat archlm Syntax for estat bgodfrey Options for estat

### Hedonism example. Our questions in the last session. Our questions in this session

Random Slope Models Hedonism example Our questions in the last session Do differences between countries in hedonism remain after controlling for individual age? How much of the variation in hedonism is

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

REPUBLIC OF SOUTH AFRICA GOVERNMENT-WIDE MONITORING & IMPACT EVALUATION SEMINAR IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD SHAHID KHANDKER World Bank June 2006 ORGANIZED BY THE WORLD BANK AFRICA IMPACT

### Chapter 9 Assessing Studies Based on Multiple Regression

Chapter 9 Assessing Studies Based on Multiple Regression Solutions to Empirical Exercises 1. Age 0.439** (0.030) Age 2 Data from 2004 (1) (2) (3) (4) (5) (6) (7) (8) Dependent Variable AHE ln(ahe) ln(ahe)

### 171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

### Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable

### Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

### ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

### Estimating the random coefficients logit model of demand using aggregate data

Estimating the random coefficients logit model of demand using aggregate data David Vincent Deloitte Economic Consulting London, UK davivincent@deloitte.co.uk September 14, 2012 Introduction Estimation

### Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans Université d Orléans April 2010 Introduction De nition We now consider

### 1 Teaching notes on GMM 1.

Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

### Accounting for Time-Varying Unobserved Ability Heterogeneity within Education Production Functions

Accounting for Time-Varying Unobserved Ability Heterogeneity within Education Production Functions Weili Ding Queen s University Steven F. Lehrer Queen s University and NBER July 2008 Abstract Traditional

### Regression Analysis (Spring, 2000)

Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity

### Panel Data: Linear Models

Panel Data: Linear Models Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Panel Data: Linear Models 1 / 45 Introduction Outline What

### CHAPTER 6. SIMULTANEOUS EQUATIONS

Economics 24B Daniel McFadden 1999 1. INTRODUCTION CHAPTER 6. SIMULTANEOUS EQUATIONS Economic systems are usually described in terms of the behavior of various economic agents, and the equilibrium that

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.

The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables Kathleen M. Lang* Boston College and Peter Gottschalk Boston College Abstract We derive the efficiency loss

### Regression analysis in practice with GRETL

Regression analysis in practice with GRETL Prerequisites You will need the GNU econometrics software GRETL installed on your computer (http://gretl.sourceforge.net/), together with the sample files that

### ill-defined statistical test for fractional integration parameter

ill-defined statistical test for fractional integration parameter Ahmed BENSALMA 1 1 ENSSEA Pôle universitaire Koléa, Tipaza, Algeria (ex/ INPS 11 Chemin Doudou mokhtar Ben-aknoun Alger) Email: bensalma.ahmed@gmail.com

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Introduction to Regression Models for Panel Data Analysis. Indiana University Workshop in Methods October 7, 2011. Professor Patricia A.

Introduction to Regression Models for Panel Data Analysis Indiana University Workshop in Methods October 7, 2011 Professor Patricia A. McManus Panel Data Analysis October 2011 What are Panel Data? Panel

### Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

### An Introduction to Path Analysis. nach 3

An Introduction to Path Analysis Developed by Sewall Wright, path analysis is a method employed to determine whether or not a multivariate set of nonexperimental data fits well with a particular (a priori)

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Econ 371 Problem Set #3 Answer Sheet

Econ 371 Problem Set #3 Answer Sheet 4.3 In this question, you are told that a OLS regression analysis of average weekly earnings yields the following estimated model. AW E = 696.7 + 9.6 Age, R 2 = 0.023,

### Performing Unit Root Tests in EViews. Unit Root Testing

Página 1 de 12 Unit Root Testing The theory behind ARMA estimation is based on stationary time series. A series is said to be (weakly or covariance) stationary if the mean and autocovariances of the series

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA ABSTRACT The decision of whether to use PLS instead of a covariance

### Multivariate Analysis of Variance (MANOVA)

Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

### Working Paper Number 103 December 2006

Working Paper Number 103 December 2006 How to Do xtabond2: An Introduction to Difference and System GMM in Stata By David Roodman Abstract The Arellano-Bond (1991) and Arellano-Bover (1995)/Blundell-Bond

### Statistical Significance and Bivariate Tests

Statistical Significance and Bivariate Tests BUS 735: Business Decision Making and Research 1 1.1 Goals Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions,

### Implementations of tests on the exogeneity of selected. variables and their Performance in practice ACADEMISCH PROEFSCHRIFT

Implementations of tests on the exogeneity of selected variables and their Performance in practice ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag

### The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information

Chapter 8 The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information An important new development that we encounter in this chapter is using the F- distribution to simultaneously

### Online Appendix Assessing the Incidence and Efficiency of a Prominent Place Based Policy

Online Appendix Assessing the Incidence and Efficiency of a Prominent Place Based Policy By MATIAS BUSSO, JESSE GREGORY, AND PATRICK KLINE This document is a Supplemental Online Appendix of Assessing the

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### Lecture 5 Hypothesis Testing in Multiple Linear Regression

Lecture 5 Hypothesis Testing in Multiple Linear Regression BIOST 515 January 20, 2004 Types of tests 1 Overall test Test for addition of a single variable Test for addition of a group of variables Overall

### Applications of Structural Equation Modeling in Social Sciences Research

American International Journal of Contemporary Research Vol. 4 No. 1; January 2014 Applications of Structural Equation Modeling in Social Sciences Research Jackson de Carvalho, PhD Assistant Professor

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Appendices with Supplementary Materials for CAPM for Estimating Cost of Equity Capital: Interpreting the Empirical Evidence

Appendices with Supplementary Materials for CAPM for Estimating Cost of Equity Capital: Interpreting the Empirical Evidence This document contains supplementary material to the paper titled CAPM for estimating

### Introduction to Stata

Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

### Multiple Regression: What Is It?

Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

Lecture 11: Outliers and Influential data Regression III: Advanced Methods William G. Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Outlying Observations: Why pay attention?

### Examining the effects of exchange rates on Australian domestic tourism demand: A panel generalized least squares approach

19th International Congress on Modelling and Simulation, Perth, Australia, 12 16 December 2011 http://mssanz.org.au/modsim2011 Examining the effects of exchange rates on Australian domestic tourism demand:

### Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

### Testing The Quantity Theory of Money in Greece: A Note

ERC Working Paper in Economic 03/10 November 2003 Testing The Quantity Theory of Money in Greece: A Note Erdal Özmen Department of Economics Middle East Technical University Ankara 06531, Turkey ozmen@metu.edu.tr

### Practical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University

Practical I conometrics data collection, analysis, and application Christiana E. Hilmer Michael J. Hilmer San Diego State University Mi Table of Contents PART ONE THE BASICS 1 Chapter 1 An Introduction

### The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

### Linear Models in STATA and ANOVA

Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood