Using instrumental variables techniques in economics and finance
|
|
|
- Erin Francis
- 9 years ago
- Views:
Transcription
1 Using instrumental variables techniques in economics and finance Christopher F Baum 1 Boston College and DIW Berlin German Stata Users Group Meeting, Berlin, June Thanks to Mark Schaffer for a number of useful suggestions. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
2 Introduction What are instrumental variables (IV) methods? Most widely known as a solution to endogenous regressors: explanatory variables correlated with the regression error term, IV methods provide a way to nonetheless obtain consistent parameter estimates. However, as Cameron and Trivedi point out in Microeconometrics (2005), this method, widely used in econometrics and rarely used elsewhere, is conceptually difficult and easily misused. (p.95) My goal today is to present an overview of IV estimation and lay out the benefits and pitfalls of the IV approach. I will discuss the latest enhancements to IV methods available in Stata 9.2 and 10, including the latest release of Baum, Schaffer, Stillman s widely used ivreg2, available for Stata 9.2 or better, and Stata 10 s ivregress. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
3 Introduction What are instrumental variables (IV) methods? Most widely known as a solution to endogenous regressors: explanatory variables correlated with the regression error term, IV methods provide a way to nonetheless obtain consistent parameter estimates. However, as Cameron and Trivedi point out in Microeconometrics (2005), this method, widely used in econometrics and rarely used elsewhere, is conceptually difficult and easily misused. (p.95) My goal today is to present an overview of IV estimation and lay out the benefits and pitfalls of the IV approach. I will discuss the latest enhancements to IV methods available in Stata 9.2 and 10, including the latest release of Baum, Schaffer, Stillman s widely used ivreg2, available for Stata 9.2 or better, and Stata 10 s ivregress. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
4 Introduction What are instrumental variables (IV) methods? Most widely known as a solution to endogenous regressors: explanatory variables correlated with the regression error term, IV methods provide a way to nonetheless obtain consistent parameter estimates. However, as Cameron and Trivedi point out in Microeconometrics (2005), this method, widely used in econometrics and rarely used elsewhere, is conceptually difficult and easily misused. (p.95) My goal today is to present an overview of IV estimation and lay out the benefits and pitfalls of the IV approach. I will discuss the latest enhancements to IV methods available in Stata 9.2 and 10, including the latest release of Baum, Schaffer, Stillman s widely used ivreg2, available for Stata 9.2 or better, and Stata 10 s ivregress. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
5 Introduction The discussion that follows is presented in much greater detail in three sources: Enhanced routines for instrumental variables/gmm estimation and testing. Baum, C.F., Schaffer, M.E., Stillman, S., Stata Journal 7:4, Preprint: Boston College Economics working paper no An Introduction to Modern Econometrics Using Stata, Baum, C.F., Stata Press, 2006 (particularly Chapter 8). Instrumental variables and GMM: Estimation and testing. Baum, C.F., Schaffer, M.E., Stillman, S., Stata Journal 3:1 31, Downloadable from IDEAS ( Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
6 Introduction The discussion that follows is presented in much greater detail in three sources: Enhanced routines for instrumental variables/gmm estimation and testing. Baum, C.F., Schaffer, M.E., Stillman, S., Stata Journal 7:4, Preprint: Boston College Economics working paper no An Introduction to Modern Econometrics Using Stata, Baum, C.F., Stata Press, 2006 (particularly Chapter 8). Instrumental variables and GMM: Estimation and testing. Baum, C.F., Schaffer, M.E., Stillman, S., Stata Journal 3:1 31, Downloadable from IDEAS ( Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
7 Introduction The discussion that follows is presented in much greater detail in three sources: Enhanced routines for instrumental variables/gmm estimation and testing. Baum, C.F., Schaffer, M.E., Stillman, S., Stata Journal 7:4, Preprint: Boston College Economics working paper no An Introduction to Modern Econometrics Using Stata, Baum, C.F., Stata Press, 2006 (particularly Chapter 8). Instrumental variables and GMM: Estimation and testing. Baum, C.F., Schaffer, M.E., Stillman, S., Stata Journal 3:1 31, Downloadable from IDEAS ( Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
8 Introduction First let us consider a path diagram illustrating the problem addressed by IV methods. We can use ordinary least squares (OLS) regression to consistently estimate a model of the following sort. Standard regression: y = xb + u no association between x and u; OLS consistent x y u Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
9 Introduction However, OLS regression breaks down in the following circumstance: Endogeneity: y = xb + u correlation between x and u; OLS inconsistent x y u The correlation between x and u (or the failure of the zero conditional mean assumption E[u x] = 0) can be caused by any of several factors. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
10 Introduction However, OLS regression breaks down in the following circumstance: Endogeneity: y = xb + u correlation between x and u; OLS inconsistent x y u The correlation between x and u (or the failure of the zero conditional mean assumption E[u x] = 0) can be caused by any of several factors. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
11 Introduction Endogeneity We have stated the problem as that of endogeneity: the notion that two or more variables are jointly determined in the behavioral model. This arises naturally in the context of a simultaneous equations model such as a supply-demand system in economics, in which price and quantity are jointly determined in the market for that good or service. A shock or disturbance to either supply or demand will affect both the equilibrium price and quantity in the market, so that by construction both variables are correlated with any shock to the system. OLS methods will yield inconsistent estimates of any regression including both price and quantity, however specified. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
12 Introduction Endogeneity We have stated the problem as that of endogeneity: the notion that two or more variables are jointly determined in the behavioral model. This arises naturally in the context of a simultaneous equations model such as a supply-demand system in economics, in which price and quantity are jointly determined in the market for that good or service. A shock or disturbance to either supply or demand will affect both the equilibrium price and quantity in the market, so that by construction both variables are correlated with any shock to the system. OLS methods will yield inconsistent estimates of any regression including both price and quantity, however specified. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
13 Introduction Endogeneity As a different example, consider a cross-sectional regression of public health outcomes (say, the proportion of the population in various cities suffering from a particular childhood disease) on public health expenditures per capita in each of those cities. We would hope to find that spending is effective in reducing incidence of the disease, but we also must consider the reverse causality in this relationship, where the level of expenditure is likely to be partially determined by the historical incidence of the disease in each jurisdiction. In this context, OLS estimates of the relationship will be biased even if additional controls are added to the specification. Although we may have no interest in modeling public health expenditures, we must be able to specify such an equation in order to identify the relationship of interest, as we discuss henceforth. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
14 Introduction Endogeneity As a different example, consider a cross-sectional regression of public health outcomes (say, the proportion of the population in various cities suffering from a particular childhood disease) on public health expenditures per capita in each of those cities. We would hope to find that spending is effective in reducing incidence of the disease, but we also must consider the reverse causality in this relationship, where the level of expenditure is likely to be partially determined by the historical incidence of the disease in each jurisdiction. In this context, OLS estimates of the relationship will be biased even if additional controls are added to the specification. Although we may have no interest in modeling public health expenditures, we must be able to specify such an equation in order to identify the relationship of interest, as we discuss henceforth. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
15 Introduction Measurement error in a regressor Although IV methods were first developed to cope with the problem of endogeneity in a simultaneous system, the correlation of regressor and error may arise for other reasons. The presence of measurement error in a regressor will, in general terms, cause the same correlation of regressor and error in a model where behavior depends upon the true value of x and the statistician observes only a inaccurate measurement of x. Even if we assume that the magnitude of the measurement error is independent of the true value of x (often an inappropriate assumption) measurement error will cause OLS to produce biased and inconsistent parameter estimates of all parameters, not only that of the mismeasured regressor. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
16 Introduction Measurement error in a regressor Although IV methods were first developed to cope with the problem of endogeneity in a simultaneous system, the correlation of regressor and error may arise for other reasons. The presence of measurement error in a regressor will, in general terms, cause the same correlation of regressor and error in a model where behavior depends upon the true value of x and the statistician observes only a inaccurate measurement of x. Even if we assume that the magnitude of the measurement error is independent of the true value of x (often an inappropriate assumption) measurement error will cause OLS to produce biased and inconsistent parameter estimates of all parameters, not only that of the mismeasured regressor. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
17 Introduction Unobservable or latent factors Another commonly encountered problem involves unobservable factors. Both y and x may be affected by latent factors such as ability. Consider a regression of (log) earnings (y) on years of schooling (x). The error term u embodies all other factors that affect earnings, such as the individual s innate ability or intelligence. But ability is surely likely to be correlated with educational attainment, causing a correlation between regressor and error. Mathematically, this is the same problem as that caused by endogeneity or measurement error. In a panel or longitudinal dataset, we could deal with this unobserved heterogeneity with the first difference or individual fixed effects transformations. But in a cross section dataset, we do not have that luxury, and must resort to other methods such as IV estimation. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
18 Introduction Unobservable or latent factors Another commonly encountered problem involves unobservable factors. Both y and x may be affected by latent factors such as ability. Consider a regression of (log) earnings (y) on years of schooling (x). The error term u embodies all other factors that affect earnings, such as the individual s innate ability or intelligence. But ability is surely likely to be correlated with educational attainment, causing a correlation between regressor and error. Mathematically, this is the same problem as that caused by endogeneity or measurement error. In a panel or longitudinal dataset, we could deal with this unobserved heterogeneity with the first difference or individual fixed effects transformations. But in a cross section dataset, we do not have that luxury, and must resort to other methods such as IV estimation. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
19 Instrumental variables methods The solution provided by IV methods may be viewed as: Instrumental variables regression: y = xb + u z uncorrelated with u, correlated with x z x y u The additional variable z is termed an instrument for x. In general, we may have many variables in x, and more than one x correlated with u. In that case, we shall need at least that many variables in z. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
20 Instrumental variables methods Choice of instruments To deal with the problem of endogeneity in a supply-demand system, a candidate z will affect (e.g.) the quantity supplied of the good, but not directly impact the demand for the good. An example for an agricultural commodity might be temperature or rainfall: clearly exogenous to the market, but likely to be important in the production process. For the public health example, we might use per capita income in each city as an instrument or z variable. It is likely to influence public health expenditure, as cities with a larger tax base might be expected to spend more on all services, and will not be directly affected by the unobserved factors in the primary relationship. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
21 Instrumental variables methods Choice of instruments To deal with the problem of endogeneity in a supply-demand system, a candidate z will affect (e.g.) the quantity supplied of the good, but not directly impact the demand for the good. An example for an agricultural commodity might be temperature or rainfall: clearly exogenous to the market, but likely to be important in the production process. For the public health example, we might use per capita income in each city as an instrument or z variable. It is likely to influence public health expenditure, as cities with a larger tax base might be expected to spend more on all services, and will not be directly affected by the unobserved factors in the primary relationship. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
22 Instrumental variables methods Choice of instruments For the problem of measurement error in a regressor, a common choice of instrument (z) is the rank of the mismeasured variable. Although the mismeasured variable contains an element of measurement error, if that error is relatively small, it will not alter the rank of the observation in the distribution. In the case of latent factors, such as a regression of log earnings on years of schooling, we might be able to find an instrument (z) in the form of the mother s or father s years of schooling. More educated parents are more likely to produce more educated children; at the same time, the unobserved factors influencing the individual s educational attainment cannot affect prior events, such as their parent s schooling. Example: OLS vs IV Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
23 Instrumental variables methods Choice of instruments For the problem of measurement error in a regressor, a common choice of instrument (z) is the rank of the mismeasured variable. Although the mismeasured variable contains an element of measurement error, if that error is relatively small, it will not alter the rank of the observation in the distribution. In the case of latent factors, such as a regression of log earnings on years of schooling, we might be able to find an instrument (z) in the form of the mother s or father s years of schooling. More educated parents are more likely to produce more educated children; at the same time, the unobserved factors influencing the individual s educational attainment cannot affect prior events, such as their parent s schooling. Example: OLS vs IV Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
24 Instrumental variables methods But why should we not always use IV? But why should we not always use IV? It may be difficult to find variables that can serve as valid instruments. Many variables that have an effect on included endogenous variables also have a direct effect on the dependent variable. IV estimators are innately biased, and their finite-sample properties are often problematic. Thus, most of the justification for the use of IV is asymptotic. Performance in small samples may be poor. The precision of IV estimates is lower than that of OLS estimates (least squares is just that). In the presence of weak instruments (excluded instruments only weakly correlated with included endogenous regressors) the loss of precision will be severe, and IV estimates may be no improvement over OLS. This suggests we need a method to determine whether a particular regressor must be treated as endogenous. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
25 Instrumental variables methods But why should we not always use IV? But why should we not always use IV? It may be difficult to find variables that can serve as valid instruments. Many variables that have an effect on included endogenous variables also have a direct effect on the dependent variable. IV estimators are innately biased, and their finite-sample properties are often problematic. Thus, most of the justification for the use of IV is asymptotic. Performance in small samples may be poor. The precision of IV estimates is lower than that of OLS estimates (least squares is just that). In the presence of weak instruments (excluded instruments only weakly correlated with included endogenous regressors) the loss of precision will be severe, and IV estimates may be no improvement over OLS. This suggests we need a method to determine whether a particular regressor must be treated as endogenous. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
26 Instrumental variables methods But why should we not always use IV? But why should we not always use IV? It may be difficult to find variables that can serve as valid instruments. Many variables that have an effect on included endogenous variables also have a direct effect on the dependent variable. IV estimators are innately biased, and their finite-sample properties are often problematic. Thus, most of the justification for the use of IV is asymptotic. Performance in small samples may be poor. The precision of IV estimates is lower than that of OLS estimates (least squares is just that). In the presence of weak instruments (excluded instruments only weakly correlated with included endogenous regressors) the loss of precision will be severe, and IV estimates may be no improvement over OLS. This suggests we need a method to determine whether a particular regressor must be treated as endogenous. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
27 Instrumental variables methods But why should we not always use IV? Instruments may be weak: satisfactorily exogenous, but only weakly correlated with the endogenous regressors. As Bound, Jaeger, Baker (NBER TWP 1993, JASA 1995) argue the cure can be worse than the disease. Staiger and Stock (Econometrica, 1997) formalized the definition of weak instruments. Many researchers conclude from their work that if the first-stage F statistic exceeds 10, their instruments are sufficiently strong. This criterion does not necessarily establish the absence of a weak instruments problem. Stock and Yogo (Camb. U. Press festschrift, 2005) further explore the issue and provide useful rules of thumb for evaluating the weakness of instruments. ivreg2 and Stata 10 s ivregress now present Stock Yogo tabulations based on the Cragg Donald statistic. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
28 Instrumental variables methods But why should we not always use IV? Instruments may be weak: satisfactorily exogenous, but only weakly correlated with the endogenous regressors. As Bound, Jaeger, Baker (NBER TWP 1993, JASA 1995) argue the cure can be worse than the disease. Staiger and Stock (Econometrica, 1997) formalized the definition of weak instruments. Many researchers conclude from their work that if the first-stage F statistic exceeds 10, their instruments are sufficiently strong. This criterion does not necessarily establish the absence of a weak instruments problem. Stock and Yogo (Camb. U. Press festschrift, 2005) further explore the issue and provide useful rules of thumb for evaluating the weakness of instruments. ivreg2 and Stata 10 s ivregress now present Stock Yogo tabulations based on the Cragg Donald statistic. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
29 Instrumental variables methods But why should we not always use IV? Instruments may be weak: satisfactorily exogenous, but only weakly correlated with the endogenous regressors. As Bound, Jaeger, Baker (NBER TWP 1993, JASA 1995) argue the cure can be worse than the disease. Staiger and Stock (Econometrica, 1997) formalized the definition of weak instruments. Many researchers conclude from their work that if the first-stage F statistic exceeds 10, their instruments are sufficiently strong. This criterion does not necessarily establish the absence of a weak instruments problem. Stock and Yogo (Camb. U. Press festschrift, 2005) further explore the issue and provide useful rules of thumb for evaluating the weakness of instruments. ivreg2 and Stata 10 s ivregress now present Stock Yogo tabulations based on the Cragg Donald statistic. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
30 The IV-GMM estimator IV estimation as a GMM problem Before discussing further the motivation for various weak instrument diagnostics, we define the setting for IV estimation as a Generalized Method of Moments (GMM) optimization problem. Economists consider GMM to be the invention of Lars Hansen in his 1982 Econometrica paper, but as Alistair Hall points out in his 2005 book, the method has its antecedents in Karl Pearson s Method of Moments [MM] (1895) and Neyman and Egon Pearson s minimum Chi-squared estimator [MCE] (1928). Their MCE approach overcomes the difficulty with MM estimators when there are more moment conditions than parameters to be estimated. This was recognized by Ferguson (Ann. Math. Stat. 1958) for the case of i.i.d. errors, but his work had no impact on the econometric literature. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
31 The IV-GMM estimator We consider the model y = Xβ + u, u (0, Ω) with X (N k) and define a matrix Z (N l) where l k. This is the Generalized Method of Moments IV (IV-GMM) estimator. The l instruments give rise to a set of l moments: g i (β) = Z i u i = Z i (y i x i β), i = 1, N where each g i is an l-vector. The method of moments approach considers each of the l moment equations as a sample moment, which we may estimate by averaging over N: ḡ(β) = 1 N N z i (y i x i β) = 1 N Z u i=1 The GMM approach chooses an estimate that solves ḡ( ˆβ GMM ) = 0. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
32 The IV-GMM estimator Exact identification and 2SLS If l = k, the equation to be estimated is said to be exactly identified by the order condition for identification: that is, there are as many excluded instruments as included right-hand endogenous variables. The method of moments problem is then k equations in k unknowns, and a unique solution exists, equivalent to the standard IV estimator: ˆβ IV = (Z X) 1 Z y In the case of overidentification (l > k) we may define a set of k instruments: ˆX = Z (Z Z ) 1 Z X = P Z X which gives rise to the two-stage least squares (2SLS) estimator ˆβ 2SLS = ( ˆX X) 1 ˆX y = (X P Z X) 1 X P Z y which despite its name is computed by this single matrix equation. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
33 The IV-GMM estimator Exact identification and 2SLS If l = k, the equation to be estimated is said to be exactly identified by the order condition for identification: that is, there are as many excluded instruments as included right-hand endogenous variables. The method of moments problem is then k equations in k unknowns, and a unique solution exists, equivalent to the standard IV estimator: ˆβ IV = (Z X) 1 Z y In the case of overidentification (l > k) we may define a set of k instruments: ˆX = Z (Z Z ) 1 Z X = P Z X which gives rise to the two-stage least squares (2SLS) estimator ˆβ 2SLS = ( ˆX X) 1 ˆX y = (X P Z X) 1 X P Z y which despite its name is computed by this single matrix equation. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
34 The IV-GMM estimator The IV-GMM approach In the 2SLS method with overidentification, the l available instruments are boiled down" to the k needed by defining the P Z matrix. In the IV-GMM approach, that reduction is not necessary. All l instruments are used in the estimator. Furthermore, a weighting matrix is employed so that we may choose ˆβ GMM so that the elements of ḡ( ˆβ GMM ) are as close to zero as possible. With l > k, not all l moment conditions can be exactly satisfied, so a criterion function that weights them appropriately is used to improve the efficiency of the estimator. The GMM estimator minimizes the criterion J( ˆβ GMM ) = N ḡ( ˆβ GMM ) W ḡ( ˆβ GMM ) where W is a l l symmetric weighting matrix. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
35 The IV-GMM estimator The IV-GMM approach In the 2SLS method with overidentification, the l available instruments are boiled down" to the k needed by defining the P Z matrix. In the IV-GMM approach, that reduction is not necessary. All l instruments are used in the estimator. Furthermore, a weighting matrix is employed so that we may choose ˆβ GMM so that the elements of ḡ( ˆβ GMM ) are as close to zero as possible. With l > k, not all l moment conditions can be exactly satisfied, so a criterion function that weights them appropriately is used to improve the efficiency of the estimator. The GMM estimator minimizes the criterion J( ˆβ GMM ) = N ḡ( ˆβ GMM ) W ḡ( ˆβ GMM ) where W is a l l symmetric weighting matrix. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
36 The IV-GMM estimator The GMM weighting matrix Solving the set of FOCs, we derive the IV-GMM estimator of an overidentified equation: ˆβ GMM = (X ZWZ X) 1 X ZWZ y which will be identical for all W matrices which differ by a factor of proportionality. The optimal weighting matrix, as shown by Hansen (1982), chooses W = S 1 where S is the covariance matrix of the moment conditions to produce the most efficient estimator: S = E[Z uu Z ] = lim N N 1 [Z ΩZ ] With a consistent estimator of S derived from 2SLS residuals, we define the feasible IV-GMM estimator as ˆβ FEGMM = (X Z Ŝ 1 Z X) 1 X Z Ŝ 1 Z y where FEGMM refers to the feasible efficient GMM estimator. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
37 The IV-GMM estimator The GMM weighting matrix Solving the set of FOCs, we derive the IV-GMM estimator of an overidentified equation: ˆβ GMM = (X ZWZ X) 1 X ZWZ y which will be identical for all W matrices which differ by a factor of proportionality. The optimal weighting matrix, as shown by Hansen (1982), chooses W = S 1 where S is the covariance matrix of the moment conditions to produce the most efficient estimator: S = E[Z uu Z ] = lim N N 1 [Z ΩZ ] With a consistent estimator of S derived from 2SLS residuals, we define the feasible IV-GMM estimator as ˆβ FEGMM = (X Z Ŝ 1 Z X) 1 X Z Ŝ 1 Z y where FEGMM refers to the feasible efficient GMM estimator. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
38 The IV-GMM estimator IV-GMM and the distribution of u The derivation makes no mention of the form of Ω, the variance-covariance matrix (vce) of the error process u. If the errors satisfy all classical assumptions are i.i.d., S = σ 2 ui N and the optimal weighting matrix is proportional to the identity matrix. The IV-GMM estimator is merely the standard IV (or 2SLS) estimator. If there is heteroskedasticity of unknown form, we usually compute robust standard errors in any Stata estimation command to derive a consistent estimate of the vce. In this context, Ŝ = 1 N N i=1 û 2 i Z i Z i where û is the vector of residuals from any consistent estimator of β (e.g., the 2SLS residuals). For an overidentified equation, the IV-GMM estimates computed from this estimate of S will be more efficient than 2SLS estimates. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
39 The IV-GMM estimator IV-GMM and the distribution of u The derivation makes no mention of the form of Ω, the variance-covariance matrix (vce) of the error process u. If the errors satisfy all classical assumptions are i.i.d., S = σ 2 ui N and the optimal weighting matrix is proportional to the identity matrix. The IV-GMM estimator is merely the standard IV (or 2SLS) estimator. If there is heteroskedasticity of unknown form, we usually compute robust standard errors in any Stata estimation command to derive a consistent estimate of the vce. In this context, Ŝ = 1 N N i=1 û 2 i Z i Z i where û is the vector of residuals from any consistent estimator of β (e.g., the 2SLS residuals). For an overidentified equation, the IV-GMM estimates computed from this estimate of S will be more efficient than 2SLS estimates. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
40 The IV-GMM estimator IV-GMM and the distribution of u We must distinguish the concept of IV/2SLS estimation with robust standard errors from the concept of estimating the same equation with IV-GMM, allowing for arbitrary heteroskedasticity. Compare an overidentified regression model estimated (a) with IV and classical standard errors and (b) with robust standard errors. Model (b) will produce the same point estimates, but different standard errors in the presence of heteroskedastic errors. However, if we reestimate that overidentified model using the GMM two-step estimator, we will get different point estimates because we are solving a different optimization problem: one in the l-space of the instruments (and moment conditions) rather than the k-space of the regressors, and l > k. We will also get different standard errors, and in general smaller standard errors as the IV-GMM estimator is more efficient. This does not imply, however, that summary measures of fit will improve. Example: IV and IV(robust) vs IV-GMM Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
41 The IV-GMM estimator IV-GMM and the distribution of u We must distinguish the concept of IV/2SLS estimation with robust standard errors from the concept of estimating the same equation with IV-GMM, allowing for arbitrary heteroskedasticity. Compare an overidentified regression model estimated (a) with IV and classical standard errors and (b) with robust standard errors. Model (b) will produce the same point estimates, but different standard errors in the presence of heteroskedastic errors. However, if we reestimate that overidentified model using the GMM two-step estimator, we will get different point estimates because we are solving a different optimization problem: one in the l-space of the instruments (and moment conditions) rather than the k-space of the regressors, and l > k. We will also get different standard errors, and in general smaller standard errors as the IV-GMM estimator is more efficient. This does not imply, however, that summary measures of fit will improve. Example: IV and IV(robust) vs IV-GMM Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
42 The IV-GMM estimator IV-GMM cluster-robust estimates If errors are considered to exhibit arbitrary intra-cluster correlation in a dataset with M clusters, we may derive a cluster-robust IV-GMM estimator using M Ŝ = where j=1 û j ûj û j = (y j x j ˆβ)X Z (Z Z ) 1 z j The IV-GMM estimates employing this estimate of S will be both robust to arbitrary heteroskedasticity and intra-cluster correlation, equivalent to estimates generated by Stata s cluster(varname) option. For an overidentified equation, IV-GMM cluster-robust estimates will be more efficient than 2SLS estimates. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
43 The IV-GMM estimator IV-GMM HAC estimates The IV-GMM approach may also be used to generate HAC standard errors: those robust to arbitrary heteroskedasticity and autocorrelation. Although the best-known HAC approach in econometrics is that of Newey and West, using the Bartlett kernel (per Stata s newey), that is only one choice of a HAC estimator that may be applied to an IV-GMM problem. ivreg2 and Stata 10 s ivregress provide several choices for kernels. For some kernels, the kernel bandwidth (roughly, number of lags employed) may be chosen automatically in both commands. In ivreg2 (but not in ivregress) you may also specify a vce that is robust to autocorrelation while maintaining the assumption of conditional homoskedasticity: that is, AC without the H. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
44 The IV-GMM estimator Implementation in Stata The estimators we have discussed are available from Baum, Schaffer and Stillman s ivreg2 package (ssc describe ivreg2). The ivreg2 command has the same basic syntax as Stata s older ivreg command: ivreg2 depvar [varlist1] (varlist2=instlist) /// [if] [in] [, options] The l variables in varlist1 and instlist comprise Z, the matrix of instruments. The k variables in varlist1 and varlist2 comprise X. Both matrices by default include a units vector. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
45 The IV-GMM estimator Implementation in Stata The estimators we have discussed are available from Baum, Schaffer and Stillman s ivreg2 package (ssc describe ivreg2). The ivreg2 command has the same basic syntax as Stata s older ivreg command: ivreg2 depvar [varlist1] (varlist2=instlist) /// [if] [in] [, options] The l variables in varlist1 and instlist comprise Z, the matrix of instruments. The k variables in varlist1 and varlist2 comprise X. Both matrices by default include a units vector. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
46 The IV-GMM estimator ivreg2 options By default ivreg2 estimates the IV estimator, or 2SLS estimator if l > k. If the gmm2s option is specified in conjunction with robust, cluster() or bw(), it estimates the IV-GMM estimator. With the robust option, the vce is heteroskedasticity-robust. With the cluster(varname) option, the vce is cluster-robust. With the robust and bw( ) options, the vce is HAC with the default Bartlett kernel, or Newey West. Other kernel( ) choices lead to alternative HAC estimators. In ivreg2, both robust and bw( ) options must be specified for HAC. Estimates produced with bw( ) alone are robust to arbitrary autocorrelation but assume homoskedasticity. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
47 The IV-GMM estimator ivreg2 options By default ivreg2 estimates the IV estimator, or 2SLS estimator if l > k. If the gmm2s option is specified in conjunction with robust, cluster() or bw(), it estimates the IV-GMM estimator. With the robust option, the vce is heteroskedasticity-robust. With the cluster(varname) option, the vce is cluster-robust. With the robust and bw( ) options, the vce is HAC with the default Bartlett kernel, or Newey West. Other kernel( ) choices lead to alternative HAC estimators. In ivreg2, both robust and bw( ) options must be specified for HAC. Estimates produced with bw( ) alone are robust to arbitrary autocorrelation but assume homoskedasticity. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
48 The IV-GMM estimator ivreg2 options By default ivreg2 estimates the IV estimator, or 2SLS estimator if l > k. If the gmm2s option is specified in conjunction with robust, cluster() or bw(), it estimates the IV-GMM estimator. With the robust option, the vce is heteroskedasticity-robust. With the cluster(varname) option, the vce is cluster-robust. With the robust and bw( ) options, the vce is HAC with the default Bartlett kernel, or Newey West. Other kernel( ) choices lead to alternative HAC estimators. In ivreg2, both robust and bw( ) options must be specified for HAC. Estimates produced with bw( ) alone are robust to arbitrary autocorrelation but assume homoskedasticity. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
49 Tests of overidentifying restrictions If and only if an equation is overidentified, we may test whether the excluded instruments are appropriately independent of the error process. That test should always be performed when it is possible to do so, as it allows us to evaluate the validity of the instruments. A test of overidentifying restrictions regresses the residuals from an IV or 2SLS regression on all instruments in Z. Under the null hypothesis that all instruments are uncorrelated with u, the test has a large-sample χ 2 (r) distribution where r is the number of overidentifying restrictions. Under the assumption of i.i.d. errors, this is known as a Sargan test, and is routinely produced by ivreg2 for IV and 2SLS estimates. It can also be calculated after ivreg estimation with the overid command, which is part of the ivreg2 suite. After ivregress, the command estat overid provides the test. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
50 Tests of overidentifying restrictions If and only if an equation is overidentified, we may test whether the excluded instruments are appropriately independent of the error process. That test should always be performed when it is possible to do so, as it allows us to evaluate the validity of the instruments. A test of overidentifying restrictions regresses the residuals from an IV or 2SLS regression on all instruments in Z. Under the null hypothesis that all instruments are uncorrelated with u, the test has a large-sample χ 2 (r) distribution where r is the number of overidentifying restrictions. Under the assumption of i.i.d. errors, this is known as a Sargan test, and is routinely produced by ivreg2 for IV and 2SLS estimates. It can also be calculated after ivreg estimation with the overid command, which is part of the ivreg2 suite. After ivregress, the command estat overid provides the test. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
51 Tests of overidentifying restrictions If and only if an equation is overidentified, we may test whether the excluded instruments are appropriately independent of the error process. That test should always be performed when it is possible to do so, as it allows us to evaluate the validity of the instruments. A test of overidentifying restrictions regresses the residuals from an IV or 2SLS regression on all instruments in Z. Under the null hypothesis that all instruments are uncorrelated with u, the test has a large-sample χ 2 (r) distribution where r is the number of overidentifying restrictions. Under the assumption of i.i.d. errors, this is known as a Sargan test, and is routinely produced by ivreg2 for IV and 2SLS estimates. It can also be calculated after ivreg estimation with the overid command, which is part of the ivreg2 suite. After ivregress, the command estat overid provides the test. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
52 Tests of overidentifying restrictions If we have used IV-GMM estimation in ivreg2, the test of overidentifying restrictions becomes J: the GMM criterion function. Although J will be identically zero for any exactly-identified equation, it will be positive for an overidentified equation. If it is too large, doubt is cast on the satisfaction of the moment conditions underlying GMM. The test in this context is known as the Hansen test or J test, and is routinely calculated by ivreg2 when the gmm option is employed. The Sargan Hansen test of overidentifying restrictions should be performed routinely in any overidentified model estimated with instrumental variables techniques. Instrumental variables techniques are powerful, but if a strong rejection of the null hypothesis of the Sargan Hansen test is encountered, you should strongly doubt the validity of the estimates. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
53 Tests of overidentifying restrictions If we have used IV-GMM estimation in ivreg2, the test of overidentifying restrictions becomes J: the GMM criterion function. Although J will be identically zero for any exactly-identified equation, it will be positive for an overidentified equation. If it is too large, doubt is cast on the satisfaction of the moment conditions underlying GMM. The test in this context is known as the Hansen test or J test, and is routinely calculated by ivreg2 when the gmm option is employed. The Sargan Hansen test of overidentifying restrictions should be performed routinely in any overidentified model estimated with instrumental variables techniques. Instrumental variables techniques are powerful, but if a strong rejection of the null hypothesis of the Sargan Hansen test is encountered, you should strongly doubt the validity of the estimates. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
54 Tests of overidentifying restrictions For instance, let s rerun the last IV-GMM model we estimated and focus on the test of overidentifying restrictions provided by the Hansen J statistic. The model is overidentified by two degrees of freedom, as there is one endogenous regressor and three excluded instruments. We see that the J statistic strongly rejects its null, casting doubts on the quality of these estimates. Let s reestimate the model excluding age from the instrument list and see what happens. We will see that the sign and significance of the key endogenous regressor changes as we respecify the instrument list. Example: Tests of overidentifying restrictions Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
55 Tests of overidentifying restrictions For instance, let s rerun the last IV-GMM model we estimated and focus on the test of overidentifying restrictions provided by the Hansen J statistic. The model is overidentified by two degrees of freedom, as there is one endogenous regressor and three excluded instruments. We see that the J statistic strongly rejects its null, casting doubts on the quality of these estimates. Let s reestimate the model excluding age from the instrument list and see what happens. We will see that the sign and significance of the key endogenous regressor changes as we respecify the instrument list. Example: Tests of overidentifying restrictions Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
56 Tests of overidentifying restrictions Testing a subset of overidentifying restrictions We may be quite confident of some instruments independence from u but concerned about others. In that case a GMM distance or C test may be used. The orthog( ) option of ivreg2 tests whether a subset of the model s overidentifying restrictions appear to be satisfied. This is carried out by calculating two Sargan Hansen statistics: one for the full model and a second for the model in which the listed variables are (a) considered endogenous, if included regressors, or (b) dropped, if excluded regressors. In case (a), the model must still satisfy the order condition for identification. The difference of the two Sargan Hansen statistics, often termed the GMM distance or C statistic, will be distributed χ 2 under the null hypothesis that the specified orthogonality conditions are satisfied, with d.f. equal to the number of those conditions. Example: C (GMM distance) test of a subset of overidentifying restrictions Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
57 Tests of overidentifying restrictions Testing a subset of overidentifying restrictions We may be quite confident of some instruments independence from u but concerned about others. In that case a GMM distance or C test may be used. The orthog( ) option of ivreg2 tests whether a subset of the model s overidentifying restrictions appear to be satisfied. This is carried out by calculating two Sargan Hansen statistics: one for the full model and a second for the model in which the listed variables are (a) considered endogenous, if included regressors, or (b) dropped, if excluded regressors. In case (a), the model must still satisfy the order condition for identification. The difference of the two Sargan Hansen statistics, often termed the GMM distance or C statistic, will be distributed χ 2 under the null hypothesis that the specified orthogonality conditions are satisfied, with d.f. equal to the number of those conditions. Example: C (GMM distance) test of a subset of overidentifying restrictions Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
58 Tests of overidentifying restrictions Testing a subset of overidentifying restrictions A variant on this strategy is implemented by the endog( ) option of ivreg2, in which one or more variables considered endogenous can be tested for exogeneity. The C test in this case will consider whether the null hypothesis of their exogeneity is supported by the data. If all endogenous regressors are included in the endog( ) option, the test is essentially a test of whether IV methods are required to estimate the equation. If OLS estimates of the equation are consistent, they should be preferred. In this context, the test is equivalent to a Hausman test comparing IV and OLS estimates, as implemented by Stata s hausman command with the sigmaless option. Using ivreg2, you need not estimate and store both models to generate the test s verdict. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
59 Tests of overidentifying restrictions Testing a subset of overidentifying restrictions A variant on this strategy is implemented by the endog( ) option of ivreg2, in which one or more variables considered endogenous can be tested for exogeneity. The C test in this case will consider whether the null hypothesis of their exogeneity is supported by the data. If all endogenous regressors are included in the endog( ) option, the test is essentially a test of whether IV methods are required to estimate the equation. If OLS estimates of the equation are consistent, they should be preferred. In this context, the test is equivalent to a Hausman test comparing IV and OLS estimates, as implemented by Stata s hausman command with the sigmaless option. Using ivreg2, you need not estimate and store both models to generate the test s verdict. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
60 Testing for weak instruments The weak instruments problem Instrumental variables methods rely on two assumptions: the excluded instruments are distributed independently of the error process, and they are sufficiently correlated with the included endogenous regressors. Tests of overidentifying restrictions address the first assumption, although we should note that a rejection of their null may be indicative that the exclusion restrictions for these instruments may be inappropriate. That is, some of the instruments have been improperly excluded from the regression model s specification. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
61 Testing for weak instruments The specification of an instrumental variables model asserts that the excluded instruments affect the dependent variable only indirectly, through their correlations with the included endogenous variables. If an excluded instrument exerts both direct and indirect influences on the dependent variable, the exclusion restriction should be rejected. This can be readily tested by including the variable as a regressor. In our earlier example we saw that including age in the excluded instruments list caused a rejection of the J test. We had assumed that age could be treated as excluded from the model. Is that assumption warranted? Example: Test of exclusion of an instrument Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
62 Testing for weak instruments The specification of an instrumental variables model asserts that the excluded instruments affect the dependent variable only indirectly, through their correlations with the included endogenous variables. If an excluded instrument exerts both direct and indirect influences on the dependent variable, the exclusion restriction should be rejected. This can be readily tested by including the variable as a regressor. In our earlier example we saw that including age in the excluded instruments list caused a rejection of the J test. We had assumed that age could be treated as excluded from the model. Is that assumption warranted? Example: Test of exclusion of an instrument Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
63 Testing for weak instruments To test the second assumption that the excluded instruments are sufficiently correlated with the included endogenous regressors we should consider the goodness-of-fit of the first stage regressions relating each endogenous regressor to the entire set of instruments. It is important to understand that the theory of single-equation ( limited information ) IV estimation requires that all columns of X are conceptually regressed on all columns of Z in the calculation of the estimates. We cannot meaningfully speak of this variable is an instrument for that regressor or somehow restrict which instruments enter which first-stage regressions. Stata s ivregress or ivreg2 will not let you do that because such restrictions only make sense in the context of estimating an entire system of equations by full-information methods (for instance, with reg3). Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
64 Testing for weak instruments To test the second assumption that the excluded instruments are sufficiently correlated with the included endogenous regressors we should consider the goodness-of-fit of the first stage regressions relating each endogenous regressor to the entire set of instruments. It is important to understand that the theory of single-equation ( limited information ) IV estimation requires that all columns of X are conceptually regressed on all columns of Z in the calculation of the estimates. We cannot meaningfully speak of this variable is an instrument for that regressor or somehow restrict which instruments enter which first-stage regressions. Stata s ivregress or ivreg2 will not let you do that because such restrictions only make sense in the context of estimating an entire system of equations by full-information methods (for instance, with reg3). Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
65 Testing for weak instruments The first and ffirst options of ivreg2 present several useful diagnostics that assess the first-stage regressions. If there is a single endogenous regressor, these issues are simplified, as the instruments either explain a reasonable fraction of that regressor s variability or not. With multiple endogenous regressors, diagnostics are more complicated, as each instrument is being called upon to play a role in each first-stage regression. With sufficiently weak instruments, the asymptotic identification status of the equation is called into question. An equation identified by the order and rank conditions in a finite sample may still be effectively unidentified. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
66 Testing for weak instruments The first and ffirst options of ivreg2 present several useful diagnostics that assess the first-stage regressions. If there is a single endogenous regressor, these issues are simplified, as the instruments either explain a reasonable fraction of that regressor s variability or not. With multiple endogenous regressors, diagnostics are more complicated, as each instrument is being called upon to play a role in each first-stage regression. With sufficiently weak instruments, the asymptotic identification status of the equation is called into question. An equation identified by the order and rank conditions in a finite sample may still be effectively unidentified. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
67 Testing for weak instruments As Staiger and Stock (Econometrica, 1997) show, the weak instruments problem can arise even when the first-stage t- and F-tests are significant at conventional levels in a large sample. In the worst case, the bias of the IV estimator is the same as that of OLS, IV becomes inconsistent, and instrumenting only aggravates the problem. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
68 Testing for weak instruments Beyond the informal rule-of-thumb diagnostics such as F > 10, ivreg2 computes several statistics that can be used to critically evaluate the strength of instruments. We can write the first-stage regressions as X = Z Π + v With X 1 as the endogenous regressors, Z 1 the excluded instruments and Z 2 as the included instruments, this can be partitioned as X 1 = [Z 1 Z 2 ] [Π 11 Π 12 ] + v 1 The rank condition for identification states that the L K 1 matrix Π 11 must be of full column rank. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
69 Testing for weak instruments The Anderson canonical correlation statistic We do not observe the true Π 11, so we must replace it with an estimate. Anderson s (John Wiley, 1984) approach to testing the rank of this matrix (or that of the full Π matrix) considers the canonical correlations of the X and Z matrices. If the equation is to be identified, all K of the canonical correlations will be significantly different from zero. The squared canonical correlations can be expressed as eigenvalues of a matrix. Anderson s CC test considers the null hypothesis that the minimum canonical correlation is zero. Under the null, the test statistic is distributed χ 2 with (L K + 1) d.f., so it may be calculated even for an exactly-identified equation. Failure to reject the null suggests the equation is unidentified. ivreg2 routinely reports this Lagrange Multiplier (LM) statistic. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
70 Testing for weak instruments The Anderson canonical correlation statistic We do not observe the true Π 11, so we must replace it with an estimate. Anderson s (John Wiley, 1984) approach to testing the rank of this matrix (or that of the full Π matrix) considers the canonical correlations of the X and Z matrices. If the equation is to be identified, all K of the canonical correlations will be significantly different from zero. The squared canonical correlations can be expressed as eigenvalues of a matrix. Anderson s CC test considers the null hypothesis that the minimum canonical correlation is zero. Under the null, the test statistic is distributed χ 2 with (L K + 1) d.f., so it may be calculated even for an exactly-identified equation. Failure to reject the null suggests the equation is unidentified. ivreg2 routinely reports this Lagrange Multiplier (LM) statistic. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
71 Testing for weak instruments The Cragg Donald statistic The C D statistic is a closely related test of the rank of a matrix. While the Anderson CC test is a LR test, the C D test is a Wald statistic, with the same asymptotic distribution. The C D statistic plays an important role in Stock and Yogo s work (see below). Both the Anderson and C D tests are reported by ivreg2 with the first option. Recent research by Kleibergen and Paap (KP) (J. Econometrics, 2006) has developed a robust version of a test for the rank of a matrix: e.g. testing for underidentification. The statistic has been implemented by Kleibergen and Schaffer as command ranktest. If non-i.i.d. errors are assumed, the ivreg2 output contains the K P rk statistic in place of the Anderson canonical correlation statistic as a test of underidentification. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
72 Testing for weak instruments The Cragg Donald statistic The C D statistic is a closely related test of the rank of a matrix. While the Anderson CC test is a LR test, the C D test is a Wald statistic, with the same asymptotic distribution. The C D statistic plays an important role in Stock and Yogo s work (see below). Both the Anderson and C D tests are reported by ivreg2 with the first option. Recent research by Kleibergen and Paap (KP) (J. Econometrics, 2006) has developed a robust version of a test for the rank of a matrix: e.g. testing for underidentification. The statistic has been implemented by Kleibergen and Schaffer as command ranktest. If non-i.i.d. errors are assumed, the ivreg2 output contains the K P rk statistic in place of the Anderson canonical correlation statistic as a test of underidentification. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
73 Testing for i.i.d. errors in an IV context Testing for i.i.d. errors in IV In the context of an equation estimated with instrumental variables, the standard diagnostic tests for heteroskedasticity and autocorrelation are generally not valid. In the case of heteroskedasticity, Pagan and Hall (Econometric Reviews, 1983) showed that the Breusch Pagan or Cook Weisberg tests (estat hettest) are generally not usable in an IV setting. They propose a test that will be appropriate in IV estimation where heteroskedasticity may be present in more than one structural equation. Mark Schaffer s ivhettest, part of the ivreg2 suite, performs the Pagan Hall test under a variety of assumptions on the indicator variables. It will also reproduce the Breusch Pagan test if applied in an OLS context. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
74 Testing for i.i.d. errors in an IV context Testing for i.i.d. errors in IV In the context of an equation estimated with instrumental variables, the standard diagnostic tests for heteroskedasticity and autocorrelation are generally not valid. In the case of heteroskedasticity, Pagan and Hall (Econometric Reviews, 1983) showed that the Breusch Pagan or Cook Weisberg tests (estat hettest) are generally not usable in an IV setting. They propose a test that will be appropriate in IV estimation where heteroskedasticity may be present in more than one structural equation. Mark Schaffer s ivhettest, part of the ivreg2 suite, performs the Pagan Hall test under a variety of assumptions on the indicator variables. It will also reproduce the Breusch Pagan test if applied in an OLS context. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
75 Testing for i.i.d. errors in an IV context In the same token, the Breusch Godfrey statistic used in the OLS context (estat bgodfrey) will generally not be appropriate in the presence of endogenous regressors, overlapping data or conditional heteroskedasticity of the error process. Cumby and Huizinga (Econometrica, 1992) proposed a generalization of the BG statistic which handles each of these cases. Their test is actually more general in another way. Its null hypothesis of the test is that the regression error is a moving average of known order q 0 against the general alternative that autocorrelations of the regression error are nonzero at lags greater than q. In that context, it can be used to test that autocorrelations beyond any q are zero. Like the BG test, it can test multiple lag orders. The C H test is available as Baum and Schaffer s ivactest routine, part of the ivreg2 suite. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
76 Testing for i.i.d. errors in an IV context In the same token, the Breusch Godfrey statistic used in the OLS context (estat bgodfrey) will generally not be appropriate in the presence of endogenous regressors, overlapping data or conditional heteroskedasticity of the error process. Cumby and Huizinga (Econometrica, 1992) proposed a generalization of the BG statistic which handles each of these cases. Their test is actually more general in another way. Its null hypothesis of the test is that the regression error is a moving average of known order q 0 against the general alternative that autocorrelations of the regression error are nonzero at lags greater than q. In that context, it can be used to test that autocorrelations beyond any q are zero. Like the BG test, it can test multiple lag orders. The C H test is available as Baum and Schaffer s ivactest routine, part of the ivreg2 suite. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
77 Panel data IV estimation Panel data IV estimation The features of ivreg2 are also available in the routine xtivreg2, which is a wrapper for ivreg2. This routine of Mark Schaffer s extends Stata s xtivreg s support for the fixed effect (fe) and first difference (fd) estimators. The xtivreg2 routine is available from ssc. Just as ivreg2 may be used to conduct a Hausman test of IV vs. OLS, Schaffer and Stillman s xtoverid routine may be used to conduct a Hausman test of random effects vs. fixed effects after xtreg, re and xtivreg, re. This routine can also calculate tests of overidentifying restrictions after those two commands as well as xthtaylor. The xtoverid routine is also available from ssc. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
78 Panel data IV estimation Panel data IV estimation The features of ivreg2 are also available in the routine xtivreg2, which is a wrapper for ivreg2. This routine of Mark Schaffer s extends Stata s xtivreg s support for the fixed effect (fe) and first difference (fd) estimators. The xtivreg2 routine is available from ssc. Just as ivreg2 may be used to conduct a Hausman test of IV vs. OLS, Schaffer and Stillman s xtoverid routine may be used to conduct a Hausman test of random effects vs. fixed effects after xtreg, re and xtivreg, re. This routine can also calculate tests of overidentifying restrictions after those two commands as well as xthtaylor. The xtoverid routine is also available from ssc. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
79 LIML and GMM-CUE estimation LIML and GMM-CUE OLS and IV estimators are special cases of k-class estimators: OLS with k = 0 and IV with k = 1. Limited-information maximum likelihood (LIML) is another member of this class, with k chosen optimally in the estimation process. Like any ML estimator, LIML is invariant to normalization. In an equation with two endogenous variables, it does not matter whether you specify y 1 or y 2 as the left-hand variable. One of the other virtues of the LIML estimator is that it has been found to be more resistant to weak instruments problems than the IV estimator. On the down side, it makes the distributional assumption of normally distributed (and i.i.d.) errors. ivreg2 produces LIML estimates with the liml option, and liml is a subcommand for Stata 10 s ivregress. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
80 LIML and GMM-CUE estimation If the i.i.d. assumption of LIML is not reasonable, you may use the GMM equivalent: the continuously updated GMM estimator, or CUE estimator. In ivreg2, the cue option combined with robust, cluster and/or bw( ) options specifies that non-i.i.d. errors are to be modeled. GMM-CUE requires numerical optimization via Stata s ml command, and may require many iterations to converge. ivregress provides an iterated GMM estimator, which is not the same estimator as GMM-CUE. Example: LIML and GMM-CUE Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
81 LIML and GMM-CUE estimation The GMM-CUE estimator To understand what GMM-CUE is all about, recall that IV-GMM is a two-step estimator: we calculate consistent parameter estimates, yielding residuals, which give us the estimated Ŝ 1 matrix that appears in the feasible estimator presented earlier. The second-step minimization problem for the FEGMM estimator is ˆβ 2SEGMM arg min J( ˆβ) = ng( ˆβ) (S( β)) 1 g( ˆβ) ˆβ in which the weighting matrix (S( β)) 1, formed from the first-stage residuals, is treated as a matrix of constants. The second-stage residuals in the orthogonality conditions g are chosen in the minimization process. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
82 LIML and GMM-CUE estimation The GMM-CUE estimator In contrast, the GMM-CUE estimator (Hansen, Heaton and Yaron (JBES, 1996)) is defined as ˆβ CUE arg min J( ˆβ) = ng( ˆβ) (S( ˆβ)) 1 g( ˆβ) ˆβ where the weighting matrix (S( ˆβ)) 1 is a function of the β being estimated. The residuals in S are the same residuals that are in g, and estimation of S is done simultaneously with the estimation of β. This makes the estimation problem nonlinear, requiring numerical optimization. The current version of ivreg2 implements this estimator using Stata s maximum likelihood ml) facilities, using ado-file code. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
83 LIML and GMM-CUE estimation The GMM-CUE estimator We can perform this numerical optimization much more efficiently in Stata 10 by making use of Mata s suite of optimize functions. Mark Schaffer and I have implemented GMM-CUE using optimize for two cases of interest: those of i.i.d. errors and arbitrary heteroskedasticity (robust). First, we have the almost trivial routine defining the objective function: void m_mycuecrit(todo, beta, j, g, H) { external Y, X, Z, e, omega, robustflag omega = m_myomega(beta, Y, X, Z, robustflag) W = invsym(omega) N = rows(z) e = Y - X * beta // Calculate gbar=z *e/n gbar = 1/N * quadcross(z,e) j = N * gbar * W * gbar } Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
84 LIML and GMM-CUE estimation The GMM-CUE estimator This routine calls a quite brief function that computes the covariance matrix: real matrix m_myomega(real rowvector beta, real colvector Y, real matrix X, real matrix Z, string scalar robust) { // Calculate residuals from the coefficient estimates N = rows(z) e = Y - X * beta if (robust=="") { // Compute classical, non-robust covariance matrix QZZ = 1/N * quadcross(z, Z) sigma2 = 1/N * quadcross(e, e) omega = sigma2 * QZZ } else { // Compute heteroskedasticity-consistent covariance matrix e2 = e:^2 omega = 1/N * quadcross(z, e2, Z) } _makesymmetric(omega) return (omega) } Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
85 LIML and GMM-CUE estimation The GMM-CUE estimator Although the wrapper ado-file we have constructed (mygmmcue) only handles some of ivreg2 s options for the specification of the covariance matrix, it supports all of the features of a Stata estimation command: the results are accessible after estimation, and standard post-estimation commands may be used. Most importantly, the Mata-based routine runs from 12 to 20 times faster than the ml-based routine. This suggests that investment in Mata s optimize facilities may pay sizable dividends for anyone with nonlinear optimization problems. Example: Mata-based estimation of GMM-CUE Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
86 Shameless advert t. d la la is r t. at sl n t it e it d BAUM AN INTRODUCTION TO STATA PROGRAMMING An Introduction to Stata Programming CHRISTOPHER F. BAUM A concise introduction to Mata s facilities and a number of cookbook examples of its use are provided in chapters of my book An Introduction to Stata Programming, to be published this summer by Stata Press. Christopher F Baum (Boston College, DIW) IV techniques in economics and finance DESUG, Berlin, June / 49
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
Evaluating one-way and two-way cluster-robust covariance matrix estimates
Evaluating one-way and two-way cluster-robust covariance matrix estimates Christopher F Baum 1 Austin Nichols 2 Mark E Schaffer 3 1 Boston College and DIW Berlin 2 Urban Institute 3 Heriot Watt University
Weak instruments: An overview and new techniques
Weak instruments: An overview and new techniques July 24, 2006 Overview of IV IV Methods and Formulae IV Assumptions and Problems Why Use IV? Instrumental variables, often abbreviated IV, refers to an
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Enhanced routines for instrumental variables/gmm estimation and testing
The Stata Journal (yyyy) vv, Number ii, pp. 1 38 Enhanced routines for instrumental variables/gmm estimation and testing Christopher F. Baum Mark E. Schaffer Boston College Heriot Watt University Steven
ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
Clustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random
Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions
Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
Dynamic Panel Data estimators
Dynamic Panel Data estimators Christopher F Baum EC 823: Applied Econometrics Boston College, Spring 2013 Christopher F Baum (BC / DIW) Dynamic Panel Data estimators Boston College, Spring 2013 1 / 50
Lecture 15. Endogeneity & Instrumental Variable Estimation
Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental
Econometric analysis of the Belgian car market
Econometric analysis of the Belgian car market By: Prof. dr. D. Czarnitzki/ Ms. Céline Arts Tim Verheyden Introduction In contrast to typical examples from microeconomics textbooks on homogeneous goods
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
Solución del Examen Tipo: 1
Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical
Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:
Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable
Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
Panel Data Analysis in Stata
Panel Data Analysis in Stata Anton Parlow Lab session Econ710 UWM Econ Department??/??/2010 or in a S-Bahn in Berlin, you never know.. Our plan Introduction to Panel data Fixed vs. Random effects Testing
The following postestimation commands for time series are available for regress:
Title stata.com regress postestimation time series Postestimation tools for regress with time series Description Syntax for estat archlm Options for estat archlm Syntax for estat bgodfrey Options for estat
SYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
1. THE LINEAR MODEL WITH CLUSTER EFFECTS
What s New in Econometrics? NBER, Summer 2007 Lecture 8, Tuesday, July 31st, 2.00-3.00 pm Cluster and Stratified Sampling These notes consider estimation and inference with cluster samples and samples
Estimating the random coefficients logit model of demand using aggregate data
Estimating the random coefficients logit model of demand using aggregate data David Vincent Deloitte Economic Consulting London, UK [email protected] September 14, 2012 Introduction Estimation
Note 2 to Computer class: Standard mis-specification tests
Note 2 to Computer class: Standard mis-specification tests Ragnar Nymoen September 2, 2013 1 Why mis-specification testing of econometric models? As econometricians we must relate to the fact that the
1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD
REPUBLIC OF SOUTH AFRICA GOVERNMENT-WIDE MONITORING & IMPACT EVALUATION SEMINAR IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD SHAHID KHANDKER World Bank June 2006 ORGANIZED BY THE WORLD BANK AFRICA IMPACT
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
Powerful new tools for time series analysis
Christopher F Baum Boston College & DIW August 2007 hristopher F Baum ( Boston College & DIW) NASUG2007 1 / 26 Introduction This presentation discusses two recent developments in time series analysis by
Fixed Effects Bias in Panel Data Estimators
DISCUSSION PAPER SERIES IZA DP No. 3487 Fixed Effects Bias in Panel Data Estimators Hielke Buddelmeyer Paul H. Jensen Umut Oguzoglu Elizabeth Webster May 2008 Forschungsinstitut zur Zukunft der Arbeit
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
Correlated Random Effects Panel Data Models
INTRODUCTION AND LINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. The Linear
Panel Data: Linear Models
Panel Data: Linear Models Laura Magazzini University of Verona [email protected] http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Panel Data: Linear Models 1 / 45 Introduction Outline What
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem
Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
An Introduction to Path Analysis. nach 3
An Introduction to Path Analysis Developed by Sewall Wright, path analysis is a method employed to determine whether or not a multivariate set of nonexperimental data fits well with a particular (a priori)
Chapter 2. Dynamic panel data models
Chapter 2. Dynamic panel data models Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans Université d Orléans April 2010 Introduction De nition We now consider
Chapter 3: The Multiple Linear Regression Model
Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics
From the help desk: Swamy s random-coefficients model
The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients
Applications of Structural Equation Modeling in Social Sciences Research
American International Journal of Contemporary Research Vol. 4 No. 1; January 2014 Applications of Structural Equation Modeling in Social Sciences Research Jackson de Carvalho, PhD Assistant Professor
On Marginal Effects in Semiparametric Censored Regression Models
On Marginal Effects in Semiparametric Censored Regression Models Bo E. Honoré September 3, 2008 Introduction It is often argued that estimation of semiparametric censored regression models such as the
Online Appendices to the Corporate Propensity to Save
Online Appendices to the Corporate Propensity to Save Appendix A: Monte Carlo Experiments In order to allay skepticism of empirical results that have been produced by unusual estimators on fairly small
Clustered Errors in Stata
10 Sept 2007 Clustering of Errors Cluster-Robust Standard Errors More Dimensions A Seemingly Unrelated Topic Clustered Errors Suppose we have a regression model like Y it = X it β + u i + e it where the
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
A Subset-Continuous-Updating Transformation on GMM Estimators for Dynamic Panel Data Models
Article A Subset-Continuous-Updating Transformation on GMM Estimators for Dynamic Panel Data Models Richard A. Ashley 1, and Xiaojin Sun 2,, 1 Department of Economics, Virginia Tech, Blacksburg, VA 24060;
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)
Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation
Introduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA
PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA ABSTRACT The decision of whether to use PLS instead of a covariance
Introduction to Path Analysis
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
Association Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
FACTOR ANALYSIS NASC
FACTOR ANALYSIS NASC Factor Analysis A data reduction technique designed to represent a wide range of attributes on a smaller number of dimensions. Aim is to identify groups of variables which are relatively
The Power of the KPSS Test for Cointegration when Residuals are Fractionally Integrated
The Power of the KPSS Test for Cointegration when Residuals are Fractionally Integrated Philipp Sibbertsen 1 Walter Krämer 2 Diskussionspapier 318 ISNN 0949-9962 Abstract: We show that the power of the
UNIVERSITY OF WAIKATO. Hamilton New Zealand
UNIVERSITY OF WAIKATO Hamilton New Zealand Can We Trust Cluster-Corrected Standard Errors? An Application of Spatial Autocorrelation with Exact Locations Known John Gibson University of Waikato Bonggeun
II. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
171:290 Model Selection Lecture II: The Akaike Information Criterion
171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information
An introduction to GMM estimation using Stata
An introduction to GMM estimation using Stata David M. Drukker StataCorp German Stata Users Group Berlin June 2010 1 / 29 Outline 1 A quick introduction to GMM 2 Using the gmm command 2 / 29 A quick introduction
Chapter 4: Statistical Hypothesis Testing
Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin
Implementations of tests on the exogeneity of selected. variables and their Performance in practice ACADEMISCH PROEFSCHRIFT
Implementations of tests on the exogeneity of selected variables and their Performance in practice ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag
The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.
The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables Kathleen M. Lang* Boston College and Peter Gottschalk Boston College Abstract We derive the efficiency loss
Chapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
The Cobb-Douglas Production Function
171 10 The Cobb-Douglas Production Function This chapter describes in detail the most famous of all production functions used to represent production processes both in and out of agriculture. First used
ESTIMATING AN ECONOMIC MODEL OF CRIME USING PANEL DATA FROM NORTH CAROLINA BADI H. BALTAGI*
JOURNAL OF APPLIED ECONOMETRICS J. Appl. Econ. 21: 543 547 (2006) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/jae.861 ESTIMATING AN ECONOMIC MODEL OF CRIME USING PANEL
Estimating Industry Multiples
Estimating Industry Multiples Malcolm Baker * Harvard University Richard S. Ruback Harvard University First Draft: May 1999 Rev. June 11, 1999 Abstract We analyze industry multiples for the S&P 500 in
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
Monte Carlo Simulation in Stata
Christopher F Baum Faculty Micro Resource Center Boston College July 2007 Christopher F Baum (Boston College FMRC) Monte Carlo Simulation in Stata July 2007 1 / 23 Introduction We often want to evaluate
Introduction to Regression Models for Panel Data Analysis. Indiana University Workshop in Methods October 7, 2011. Professor Patricia A.
Introduction to Regression Models for Panel Data Analysis Indiana University Workshop in Methods October 7, 2011 Professor Patricia A. McManus Panel Data Analysis October 2011 What are Panel Data? Panel
Indices of Model Fit STRUCTURAL EQUATION MODELING 2013
Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:
Non Parametric Inference
Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable
Testing for Granger causality between stock prices and economic growth
MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted
Regression Analysis (Spring, 2000)
Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity
Module 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
Overview of Factor Analysis
Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,
Performing Unit Root Tests in EViews. Unit Root Testing
Página 1 de 12 Unit Root Testing The theory behind ARMA estimation is based on stationary time series. A series is said to be (weakly or covariance) stationary if the mean and autocovariances of the series
Lecture 3: Differences-in-Differences
Lecture 3: Differences-in-Differences Fabian Waldinger Waldinger () 1 / 55 Topics Covered in Lecture 1 Review of fixed effects regression models. 2 Differences-in-Differences Basics: Card & Krueger (1994).
Mgmt 469. Regression Basics. You have all had some training in statistics and regression analysis. Still, it is useful to review
Mgmt 469 Regression Basics You have all had some training in statistics and regression analysis. Still, it is useful to review some basic stuff. In this note I cover the following material: What is a regression
FULLY MODIFIED OLS FOR HETEROGENEOUS COINTEGRATED PANELS
FULLY MODIFIED OLS FOR HEEROGENEOUS COINEGRAED PANELS Peter Pedroni ABSRAC his chapter uses fully modified OLS principles to develop new methods for estimating and testing hypotheses for cointegrating
PS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng [email protected] The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
Examining the effects of exchange rates on Australian domestic tourism demand: A panel generalized least squares approach
19th International Congress on Modelling and Simulation, Perth, Australia, 12 16 December 2011 http://mssanz.org.au/modsim2011 Examining the effects of exchange rates on Australian domestic tourism demand:
E 4101/5101 Lecture 8: Exogeneity
E 4101/5101 Lecture 8: Exogeneity Ragnar Nymoen 17 March 2011 Introduction I Main references: Davidson and MacKinnon, Ch 8.1-8,7, since tests of (weak) exogeneity build on the theory of IV-estimation Ch
Two Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
The correlation coefficient
The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative
Imputing Missing Data using SAS
ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are
The Effect of Housing on Portfolio Choice. July 2009
The Effect of Housing on Portfolio Choice Raj Chetty Harvard Univ. Adam Szeidl UC-Berkeley July 2009 Introduction How does homeownership affect financial portfolios? Linkages between housing and financial
Testing The Quantity Theory of Money in Greece: A Note
ERC Working Paper in Economic 03/10 November 2003 Testing The Quantity Theory of Money in Greece: A Note Erdal Özmen Department of Economics Middle East Technical University Ankara 06531, Turkey [email protected]
Sample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS
ON THE ROBUSTNESS OF FIXED EFFECTS AND RELATED ESTIMATORS IN CORRELATED RANDOM COEFFICIENT PANEL DATA MODELS Jeffrey M. Wooldridge THE INSTITUTE FOR FISCAL STUDIES DEPARTMENT OF ECONOMICS, UCL cemmap working
Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances
METRON - International Journal of Statistics 2008, vol. LXVI, n. 3, pp. 285-298 SHALABH HELGE TOUTENBURG CHRISTIAN HEUMANN Mean squared error matrix comparison of least aquares and Stein-rule estimators
Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?*
Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?* Jennjou Chen and Tsui-Fang Lin Abstract With the increasing popularity of information technology in higher education, it has
QUANTITATIVE EMPIRICAL ANALYSIS IN STRATEGIC MANAGEMENT
Strategic Management Journal Strat. Mgmt. J., 35: 949 953 (2014) Published online EarlyView 11 May 2014 in Wiley Online Library (wileyonlinelibrary.com).2278 Received 11 April 2014; Final revision received
