In this chapter, we aim to answer the following questions: 1. What is the nature of heteroskedasticity?

Size: px
Start display at page:

Download "In this chapter, we aim to answer the following questions: 1. What is the nature of heteroskedasticity?"

Transcription

1 Lecture 9 Heteroskedasticity In this chapter, we aim to answer the following questions: 1. What is the nature of heteroskedasticity? 2. What are its consequences? 3. how does one detect it? 4. What are the remedial measures? 9.1 The Nature of Heteroskedasticity Homoskedasticity (homoskedasticity): the classical regression model assumes that residuals ε i were identically distributed with mean zero and equal variance σ 2 (i.e., E(ε i X i ) = 0, and Var(ε i X i ) = σ 2, where X i means {X i2,, X ik }, for i = 1, 2,, n) Because the variance is a measure of dispersion of the observed value of the dependent variable (y around the regression line β 1 + β 2 X β k X k ), homoskedasticity means that the dispersion is the same across all observations. However, in many situations, this assumption might be false. Example 1. Take the sample of household consumption expenditure and income. Since household with low income do not have much flexibility in spending, consumption patterns among such low-income households may not vary very much. On the other hand, rich families have 1

2 2 LECTURE 9 HETEROSKEDASTICITY a great deal of flexibility in spending. Some might be large consumers; others might be large savers and investors in financial markets. This implies that actual consumption might be quite different from average consumption. Therefore, it is very likely that higher income households have a large dispersion around mean consumption than lower income households. Such situation is called heteroskedasticity. Example 2. The annual salary and the number of years since earning the Ph.D. for 222 professors from seven universities. (Example 8.1, Ramanathan (2002)) Look at the scatter diagram of log salary and years since Ph.D. [Figure 8.2] The spread around an average straight-line relation is not uniform. It violates the usual assumption of homoskedasticity of the error terms. Heteroskedasticity arises also when one uses grouped data rather than individual data. Heteroskedasticity can occur in time series data also. Let s relax the assumption that the residual variance is constant across observations and assume heteroskedasticity instead Assume ε i is a random variable with E(ε i X i ) = 0 and Var(ε i X t ) = E(ε 2 i X i ) = σ 2 i, for i = 1,, n. It implies each observation has a different error variance. ASSUMPTION A4 ε i is a random variable with E(ε i X i ) = 0 and Var(ε i X i ) = E(ε 2 i X i ) = σi 2, for i = 1,, n. Thus, σ E[εε ] = σ 2 0 σ 2 Ω = σn 2 It will sometimes be useful to write σ 2 i = σ 2 ω i For convenience, we shall use the normalization n tr(ω) = ω i = n i=1

3 3 LECTURE 9 HETEROSKEDASTICITY 9.2 Consequences of Ignoring Heteroskedasticity y i = β 1 + β 2 X i2 + + β k X ik + ε i where Var(ε i X i ) = σ 2 i for i = 1,, n. That is, the error variances are different for different values of i and are unknown. In the presence of heteroskedasticity, the OLS estimator b is still unbiased, consistentm and asymptotically normally distributed Effects on the properties of the OLS estimators If one ignores heteroskedasticity and uses OLS estimators to estimate β s, the properties of unbiasedness and consistency still are not violated. But OLS estimate is no more efficient. It is possible to find an alternative unbiased linear estimate that has a lower variance than OLS estimate. Inefficiency of OLS estimators: Var[ε i ] = σ 2 ω i y i = βx i + ε i Assume that y i and x i are measured as deviations from means, so E[y i ] = E[x i ] = 0. Let x denote the column of vector of n observations on x i, and Ω be defined as σ σ 2 Ω = σn 2 The variance of the OLS and GLS estimators of β are Var[b] = σ 2 (x x) 1 x Ωx(x x) 1 = σ2 n i=1 x 2 i ω i ( n i=1 x 2 i ) 2 Var[ ˆβ] = σ 2 [x Ω 1 x] 1 = σ 2 ni=1 (x 2 i /ω i )

4 4 LECTURE 9 HETEROSKEDASTICITY We can have that Var[b] (1/n) ni=1 zi 2 = Var[ ˆβ] z 2 = z2 + (1/n) ni=1 (z i z) 2 = 1 + Var[x2 i ] z 2 (E[x 2 i ]) > 1 2 where ω i = x 2 i = z i. It shows that the gain in efficiency from GLS over OLS can be substantial Effects on the tests of hypotheses The estimated variances and covariances of the OLS estimates of the β s are biased and inconsistent when heteroskedasticity is present but ignored [see Kmenta (1986), pp ]. Therefore, the tests of hypotheses are no longer valid Effects on forecasting Forecasts based on OLS estimated will also be unbiased. But forecasts are inefficient (because the estimates are inefficient). PROPERTIES of OLS with Heteroskedastic Errors: 1. The estimates and forecasts based on OLS will still be unbiased and consistent. 2. The OLS estimates are no longer BLUE and will be inefficient. Forecasts will also be inefficient. 3. The estimated variances and covariances of the regression coefficients will be biased and inconsistent, and hence tests of hypotheses are invalid The Estimated Covariance Matrix of b The conventionally estimated covariance matrix for the OLS estimator σ 2 (X X) 1 is inappropriate; the appropriate matrix is σ 2 (X X) 1 (X ΩX)(X X) 1. The error resulted from the conventional estimator is shown below. We know s 2 = e e n K = ε Mε n K

5 5 LECTURE 9 HETEROSKEDASTICITY where Thus, we can obtain M = I X(X X) 1 X s 2 = e e n K ε X(X X) 1 X ε n K Taking the expectation for the two parts separately, E [ e ] e = tr[e[ε ε]] n K n K = nσ2 n K and [ ε X(X X) 1 X ] ε E = tr[e[(x X) 1 X ε εx]] n K n K [ tr σ ( ) 2 X 1 ( ) ] X X ΩX n n = n K ( = nσ2 n K tr X ) 1 X Q n n As n, and Therefore, This implies that: E [ e ] e σ 2 n K [ ε X(X X) 1 X ] ε E 0 n K if b is consistent If b is consistent, then lim n E[s 2 ] = σ 2 If plimb = β, then plims 2 = σ 2 The difference between the conventional estimator and the appropriate covariance matrix for b is Est.Var[b] Var[b] = s 2 (X X) 1 σ 2 (X X) 1 (X ΩX)(X X) 1

6 6 LECTURE 9 HETEROSKEDASTICITY Estimating The Appropriate Covariance Matrix for OLS Σ = 1 n σ2 X ΩX = 1 n σi 2 x i x i n i=1 White (1980) Shows that under very general conditions, the matrix S 0 = 1 n e 2 i x i x i n i=1 where e i denotes the ith least squares residual, is a consistent estimator of Σ. Therefore, the White estimator, Est.Var[b] = n(x X) 1 S 0 (X X) 1 can be used as an estimate of the true variance of the least squares estimator. 9.3 Testing for Heteroskedasticity Scatter diagram of squared residuals before actually carrying out any formal tests of heteroskedasticity, it is useful to examine the model s residuals visually. plot the squares of the residuals obtained by applying OLS to the model (i.e., e 2 i, i = 1,, n) Graph the squared estimated residuals against a variable that is suspected to the cause of heteroskedasticity. If the model has several explanatory variables, we can graph e 2 i against each of these variables, or graph it against Ŷi, the fitted value of the dependent variable. This graphing technique is only suggestive of heteroskedasticity and is not a substitute for formal testing. Gujarati (2003) Figure 10.7

7 7 LECTURE 9 HETEROSKEDASTICITY Goldfeld-Quandt Test Goldfeld and Quandt (1965): based on the ratio of variances. Idea: If the error variances are equal across observations (i.e., homoskedastic), then the variance for one part of the sample will be the same as the variance for another part of the sample. one can test for the equality of error variances using a F -test on the ratio of two variances. Divide the sample of observations into three parts, then discard the middle observations. Estimate the model for each of the two other sets of observations and compute the corresponding residual variances. Use an F -test to test for the equality of these two variances. F [n 1 K, n 2 K] = e 1e 1 /(n 1 K) e 2e 2 /(n 2 K) Formal steps for Goldfeld-Quandt test: Step 1: Identify a variable (Z) to which the error variance σi 2 is related. Suppose that σi 2 is suspected of being positively related to Z i. Arrange the data set according to increasing values of Z i (Z i could be one of the Xs in the regression. For example, σi 2 = σ 2 x 2 i for some variable x) Step 2: Divide the sample of n observations into the first n 1 and the last n 2, thus omitting the middle observations n through n n 2. The number of observations to be omitted is arbitrary and is usually between one-sixth and one-third. Note that n 1 and n 2 must be greater than the number of coefficients to be estimated. Step 3: Estimate separate regressions for observations 1 through n 1 n n through n. and Step 4: Obtain th error sum of squares as follows: n 1 n e 2 i i=1 i=n n 2 +1 SSR 1 = e 2 i and SSR 2 =

8 8 LECTURE 9 HETEROSKEDASTICITY Under H 0 : homoskedasticity, we have that SSR/σ 2 has the χ 2 distribution. Also we know that the ratio of two independent χ 2 -distributed random variables is F -distributed. Therefore, the GQ statistic is computed as follows: Step 5: Compute GQ = ˆσ2 2 = SSR 2/(n 2 k) ˆσ 1 2 SSR 1 /(n 1 k) where k is the number of regression coefficients including the constant term. Under the null hypothesis of homoskedasticity, GQ F n2 k,n 1 k. If the disturbances are normally distributed, the the GQ statistic is exactly F -distributed under the null hypothesis. if GQ > Fα% then reject the null of homoskedasticity and conclude that heteroskedasticity is present, where α% is the significance level Testing Homoskedasticity by the Lagrange Multiplier (LM) Tests y i = β 1 + β 2 X i2 + + β k X ik + ε i We are interested to test if assumptions A4 is true, therefore we are to test the null hypothesis H 0 : Var(ε i X i ) = σ 2 Because u is assumed to have a zero conditional mean, E(ε i X i ) = 0, Var(ε i X i ) = E(ε 2 i X i ) and so the null hypothesis of homoskedasticity is equivalent to H 0 : E(ε 2 i X i ) = σ 2 A simple approach is to assume a linear function for σ 2 i : σ 2 i = α 1 + α 2 Z i2 + + α p Z ip where the error variance σ 2 i is related to a number of variables Z 2,, Z p. Then the null hypothesis can be written as H 0 : α 2 = = α p = 0 F -statistic for testing α 2 = = α p = 0

9 9 LECTURE 9 HETEROSKEDASTICITY Step 1: Regress y against a constant term, x 2,, x k, and obtain the estimated residuals e i for i = 1,, n. Step 2: Regress e i against a constant term, Z 2,, Z p, and obtain OLS estimators ˆα 1, ˆα 2,, ˆα p ). Denote the corresponding R-squared as Re 2. 2 F = R2 e 2 /(p 1) 1 R 2 e 2 /(n k) Under H 0, this F statistic has an F p 1,n k distribution. LM-test for testing α 2 = = α p = 0: LM = n R 2 e 2 Under H 0, this LM statistic has an χ 2 p 1 distribution. The LM statistic of the test is actually called the Breusch-Pagan test for heteroskedasticity (BP test) A Simple Example of Breusch-Pagan Test Breusch and Pagan (1980): based on the Lagrange multiplier test principle. If the error variance σi 2 is not constant but is related to a number of variables Z 2,, Z p (some or all of which might be the Xs in the model). The simplest example is assuming that Z s are X s. Hence, the model becomes y i = β 1 + β 2 X i2 + + β k X ik + ε i σ 2 i = α 1 + α 2 X i2 + + α k X ik Under H 0 : α 2 = = α k = 0, the variance is a constant, indicating homoskedasticity. Breusch-Pagan test is aversion of LM statistic for the hypothesis α 2 = = α k = 0. The LM test consists of an auxiliary regression and using it to construct a test statistic.

10 10 LECTURE 9 HETEROSKEDASTICITY Step 1: Estimate y i = β 1 +β 2 x i2 + +β k x ik +ε i by OLS and compute e i = y i ˆβ 1 ˆβ 2 X i2 ˆβ e k X ik and ˆσ 2 2 = i n Step 2: e 2 i is an estimate of the error variance σ 2 i. If σ 2 i = α 1 +α 2 X i2 + +α k X ik were valid, one would expect e 2 i to be related to the xs. Run the regression of e 2 i against a constant term, X i2,, X ik, and compute the LM statistic: n R 2 e 2. This LM statistic is distributed χ 2 k under H 0. Breusch-Pagan has been shown to be sensitive to any violation of the normality assumption. Breusch-Pagan test also requires a prior knowledge of what might be causing the heteroskedasticity. Auxiliary equations for error variance, E(ε 2 i X i ): which is equivalent to σ 2 i = α 1 + α 2 Z i2 + + α p Z ip (9.1) σ i = α 1 + α 2 Z i2 + + α p Z ip (9.2) ln(σ 2 i ) = α 1 + α 2 Z i2 + + α p Z ip (9.3) σ 2 i = exp(α 1 + α 2 Z i2 + + α p Z ip ) where exp means the exponential function, p is the number of unknown coefficients, and the Zs are variables with known values. (some or all of the Zs might be the Xs in the model) 1. Breusch-Pagan test (Breusch and Pagan, 1979): use the formulation (9.1) 2. Glesjet test (Glesjer, 1969): use the formulation (9.2) 3. Harvey-Godfrey test (Harvey, 1976, and Godfrey, 1978): use the formulation (9.3) Because we do not know σ i, we use estimates obtained by applying OLS to y i = β 1 + β 2 X i2 + + β k X ik + ε i to obtain the estimated residuals e i. Then, use e 2 i ln(e 2 i ) for ln(σi 2 ). for σ 2 i, e i for σ i, and

11 11 LECTURE 9 HETEROSKEDASTICITY White s Test The Goldfeld-Quandt test is not as useful as the LM tests because it can not accommodate situations where several variables jointly cause heteroskedasticity, as in Equations (9.1), (9.2), and (9.3). By discarding the middle observations, we will throw away valuable information. The Breusch-Pagan test has be shown to be sensitive to any violation of the normality assumption. All the previous tests require a prior knowledge of what might be causing the heteroskedasticity. White (1980): a direct test for the heteroskedasticity that is very closely related to the Breusch-Pagan test but does not assume any prior knowledge of the heteroskedasticity White s test is a large sample LM test with a particular choice for the Z s, but it does not depend on the normality assumption. y i = β 1 + β 2 X i2 + β 3 X i3 + ε i σ 2 i = α 1 + α 2 X i2 + α 3 X i3 + α 4 X 2 i2 + α 5 X 2 i3 + α 6 X i2 X i3 Step 1: Regress y against a constant term, X 2, and X 3, and obtain estimated residuals e i = y i ˆβ 1 ˆβ 2 X i2 ˆβ 3 X i3. Step 2: Regress e 2 i against a constant term, X i2, X i3, Xi2, 2 Xi3 2 and X i2 X i3. Compute n Re 2, where n is the size of the sample and 2 Re 2 is the unadjusted R-squared from the auxiliary regression of 2 e 2. Step 3: Reject H 0 : α 2 = α 3 = α 4 = α 5 = α 6 = 0 if nr 2 e 2 > χ 2 5(0.05), the upper 5 percent point on the χ 2 distribution with 5 d.f. Although White s test is a large sample test, it has been found useful in samples of 30 or more. If the null is not rejected, it implies that the residuals are homoskedastic.

12 12 LECTURE 9 HETEROSKEDASTICITY If some of the explanatory variables are dummy variables, say X i2 is a dummy variable, then X 2 i2 = X i2 and hence should not be included separately, as otherwise there will be exact multicollinearity and the auxiliary regression cannot be run. With k explanatory variables (including the constant term), the auxiliary regression will have k(k + 1)/2 terms, excluding the constant term. The number of observations must be larger than that, and hence n > k(k + 1)/2 is a necessary condition. 9.4 Estimation Procedures If the assumption of homoskedasticity is rejected, we have to find alternative estimation procedures that are superior to OLS. The following are several approaches to estimations Heteroskedasticity Consistent Covariance Matrix (HCCM) Estimation Heteroskedasticity-Robust Inference: White (1980) proposes a method of obtaining consistent estimators of the variance and covariances of OLS estimators, called as the HCCM estimator, Generalized (or Weighted) Least Squares When Ω is Known Before development of heteroskedasticity-robust statistics from White (1980), the usual solution to problem of heteroskedasticity is to model and estimate the specific form of heteroskedasticity. This leads to a more efficient estimator than OLS, and it produces t and F statistics that have y and F distributions. However, such approach suffers the problem of unknown nature of the heteroskedasticity. ˆβ = (X Ω 1 X ) 1 X Ω 1 y Example 1: Consider the most general case, Var[ε i ] = σ 2 i = σ 2 ω i

13 13 LECTURE 9 HETEROSKEDASTICITY Then ω ω Ω = ω n The GLS estimator is obtained by regressing Py on PX, where Py = y 1 / ω 1 y 2 / ω 2. y n / ω n x 1 / ω 1 x 2 / ω 2 PX =. x n / ω n Applying the OLS to the transformed model, we obtain the weighted least squares (WLS) estimator where w i = 1/ω i. [ n ] 1 [ n ] ˆβ = w i x i x i w i x i y i i=1 i=1 Example 2: Consider σ 2 i = σ 2 x 2 ik Then the transformed model for GLS is ( ) ( ) y x1 x2 = β k + β 1 + β 2 x k x k x k where β k = β 0 /x k. + + ε x k Example 3: If the variance is proportional to x k instead of x 2 k, σi 2 = σ 2 x ik. Then the transformed model for GLS is ( ) ( ) y x1 x2 = β k + β 1 + β ε x k xk xk xk where β k = β 0 / x k.

14 14 LECTURE 9 HETEROSKEDASTICITY Weighted Least Squares (WLS) The Heteroskedasticity if Known up to a Multiplicative Constant: Let s specify the form of heteroskedasticity as Var(ε i X i2,, X ik ) = σ 2 i = σ 2 Z 2 i or equivalently σ i = σz i where the values of Z i are known for all i. Z i could be h(x i2,, X ik ), some function of the explanatory variables that determines the heteroskedasticity. Write Var(ε i X i ) = σi 2 = σ 2 Zi 2, where X i denotes all independent variables for observation i, and Z i changes across observations. Suppose the original equation is y i = β 1 + β 2 X i2 + + β k X ik + ε i and assumptions A1-A3 are satisfied. Since Var(ε i X i ) = E(ε 2 i X i ) = σ 2 Zi 2, ( ( ) ) εi 2 Xi E = E(ε 2 i X i )/Zi 2 = (σ 2 Zi 2 )/Zi 2 = σ 2 Z i We divide the above equation by Z i to get or y i = β 1 X i2 X ik + β β k Z i Z i Z i Z i + ε i Z i y i = β 1 X i1 + β 2 X i2 + + β k X ik + ε i where Xi1 = 1/Z i and Xij = X ij /Z i for j = 2,, k. Hence, OLS estimators obtained by regressing yi against Xi1, Xi2,, Xik will be BLUE. These estimators, β1, β2,, βk, will be different from the OLS estimators in the original equation. The βj are examples of generalized least squares (GLS) estimators. Summary on Weighted Least Squares (WLS): Define w i = 1/σ i, rewrite the original equation as w i y i = β 1 w i + β 2 (w i X i2 ) + + β k (w i X ik ) + (w i ε i ) Minimize the weighted sum of squares of residuals: (wi ε i ) 2 = (w i y i β 1 w i β 2 w i X i2 β k w i X ik ) 2 Remarks:

15 15 LECTURE 9 HETEROSKEDASTICITY Observation for which σ 2 i is large are given less weights in WLS. Resulting estimators are identical to those obtained by applying OLS to equation of y and X s. Multiplicative Heteroskedasticity with Known Proportional Factor: Assume the heteroskedasticity is such that the residual standard deviation σ i is proportional to some known variable z i Var(ε i X i ) = σ 2 i = σ 2 Z 2 i or equivalently σ i = σz i divide every term in the original equation by Z i or We have y i 1 X i2 X ik = β 1 + β β k Z i Z i Z i Z i + ε i Z i y i = β 1 X i1 + β 2 X i2 + + β k X ik + ε i (9.4) ( Var(ε εi ) i X i ) = Var X i Z i = Var(ε i X i ) Z 2 i = σ 2 Estimates obtained by regressing yi BLUE (when σi 2 = Zi 2 σ 2 ). against X i1, X i2,, X ik will be This is the same as WLS with w i = 1/Z i. Because the GLS estimates are BLUE, OLS estimates of the original equation will be inefficient Estimated Generalized Least Squares (EGLS) or Feasible GLS (FGLS) As the structure of the heteroskedasticity is generally unknown (that is, Z i or σ i is unknown), one must first obtain estimates of σ i by some means and then use the weighted least squares procedure. This method is called estimated generalized least squares (EGLS). There are many ways to model heteroskedasticity, but we study one particular flexible approach.

16 16 LECTURE 9 HETEROSKEDASTICITY Assume that Var(u X) = σ 2 exp(δ 1 + δ 2 X δ k X k ) Once the parameters δ j were known, we may just apply WLS to obtain efficient estimators of β j. The best way is to use the data to estimate delta j, and then to use these estimates ˆδ j to construct weights. Following the setup of Var(u X), we can write u 2 = σ 2 exp(δ 1 + δ 1 X δ k X k )v where E(v X) = 1. If we assume that v is actually independent of X, we can write log(u 2 ) = δ 1 + δ 2 X δ k X k + e where E(e) = 0 and e is independent of X; the intercept in this equation is different from δ 0. Since this equation satisfies the Gauss-Markov assumptions, we can get unbiased estimators of the delta j by using OLS. Now, replace the unobserved u with the OLS residuals. Next, run the regression of log(e 2 i ) on a constant, X 2,, X k. Call the fitted values as ĝ i. So the estimates of σi 2 are simply ˆσ 2 i = exp(ĝ i ) So we may use WLS with weights 1/ˆσ i. SUMMARY of the FGLS Procedure to Correct for Heteroskedasticity: 1. Run the regression of y on a constant, X 2,, X k and obtain the residuals, e i, i = 1,, n. 2. Create log(e i ) by first squaring the OLS residuals and then taking the natural log. 3. Run the regression of log(e 2 i ) on a constant, X 2,, X k, and obtain the fitted value, ĝ i. 4. Exponential the fitted values, ĝ i, to obtain ˆσ 2 i = ĝ i 5. Estimate the equation by WLS, using weights 1/ˆσ i y = β 1 + β 2 X β k X k + u

17 17 LECTURE 9 HETEROSKEDASTICITY 9.5 Linear Probability Model (LPM) Nature of LPM y = β 1 + β 2 X β k X k + u E(y X) E(y X 2,, X k ) = β 1 + β 2 X β k X k where y is a binary variable. It results P(y = 1 X 2,, X k ) = β 1 + β 2 X β k X k More important, the linear probability model does violate the assumption of homoskedasticity. When y is a binary variable, we have Var(y X) = p(x)[1 p(x)] where p(x) denotes the probability success: p(x) = β 1 + β 2 X β k X k. This indicates there exists heteroskedasticity in LPM model. It implies the OLS estimators are inefficient in the LPM. Hence we have to correct for heteroskedasticity for estimating the LPM if we want to have a more efficient estimator that OLS in LPM. Procedures of Estimating the LPM: 1. Obtain the OLS estimators of the LPM at first. 2. Determine whether all of the OLS fitted values, ŷ i, satisfy 0 < ŷ i < 1. If so, proceed to step (3). If not, some adjustment is needed to bring all fitted values into the unit interval. 3. Construct the estimator of σ 2 i : ˆσ 2 i = ŷ i (1 ŷ i ) 4. Apply WLS to estimate the equation using weights w i = 1/ˆσ i. y = β 1 + β 2 X β k X k + u

18 18 LECTURE 9 HETEROSKEDASTICITY Maximum Likelihood Estimation (MLE) Use MLE method to estimate β, the collection of β 0, β 1,, β k, σ 2, and Ω(θ), the variance matrix of parameters, at the same time. Let Γ estimate Ω 1. Then, the log likelihood can be written as: log L = n 2 First Order Condition (FOC): ( ) log(2π) + log σ 2 1 2σ 2 ε Γε + 1 log Γ 2 log L β log L σ 2 = 1 σ 2 X Γ(Y Xβ) = n 2σ σ (Y 4 Xβ) Γ(Y Xβ) = 1 [Γ 1 ( 1 ] 2 σ 2 )εε = 1 2σ 2 (σ2 Ω εε ) 9.6 Heteroskedasticity-Robust Inference After OLS Estimation Since hypotheses tests and confidence interval with OLS are invalid in the presence of heteroskedasticity, we must make decide if we entirely abandon OLS or reformulate the adequate corresponding test statistics or confidence intervals. For the later options, we have to adjust standard errors, t, F, and LM statistics so that they are valid in the presence of heteroskedasticity of unknown form. Such procedure is called heteroskedasticity-robust inference and it is valid in large samples How to estimate the variance, Var( ˆβ j ), in the presence of heteroskedasticity Consider the simple regression model, y i = β 0 + β 1 x i + ε i Assume assumptions A1-A3 are satisfied. If the errors are heteroskedastic, then Var(ε i x i ) = σ 2 i

19 19 LECTURE 9 HETEROSKEDASTICITY The OLS estimator can be written as ˆβ 1 = β 1 + ni=1 (x i x)ε i ni=1 (x i x) 2 and we have Var( ˆβ ni=1 (x i x) 2 σi 2 1 ) = SST 2 x where SST x = n i=1 (x i x) 2 is the total sum of squares of the x i. Note: When σi 2 = σ 2 for all i, Var( ˆβ 1 ) reduces to the usual form, σ 2 /SST x. Regarding the way to estimate Var( ˆβ 1 ) in the presence of heteroskedasticity, White (1980) proposed a procedure which is valid in large samples. Let e i denote the OLS residuals from the initial regression of y on x. White (1980) suggested a valid estimator of Var( ˆβ 1 ) for heteroskedasticity of any form (including homoskedasticity), is ni=1 (x i x) 2 e 2 i SST 2 x Brief proof: (for complete proof, please refer to White (1980)) ni=1 (x i x) 2 e 2 i n SST 2 x p E[(x i µ x ) 2 ε 2 i ]/(σ 2 x) 2 n Var( ˆβ ni=1 (x i x) 2 σi 2 1 ) = n SST 2 x p ni=1 (x i x) 2 e 2 i SST 2 x Therefore, by the law of large number and the central limit theorem, we can use this estimator, Var( ˆβ ni=1 (x i x) 2 e 2 i 1 ) = SST 2 to construct confidence x intervals and t test. For the multiple regression model, y i = β 0 + β 1 x i1 + + β k x ik + ε i under assumptions A1-A3, the valid estimator of Var( ˆβ j ) is Var( ˆβ j ) = ni=1 r 2 ije 2 i SST 2 j where r ij denotes the ith residuals from regressing x j on all other independent variables (including an intercept), and SST j = n i=1 (x ij x j ) 2.

20 20 LECTURE 9 HETEROSKEDASTICITY REMARKS: The variance of the usual OLS estimator ˆβ j is Var( ˆβ σ 2 j ) = SST j (1 Rj), 2 for j = 1,..., k, where SST j = n i=1 (x ij x j ) 2, and Rj 2 is the R 2 from the regressing x j on all other independent variables (and including an intercept). The square root of Var( ˆβ j ) is called heteroskedasticity-robust standard error for ˆβ j. Once the heteroskedasticity-robust standard errors are obtained, we can then construct a heteroskedasticity-robust t statistic. estimate hypothesized value t = standard error References Greene, W. H., 2003, Econometric Analysis, 5th ed., Prentice Hall. Chapter 11. Gujarati, D. N., 2003, Basic Econometrics, 4th ed., McGraw-Hill. Chapter 10. Ramanathan, R., 2002, Introductory Econometrics with Applications, 5th ed., Harcourt College Publishers. Chapter 8. Ruud, P. A., 2000, An Introduction to Classical Econometric Theory, 1st ed., Oxford University Press. Chapter 18.

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

1 Another method of estimation: least squares

1 Another method of estimation: least squares 1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

Chapter 4: Statistical Hypothesis Testing

Chapter 4: Statistical Hypothesis Testing Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin

More information

Solución del Examen Tipo: 1

Solución del Examen Tipo: 1 Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical

More information

Chapter 3: The Multiple Linear Regression Model

Chapter 3: The Multiple Linear Regression Model Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

Regression Analysis (Spring, 2000)

Regression Analysis (Spring, 2000) Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

3.1 Least squares in matrix form

3.1 Least squares in matrix form 118 3 Multiple Regression 3.1 Least squares in matrix form E Uses Appendix A.2 A.4, A.6, A.7. 3.1.1 Introduction More than one explanatory variable In the foregoing chapter we considered the simple regression

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4 4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

CAPM, Arbitrage, and Linear Factor Models

CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, Linear Factor Models 1/ 41 Introduction We now assume all investors actually choose mean-variance e cient portfolios. By equating these investors

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Chapter 2. Dynamic panel data models Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans Université d Orléans April 2010 Introduction De nition We now consider

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.

The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College. The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables Kathleen M. Lang* Boston College and Peter Gottschalk Boston College Abstract We derive the efficiency loss

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES

ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES Advances in Information Mining ISSN: 0975 3265 & E-ISSN: 0975 9093, Vol. 3, Issue 1, 2011, pp-26-32 Available online at http://www.bioinfo.in/contents.php?id=32 ANALYSIS OF FACTOR BASED DATA MINING TECHNIQUES

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Testing for Granger causality between stock prices and economic growth

Testing for Granger causality between stock prices and economic growth MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

1 Teaching notes on GMM 1.

1 Teaching notes on GMM 1. Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Causal Forecasting Models

Causal Forecasting Models CTL.SC1x -Supply Chain & Logistics Fundamentals Causal Forecasting Models MIT Center for Transportation & Logistics Causal Models Used when demand is correlated with some known and measurable environmental

More information

From the help desk: Swamy s random-coefficients model

From the help desk: Swamy s random-coefficients model The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

Week 5: Multiple Linear Regression

Week 5: Multiple Linear Regression BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

More information

Department of Economics

Department of Economics Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Elements of econometrics

Elements of econometrics Elements of econometrics C. Dougherty EC2020 2014 Undergraduate study in Economics, Management, Finance and the Social Sciences This is an extract from a subject guide for an undergraduate course offered

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

More information

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo nadja.dreca@students.ius.edu.ba Abstract The analysis of a data set of observation for 10

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract

More information

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013 Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear

More information

State Space Time Series Analysis

State Space Time Series Analysis State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014. University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording

More information

Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

More information

ANOVA. February 12, 2015

ANOVA. February 12, 2015 ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1. **BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,

More information

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

More information

Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics

Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Lecture 8: Gamma regression

Lecture 8: Gamma regression Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

More information

Moderation. Moderation

Moderation. Moderation Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models: Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information