Economics 345 Applied Econometrics

Transcription

1 Economics 345 Applied Econometrics Lab 5: Hypothesis Testing in Multivariate Linear Regression II. Prof: Martin Farnham TAs: Joffré Leroux, Rebecca Wortzman, Yiying Yang Open EViews, and open the EViews workfile, bwght.wf1. These data, along with all of the data for this course, should be on the Network Drive sfgclients on uvic\storage (S:) Your computer should be mapped to this drive. Browse to this drive and to the folder \social sciences\economics\econ 345\Wooldridge Eviews Files\ Introduction: This week, we re going to add to our portfolio of hypothesis tests, by testing some non-zero null hypotheses, learning a bit about how to use p-values, and doing F-tests. 1) Some F-test preliminaries (your lecture notes may be useful here) i.a) What is the formula for the F-statistic, using sums of squared residuals? i.b) What is the formula for the F-statistic, using R-squared values? i.c) What is the intuition for the F-test? 2) Cigarette use and child birth weight. The following exercise is based on Example 4.9 in your text. We are going to examine the relationship between a woman s decision to smoke during pregnancy and the birth weight of her child. Low birth weight is considered an indicator of poor infant health, and can lead to health complications early in life. Load the dataset bwght.wf1, from the folder with Wooldridge datasets. ii.a) First, estimate the following model of child birth weight: bwght = β 0 + β 1 cigs + β 2 parity + β 3 faminc + β 4 motheduc + β 5 fatheduc + u The variables are defined as follows: bwght: birthweight in ounces

2 cigs: average number of cigarettes the mother smoked per day during pregnancy parity: birth order of child faminc: family income in thousands of dollars motheduc: years of schooling of mother fatheduc: years of schooling of father What sign would you expect on each of the coefficients in this model? For each, give a brief explanation. Do the signs you obtain match your expectations? Which don t? ii.b) Now let s perform the F-test proposed in Example 4.9. Formally, the test is set up as follows: :β 4 = 0,β 5 = 0 H 1 : At least one of these coefficients is non-zero. ii.c) Before we can proceed, we need to make sure the sample we work with will be the same for both our restricted and unrestricted models. Anytime you re-estimate a model omitting one or more variables, there s some chance your sample size will change. This is because one of the variables you omit may have some missing values. Observations that were tossed out by EViews due to missing data when you estimate an unrestricted model, may be automatically re-included by EViews once you estimate the restricted model. This causes the sample size to change. If you re trying to compare something like the SSR for two different models, you want the sample held constant for the comparison. I bring this up, because in this particular case that we re looking at, it turns out there are some missing data for fatheduc and motheduc. So the first thing we should do, to prevent the problem noted above, is to restrict our sample to observations for which values of fatheduc and motheduc are not missing. This will insure that our sample will remain constant when we compare the restricted and unrestricted models. To help with this, I ve created a dummy variable called no_educ that equals 1 for every observation that has missing data for fatheduc or motheduc. Let s limit our sample to observations for which no_educ=0. To do this, go to the Workfile window, and select Sample. In the IF condition window, type no_educ=0 Notice, that when you return to the Workfile window, the number of observations in the sample has dropped to Whew. Now we re ready to proceed with our correctly restricted sample. ii.d) First, we ll calculate the F-stat for the above hypothesis test. Then we ll see how to generate it in EViews. Write down the restricted model that is implied by this null hypothesis.

3 How many exclusion restrictions are implied by this null hypothesis? This gives you your value for q in calculating your F-stat. How many degrees of freedom are there in the unrestricted model? ii.d) Let s first calculate the F-stat for the above null hypothesis, using the formula that makes use of the SSR for each model. To do this, you need to estimate both the unrestricted and the restricted models, and find the SSR for each. This should be labeled Sum squared resid in your EViews output. Calculate the F-stat using this approach. To save time on the next step, record the R-squared value for each of the models you estimate. ii.e) Now calculate the F-stat for the same null, using the formula that makes use of the R-squared for each model. Your answer may differ slightly using the different methods, due to rounding error (they may also differ slightly from the value given in your text). But they should be quite close. ii.f) How is the F-stat that you ve calculated distributed? Recall that you state the distribution of the F-stat making reference to both numerator degrees of freedom and denominator degrees of freedom. Can you reject the null at the 5 percent significance level? To answer this, you will need to use Table G.3 in the back of your text, or the following online tabulation of the cdf for the F-statistic, to find the critical value of the F-distribution. Note that you will need to scroll down to the F-table and then make sure you pick the table that corresponds to an alpha of What is the critical value? Is your F-stat greater or less than this critical value? If it s greater, then you reject the null. If it s less, you fail to reject the null. ii.g) Now that you know you can perform the F-test by hand, let s do it the easy way. Let s ask EViews to do the F-test for us. To do this, first make sure to estimate the unrestricted model (so EViews has it in its memory). Then, in the Equation window, select View/Coefficient Tests/Wald-Coefficient Restrictions. This will give you a window that allows you to place restrictions on the coefficients in your model. The coefficients we re interested in restricting are the coefficients on motheduc and fatheduc, and we want to restrict their values to zero. The syntax for placing restrictions on coefficients is quite simple. In this case, we type in C(5)=0,C(6)=0 This just tells EViews that we want to restrict the fifth and sixth coefficients in the model to equal zero. Note that because EViews views the intercept as C(1), the coefficients on the fourth and fifth variables in the model are denoted as C(5) and C(6) respectively.

4 After issuing the above command, you will get a window with output in it. The first line of the output gives you your F-stat. Confirm that this number is close to what you previously calculated by hand. Notice that EViews gives you the df for the numerator and denominator, which allows you to quickly reference an F-table for the level of significance you re interested in. Wasn t that easy? Much less chance of slipping up on your arithmetic, when you let EViews calculate the thing for you. ii.h) Notice also, that EViews gives you a value labeled Probability. This is the p-value associated with your F-statistic. Since F-tables are somewhat cumbersome to consult, this is a very handy thing for EViews to give you. What does the p-value really tell you? It lets you know how far out in the F distribution your F-stat lies. To be precise, if you take (1-p)*100, which in this case is 76.2, this tells you what percentile your F-stat lies in, in the relevant F-distribution (in this case, the one with a df of (2, 1185)). Now, if you re choosing to perform your hypothesis test at the 5% significance level, you know this means that if you get an F-stat that lies at or above the 95 th percentile, you will reject the null. So when EViews automatically tells you that the p-value is , you can immediately think, Ah, so this F-stat lies at the 76.2 percentile, which is below the 95 percentile, hence I fail to reject the null at the 5 percent significance level. (Note that this interpretation doesn t always hold with the t-distribution, as it depends whether your alternative is one- or two-sided). Another way to interpret the p-value is that it tells you the lowest significance level at which you can reject the null. In this case, the lowest significance level at which you can reject the null is the 23.8% level. This is a very high significance level (corresponding to a very high probability of Type I error). Generally, social scientists don t conduct hypothesis tests at anything above the 10% level. So, given such a high p-value we re given a slam-dunk case of failing to reject the null. ii.i) What does it mean that we fail to reject the null? It means that the coefficient estimates on motheduc and fatheduc are jointly statistically insignificant. Another way of saying this is that mother s and father s education are jointly statistically insignificant determinants of child birth weight. Given that we ve found these to be jointly insignificant, we will probably want to omit them from the model, as their inclusion may increase the variance of the coefficient estimate on cigs, our key RHS variable of interest (see last week s lab for more detail on this problem). ii.j) Briefly confirm that last point in (ii.i) by checking that the standard error of the estimate of the coefficient on cigs is smaller when you omit motheduc and fatheduc. Given the joint insignificance of motheduc and fatheduc, are you concerned about the implications of their exclusion for omitted variables bias? ii.k) Looking at your estimates from the restricted model (having dropped motheduc and fatheduc), notice the p-value associated with the t-statistic for faminc. Keeping in mind that EViews assumes a null of zero and a 2-sided alternative, when calculating this p- value, what does this tell you about the statistical significance of the coefficient estimate

5 on faminc given a two-sided alternative to the null that the coefficient on faminc equals zero? Can you reject the null that the coefficient on faminc equals zero at the 5% level? What about the 1% level? Isn t that handy? No t-table needed (for the zero null)! Note that if you find this confusing, Figure 4.6 in your text may be helpful. 3) Does education from a junior college have a different impact on wages than education from a university? Cultural note: A junior college in the US is roughly equivalent to a 2-year college in Canada. Close your previous workfile and load the workfile twoyear.wf1 (from the Econ 345 folder). This follows the discussion on pages of your textbook. iii.a) Estimate the following model: log(wage) = β 0 + β 1 jc + β 2 univ + β 3 exper +u iii.b) Comment on the coefficient signs and magnitudes for jc and univ. Are the signs as expected? What do the magnitudes imply about the contribution of 1 extra year of junior college education to the wage? What do they imply about the contribution of 1 extra year of university to the wage? How do the relative magnitudes compare? Is this what you would expect? iii.c) At standard significance levels, would you say all the coefficients in this model are statistically significant? iii.d) Suppose a policymaker is interested in testing whether university education raises wages more than junior college education. They might set up the following hypothesis test: :β 1 = β 2 H 1 :β 1 < β 2 Write down the formula for the t-statistic for this null hypothesis. iii.e) Is the denominator of this t-statistic readily obtained from regular EViews output? iii.f) Here s one way to do this problem. Recall from lecture (and the text) that you can directly calculate t = ˆβ 1 ˆβ 2 se( ˆβ 1 ˆβ 2 )

6 but that obtaining the denominator can be a bit tricky. The expression for the denominator is given in your text as se( ˆβ 1 ˆβ 2 ) = se ˆβ 1 ( ) + se ( ˆβ2 ) { 2s } 1/2, where s 12 = Cov( ˆβ 1, ˆβ 2 ) where the first two terms on the RHS are easily obtained, but s 12 is not immediately obvious in the EViews output. It turns out that s 12 is not difficult to obtain. After estimating the model, you simply go to the Equation window and select View/Covariance Matrix. From the covariance matrix, you select the element that corresponds to jc in the rows and univ in the columns (or univ in the rows and jc in the columns). This number should be 1.93E-06 (which is scientific notation for ). If you open a spreadsheet in Excel you can directly calculate the standard error of β 1. Due to rounding error, you may find you get a slightly different answer, but you should get something on the order of Calculate your t-stat for the above null. You should obtain a t-stat in the range of -1.4 to -1.5 (I m giving you a range, because rounding error may lead you to get a different estimate from me). Write down the t-stat you get for later reference. Given this, do you reject the null hypothesis. An online t-table is available here. iii.g) Given that this was a somewhat awkward way to obtain a t-stat for our null, let s consider respecifying the model in such a way as to make the hypothesis test easier to perform. Consider the following restatement of the null hypothesis: :β 1 = 0 Given this statement of the null, we can come up with the following rearrangement :θ 1 = β 1 = 0 H 1 :θ 1 < 0 We can use this to help respecify the model in a way that gives us a direct estimate of theta, and a direct estimate of the standard error of θ 1 (which is the same as the standard error of β 1, which is what we were having a hard time obtaining above). Substituting into the original equation, we get

7 log(wage) = β 0 + (θ 1 + β 2 ) jc + β 2 univ + β 3 exper +u log(wage) = β 0 +θ 1 jc + β 2 ( jc + univ) + β 3 exper +u. Let (jc+univ)=totcoll. Then log(wage) = β 0 +θ 1 jc + β 2 totcoll + β 3 exper +u The variable totcoll is already defined in your dataset, so go ahead and estimate this new specification. iii.h) Do you reject the null that β 1 = β 2? Write down the t-stat for your null hypothesis. Is it roughly the same as what you obtained above? iii.i) Finally, here s the simplest way to test the null that β 1 = β 2. Estimate your original model with jc and univ (not totcoll) on the RHS. In the Equation window, go to View/Coefficient Tests/Wald-Coefficient Restrictions and enter C(2)=C(3) This will give you some output similar to what you received when you did the F-test above. Here, you want to look at the bottom line of output where it gives the value of the coefficient and the standard error for C(2)-C(3). You can calculate the t-stat for your null hypothesis using these figures. Simply divide the estimated value of β 1 by its standard error, and confirm that the t-stat you generate matches the t-stat you obtained above. iii.j) Note that there s even a fourth way to obtain your t-statistic for the above null. Recall from lecture that if you conduct an F-test with one linear restriction, the F-statistic you obtain for that restriction equals the t-stat (for that single restriction) squared. So, in looking at the output that EViews generated when you imposed the coefficient restrictions, look at the top line and the F-stat that was generated. Take its square root. Voila! The t-stat for your null hypothesis above! The nice thing about using the F-stat, is that the p-value for the F-stat is the same as the 2-sided p-value for the t-stat. You can use this p-value to quickly assess whether you would reject the null at whatever significance level you are conducting your test. You have to be careful of one thing though. The p-value you will have is for a two-sided alternative. In our test, our alternative is one sided. For a one-sided test (see the lecture notes for a reminder on this) you divide the two-sided p-value by 2. This gives you a one-sided p-value of In other words, you can reject the null at the 10 percent level, but not at the 5 percent level. The lowest significance level at which you can reject the null is the 7.11 percent level. Note: There was a lot covered in this lab. If your head is spinning a bit at the end, you might consider going through it a little more slowly on your own time. I think you ll find that it reinforces some material from lecture, and hopefully clarifies the implementation of some things you ve only read about or seen in lecture.