Econ 371 Problem Set #3 Answer Sheet

Size: px

Start display at page:

Download "Econ 371 Problem Set #3 Answer Sheet"

Reginald Neil Garrett
7 years ago
Views:

1 Econ 371 Problem Set #3 Answer Sheet 4.3 In this question, you are told that a OLS regression analysis of average weekly earnings yields the following estimated model. AW E = Age, R 2 = 0.023, SER = a. The first question asks you to explain what the coefficient values mean. The coefficient 9.6 shows the marginal effect of Age on AW E; that is, AW E is expected to increase by $9.6 for each additional year of age is the intercept of the regression line. It determines the overall level of the line, indicating the the average weekly earnings for someone just born i.e., Age = 0. Clearly, one would not want to put much emphasis on this prediction. b. The second question asks what the units of measurement are for the SER. SER is in the same units as the dependent variable Y, or AW E in this example. Thus SER is measures in dollars per week. c. This question asks for the units for R 2. R 2 is unit-free. d. This question asks what the regression s predicted earnings for a 25 year-old worker and a 45 year-old worker. Our model implies that AW E = = $ and AW E = = $ e. Part e of the question asks if the regression will give a reliable prediction for a 99 year-old worker. The answer in this case is no. The oldest worker in the sample is 65 years old. 99 years is far outside the range of the sample data. It is usually inadvisable to use a regression model particularly a linear one to make predictions outside the range of the sample data. f. Here you are asked whether it is plausible that the distribution of the errors in the regression are normal. It is unlikely that the underlying error terms are normal. Indeed, it is probably the case that the distribution of earning is positively skewed and has kurtosis larger than the normal. The income levels are bounded below by zero which also will end up bounding the error terms. Also, there are likely to large outliers on the right-hand side of the distribution but not the left, due to extreme income cases such as Bill Gates, Tiger Woods, etc., but no comparable extremes on the left-hand side again because income is bounded below by zero. g. Finally, you are asked what the average value of AW E is in the sample. Since ˆβ 0 = Ȳ ˆβ 1 X, then Ȳ = ˆβ 0 + ˆβ 1 X. Thus the sample mean of AW E is = $1, This question has two parts. a. First, you are asked to show that if ˆβ 1 = 0, then R 2 = 0. But in this case, ˆβ 0 = Ȳ, so that Ŷi = ˆβ 0 = Ȳ, yielding ESS = 0 and R 2 = 0. b. Second, you are asked if R 2 = 0 implies that ˆβ 1 = 0. If R 2 = 0, then ESS = 0, so that Ŷi = Ȳ for all i. But Ŷi = ˆβ 0 + ˆβ 1 X i so that Ȳ = ˆβ 0 + ˆβ 1 X i for all i, which implies that either ˆβ 1 = 0 or that X i is constant for all i. If X i is constant for all i, then n i=1 X i X 2 = 0 and ˆβ 1 is undefined see equation In this question, you are told that an OLS regression analysis of test scores on class size CS yields. T estscore = CS, R 2 = 0.08, SER = a. The first part of the question asks you to construct a 95% confidence interval for β 1. This is given by: ˆβ 1 ± 1.96[SE ˆβ 1 ] = 5.82 ± = ,

6 shows the marginal effect of Age on AW E; that is, AW E is expected to increase by $9.6 for each additional year of age. 696.7 is the intercept of the regression line.

2 b. This second part of the question asks you to compute the p-vale associated with the hypothesis H 0 : β 1 = 0. The p-value is computed using: p value = 2Φ t act ˆβ 1 0 = 2Φ SE ˆβ 1 = 2Φ = 2Φ = = Clearly we would reject the null hypothesis at both the 5% and 1% levels. c. In part c, you are asked to test the null hypothesis that H 0 : β 1 = 5.61 and to predict whether or not -5.6 would be contained in a 95% confidence interval for β 1. The p-value is computed using: p value = 2Φ t act ˆβ = 2Φ = 2Φ = 2Φ 0.1 = = SE ˆβ The p-value is larger than 0.10, so we cannot reject the null hypothesis at the 10%, 5% or 1% significance level. Because H 0 : β 1 = 5.61 is not rejected at the 5% level, this value is contained in the 95% confidence interval. 5.7 This question considers a linear regression model with a sample size of n = 250. Specifically, the study finds Ŷ = X, R 2 = 0.26, SER = a. The first part of the question asks you to test the hypothesis H 0 : β 1 = 0 at the 5% level. The p-value is computed using: p value = 2Φ t act ˆβ 1 0 = 2Φ SE ˆβ 1 = 2Φ = 2Φ 2.13 = = Clearly we would reject the null hypothesis at the 5% level since the p-value is less than b. This part of the question asks you to construct a 95% confidence interval for β 1. This is given by: ˆβ 1 ± 1.96[SE ˆβ 1 ] = 3.2 ± = 0.26, c. Part c asks if you would be surprised to learn that Y i and X i are independent. You should be. If Y and X are independent, then beta 1 = 0; but this null hypothesis was rejected at the 5% level in part a. 2

61 and to predict whether or not -5.6 would be contained in a 95% confidence interval for β 1. The p-value is computed using: p value = 2Φ t act ˆβ 1 5.61 = 2Φ = 2Φ = 2Φ 0.1 = 20.46 = 0.92. SE ˆβ 1 0.

3 course_eval beauty Figure 1: Scatter Plot of CourseEval versus Beauty d. β 1 would be rejected at the 5% level in 5% of the samples; 95% of the confidence intervals would contain the value β 1 = 0. The two empirical exercises in this homework use the same dataset: TeachingRatings. The data can be downloaded from the Web site listed in the assignment which you can also reach from the class website. A program that carries all of the tasks for problems E4.2 and E5.2 is appended to this answer sheet. E4.2 The specific questions you are asked to respond to are: a. From Figure 1, we can see that there appears to be a weak positive relationship between course evaluation and the beauty index. b. The regression results are as follows: CourseEval = Beauty The variable Beauty has a mean that is equal to 0; the estimated intercept is the mean of the dependent variable CourseEval minus the estimated slope times the mean of the regressor Beauty. Thus, the estimated intercept is equal to the mean of CourseEval. c. Next, you are asked to predict the CoursEval of Watson Beauty = 0 and Stock Beauty = Using our regression results we have: Watson s predicted CoursEval is = Stock s predicted CoursEval is = The program provides two different ways of computing these predicted values. command and the other uses the lincom command. One uses the scalar d. The standard deviation of course evaluations is 0.55 and the standard deviation of beauty is A one standard deviation increase in beauty is expected to increase course evaluation by = 0.105, or 1/5 of a standard deviation of course evaluations. The effect is small. e. The regression R 2 is 0.036, so that Beauty explains only 3.6% of the variance in course evaluations. E5.2 This question uses the results from E4.2, reported above. You are asked to estimate the model regressing CourseEval on Beauty. The resulting parameter estimates are CourseEval = Beauty

The two empirical exercises in this homework use the same dataset: TeachingRatings.

4 The t-statistic is 4.12, which has a p-value of <0.001, so the null hypothesis can be rejected at the 1% level and thus, also at the 10% and 5% levels. 4

5 ; Problem Set #3 ; # delimit ; clear; cap log close; cd "R:\users\jaherrig\My Documents\Classes\Economics 371\Stata"; ; Specify the output file ; log using Prob3F09.log,replace; set more off; ; Read in and summarize the data ; use TeachingRatings.dta; describe; summarize course_eval beauty; ; Plot course_eval versus beauty for question E4.1a ; twoway scatter course_eval beauty; ; Estimate the model for question E4.1b ; reg course_eval beauty,r; ; Compute Fitted Values for question E4.1c using Scalar ; scalar drop _all; scalar Watson = _b[_cons] + 0_b[beauty]; scalar Stock = _b[_cons] _b[beauty]; scalar list; ; Compute Fitted Values for question E4.1c using lincom ; lincom _cons; lincom _cons beauty;

1a ; twoway scatter course_eval beauty; ; Estimate the model for question E4.1b ; reg course_eval beauty,r; ; Compute Fitted Values for question E4.

6 log close; clear; exit;

7 log: R:\users\jaherrig\My Documents\Classes\Economics 371\Stata\Prob3F09.log log type: text opened on: 12 Oct 2009, 07:08:13. set more off;. ;. > Read in and summarize the data > > ;. use TeachingRatings.dta;. describe; Contains data from TeachingRatings.dta obs: 463 vars: 8 10 Dec :29 size: 15, % of memory free storage display value variable name type format label variable label minority float %9.0g Minority age float %9.0g Professor's age female float %9.0g female = 1 onecredit byte %8.0g Equal 1 if a one-credit course beauty float %9.0g course_eval float %9.0g intro float %9.0g nnenglish float %9.0g Sorted by:. summarize course_eval beauty; Variable Obs Mean Std. Dev. Min Max course_eval beauty e ;. > Plot course_eval versus beauty for question E4.1a > > ;. twoway scatter course_eval beauty;. ;. > Estimate the model for question E4.1b > > ;. reg course_eval beauty,r;

5% of memory free ------- ------------------------------------ storage display value variable name type format label variable label ------- ------------------------------------ minority float %9.

8 Linear regression Number of obs = 463 F 1, 461 = Prob > F = R-squared = Root MSE = Robust course_eval Coef. Std. Err. t P> t [95% Conf. Interval] beauty _cons ;. > Compute Fitted Values for question E4.1c using Scalar > > ;. scalar drop _all;. scalar Watson = _b[_cons] + 0_b[beauty];. scalar Stock = _b[_cons] _b[beauty];. scalar list; Stock = Watson = ;. > Compute Fitted Values for question E4.1c using lincom > > ;. lincom _cons; 1 _cons = 0 course_eval Coef. Std. Err. t P> t [95% Conf. Interval] lincom _cons beauty; beauty + _cons = 0 course_eval Coef. Std. Err. t P> t [95% Conf. Interval] log close; log: R:\users\jaherrig\My Documents\Classes\Economics 371\Stata\Prob3F09.log log type: text closed on: 12 Oct 2009, 07:08:

> Compute Fitted Values for question E4.1c using Scalar > > ;. scalar drop _all;. scalar Watson = _b[_cons] + 0_b[beauty];. scalar Stock = _b[_cons] + 0.789_b[beauty];. scalar list; Stock = 4.

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if