Page 1 of 7 August 2013 EXAMINATIONS ECO220Y1Y Solutions PART 1: 20 multiple choice questions with point values from 1 to 3 points each for a total of 47 points (1) Determine whether the following statement is correct: The only way to improve power of the test, while holding significance level fixed, is to increase sample size. (A) (2) As a result of the above hypothesis test, a researcher failed to reject the null hypothesis. What does it mean? (C) (3) Which of the following statements describes the conclusion of the researcher in question (2)? (C) (4) If you change the significance level from 0.05 to 0.01, what will happen to the critical value and the probability of making a type II error? (E) (5) For the hypothesis test of a population proportion, suppose you obtained the Z test statistic from a random sample to be -0.5. Which of the following statements is true? (B) (6) What is the point estimate of mean response time by the company s customer service? (D) (7) Is he correct? (B) (8) What is the value of that satisfies the following equation? (D) (9) The mean of the dependent variable is 5.44. What is the mean of the independent variable? (A) (10) If an additional point with and is added in the regression analysis, what will happen to the OLS estimate for the coefficient of? (B) (11) How do you interpret the value of? (C) (12) What is the test statistic for the following set of hypothesis? (D) (13) Which of the following models violates the linearity assumption for a simple regression model? (E) (14) Based on this scatter plot, which one of the assumptions of the simple regression model is likely to be violated? (B) (15) What kind of data is it? (B) (16) Which variable has a statistically significant slope coefficient estimate at the 1 percent significance level? (A)
Page 2 of 7 (17) How do you interpret the coefficient estimate for PC? (C) (18) Suppose that the researchers are going to add another explanatory variable, the number of siblings student i has, in the model. How will it change SST? (C) (19) Determine whether the following statement is correct: Adjusted R-Squared=0.71 implies that 71% of the variation in the dependent variable is explained by the linear model. (B) (20) What is the probability,? (E)
Page 3 of 7 PART 2: 3 written questions with varying point values worth a total of 33 points (21) [9 pts] A researcher would like to investigate how long typical new parents take parental leave after having a child in Canada. Suppose that he randomly sampled 251 parents across Canada who experienced the birth of a child in 2010 and worked full time prior to that. He found that the mean length of parental leave is 35 weeks and the standard deviation is 8.5 weeks in the sample. The histogram of the sample looks close to bell shape. (a) [4 pts] Obtain a 0.95 confidence interval for the mean length of parental leave among those who became parents in 2010 in Canada. [Answer with quantitative analysis and 2 values] Based on the information given above, the underlying population distribution of length of parental leave in Canada is normally distributed. Therefore, we can use Student s t model for the inference of the population mean. A 0.95 confidence interval can be obtained by the following formula: Where is t critical value for with degrees of freedom Given the sample size is 251, the degrees of freedom for the t statistic is Therefore, the t critical value for with is 1.969. Given, the 0.95 confidence interval for mean length of the parental leave is. (b) [5 pts] The researcher found that 25 parents (about 10 percent) in the sample responded that the length of their parental leave was within the interval calculated in (a). Explain why this is consistent with the result obtained in (a). [Answer with quantitative analysis & 2-3 sentences] The result obtained in (a) is consistent with the fact that about 10 percent of parents in the sample responded their parental leave was between 33.943 weeks and 36.056. It is because the confidence interval obtained in (a) is the confidence interval for the population mean, which is based on the sampling distribution of the sample mean, while the distribution of the sample stated above reflects that of the population.
Given the information, the point estimate for the population mean is 35 weeks and that for the population standard deviation is 8.5 weeks. That means the fraction of the population that falls between 33.943 and 36.056 is approximately. Page 4 of 7 Therefore, observing about 10 percent of the sample falls between the.95 confidence interval for the population mean is consistent. (22) [11 pts] Usually, there are more borrowers for mortgage when the cost of borrowing gets lower. The following variables measure the amount of mortgages and the cost of borrowing. Mortgage t : total mortgage outstanding (in million US dollars) at time t IntRate t : Interest rate (in percentage points) at time t. The table below shows the summary statistics of annual data from the U.S. between 1980 and 2005. Variable n Mean Std. Dev. Min Max Mortgage 26 151.87 23.86 112.4 210.8 IntRate 26 8.88 2.58 5.7 14.7 The regression result is reported in the table below. Assume that all assumptions of simple regression model are satisfied. Regression Results Dependent variable is: Mortgage R-Squared = 0.7056, R-squared (adjusted) = 0.6933 s=13.21, n=26 Variable Coef SE(Coef) t-ratio P-value Intercept 220.89 9.46 23.3499 <0.0001 IntRate -7.78 1.03-7.55 <0.0001 (a) [3 pts] Fully interpret the coefficient estimate for IntRate. Include a comment on its statistical significance. [Answer with 2 3 sentences] When the interest rate creases by 1 percentage point, total mortgage outstanding decreases on average, by 7.78 million dollars. The p-value for this coefficient is less than 0.001 or any conventional significance level. Therefore, this coefficient is statistically significantly different from zero and we can conclude that there is statistically significant linear relationship between mortgage rate and interest rate.
Page 5 of 7 (b) [3 pts] Obtain 90% prediction interval of Mortgage when IntRate is 9.5 percent and interpret the estimate. [Answer with quantitative analysis, 2 values, 1-2 sentences] The formula for a 1- prediction interval is given as follows: Where is t critical value for with degrees of freedom Since sample size is 26, the degrees of freedom is 24. The t critical value for 0.05 is 1.711. Given value of the predicted mortgage rate is. Therefore, With 0.90 confidence, the predicted mortgage outstanding for any year with interest rate of 9.5 percentage points is at least 123.92 and at most 170.04 on average. (c) [5 pts] What is the prediction of Mortgage when IntRate is 2.1 percent? How reliable is the prediction? [Answer with quantitative analysis, a value, & 2 3 sentences] Given the value of (IntRate) to be 2.1, the predicted value of Mortgage is. However, this prediction is not reliable because it is outside of the range of values observed for IntRate and this is a extrapolation. We are making a strong assumption that the relationship between Mortgage and IntRate that we estimated holds even at IntRate=2.1, outside of observed range of IntRate.
Page 6 of 7 (23) [13 pts] A researcher would like to investigate the relationship between hourly wage rate (measured in dollars) and workers characteristics. The following table shows the definition of variables that describe worker s characteristics variables Definitions College i a dummy variable that takes 1 if worker i attended college, 0 otherwise Female i a dummy variable that takes 1 if worker i is female, 0 otherwise Age i Age of worker i Northeast i a dummy variable that takes 1 if worker i lives in Northeast, 0 otherwise Midwest i a dummy variable that takes 1 if worker i lives in Midwest, 0 otherwise South i a dummy variable that takes 1 if worker i lives in South, 0 otherwise West i a dummy variable that takes 1 if worker i lives in West, 0 otherwise Note that a worker lives in exactly one of the four regions: Northeast, Midwest, South, and West. The following table presents the regression result. Dependent Variable: Hourly wage rate College (X 1 ) 5.24 (0.11) Female (X 2 ) -1.02 (0.33) Age (X 3 ) 0.29 (0.01) Age*Female (X 4 ) -0.11 (0.01) Northeast (X 5 ) 1.06 (0.14) Midwest (X 6 ) 0.83 (0.15) South (X 7 ) -0.19 (0.15) Intercept 2.69 (0.25) s 2.93 R 2 0.76 n 3000 (a) [4 pts] Is this regression statistically significant overall? Write down the set of hypotheses to be tested and explain. [A set of hypotheses, answer with a quantitative analysis and 1 sentence] H 0 : All of the slope coefficients are jointly zero H A : Not all of the slope coefficients are jointly zero
Page 7 of 7 Or H 0 : H A : At least one Given that n = 3000 and k = 7 (seven explanatory variables), the numerator degrees of freedom is and denominator degrees of freedom is =3000-7-1=2992. Thus we use the critical value for F with degrees of freedom, with significance level The rejection region for significance level 0.05 is F > 2.01. The F test statistic is: Since 1353.524 > 2.01, we reject the null hypothesis that all slope coefficients are jointly 0. We conclude that the model overall is statistically significant at at least a significance level of 0.05. (b) [3 pts] Fully interpret the coefficient on the variable Age (X 3 ). Include a comment on its statistical significance. [Answer with 2 3 sentences] An increase in age by 1 year is associated with on average 0.29 dollar for male workers, while all other factors are held constant. Since the t statistic of the test for, is and it is greater than the t critical value for the significance level 1 percent for the two sided test with (since is big enough.), 2.326, we can reject the null hypothesis in favor of alternative. Thus, this coefficient estimate is statistically significantly differently from zero. (c) [3 pts] Fully interpret the coefficient on the variable Age*Female (X 4 ). Include a comment on its statistical significance. [Answer with 2 3 sentences] It implies that one year increase in age increases the wage rate for female by 0.18 dollar (0.29-0.11) on average controlling for all factors considered in this model. The t statistic for the two sided test is - 11, which is in the rejection region at any conventional significance level Therefore, we conclude that there is enough evidence to suggest a statistically significant difference in the age coefficients between male and female. (d) [3 pts] Fully interpret the coefficient on the variable Midwest (X 6 ). Include a comment on its statistical significance. [Answer with 2 3 sentences] It implies that the hourly wage rate is on average 0.83 dollar higher for workers living in Midwest relative to those living in West, holding all other factors constant. The t statistic for this coefficient is 5.53 (=0.83/0.15), larger than critical value for any conventional significance level Therefore, the difference in wage rate between workers in Midwest and workers in West is statistically significant.