August 2013 EXAMINATIONS ECO220Y1Y. Solutions. PART 1: 20 multiple choice questions with point values from 1 to 3 points each for a total of 47 points

Similar documents
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Simple linear regression

Correlation and Simple Linear Regression

August 2012 EXAMINATIONS Solution Part I

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Simple Regression Theory II 2010 Samuel L. Baker

Part 2: Analysis of Relationship Between Two Variables

Multiple Linear Regression

Regression Analysis: A Complete Example

2013 MBA Jump Start Program. Statistics Module Part 3

Chapter 7: Simple linear regression Learning Objectives

Chapter 5 Analysis of variance SPSS Analysis of variance

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Module 5: Multiple Regression Analysis

Factors affecting online sales

Interaction between quantitative predictors

STAT 350 Practice Final Exam Solution (Spring 2015)

Section 13, Part 1 ANOVA. Analysis Of Variance

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

11. Analysis of Case-control Studies Logistic Regression

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Lecture Notes Module 1

A Primer on Forecasting Business Performance

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

Introduction to Quantitative Methods

Descriptive Statistics

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

Introduction to Regression and Data Analysis

Premaster Statistics Tutorial 4 Full solutions

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

17. SIMPLE LINEAR REGRESSION II

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Generalized Linear Models

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Simple Linear Regression Inference

Nonlinear Regression Functions. SW Ch 8 1/54/

Statistics 2014 Scoring Guidelines

SPSS Guide: Regression Analysis

Statistical tests for SPSS

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Association Between Variables

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Unit 26 Estimation with Confidence Intervals

Example: Boats and Manatees

Hypothesis testing - Steps

Final Exam Practice Problem Answers

Linear Models in STATA and ANOVA

Fairfield Public Schools

2. Simple Linear Regression

Marginal Person. Average Person. (Average Return of College Goers) Return, Cost. (Average Return in the Population) (Marginal Return)

II. DISTRIBUTIONS distribution normal distribution. standard scores

Correlation and Regression

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

LOGIT AND PROBIT ANALYSIS

Recall this chart that showed how most of our course would be organized:

Multinomial and Ordinal Logistic Regression

Elementary Statistics Sample Exam #3

Chapter 7 Section 7.1: Inference for the Mean of a Population

Rockefeller College University at Albany

3.4 Statistical inference for 2 populations based on two samples

1.5 Oneway Analysis of Variance

Study Guide for the Final Exam

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Logs Transformation in a Regression Equation

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

2. Linear regression with multiple regressors

MULTIPLE REGRESSION EXAMPLE

The correlation coefficient

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Solución del Examen Tipo: 1

Exercise 1.12 (Pg )

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

UNDERSTANDING THE TWO-WAY ANOVA

Causal Forecasting Models

Math 1. Month Essential Questions Concepts/Skills/Standards Content Assessment Areas of Interaction

Using R for Linear Regression

MTH 140 Statistics Videos

Univariate Regression

Violent crime total. Problem Set 1

Regression and Correlation

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Chapter 23. Inferences for Regression

Regression III: Advanced Methods

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Transcription:

Page 1 of 7 August 2013 EXAMINATIONS ECO220Y1Y Solutions PART 1: 20 multiple choice questions with point values from 1 to 3 points each for a total of 47 points (1) Determine whether the following statement is correct: The only way to improve power of the test, while holding significance level fixed, is to increase sample size. (A) (2) As a result of the above hypothesis test, a researcher failed to reject the null hypothesis. What does it mean? (C) (3) Which of the following statements describes the conclusion of the researcher in question (2)? (C) (4) If you change the significance level from 0.05 to 0.01, what will happen to the critical value and the probability of making a type II error? (E) (5) For the hypothesis test of a population proportion, suppose you obtained the Z test statistic from a random sample to be -0.5. Which of the following statements is true? (B) (6) What is the point estimate of mean response time by the company s customer service? (D) (7) Is he correct? (B) (8) What is the value of that satisfies the following equation? (D) (9) The mean of the dependent variable is 5.44. What is the mean of the independent variable? (A) (10) If an additional point with and is added in the regression analysis, what will happen to the OLS estimate for the coefficient of? (B) (11) How do you interpret the value of? (C) (12) What is the test statistic for the following set of hypothesis? (D) (13) Which of the following models violates the linearity assumption for a simple regression model? (E) (14) Based on this scatter plot, which one of the assumptions of the simple regression model is likely to be violated? (B) (15) What kind of data is it? (B) (16) Which variable has a statistically significant slope coefficient estimate at the 1 percent significance level? (A)

Page 2 of 7 (17) How do you interpret the coefficient estimate for PC? (C) (18) Suppose that the researchers are going to add another explanatory variable, the number of siblings student i has, in the model. How will it change SST? (C) (19) Determine whether the following statement is correct: Adjusted R-Squared=0.71 implies that 71% of the variation in the dependent variable is explained by the linear model. (B) (20) What is the probability,? (E)

Page 3 of 7 PART 2: 3 written questions with varying point values worth a total of 33 points (21) [9 pts] A researcher would like to investigate how long typical new parents take parental leave after having a child in Canada. Suppose that he randomly sampled 251 parents across Canada who experienced the birth of a child in 2010 and worked full time prior to that. He found that the mean length of parental leave is 35 weeks and the standard deviation is 8.5 weeks in the sample. The histogram of the sample looks close to bell shape. (a) [4 pts] Obtain a 0.95 confidence interval for the mean length of parental leave among those who became parents in 2010 in Canada. [Answer with quantitative analysis and 2 values] Based on the information given above, the underlying population distribution of length of parental leave in Canada is normally distributed. Therefore, we can use Student s t model for the inference of the population mean. A 0.95 confidence interval can be obtained by the following formula: Where is t critical value for with degrees of freedom Given the sample size is 251, the degrees of freedom for the t statistic is Therefore, the t critical value for with is 1.969. Given, the 0.95 confidence interval for mean length of the parental leave is. (b) [5 pts] The researcher found that 25 parents (about 10 percent) in the sample responded that the length of their parental leave was within the interval calculated in (a). Explain why this is consistent with the result obtained in (a). [Answer with quantitative analysis & 2-3 sentences] The result obtained in (a) is consistent with the fact that about 10 percent of parents in the sample responded their parental leave was between 33.943 weeks and 36.056. It is because the confidence interval obtained in (a) is the confidence interval for the population mean, which is based on the sampling distribution of the sample mean, while the distribution of the sample stated above reflects that of the population.

Given the information, the point estimate for the population mean is 35 weeks and that for the population standard deviation is 8.5 weeks. That means the fraction of the population that falls between 33.943 and 36.056 is approximately. Page 4 of 7 Therefore, observing about 10 percent of the sample falls between the.95 confidence interval for the population mean is consistent. (22) [11 pts] Usually, there are more borrowers for mortgage when the cost of borrowing gets lower. The following variables measure the amount of mortgages and the cost of borrowing. Mortgage t : total mortgage outstanding (in million US dollars) at time t IntRate t : Interest rate (in percentage points) at time t. The table below shows the summary statistics of annual data from the U.S. between 1980 and 2005. Variable n Mean Std. Dev. Min Max Mortgage 26 151.87 23.86 112.4 210.8 IntRate 26 8.88 2.58 5.7 14.7 The regression result is reported in the table below. Assume that all assumptions of simple regression model are satisfied. Regression Results Dependent variable is: Mortgage R-Squared = 0.7056, R-squared (adjusted) = 0.6933 s=13.21, n=26 Variable Coef SE(Coef) t-ratio P-value Intercept 220.89 9.46 23.3499 <0.0001 IntRate -7.78 1.03-7.55 <0.0001 (a) [3 pts] Fully interpret the coefficient estimate for IntRate. Include a comment on its statistical significance. [Answer with 2 3 sentences] When the interest rate creases by 1 percentage point, total mortgage outstanding decreases on average, by 7.78 million dollars. The p-value for this coefficient is less than 0.001 or any conventional significance level. Therefore, this coefficient is statistically significantly different from zero and we can conclude that there is statistically significant linear relationship between mortgage rate and interest rate.

Page 5 of 7 (b) [3 pts] Obtain 90% prediction interval of Mortgage when IntRate is 9.5 percent and interpret the estimate. [Answer with quantitative analysis, 2 values, 1-2 sentences] The formula for a 1- prediction interval is given as follows: Where is t critical value for with degrees of freedom Since sample size is 26, the degrees of freedom is 24. The t critical value for 0.05 is 1.711. Given value of the predicted mortgage rate is. Therefore, With 0.90 confidence, the predicted mortgage outstanding for any year with interest rate of 9.5 percentage points is at least 123.92 and at most 170.04 on average. (c) [5 pts] What is the prediction of Mortgage when IntRate is 2.1 percent? How reliable is the prediction? [Answer with quantitative analysis, a value, & 2 3 sentences] Given the value of (IntRate) to be 2.1, the predicted value of Mortgage is. However, this prediction is not reliable because it is outside of the range of values observed for IntRate and this is a extrapolation. We are making a strong assumption that the relationship between Mortgage and IntRate that we estimated holds even at IntRate=2.1, outside of observed range of IntRate.

Page 6 of 7 (23) [13 pts] A researcher would like to investigate the relationship between hourly wage rate (measured in dollars) and workers characteristics. The following table shows the definition of variables that describe worker s characteristics variables Definitions College i a dummy variable that takes 1 if worker i attended college, 0 otherwise Female i a dummy variable that takes 1 if worker i is female, 0 otherwise Age i Age of worker i Northeast i a dummy variable that takes 1 if worker i lives in Northeast, 0 otherwise Midwest i a dummy variable that takes 1 if worker i lives in Midwest, 0 otherwise South i a dummy variable that takes 1 if worker i lives in South, 0 otherwise West i a dummy variable that takes 1 if worker i lives in West, 0 otherwise Note that a worker lives in exactly one of the four regions: Northeast, Midwest, South, and West. The following table presents the regression result. Dependent Variable: Hourly wage rate College (X 1 ) 5.24 (0.11) Female (X 2 ) -1.02 (0.33) Age (X 3 ) 0.29 (0.01) Age*Female (X 4 ) -0.11 (0.01) Northeast (X 5 ) 1.06 (0.14) Midwest (X 6 ) 0.83 (0.15) South (X 7 ) -0.19 (0.15) Intercept 2.69 (0.25) s 2.93 R 2 0.76 n 3000 (a) [4 pts] Is this regression statistically significant overall? Write down the set of hypotheses to be tested and explain. [A set of hypotheses, answer with a quantitative analysis and 1 sentence] H 0 : All of the slope coefficients are jointly zero H A : Not all of the slope coefficients are jointly zero

Page 7 of 7 Or H 0 : H A : At least one Given that n = 3000 and k = 7 (seven explanatory variables), the numerator degrees of freedom is and denominator degrees of freedom is =3000-7-1=2992. Thus we use the critical value for F with degrees of freedom, with significance level The rejection region for significance level 0.05 is F > 2.01. The F test statistic is: Since 1353.524 > 2.01, we reject the null hypothesis that all slope coefficients are jointly 0. We conclude that the model overall is statistically significant at at least a significance level of 0.05. (b) [3 pts] Fully interpret the coefficient on the variable Age (X 3 ). Include a comment on its statistical significance. [Answer with 2 3 sentences] An increase in age by 1 year is associated with on average 0.29 dollar for male workers, while all other factors are held constant. Since the t statistic of the test for, is and it is greater than the t critical value for the significance level 1 percent for the two sided test with (since is big enough.), 2.326, we can reject the null hypothesis in favor of alternative. Thus, this coefficient estimate is statistically significantly differently from zero. (c) [3 pts] Fully interpret the coefficient on the variable Age*Female (X 4 ). Include a comment on its statistical significance. [Answer with 2 3 sentences] It implies that one year increase in age increases the wage rate for female by 0.18 dollar (0.29-0.11) on average controlling for all factors considered in this model. The t statistic for the two sided test is - 11, which is in the rejection region at any conventional significance level Therefore, we conclude that there is enough evidence to suggest a statistically significant difference in the age coefficients between male and female. (d) [3 pts] Fully interpret the coefficient on the variable Midwest (X 6 ). Include a comment on its statistical significance. [Answer with 2 3 sentences] It implies that the hourly wage rate is on average 0.83 dollar higher for workers living in Midwest relative to those living in West, holding all other factors constant. The t statistic for this coefficient is 5.53 (=0.83/0.15), larger than critical value for any conventional significance level Therefore, the difference in wage rate between workers in Midwest and workers in West is statistically significant.