Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis



Similar documents
Chapter 3 Quantitative Demand Analysis

Module 5: Multiple Regression Analysis

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Regression step-by-step using Microsoft Excel

c. Given your answer in part (b), what do you anticipate will happen in this market in the long-run?

Learning Objectives. Essential Concepts

Final Exam Practice Problem Answers

Session 7 Bivariate Data and Analysis

Chapter 6 Competitive Markets

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

Coefficient of Determination

Simple linear regression

Module 3: Correlation and Covariance

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Elasticity. I. What is Elasticity?

Traditional Conjoint Analysis with Excel

Univariate Regression

2013 MBA Jump Start Program. Statistics Module Part 3

Chapter 5 Analysis of variance SPSS Analysis of variance

Table of Contents MICRO ECONOMICS

Principles of Economics: Micro: Exam #2: Chapters 1-10 Page 1 of 9

Hypothesis testing - Steps

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Multiple Linear Regression in Data Mining

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

STAT 350 Practice Final Exam Solution (Spring 2015)

Simple Methods and Procedures Used in Forecasting

Microeconomics Instructor Miller Practice Problems Labor Market

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Week TSX Index

or, put slightly differently, the profit maximizing condition is for marginal revenue to equal marginal cost:

PART A: For each worker, determine that worker's marginal product of labor.

Hedge Effectiveness Testing

Chapter 7: Simple linear regression Learning Objectives

Examples on Monopoly and Third Degree Price Discrimination

Chapter 9 Assessing Studies Based on Multiple Regression

2. Linear regression with multiple regressors

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

A Basic Introduction to Missing Data

Monopoly and Monopsony Labor Market Behavior

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

1 Calculus of Several Variables

Q = ak L + bk L. 2. The properties of a short-run cubic production function ( Q = AL + BL )

Determining Future Success of College Students

August 2012 EXAMINATIONS Solution Part I

Case Study: Alex Charter School Gordon Johnson, California State University, Northridge, USA Raj Kiani, California State University, Northridge, USA

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Generalized Linear Models

Review of Fundamental Mathematics

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Nonlinear Regression Functions. SW Ch 8 1/54/

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Simple Linear Regression Inference

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Simple Regression Theory II 2010 Samuel L. Baker

ECON 103, ANSWERS TO HOME WORK ASSIGNMENTS

CORRELATION ANALYSIS

Chapter 23 Inferences About Means

Linear Programming Notes VII Sensitivity Analysis

Econometrics Simple Linear Regression

Chapter 5 Estimating Demand Functions

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Not Your Dad s Magic Eight Ball

Statistical Functions in Excel

4 G: Identify, analyze, and synthesize relevant external resources to pose or solve problems. 4 D: Interpret results in the context of a situation.

Determination of g using a spring

Topic 1 - Introduction to Labour Economics. Professor H.J. Schuetze Economics 370. What is Labour Economics?

Learning Objectives. After reading Chapter 11 and working the problems for Chapter 11 in the textbook and in this Workbook, you should be able to:

Premaster Statistics Tutorial 4 Full solutions

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

Covariance and Correlation

Profit Maximization. 2. product homogeneity

Optimization: Optimal Pricing with Elasticity

Business Valuation Review

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

CALCULATIONS & STATISTICS

Association Between Variables

The Dummy s Guide to Data Analysis Using SPSS

Chapter 7 Monopoly, Oligopoly and Strategy

Pay for performance. Intrinsic (interested in the job as such) Extrinsic motivation. Pay Work environment, non-pay characteristics, benefits

Statistics 151 Practice Midterm 1 Mike Kowalski

Predictability Study of ISIP Reading and STAAR Reading: Prediction Bands. March 2014

Physics Lab Report Guidelines

Pearson s Correlation

COST THEORY. I What costs matter? A Opportunity Costs

Multiple Regression. Page 24

Using R for Linear Regression

Microeconomics Topic 6: Be able to explain and calculate average and marginal cost to make production decisions.

Nominal, Real and PPP GDP

Week 3&4: Z tables and the Sampling Distribution of X

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Testing for Lack of Fit

Transcription:

Economics of Strategy (ECON 4550) Maymester 015 Applications of Regression Analysis Reading: ACME Clinic (ECON 4550 Coursepak, Page 47) and Big Suzy s Snack Cakes (ECON 4550 Coursepak, Page 51) Definitions and Concepts: Sample Maximum the largest realized value of a variable Sample Minimum the smallest realized value of a variable Dummy Variable a variable that indicates whether an observation is characterized by a particular attribute (typically equal to 1 if the attribute is true and equal to 0 otherwise) Omitted Variable Bias a problem of distorted regression results arising from specifying a model which leaves out one or more important independent variables (i.e., a specification of the true model which is wrong because all of the relevant X variables were not included) For such a bias to arise in linear regression, the omitted variable must (i) be a true determinant of the independent variable and (ii) be strongly correlated with one or more of the other included independent variables If such a relevant independent variable is omitted, then the estimated coefficient on the strongly correlated (included) independent variable is partly measuring the impact of the highly correlated omitted variable Note: an Excel file containing the data used in each of the examples discussed in lecture is posted on the course webpage (http://ksuweb.kennesaw.edu/~tmathew7/econ4550.html)

1. Estimating an Average Cost Function Consider an automobile manufacturer trying to estimate ATC (q), based on past realizations of Average Total Costs for different levels of Output Assume ATC( q) b0 b1q bq We have data on Average Costs and Quantity of Output for each of the past 6 weeks as follows: Average Costs Quantity Average Costs Quantity 39,380 758 36,580 114 9,10 100 33,980 65 51,00 69 71,560 800 34,500 571 36,900 44 3,980 584 4,900 48 18,790 576 56,90 804 18,00 434 3,655 641 59,10 75 31,40 431 41,15 60 18,50 14 1,990 300 17,70 47 17,450 85 19,980 60 51,985 796 33,450 150 4,500 308 14,10 40 Start by computing some descriptive statistics for the variables in our data set: sample mean, sample standard deviation, sample maximum, and sample minimum In practice, this partly serves as a check to potentially identify any errors in the dataset Descriptive Statistics: Average Costs Quantity Mean 33,048 489.7 Std Dev 15,153.1 19.68 Maximum 71,560 804 Minimum 1,990 100 In order for our data to match the assumed functional form for Average Costs, we need to do a non-linear transformation of Quantity (i.e., compute Quantity Squared for each observation) Regression results from Excel

Example 1 Estimating an Average Cost Function SUMMARY OUTPUT Regression Statistics Multiple R 0.86186414 R Square 0.68583991 Adjusted R Square 0.65498599 Standard Error 8900.667707 Observations 6 ANOVA df SS MS F Significance F Regression 391833443 19591617 4.73005667 1.85666E 06 Residual 3 18103369 791885.63 Total 5 574046813 Coefficients Standard Error t Stat P value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 4108.54946 7596.660953 5.44560833 1.63735E 05 5493.65897 5693.43996 5493.65897 5693.43996 X Variable 1 10.684758 36.01458995 3.350996315 0.00767997 195.1866138 46.189063 195.1866138 46.189063 X Variable 0.1780569 0.03840066 4.65675685 0.000109633 0.098950686 0.57161894 0.098950686 0.57161894 Estimated equation of bˆ 0 bˆ ˆ 1q bq 41,08.55 10.68q (0.1781) q Note, all p-values are small enough so that each estimated coefficient is statistically significant at a 0.1% error level R. 6858

(?) What is the Efficient Scale of Production for this firm? (A) Recall, the Efficient Scale of Production is the quantity of output that minimizes Average Total Costs of Production. We have estimated Average Total Costs of Production to be: ATC( q) 41,08.55 10.68q (0.1781) q From here, we have: AT C ( q) 10.68 (0.356) q and AT C( q) 0. 356 AT C( q) 0 for small quantities and AT C( q) 0 for large quantities Average Total Costs are minimized where: AT C( q) 0 10.68 (0.356) q 0 ( 0.356) q 10.68 10.68 q 338.80.356 Thus, the Efficient Scale of Production is roughly 338 units of output

. Estimating Demand Consider a coffee house with retail outlets in 3 markets For each market they have data on annual quantity sold, price per unit, average income, and price set by a rival. Store Number Quantity Sold Price Average Income Rival Price 1 476,500.10 33,560 1.85 358,750.15 30,10 1.90 3 443,900.05 34,50 1.80 4 54,450.0 3,340.05 5 433,575.0 41,750.15 6 498,790 1.65 34,50 1.45 7 389,670.45 5,690.5 8 430,560.40 33,40.10 9 575,690.0 37,800.00 10 40,350.15 8,900.05 11 430,150.65 3,450.5 1 470,00.0 34,150 1.95 13 34,175.5 33,5 1.80 14 530,10 1.90 43,750 1.60 15 638,900 1.95 4,990 1.75 16 67,340 1.75 3,785 1.55 17 609,510.05 31,140.00 18 410,10.5 5,670 1.90 19 410,450.30 9,310.05 0 575,750 1.80 38,800 1.75 1 45,790.5 37,75 1.80 64,900 1.85 40,050 1.75 3 43,910.05 34,800 1.95 4 579,800.10 4,500 1.55 5 388,750 1.70 6,700 1.40 6 505,675 1.85 9,750 1.75 7 575,680.10 3,000.15 8 517,750 1.95 33,540 1.80 9 57,50 1.90 38,765 1.70 30 540,000.15 39,975 1.95 31 540,85 1.95 41,00 1.50 3 480,100.40 35,800.0

Descriptive Statistics: Quantity Price Income Rival Price Mean 494,861.09 34,655.47 1.86565 Std Dev 86,754.68 0.3 5,047.94 0.36001 Maximum 67,340.65 43,750.5 Minimum 34,175 1.65 5,670 1.4 Suppose they conjecture that: B 1 B ( ) _ B quantity A price income rival price 3 a b c Note: ln( x y z ) a ln( x) bln( y) c ln( z) Thus, the demand relation above can be expressed as: quantity ln( A) B1 lnprice B lnincome B lnrival _ price quantity B B lnprice B lnincome B lnrival _ price ln 3 ln 0 1 3 We can do a transformation of variables and run a linear regression! Regression results from Excel (see following page) From here, we can essentially undo the previous transformation of variables Note, since B0 ln( A) and B ˆ0 7. 001467, it follows that A ˆ exp{7.001467} 1,098. 4 So, our estimated equation is: B 1 B ( ) _ B quantity A price income rival price 3 price 1.681 income 0.6309 rival _ 0. 7067 quantity 1.098.4 price Recognize that fixing income and rival price, this demand function is of the constant elasticity form => price elasticity of demand is p 1. 681 => Elastic Demand Further, Income Elasticity of Demand is I 0. 6309 => Normal Good And Cross Price Elasticity of Demand (with respect to rival price) is X, p Y 0. 7067 => good in question is a Substitute for the good being sold by the rival firm

Example Estimating Demand SUMMARY OUTPUT Regression Statistics Multiple R 0.714436775 R Square 0.510419906 Adjusted R Square 0.457964895 Standard Error 0.131195 Observations 3 ANOVA df SS MS F Significance F Regression 3 0.5045416 0.16748471 9.7306574 0.00014510 Residual 8 0.48193958 0.017118 Total 31 0.984393744 Coefficients Standard Error t Stat P value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 7.001467495 1.76409634 3.9686145 0.000457071 3.38764788 10.6158711 3.38764788 10.6158711 X Variable 1 1.68085531 0.38195871 3.319954385 0.00509044.05049504 0.485678559.05049504 0.485678559 X Variable 0.630866119 0.165579736 3.81004445 0.00069764 0.91691406 0.970040831 0.91691406 0.970040831 X Variable 3 0.70675885 0.3319797.19154635 0.041771 0.0680348 1.38664941 0.0680348 1.38664941 Estimated equation of Bˆ Bˆ lnprice Bˆ lnincome Bˆ ln( rival _ ) is: 0 1 price ( 7.0015) 1.681ln( price).6309ln( income).7067ln( rival _ price) Note, all p-values are small enough so that each estimated coefficient is statistically significant at a 5% error level. R. 5104

3. ACME Clinic Page 47 in Coursepak 1. Based upon Exhibit A, are male nurses paid less than female nurses? If so, by how much? Is that difference statistically significant?. What about the clinic s claim that Mr. Jones is appropriately paid if you account for his below average education? Is that supported by the data? If education is the only determinant of compensation, what is a fair estimate of what Mr. Jones salary should be? 3. After conducting your preliminary analysis, you interview supervisors in the clinic and find that years of experience are also highly valued by the clinic. Based on that observation, you request data on the experience of the nurses and receive data contained in Exhibit B. How is you analysis altered if you consider experience as a factor that determines compensation? Is Mr. Jones underpaid according to this analysis? Why not? 4. How do you reconcile the apparent contradiction between your answers above? Exhibit A (with Gender=1 for female and Gender=0 for male ) ID # Salary Education Gender 1 49,380 4 1 33,400 3 0 3 40,940 1 1 4 43,440 4 1 5 4,960 0 0 6 47,580 5 1 7 33,400 3 0 8 37,50 4 0 9 45,080 1 10 36,80 0 1 11 43,100 1 1 64,80 1 13 33,980 4 0 14 43,440 4 1 15 9,60 0 16 6,940 0 0 17 53,300 4 1 18 61,50 5 1 19 60,750 5 1 0 9,980 0 Descriptive Statistics: Salary Education Gender Mean 41,967.8 0.6 Std Dev 11,596.76 1.67 0.50 Maximum 64,80 5 1 Minimum 4,960 0 0

1. Based upon Exhibit A, are male nurses paid less than female nurses? If so, by how much? Is that difference statistically significant? Observe that from the dataset we can compute that the Average Salary of Female nurses is $49,158.33, while the Average Salary of Male nurses is only $31,180.00 => Male nurses are paid $17,978.33 less! If we run a regression to estimate the equation salary b0 b1( female), we get the results labeled Example 3 ACME Clinic [Regression (i)] So, based upon the results of this regression, it appears as if Male nurses are paid less ($17,978.33 less!) than Female nurses Further, this difference is statistically significant at a.01% error level. What about the clinic s claim that Mr. Jones is appropriately paid if you account for his below average education? Is that supported by the data? If education is the only determinant of compensation, what is a fair estimate of what Mr. Jones salary should be? To determine the relation between education and salary (assuming education is the only determinant of salary), run a regression on the equation salary b ( 0 b1 education). Doing so, we get the results labeled Example 3 ACME Clinic [Regression (ii)] So, based upon the results of this regression, it appears that nurses with more education are paid higher salaries Mr. Jones education level (only years) is slightly below the sample mean of (.8) But, by the estimated equation 31,09.74 3,906.17( education), the expected salary of a nurse with years of education should be 31,09.74 3,906.17() 38,84. 08 => Mr. Jones salary of only $9,980 is well below this amount Thus, the Clinic s claim that Mr. Jones low salary is accounted for by his below average education is not supported by the data 3. After conducting your preliminary analysis, you interview supervisors in the clinic and find that years of experience are also highly valued by the clinic. Based on that observation, you request data on the experience of the nurses and receive data contained in Exhibit B. How is you analysis altered if you consider experience as a factor that determines compensation? Is Mr. Jones underpaid according to this analysis? Why not?

we now have Exhibit B ID # Salary Education Female Experience 1 49,380 4 1 11 33,400 3 0 4 3 40,940 1 1 10 4 43,440 4 1 8 5 4,960 0 0 3 6 47,580 5 1 9 7 33,400 3 0 4 8 37,50 4 0 5 9 45,080 1 11 10 36,80 0 1 9 11 43,100 1 10 1 64,80 1 1 13 33,980 4 0 3 14 43,440 4 1 8 15 9,60 0 3 16 6,940 0 0 4 17 53,300 4 1 1 18 61,50 5 1 17 19 60,750 5 1 17 0 9,980 0 3 Descriptive Statistics: Salary Education Gender Experience Mean 41,967.8 0.6 8.6 Std Dev 11,596.76 1.67 0.50 5.6 Max 64,80 5 1 1 Min 4,960 0 0 3 To determine the relation between salary and all three independent variables, run a regression on salary b0 b1 ( education) b ( female) b3 (experience). Doing so, we get the results labeled Example 3 ACME Clinic [Regression (iii)] So, based upon the results of this regression, there is not statistically significant difference in salaries of females versus males Accounting for Mr. Jones education level (only years) and experience (only 3 years), his expected salary is 19,89.50,054.64() 706.58(0) 1,855.88(3) 9,506. 4 His actual salary of $9,980 is greater than this estimated expected salary (an estimate that takes into account his level of education and experience) => if anything, he s slightly overpaid

4. How do you reconcile the apparent contradiction between your answers above? To answer Question (1) we ran a regression for the equation salary b ( 0 b1 female) and found the impact of female to be statistically significant To answer Question (3) we ran a regression for salary b0 b1 ( education) b ( female) b3 ( Experience) and found the impact of education and experience to be statistically significant but the impact of female to not be statistically significant When running this latter regression, we are determining the impact of changes in each independent variable, controlling for differences in each of the other independent variables (recall, for multiple regression the interpretation of each coefficient is along the lines of all other factors fixed ) The regression we ran to answer Question (1) suffers from an Omitted Variables Bias, due to the fact that for this population there is a strong, positive correlation between Female and Experience Recall, definition of Correlation Coefficient: cov( X, Y ) XY s X sy Value of the correlation coefficient between each pair of independent variables: Education Female Experience Education 1 Female 0.75344396 1 Experience 0.307617041 0.79986037 1 Correlation Coefficient between Experience and Female is (.7999), which is fairly close to the upper bound of (1) For the regression we ran to answer Question (1), this was precisely the case Recall, the specified equation for this regression was salary b0 b1( female) We omitted Experience, which is highly correlated with Female => when doing so, the estimated coefficient for Female is actually providing a measure of both gender and the highly correlated experience Once we include both Female and Experience, the coefficient on Female only measures the impact of gender and not the impact of experience => from these results we see that experience has a statistically significant impact on salary, while gender does not Thus, the better results in this case are those from the regression which includes all three potential determinants of salary (i.e., results for the estimation of the equation salary b0 b1 ( education) b ( female) b3 ( Experience), as estimated within our answer to Question 3) => these results do NOT suffer from any Omitted Variable Bias

Example 3 ACME Clinic [Regression (i)] SUMMARY OUTPUT Regression Statistics Multiple R 0.77913999 R Square 0.607174456 Adjusted R Square 0.585350814 Standard Error 7467.58844 Observations 0 ANOVA df SS MS F Significance F Regression 1 155145853 155145853 7.8186741 5.14359E 05 Residual 18 1003751767 55763987.04 Total 19 5551000 Coefficients Standard Error t Stat P value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 31180 640.17014 11.80984494 6.5351E 10 5633.0836 3676.79164 5633.0836 3676.79164 X Variable 1 17978.33333 3408.444997 5.74643818 5.14359E 05 10817.4561 5139.1055 10817.4561 5139.1055 Estimated equation: bˆ ˆ 0 b1 ( female) 31,180 17,978.33( female) => if we run a regression with only one X variable that happens to be a dummy, then ˆb 0 is equal to the average value of the observations with (dummy)=(0) and ˆb 1 is equal to the difference between average value of the observations with (dummy)=(1) and average value of the observations with (dummy)=(0) Each estimated coefficient is significant at a.01% error level R. 61331

Example 3 ACME Clinic [Regression (ii)] SUMMARY OUTPUT Regression Statistics Multiple R 0.5636876 R Square 0.317677379 Adjusted R Square 0.79770567 Standard Error 9841.74103 Observations 0 ANOVA df SS MS F Significance F Regression 1 811734.3 811734.3 8.38048559 0.009650965 Residual 18 1743477598 96859866.54 Total 19 5551000 Coefficients Standard Error t Stat P value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 3109.73684 437.30819 7.096877778 1.9141E 06 1843.858 4015.61549 1843.858 4015.61549 X Variable 1 3906.165414 1349.3360.894906313 0.009650965 1071.341719 6740.989109 1071.341719 6740.989109 Estimated equation: bˆ ˆ 0 b1 ( education) 31,09.74 3,906.17( education) Each estimated coefficient is significant at a.01% error level R. 31767

Example 3 ACME Clinic [Regression (iii)] SUMMARY OUTPUT Regression Statistics Multiple R 0.997744418 R Square 0.99549393 Adjusted R Square 0.994649034 Standard Error 848.3061073 Observations 0 ANOVA df SS MS F Significance F Regression 3 543696048 84789868.7 1178.53594 5.6633E 19 Residual 16 1151397.03 71963.516 Total 19 5551000 Coefficients Standard Error t Stat P value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 1989.49549 443.6150013 44.69978569 3.1608E 18 18889.0737 0769.9178 18889.0737 0769.9178 X Variable 1 054.6398 1.41135 16.78471645 1.40007E 11 1795.13933 314.14069 1795.13933 314.14069 X Variable 706.57569 636.476074 1.110136419 0.83346397 64.693738 055.84471 64.693738 055.84471 X Variable 3 1855.87999 61.499549 30.17713499 1.56335E 15 175.506784 1986.53195 175.506784 1986.53195 bˆ ˆ ˆ ˆ 0 b1 ( education) b ( female) b3 ( Experience) 19,89.50,054.64( education) 706.58( female) 1,855.88( Experience) 99549. R However, the coefficient for the Female dummy variable is no longer statistically significant ("pvalue of.8335)

Multiple Choice Questions: 1. refers to a problem of distorted regression results arising from specifying a model which leaves out one or more important independent variables. A. Selection Bias B. A Dummy Variable C. Omitted Variable Bias D. A Log-Transformation.. A Dummy Variable A. is typically defined in such a way that it can take on any value between ( 1) and (1), but cannot take on values less than ( 1) or greater than (1). B. can only ever be included in a regression as the Y variable (and never as one of the X variables ). C. indicates whether an observation is characterized by a particular attribute. D. More than one (perhaps all) of the above answers is correct. 3. Henry ran a regression to estimate qx b0 b1 ln( px ) b ln( py ) b3 ln( Inc), where q x denotes quantity of good x, p x denotes price of good x, p y denotes price of good y, and Inc denotes per capita Income in the market for good x. His estimated coefficient values are b ˆ0 4. 38731, b ˆ1 1. 4189, b ˆ 0. 50407, and b ˆ3 0. 386. These results would suggest that A. good x is an inferior good. B. good x is a substitute for good y. C. good x is a complement to good y. D. More than one (perhaps all) of the above answers is correct. 4. Suppose you have the following observations on the value of variable X1 : 9, 10, 1, 16, 13, 18, 7, 10, 9, 6, 14, and 11. For these observations, the Sample Minimum is A. 6. B. 1. C. 1. D. 144.

Problem Solving or Short Answer Questions: 1. John is planning on running a regression in order to determine the factors influencing salaries of public school teachers in the state of Georgia. He has obtained data on current salary, level of education, number of years of teaching experience, age, and gender for a random sample of,457 teachers in the state. Every teacher in his sample has at least a Bachelor s degree, but some have a Master s Degree or Doctorate. He has created a dummy variable (named AdvDeg ) to indicate whether or not each individual has one of these advanced degrees. He has also created a dummy variable (named Male ) to indicate the gender of each individual. Before running his regression, he computed Descriptive Statistics for each variable, as reported below: Salary AdvDeg Experience Age Male Mean 45,071.79 0.74 7.59 40.18 0.9 Std Dev 15,655.14.56 13.8 15.18 0.46 Max 7,130 15 94 65 1 Min 8,759 0 1 7 0 Based upon these reported values, do you have any observations to offer about his dataset? Explain.. Amy ran a regression to estimate the parameters in the equation y b0 b1 x1 b x b3 x3 b4 x4 In part, her regression results are: Regression Statistics R Square 1.895479353 Adjusted R Square 0.713649764 Observations 357 Coefficients P value Intercept 39.79148 0.0345917 X Variable 1 56.1198 0.00056794 X Variable 1.7891 0.01789495 X Variable 3 537.41849 1.03478E 10 X Variable 4 89.0347 0.085734195 A. Based upon her reported p-values, which of her coefficient estimates are statistically significant at a 5% error level? Which of her coefficient estimates are statistically significant at a 1% error level? B. Do you have any concerns with her reported regression results? If so, explain.

Answer Questions 3 through 5 using the data posted online at: http://ksuweb.kennesaw.edu/~tmathew7/econ4550/05_regressionanalysisapplications_problemsetdata_summer015.xlsx 3. You have been hired by Jim Highland Homes (a custom home builder operating in northern Georgia, northeastern Alabama, northwestern South Carolina, and southwestern North Carolina) to conduct an analysis to determine the factors influencing the price of homes. Specifically, you are given the data contained in the worksheet titled Data for Question 3. This dataset contains observations on Selling Price, Square Footage, and Lot Size (in acres), for a sample of 88 recently sold new homes in a market where Jim Highland Homes is considering starting a new development. Some of these properties were also on either a waterfront lot or a golf course lot, as indicated in the dataset. 3A. Determine the value of Sample Mean, Sample Standard Deviation, Sample Maximum, and Sample Minimum for each of the variables in this dataset. 3B. Run a regression on the equation ( price) b0 b1 ( SqFootage) b ( LotSize) b3 ( Waterfront) b4 ( GolfCourse) and state the estimated coefficient values for this regression. 3C. Based upon the estimated coefficient values, how much of a premium are people willing to pay for a Waterfront Lot? How much of a premium are people willing to pay for a Golf Course Lot? 3D. Which coefficient estimates are statistically significant at a 10% error level? Which coefficient estimates are statistically significant at a 1% error level? 4. Mo, Caleb, and Gene have been hired by the U.S. Federal Trade Commission to conduct a study on the impact of market power on the pricing patterns of firms. They have been provided with the data in the worksheet titled Data for Question 4. This dataset contains observations on Price, Marginal Cost, the value of C4, and the value of HHI for 100 firms operating in 9 different industries with the U.S. 4A. Mo claims, I know from my economics classes that firms with substantial market power charge higher prices than firms with less market power. Since C4 is a good measure of market power, we should run a regression on the equation price) b b ( 4). I am very confident that we will get good results, with ( 0 1 C b ˆ1 0. Run the regression suggested by Mo. Based upon the resulting value of R and the resulting p-values, would Mo obtain the results that he expects? Explain. 4B. Caleb says, It is true that firms with substantial market power charge higher prices than firms with less market power. But, C4 is not a good measure of market power HHI is a superior measure. We should run a regression on the equation ( price) b0 b1 ( HHI). For this regression we are sure to get good results, with b ˆ1 0. Run the regression suggested by Caleb. Based upon the resulting value of R and the resulting p-values, would Caleb obtain the results that he expects? Explain. 4C. Gene storms out of the room yelling, I can t work with you idiots. IEPR! IEPR!!! Don t you remember anything from your economics classes!? With this

data, if you are going to run a regression it should be on an equation along the lines of either p MC 100 b b ( 1 p 0 C 4) p MC p or 100 b ( 0 b1 HHI). IEPR!!! IEPR!!!!! What is this IEPR that Gene is ranting about? Run the regressions suggested by Gene. Based upon the resulting values of R and the resulting p-values, are the results of these regressions better than those suggested in parts (4A) and (4B)? 4D. Using the results of the first regression suggested by Gene, what would be the impact on firm pricing of a change in market structure that increases the value of C4 by (5)? Explain. 5. Professor Tufnel teaches an introductory marketing class at a small university near Des Moines, Indiana. He has been accused of gender discrimination (specifically, of giving female students lower grades than male students). Using the data in the worksheet titled Data for Problem 5, you need to evaluate the validity of this accusation. This spreadsheet provides a summary of the Semester Average, Combined SAT Score, Age (a dummy variable indicating if the student is over the age of 5), Gender (a dummy variable indicating if the student is male), and Major of each of the 61 students enrolled in his class during the most recent semester. 5A. Determine the Mean of Semester Average for male students and for female students. How do these two values compare to each other? 5B. Run a regression on the equation ( SemAvg) b0 b1 ( SAT) b ( Over5) b3 ( Male). Based upon the results of this regression, is there evidence of gender discrimination? Is the difference in assigned grades between genders statistically significant at a 1% error level? Explain. 5C. After receiving a report of your results from the regression in part (5B), Professor Tufnel discussed your findings with Professors St. Hubbins and Smalls, two econometricians in his college. They think that there is a major error with the analysis above. They suggest that a regression should be run on the equation ( SemAvg) b0 b1 ( SAT) b ( Over5) b3 ( Male) b4 ( Bus), where (Bus) is a dummy variable indicating whether the student is majoring in one of the three business majors (Economics, Finance, or Marketing) offered by their college. (To assist in the construction of this dummy variable, the business majors have been color-coded light green in Column E of the spreadsheet.) After running the regression suggested by Professors St. Hubbins and Smalls, does there appear to be any evidence of gender discrimination? Explain. 5D. Determine the value of the correlation coefficient between each pair of the variables (SAT), (Over 5), (Male), and (Bus). Based upon these values, explain the apparent discrepancy between the regression results from (5B) and the regression results from (5C).

Answers to Multiple Choice Questions: 1. C. C 3. D 4. A Answers to Problem Solving or Short Answer Questions: 1. Based upon the reported Descriptive Statistics, there appear to be some errors in his dataset. First, the reported minimum values for Salary and Age are each negative. These values do not make sense, since each of these variables should always be positive in value. Second, AdvDeg is a dummy variable, which should only take on a value of either (0) or (1). Thus, the reported maximum value of (15) cannot be correct. Finally, the reported maximum value for Experience is (94). Since this variable is measuring number of years of teaching experience, this reported value is most certainly a mistake. A. Based upon the reported p-values, her estimates for b 0, b 1, and b 3 are statistically significant at a 5% error level (while the estimates for b and b 4 are not). Further, her estimates for b 1 and b 3 are statistically significant at a 1% error level (while the estimates for b 0, b, and b 4 are not). B. Her reported value for R is approximately 1.89548. The mathematical upper bound for R is a value of (1). Thus, there would seem to be some sort of error with her reported results. 3A. Price Sq Footage Lot Size Waterfront Lot Golf Course Lot mean 43,464.98,486.46 0.3 0.4 0. std dev 58,98.63 590.46 0.10 0.43 0.4 max 40,950 4,010 0.65 1 1 min 143,800 1,480 0.17 0 0 3B. The estimated coefficients are: b ˆ0 3,039. 5619, b ˆ1 9. 7640, b ˆ 0,133. 7176, b ˆ3 8,17.1178, and b ˆ4 6,03. 570. 3C. These results imply that a home on a Waterfront Lot will sell for a premium of $8,17.1, while a home on a Golf Course Lot will sell for a premium of $6,03.6. 3D. Based upon the obtained p-values, the estimates for b 1, b, b 3, and b 4 are statistically significant at a 10% error level (while the estimate for b 0 is not). Further, only the estimates for b 1 and b 3 are statistically significant at a 1% error level.

4A. For this regression, R. 00354. The p-values of (.14969) and (.55740) imply that neither ˆb 0 nor ˆb 1 are statistically significant. So, no, the results of this regression are not good. 4B. For this regression, R. 0163. The p-value of (.055) implies that ˆb 1 is not statistically significant. So again, no, the results of this regression are not good. 4C. Gene is ranting about the Inverse Elasticity Pricing Rule. Recall, this rule states that in order to be maximizing profit, a firm must be operating where p MC 1. p p That is, where the markup of price over Marginal Costs (as a percentage of price) is equal to the inverse of the absolute value of Price Elasticity of Demand. Since firms with more market power would tend to face demand for their output that is less elastic (so that the inverse of the absolute value of elasticity is greater in value), we could reasonably expect there to be a positive relation between either C4 or HHI (recall, these are measures of market structure for which a larger value corresponds to a market that is less competitive, in which case firms have more market power ) and any increasing function of p MC p p MC p. By considering 100, Gene is simply suggesting that this percentage increase be stated in such a way to make the values be between (0) p MC p and (100). For the regression on 100 b0 b1( C4), we obtain R.78061, along with p-values of (.00574) and (4.76E-34). These results are much better than those in part (4A). Finally, b ˆ1. 5390, suggesting a positive relation between (C4) and the percentage markup (as expected). For the regression on p MC 100 b0 b1( HHI) p, we obtain R. 7878, along with p-values of (5.0E-10) and (9.E-35). These results are much better than those in part (4B). Finally, b ˆ1. 01463, suggesting a positive relation between (HHI) and the percentage markup (as expected). 4D. If there were a change in market structure causing C4 to increase in value by (5), we see that using the value of b ˆ1. 5390 from the results of the first regression in part (4C) firms in the industry would increase their expected percentage markup by approximately.6951. 5A. There are a total of 35 male students in the sample. These students have a semester,810 average of 35 80. 86. There are a total of 6 female students in the sample. These 1,999 students have a semester average of 76. 885. Thus, a simple comparison of 6

sample means between genders shows that the mean semester average of male students is 3.401 higher than that of female students. 5B. Running a regression for ( SemAvg) b0 b1 ( SAT) b ( Over5) b3 ( Male), we obtain b ˆ3 5.60159. Based upon the p-value for this estimated coefficient (of.0019), this estimate is statistically significant at the 1% error level. Thus, these results would seem to provide evidence of gender discrimination, since male the expected semester average of a male student is 5.60159 points above that for a female student, even after controlling for SAT Score and Age. 5C. Running a regression for ( SemAvg) b0 b1 ( SAT) b ( Over5) b3 ( Male) b4 ( Bus), we obtain b ˆ3 1. 7588 (with a p-value of.8345). Based upon this p-value, gender no longer has a statistically significant impact on semester average. That it, once we control for SAT Score, Age, and Major, there no longer appear to be a difference in grades between male and female students. 5D. The numerical values of the six relevant correlation coefficients are: SAT Over 5 Male Business SAT 1 Over 5 0.05558 1 Male 0.13573 0.033917 1 Business 0.0811 0.06059 0.47005 1 Note that there is a strong, positive correlation between being male and being a business major (implied by the value of 0.47005 above). The regression results from (5C) suggest that while semester averages in this marketing course do not differ between male and female students, there is a substantial, statistically significant difference in performance between business majors and non-business majors (the estimated value of the coefficient attached to (Bus) is b ˆ4 8. 13439, with a p-value of 3.8904E-06). When the dummy variable identifying college major is left out of the regression (as was done in part (5B)), the results suffer from an omitted variable bias, since the estimated coefficient for (Male) (of b ˆ3 5. 60159, with a p-value of.0019) is partly capturing this difference in performance resulting from chosen major. In summary, once we control for SAT Score, Age, and Major, there is no longer any evidence of gender discrimination. Perhaps a better explanation is simply that students who choose to major in a business discipline are likely to be more interested in and perform better in a marketing class (compared to students who have chosen to major in a non-business discipline).