QUALITATIVE EXPLANATORY VARIABLES

Similar documents
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 7: Dummy variable regression

Solución del Examen Tipo: 1

MULTIPLE REGRESSION WITH CATEGORICAL DATA

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

STATISTICAL ANALYSIS OF UBC FACULTY SALARIES: INVESTIGATION OF

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability).

Chapter 7: Simple linear regression Learning Objectives

SPSS Guide: Regression Analysis

LOGIT AND PROBIT ANALYSIS

Linear Models in STATA and ANOVA

17. SIMPLE LINEAR REGRESSION II

Binary Logistic Regression

Module 5: Multiple Regression Analysis

The correlation coefficient

11. Analysis of Case-control Studies Logistic Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Premaster Statistics Tutorial 4 Full solutions

Directions for using SPSS

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Simple Predictive Analytics Curtis Seare

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Introduction to Regression and Data Analysis

Topic 1 - Introduction to Labour Economics. Professor H.J. Schuetze Economics 370. What is Labour Economics?

Financial Risk Management Exam Sample Questions/Answers

Module 3: Correlation and Covariance

Example: Boats and Manatees

Econometrics Simple Linear Regression

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Statistical tests for SPSS

A Primer on Forecasting Business Performance

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

Logistic Regression. BUS 735: Business Decision Making and Research

Nonlinear Regression Functions. SW Ch 8 1/54/

Association Between Variables

2. Linear regression with multiple regressors

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Violent crime total. Problem Set 1

Name: Date: Use the following to answer questions 2-3:

Review Jeopardy. Blue vs. Orange. Review Jeopardy

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

y = a + bx Chapter 10: Horngren 13e The Dependent Variable: The cost that is being predicted The Independent Variable: The cost driver

Part 2: Analysis of Relationship Between Two Variables

Interaction between quantitative predictors

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Multiple Linear Regression

Introduction to Quantitative Methods

Regression III: Advanced Methods

Correlation and Simple Linear Regression

2. Simple Linear Regression

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

II. DISTRIBUTIONS distribution normal distribution. standard scores

NOTES ON HLM TERMINOLOGY

Standard errors of marginal effects in the heteroskedastic probit model

Testing for Granger causality between stock prices and economic growth

MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM. R, analysis of variance, Student test, multivariate analysis

August 2012 EXAMINATIONS Solution Part I

Rockefeller College University at Albany

Regression Analysis (Spring, 2000)

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Mgmt 469. Regression Basics. You have all had some training in statistics and regression analysis. Still, it is useful to review

Session 9 Case 3: Utilizing Available Software Statistical Analysis

Exercise 1.12 (Pg )

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

GLM I An Introduction to Generalized Linear Models

Session 7 Bivariate Data and Analysis

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Simple Regression Theory II 2010 Samuel L. Baker

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Forecast. Forecast is the linear function with estimated coefficients. Compute with predict command

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Statistical Models in R

Final Exam Practice Problem Answers

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

13. Poisson Regression Analysis

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

Panel Data Analysis in Stata

Causal Forecasting Models

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

Power and sample size in multilevel modeling

Revisiting Inter-Industry Wage Differentials and the Gender Wage Gap: An Identification Problem

Additional sources Compilation of sources:

Categorical Data Analysis

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Factors affecting online sales

Transcription:

1 QUALITATIVE EXPLANATORY VARIABLES QUANTITATIVE VS QUALITATIVE VARIABLES A quantitative variable is a variable that can be numerically measured on some well defined scale (e.g., income, price, output, age, height, weight, and family size). A qualitative variable is a variable that indicates the presence or absence of a quality or a characteristic. It has as many categories as possible characteristics. Examples of qualitative variables are as follows. 1) Gender: male or female. 2) Homeownership: own a house or don t own a house 3) Smoker: smoke or don t smoke. 4) Education: high school, college, graduate school. To quantify a qualitative variable, we construct one or more artificial variables called dummy variables. A dummy variable can take two values: 0 or 1. The variable takes the value 1 if the characteristic is present and a value of 0 if the characteristic is absent. ANALYSIS OF VARIANCE MODELS A model for which the dependent variable is a quantitative variable, and all explanatory variables are qualitative variables is called an analysis of variance model. An analysis of variance model is a particular type of classical linear regression model. Analysis of Variance Model with One Qualitative Variable with Two Categories Suppose that we have information on the monthly wage of 49 workers. Suppose that we postulate that an individual s monthly wage depends upon his or her gender. To quantify the qualitative variable gender, we create a dummy variable, designated G t. It is defined as follows. G t = 1 if male G t = 0 if female We can specify this statistical model of wage determination as follows Y t = a + bg t + ε t The error term ε t satisfies all of the assumptions of the classical linear regression model. Y t is the monthly wage of the tth worker, a quantitative variable. G t measures the qualitative variable gender. The equation tells us that the monthly wage for the tth male (G t =1) is given by Y t = a + b + ε t The monthly wage for the tth female (G t = 0) is given by Y t = a + ε t

2 This means that the intercept of the population regression line measures the average monthly wage of female workers (the control group). The slope of the population regression line measures the difference between the average salary of male workers and the average salary of female workers. Thus, if b > 0, then the average salary of male workers is greater than the average salary of female workers. If b< 0, then the average salary of male workers is less than the average salary of female workers. Estimation of the Statistical Model To obtain estimates of the population parameters a and b, we can regress Y t on a constant term and the dummy variable G t using the OLS estimator. It can be shown that the estimate of a is the sample mean salary for females, and the estimate of a + b is the sample mean salary for males. In this case, if we were to divide the 49 workers into 2 groups, male and female, and calculate the sample mean wage for each group, we would get the same result as that obtained from the regression model. Test of the Hypothesis that the Two Population Means Are Equal Suppose that we want to test the following hypothesis: The population mean salary of females is equal to the population mean salary of males. The null and alternative hypotheses can be expressed as the following restriction on the parameters of the statistical model. H o: b 1 = 0 H 1: b 1 0 To test this null hypothesis, we can use a t-test. Note that the t-test of the null hypothesis b = 0 in the regression model is exactly the same as the t-test of the hypothesis that two population means are equal. Analysis of Variance Model with One Qualitative Variable with More Than Two Categories Suppose that we postulate that an individual s monthly wage depends upon his/her job type. Suppose that there are 4 different job types: professional, clerical, crafts, maintenance. This is an example of a qualitative variable that has 4 categories. To quantify the qualitative variable job type, we can define 4 dummy variables, one for each of the four categories. Define the following. J 1 = 1 if professional J 2 = 1 if clerical J 3 = 1 if crafts J 4 = 1 if maintenance J 1 = 0 otherwise J 2 = 0 otherwise J 3 = 0 otherwise J 4 = 0 otherwise Note that if for a particular worker J 1 =1 then J 2 = 0, J 3 = 0, and J 4 = 0 for this worker. If J 2 =1 then J 1 = 0, J 3 = 0, and J 4 = 0 for this worker, etc. The statistical model of wage determination can be specified as follows Y t = a + b 1J t1 + b 2J t2 + b 3J t3 + ε t.

3 Note that we excluded one of the dummy variables. We excluded the dummy variable for maintenance, J 4. By doing this, we have selected maintenance as the control group. It is represented by the constant term a. In general, when we represent a qualitative variable with k categories with k dummy variables, we can only include k 1 of those dummy variables in the regression model. This is because if we included all k dummy variables in the model, we would have perfect multicollinearity and we could not estimate the model. The category for which we do not include a dummy variable is represented by the constant and is interpreted as the control group or reference group. Thus, it was not necessary to create a dummy variable for the category maintenance. The monthly wage for the tth maintenance worker is given by Y t = a + ε t The monthly wage for the tth professional worker is given by Y t = a + b 1 + ε t The monthly wage for the tth clerical worker is given by Y t = a + b 2 + ε t The monthly wage for the tth crafts worker is given by Y t = a + b 3 + ε t It follows that b 1 is the difference between the average salary of a professional and a maintenance workers; b 2 is the difference between the average salary of a clerical worker and a maintenance worker; b 3 is the difference between the average salary of a crafts worker and a maintenance worker. Estimation of the Statistical Model To obtain estimates of the population parameters, a, b 1, b 2 and b 3, we can regress Y t on a constant term and the dummy variables J t1, J t2, J t3 using the OLS estimator. It can be shown that a is the sample mean salary for maintenance workers, a + b 1 is the sample mean salary for professionals, a + b 2 is the sample mean salary for clerical workers, and a + b 3 is the sample mean salary for crafts workers. In this case, if we were to divide the 49 workers into 4 groups, maintenance, professional, clerical, and crafts, and calculate the sample mean wage for each group, we would get the same result as that obtained from the regression model. Tests of Hypotheses There are a number of alternative hypotheses that we can test.

4 Analysis of Variance Model with Two Qualitative Variables Suppose that we postulate that an individual s monthly wage depends on his/her job type and his/her gender. Job-type is represented by the 4 dummy variables defined above: J 1, J 2, J 3, J 4. Gender is represented by the dummy variable G defined above. This is an example of a model with two qualitative variables. The qualitative variable job type has 4 categories. The qualitative variable gender has two categories. The statistical model of wage determination can be specified as follows Y t = a + b 1J t1 + b 2J t2 + b 3J t3 + b 4G t + ε t Note that, once again, we have excluded one of the dummy variables for the qualitative variable job type. We excluded the dummy variable for maintenance. By doing this we have selected maintenance for the control group. For the gender dummy variable, G = 0 indicates female, and therefore we have selected female for the control group. Thus, in this model, with two qualitative variables, the control group is female maintenance workers. The group female maintenance workers is represented by the constant. The monthly wage for the tth worker is given by tth female maintenance worker: Y t = a + ε t tth male maintenance worker. Y t = a + b 4 + ε t tth female professional: tth male professional: tth female clerical worker: tth male clerical worker: tth female crafts worker: tth male crafts worker: Y t = a + b 1 + ε t Y t = a + b 1 + b 4 + ε t Y t = a + b 2 + ε t Y t = a + b 2 + b 4 + ε t Y t = a + b 3 + ε t Y t = a + b 3 + b 4 + ε t It is important to understand that this model imposes the restriction that the difference between the average salary of a male and a female is the same regardless of the job type; that is, the male/female wage differential is the same for maintenance workers, professionals, clerical workers, and crafts workers. This difference is given by the parameter attached to the dummy variable for gender (G), which is b 4. Estimation of the Model To obtain estimates of the population parameters a, b 1, b 2, b 3, and b 4, we can regress Y t on a constant term and the dummy variables J 1, J 2, J 3, J 4 and G using the OLS estimator

5 Tests of Hypotheses Once again, we can test hypotheses about differences between or among population mean salaries for the various groups using the appropriate t-test or F-test. Regression Model Vs Analysis of Variance This regression model with two qualitative variables (job type and gender) represented by 4 dummy variables as regressors describes the same phenomenon and leads to the same test results as a two-way analysis of variance model. Analysis of Variance Model with Two Qualitative Variables and Interaction Terms The regression model with two or more qualitative variables can be generalized further by including interaction terms. Once again, suppose that we postulate that an individual s monthly wage depends upon his/her job type and his/her gender. The previous model imposes the restriction that the difference between the monthly wage of a male and a female is the same regardless of their job type. This male/female wage difference is given by the parameter attached to the dummy variable for gender (G), which is b 4. Suppose that we don t want to impose this restriction. Suppose that we want to allow the difference between the monthly wage of a male and a female to differ by job type. To do this, we need to include interaction terms between gender and job type in the model. The statistical model of wage determination can be specified as follows Y t = a + b 1J t1 + b 2J t2 + b 3J t3 + b 4G t + b 5J t1g t + b 6J t2g t + b 7J t3g t + ε t To obtain an interaction term between gender and a particular job type category, we simply multiply the dummy variable for the gender by the dummy variable for that job type category. Notice that once again the control group is female maintenance workers. The monthly wage for the tth worker is given by tth female maintenance worker: Y t = a tth male maintenance worker. Y t = a + b 4 tth female professional: Y t = a + b 1 tth male professional: Y t = a + b 1 + b 4 + b 5 tth female clerical worker: Y t = a + b 2 tth male clerical worker: Y t = a + b 2 + b 4 + b 6 tth female crafts worker: Y t = a + b 3 tth male crafts worker: Y t = a + b 3 + b 4 + b 7

6 Notice that the difference between the average salary for a male and female can be different for maintenance workers, professionals, clerical workers, and crafts workers. The male/female wage differential will differ for these different job types if the coefficients of the interaction terms, b 5, b 6, and b 7, are non-zero and have different magnitudes. Estimation of the Model To obtain estimates of the population parameters a, b 1, b 2, b 3, b 4, b 5, b 6, and b 7, we can regress Y t on a constant term and the dummy variables and interaction variables (which are also dummy variables) J 1, J 2, J 3, J 4, G, J 1G, J 2G, and J 3G using the OLS estimator. It can be shown that the above means are identical to the sample means for 8 separate groups of workers: female maintenance workers, male maintenance workers, female professionals, male professionals, female clerical workers, male clerical workers, female crafts workers, male craft workers. Thus, if we were to divide the 49 workers into 8 groups, female maintenance workers, male maintenance workers, female professionals, male professionals, female clerical workers, male clerical workers, female crafts workers, male craft workers, and calculate the sample mean for each of the groups we would get the same results. Tests of Hypotheses Once again, we can test hypotheses about differences between or among population mean salaries for the various groups using the appropriate t-test or F-test. Regression Model Vs Analysis of Variance This regression model with two qualitative variables (job type and gender) and interaction terms between the two qualitative variables, represented by 7 dummy variables as regressors, describes the same phenomenon and leads to the same test results as a two-way analysis of variance model with interactions. ANALYSIS OF COVARIANCE MODELS In an analysis of variance model, all of the explanatory variables are qualitative variables measured by dummy variables. These types of models are not used very often in economics. This is because the dependent variable usually depends on one or more quantitative variables as well as one or more qualitative variables. A statistical model that includes both qualitative and quantitative explanatory variables is often times called an analysis of covariance model in the statistics literature. Varying Intercept Parameter Models A varying intercept model allows the intercept to differ for two or more categories, but requires the slope to be the same for all categories. Example Suppose that we postulate that an individual s monthly wage depends upon his/her gender and his/her years of work experience. That is,

7 Wage = (gender, experience). Gender is a qualitative variable that has two categories: male and female. Experience is a quantitative variable that is measured in years. We can specify this statistical model of wage determination as follows Y t = a + bg t + b 2E t + ε t Where G is the dummy variable for gender (G = 1 if male; G = 0 if female), and E is years of work experience. The error term satisfies all of the assumptions of the classical linear regression model. By including the quantitative variable experience, we can now address the question: Is there a wage differential between male and female workers who have the same number of years of work experience? The monthly wage for the tth male worker with E t years of experience, and the tth female worker with E t years of experience are given by tth male worker with E t years of experience: tth female worker with E t years of experience: Y t = a + b 1 + b 2E t + ε t Y t = a + b 2E t + ε t Thus, we are postulating that the intercept for the wage function differs for males and females. The intercept for males is given by (a + b 1) and the intercept for females is given by a. Thus the difference between the intercepts is given by b. If b 1 > 0, then for any given number of years of experience males make higher wages than females. If b 1 < 0, then for any given number of years of experience males make lower wages than females. Estimation of the Statistical Model To obtain estimates of the population parameters, we can regress the monthly wage on the dummy variable for gender and the number of years of experience, using OLS. Hypothesis Tests To test the hypothesis of no gender discrimination, we can test the following restriction on the parameters of the statistical model. H o: b1 = 0 H 1: b 0 Extension of Model We can include more than one quantitative variable in the model if we so desire. For example, we could include both years of experience and years of schooling. The conditional means would then give us information on the average wages of males and females for any given number of years of experience and years of schooling. When testing for wage discrimination between males and females, it is important to control for all important factors that influence wages other than

8 gender. If we don t, then the coefficient attached to gender will be biased and the test of wage discrimination will not be valid. Varying Slope Parameter Models A varying slope parameter model allows the slope to differ for two or more categories, but requires the intercept to be the same for all categories. Example Suppose that we postulate that an individual s monthly wage depends his/her years of work experience. That is, Wage = (experience). However, we also postulate that the amount by which the wage changes for each additional year of work experience differs for males and females. We can specify this statistical model of wage determination as follows Y t = a + b 1E t + b 2E tg t + ε t Notice that we have included an interaction term between experience and gender. The monthly wage for the tth male worker with E t years of experience, and the tth female worker with E t years of experience are given by tth male worker with E t years of experience: tth female worker with E t years of experience: Y t = a + (b 1 + b 2)E t + ε t Y t = a + b 1E t + ε t Thus, we are postulating that the slope for the wage function differs for males and females. The slope for males is given by (b 1 + b 2) and the slope for females is given by b 1. Thus the difference between the slopes is given by b 2, which is the coefficient of the interaction term between experience and gender. If b 2 > 0, then an additional year of experience for males results in a bigger wage increase than for males. If b 2 < 0, then an additional year of experience for males results in a smaller wage increase for males than for females. Estimation of the Statistical Model To obtain estimates of the population parameters, we can regress the monthly wage on the number of years of experience and the interaction term between experience and gender, using OLS. Hypothesis Tests To test the hypothesis of no gender discrimination in wage increases for additional years of experience, we can test the following restriction on the parameters of the statistical model:

9 H o: b 2 = 0 against H 1: b 2 0 Extension of Model We can include more than one quantitative variable in the model if we so desire. For example, we could include both years of experience and years of schooling. We could allow the wage increase for an additional year of experience and the wage increase for an additional year of schooling to differ for males and females. Varying Intercept and Slope Parameter Models A varying intercept and slope parameter model allows both the intercept and slope to differ for two or more categories. Example Suppose that we postulate that an individual s monthly wage depends upon his/her gender and his/her years of work experience. That is, Wage = (gender, experience). Suppose that we postulate that the wages of males and females will be different for any given number of years of experience, and wage increases of males and females will be different for an additional year of work experience. We can specify this statistical model of wage determination as follows Y t = a + b 1G t + b 2E t + b 3E tg t + ε t The monthly wage for the tth male worker with E t years of experience, and the tth female worker with E t years of experience are given by tth male worker with E t years of experience: tth female worker with E t years of experience: Y t = (a + b 1) + (b 2 + b 3)E t + ε t Y t = a + b 2E t + ε t Thus, we are postulating that both the intercept and the slope for the wage function differs for males and females. The intercept for males is given by (a + b 1) and the intercept for females is given by a. The slope for males is given by (b 2 + b 3) and the slope for females is given by b 2. Thus the difference between the intercepts is given by b. If b 1 > 0, then for any given number of years of experience males make higher wages than females. If b 1 < 0, then for any given number of years of experience male make lower wages than females. The difference between the slopes is given by b 3, which is the coefficient of the interaction term between experience and gender. If b 3> 0, then an additional year of experience for males results in a bigger wage increase than for males. If b 3 < 0, then an additional year of experience for males results in a smaller wage increase for males than for females

10 Estimation of the Statistical Model To obtain estimates of the population parameters, we can regress the monthly wage on the dummy variable for gender and the number of years of experience, and the interaction term between experience and gender, using OLS. Hypothesis Tests To test the hypothesis of no gender discrimination, we can test the following restrictions on the parameters of the statistical model. H o: b 1 = b 3 = 0 H 1: At least one is non-zero Alternative Specification of the Model It can be shown that the coefficient estimates of the varying intercept and slope parameter model given by Y t = a + b 1G t + b 2E t + b 3E tg t + ε t are exactly the same as the coefficient estimates that we would obtain if we estimated two separate wage equations: one wage equation for males, and one wage equation for females. Thus, we could specify this model equivalently as follows. Wage equation for males: Y t = o + 1E t + ε t, where o = (a + b 1), and 1 = (b 2 + b 3). Wage equation for female: Y t = a + b 2E t + t. Thus, we have a sample of 49 workers: 25 male workers, and 24 female workers. We can regress the wage of males on years of experience for males for the subsample of 25 male workers. We can regress the wage of females on years of experience for females for the subsample of 24 workers. Difference between the Two Models The only difference between the two models involves the estimation of the error variance 2. If the error variance for males is the same as the error variance for females, then using the dummy variable model will give us a more efficient estimate of 2. This estimate is ESS/ (n k), where ESS is the residual sum of squares and n k is the number of degrees of freedom for the dummy variable model.