Chapter 7: Dummy variable regression



Similar documents
IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

17. SIMPLE LINEAR REGRESSION II

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Multinomial and Ordinal Logistic Regression

Multivariate Logistic Regression

Elements of statistics (MATH0487-1)

Solución del Examen Tipo: 1

Nonlinear Regression Functions. SW Ch 8 1/54/

11. Analysis of Case-control Studies Logistic Regression

Introduction to Regression and Data Analysis

Point Biserial Correlation Tests

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Introduction to Quantitative Methods

1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability).

Module 5: Multiple Regression Analysis

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Panel Data Analysis in Stata

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Interaction between quantitative predictors

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Forecast. Forecast is the linear function with estimated coefficients. Compute with predict command

Contrast Coding in Multiple Regression Analysis: Strengths, Weaknesses, and Utility of Popular Coding Structures

5. Linear Regression

Interaction effects and group comparisons Richard Williams, University of Notre Dame, Last revised February 20, 2015

Multiple Linear Regression

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Research Methods & Experimental Design

Simple Linear Regression Inference

Pearson's Correlation Tests

Using Heterogeneous Choice Models. To Compare Logit and Probit Coefficients Across Groups

Regression Analysis (Spring, 2000)

Directions for using SPSS

Chapter 7: Simple linear regression Learning Objectives

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

5. Multiple regression

II. DISTRIBUTIONS distribution normal distribution. standard scores

MULTIPLE REGRESSION WITH CATEGORICAL DATA

IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD

Financial Risk Management Exam Sample Questions/Answers

Univariate Regression

Correlational Research

Introduction to Fixed Effects Methods

Topic 1 - Introduction to Labour Economics. Professor H.J. Schuetze Economics 370. What is Labour Economics?

Part 2: Analysis of Relationship Between Two Variables

Moderator and Mediator Analysis

LOGIT AND PROBIT ANALYSIS

2. Linear regression with multiple regressors

Lecture 15. Endogeneity & Instrumental Variable Estimation

Statistics Review PSY379

August 2012 EXAMINATIONS Solution Part I

Mgmt 469. Regression Basics. You have all had some training in statistics and regression analysis. Still, it is useful to review

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Additional sources Compilation of sources:

Testing and Interpreting Interactions in Regression In a Nutshell

The correlation coefficient

Econometrics Simple Linear Regression

Multiple Regression: What Is It?

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

Multiple Choice Models II

A Primer on Forecasting Business Performance

HYPOTHESIS TESTING: POWER OF THE TEST

Regression Analysis: A Complete Example

Week TSX Index

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

14.74 Lecture 9: Child Labor

SIMON FRASER UNIVERSITY

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Module 5: Statistical Analysis

Regression III: Advanced Methods

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Factors affecting online sales

Independent t- Test (Comparing Two Means)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

Name: Date: Use the following to answer questions 2-3:

Statistical Machine Learning

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Transcription:

Chapter 7: Dummy variable regression Why include a qualitative independent variable?........................................ 2 Simplest model 3 Simplest case............................................................. 4 Example (continued)........................................................ 5 Possible solution: separate regressions............................................ 6 Independent variable vs. regressor............................................... 7 Common slope model....................................................... 8 Testing................................................................. 9 More general models 10 More than one quantitative independent variable..................................... 11 Polytomous independent variables............................................... 12 Example (continued)........................................................ 13 Testing with polytomous independent variable....................................... 14 R commands............................................................. 15 More than one qualitative independent variable...................................... 16 Interaction 17 Definition................................................................ 18 Interaction vs. correlation..................................................... 19 Constructing regressors...................................................... 20 Testing................................................................. 21 Principle of marginality...................................................... 22 Polytomous independent variables............................................... 23 Hypothesis tests........................................................... 24 Standardized estimates....................................................... 25 Interaction between categorical variables........................................... 26 1

Why include a qualitative independent variable? We are interested in the effect of a qualitative independent variable (for example: do men earn more than women?) We want to better predict/describe the dependent variable. We can make the errors smaller by including variables like gender, race, etc. Qualitative variables may be confounding factors. Omitting them may cause biased estimates of other coefficients. 2 / 26 Simplest model 3 / 26 Simplest case Example: Dependent variable: income One quantitative independent variable: education One dichotomous (can take two values) independent variable: gender Assume effect of either independent variable is the same, regardless of the value of the other variable (additivity, parallel regression lines) - See pictures from book. Usual assumptions on statistical errors: independent, zero means, constant variance, normally distributed, fixed X s or X independent of statistical errors. 4 / 26 Example (continued) Suppose that we are interested in the effect of education on income, and that gender has an effect on income. See pictures from book. Scenario 1: Gender and education are uncorrelated Gender is not a confounding factor Omitting gender gives correct slope estimate, but larger errors Scenario 2: Gender and education are correlated Gender is a confounding factor Omitting gender gives biased slope estimate, and larger errors 5 / 26 2

Possible solution: separate regressions Fit separate regression for men and women Disadvantages: How to test for the effect of gender? If it is reasonable to assume that regressions for men and women are parallel, then it is more efficient to use all data to estimate the common slope. 6 / 26 Independent variable vs. regressor Y =income, X=education, D=regressor for gender: D i = { 1 for men 0 for women Independent variable = real variables of interest Regressor = variable put in the regression model In general, regressors are functions of the independent variables. Sometimes regressors are equal to the independent variables. 7 / 26 Common slope model Y i = α + βx i + γd i + ǫ i For women (D i = 0): For men (D i = 1): Y i = α + βx i + γ 0 + ǫ i = α + βx i + ǫ i Y i = α + βx i + γ 1 + ǫ i = (α + γ) + βx i + ǫ i See picture from book. What are the interpretations of α, β and γ? What happens if we code D = 1 for women and D = 0 for men? 8 / 26 3

Testing Test the partial effect of gender: H 0 : γ = 0, H a : γ 0 Same as before: Compute t-statistic or incremental F-test Test the partial effect of education: H 0 : β = 0, H a : β 0 Same as before: Compute t-statistic or incremental F-test Cystic fibrosis example. 9 / 26 More general models 10 / 26 More than one quantitative independent variable All methods go through, as long as we assume parallel regression surfaces. Model: Y i = α + β 1 X i1 + + β k X ik + γd i + ǫ i. Women (D i = 0): Y i = α + β 1 X i1 + + β k X ik + γ 0 + ǫ i = α + β 1 X i1 + + β k X ik + ǫ i Men (D i = 1): Y i = α + β 1 X i1 + + β k X ik + γ 1 + ǫ i = (α + γ) + β 1 X i1 + + β k X ik + ǫ i Interpretation of α, β 1,...,β k, γ. 11 / 26 4

Polytomous independent variables Qualitative variable with more than two categories Example: Duncan data: Dependent variable: Y =prestige Quantitative independent variables: X 1 =income and X 2 =education Qualitative independent variable: type (bc, prof, wc) D 1 and D 2 are regressors for type: Type D 1 D 2 Blue collar (bc) 0 0 Professsional (prof) 1 0 White collar (wc) 0 1 If there are p categories, use p 1 dummy regressors. What happens if we use p regressors? 12 / 26 Example (continued) Y = α + β 1 X 1 + β 2 X 2 + γ 1 D 1 + γ 2 D 2 + ǫ Blue collar (D i1 = 0 and D i2 = 0): Y i = α + β 1 X i1 + β 2 X i2 + γ 1 0 + γ 2 0 + ǫ i = α + β 1 X i1 + β 2 X i2 + ǫ i Professional (D i1 = 1 and D i2 = 0): Y i = α + β 1 X i1 + β 2 X i2 + γ 1 1 + γ 2 0 + ǫ i = (α + γ 1 ) + β 1 X i1 + β 2 X i2 + ǫ i White collar (D i1 = 0 and D i2 = 1): Y i = α + β 1 X i1 + β 2 X i2 + γ 1 0 + γ 2 1 + ǫ i = (α + γ 2 ) + β 1 X i1 + β 2 X i2 + ǫ i 13 / 26 5

Testing with polytomous independent variable Test partial effect of type, i.e., the effect of type controlling for income and education. H 0 : γ 1 = γ 2 = 0 H a : at least one γ j 0, j = 1,2. Incremental F-test: Null model: Y = α + β 1 X 1 + β 2 X 2 + ǫ Full model: Y = α + β 1 X 1 + β 2 X 2 + γ 1 D 1 + γ 2 D 2 + ǫ What do the individual p-values in summary(lm()) mean? First look at F-test, then at individual p-values 14 / 26 R commands Creating dummy variables by hand: D1 <- (type=="prof")*1 D2 <- (type=="wc")*1 m1 <- lm(prestige education+income+d1+d2) Letting R do things automatically: m1 <- lm(prestige education+income+type) m1 <- lm(prestige education+income+factor(type)) The use of factor(): factor() is not needed in this example, because the coding of the categories is in words: bc, prof, wc. It is essential to use factor() if the coding of the categories is numerical! To be safe, you can always use factor. Example R-code 15 / 26 6

More than one qualitative independent variable Example: Y =prestige, X 1 =income, X 2 =education, Type D 1 D 2 Blue collar 0 0 Professional 1 0 White collar 0 1 and Gender D3 Women 0 Men 1 Y = α + β 1 X 1 + β 2 X 2 + γ 1 D 1 + γ 2 D 2 + γ 3 D 3 + ǫ What is the equation for men with professional jobs? And for women with white collar jobs? 16 / 26 Interaction 17 / 26 Definition Two variables are said to interact in determining a dependent variable if the partial effect of one depends on the value of the other. So far we only studied models without interaction. Interaction between a quantitative and a qualitative variable means that the regression surfaces are not parallel. See picture. Interaction between two qualitative variables means that the effect of one of the variables depends on the value of the other variable. Example: the effect of type of job on prestige is bigger for men than for women. Interaction between two quantitative variables is a bit harder to interpret, and we may consider that later. 18 / 26 Interaction vs. correlation First, note that in general, the independent variables are not independent of each other. Correlation: Independent variables are statistically related to each other. Interaction: Effect of one independent variable on the dependent variable depends on the value of the other independent variable. Two independent variables can interact whether or not they are correlated. 19 / 26 7

Constructing regressors Y =income, X=education, D=dummy for gender Y i = α + βx i + γd i + δ(x i D i ) + ǫ i Note X D is a new regressor. It is a function of X and D, but not a linear function. Therefore we do not get perfect collinearity. Women (D i = 0): Men (D i = 1) Y i = α + βx i + γ 0 + δ(x i 0) + ǫ i = α + βx i + ǫ i Y i = α + βx i + γ 1 + δ(x i 1) + ǫ i = (α + γ) + (β + δ)x i + ǫ i Interpretation of α, β, γ, δ. 20 / 26 Testing Testing for interaction is testing for a difference in slope between men and women. H 0 : δ = 0 and H a : δ 0. What is the difference between: The model with interaction Fitting two separate regression lines for men and women 21 / 26 Principle of marginality If interaction is significant, do not test or interpret main effects: First test for interaction effect. If no interaction, test and interpret main effects. If interaction is included in the model, main effects should also be included. See pictures of models that violate the principle of marginality. 22 / 26 8

Polytomous independent variables Create interaction regressors by taking the products of all dummy variable regressors and the quantitative variable. Example: Y =prestige, X 1 =education, X 2 =income D 1,D 2 =dummies for type of job Y = α + β 1 X 1 + β 2 X 2 + γ 1 D 1 + γ 2 D 2 + δ 11 X 1 D 1 + δ 12 X 1 D 2 + δ 21 X 2 D 1 + δ 22 X 2 D 2 + ǫ Interpretation of parameters 23 / 26 Hypothesis tests When testing for main effects and interactions, follow principle of marginality Use incremental F-test Examples in R-code 24 / 26 Standardized estimates Do not standardize dummy-regressor coefficients. Dummy regressor coefficient has clear interpretation. By standardizing it, this interpretation gets lost. Therefore we don t standardize dummy regressor coefficients. Also, don t standardize interaction regressors. You can standardize the quantitative independent variable before taking its product with the dummy regressor. 25 / 26 9

Interaction between categorical variables Example: Does reproduction reduce lifespan of male fruitflies? Experiment: male flies with 1 pregnant (not receptive) female per day male flies with 8 pregnant females per day male flies with 1 virgin (receptive) female per day male flies with 8 virgin females per day male flies without females Each group contains 25 fruitflies Available information: Thorax length in mm Percentage of time sleeping Longevity in days See plots 26 / 26 10