Direct and indirect effects in a logit model



Similar documents
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Module 14: Missing Data Stata Practical

Competing-risks regression

From this it is not clear what sort of variable that insure is so list the first 10 observations.

Estimating the random coefficients logit model of demand using aggregate data

Discussion Section 4 ECON 139/ Summer Term II

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Multinomial and Ordinal Logistic Regression

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Implementation Committee for Gender Based Salary Adjustments (as identified in the Pay Equity Report, 2005)

FOREIGN AFFAIRS PROGRAM EVALUATION GLOSSARY CORE TERMS

August 2012 EXAMINATIONS Solution Part I

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

Interaction effects between continuous variables (Optional)

Calculating Effect-Sizes

Regression III: Advanced Methods

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Ordinal Regression. Chapter

SAS Software to Fit the Generalized Linear Model

Corporate Defaults and Large Macroeconomic Shocks

Nested Logit. Brad Jones 1. April 30, University of California, Davis. 1 Department of Political Science. POL 213: Research Methods

Probit Analysis By: Kim Vincent

Nonlinear Regression Functions. SW Ch 8 1/54/

III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

LOGIT AND PROBIT ANALYSIS

MULTIPLE REGRESSION EXAMPLE

SUMAN DUVVURU STAT 567 PROJECT REPORT

Sample Size Calculation for Longitudinal Studies

From the help desk: hurdle models

How to set the main menu of STATA to default factory settings standards

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham City DH1 3LE UK

Introduction. Survival Analysis. Censoring. Plan of Talk

Statistical modelling with missing data using multiple imputation. Session 4: Sensitivity Analysis after Multiple Imputation

10 Dichotomous or binary responses

Correlation and Regression

Simple Random Sampling

Linear Regression Models with Logarithmic Transformations

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

The CRM for ordinal and multivariate outcomes. Elizabeth Garrett-Mayer, PhD Emily Van Meter

Simple Linear Regression

Private Forms of Unemployment Protection and Social Stratification. Alison Koslowski University of Edinburgh, UK

GLOSSARY OF EVALUATION TERMS

Software for Analysis of YRBS Data

and Gologit2: A Program for Ordinal Variables Last revised May 12, 2005 Page 1 ologit y x1 x2 x3 gologit2 y x1 x2 x3, pl lrforce

Basic Statistical and Modeling Procedures Using SAS

The Regression Calibration Method for Fitting Generalized Linear Models with Additive Measurement Error

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

In the general population of 0 to 4-year-olds, the annual incidence of asthma is 1.4%

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Development of the nomolog program and its evolution

Regression Modeling Strategies

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Chapter 18. Effect modification and interactions Modeling effect modification

especially with continuous

October 31, The effect of price on youth alcohol. consumption in Canada. Stephenson Strobel & Evelyn Forget. Introduction. Data and Methodology

Lecture 1: Review and Exploratory Data Analysis (EDA)

Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Introduction to structural equation modeling using the sem command

Education and Wage Differential by Race: Convergence or Divergence? *

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

Logistic regression modeling the probability of success

Classification Problems

Using Stata for Categorical Data Analysis

Standard errors of marginal effects in the heteroskedastic probit model

Statistics, Data Analysis & Econometrics

Lecture 15. Endogeneity & Instrumental Variable Estimation

SUPPORTING LOGISTICS DECISIONS BY USING COST AND PERFORMANCE MANAGEMENT TOOLS. Zoltán BOKOR. Abstract. 1. Introduction

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

The average hotel manager recognizes the criticality of forecasting. However, most

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

A General Approach to Variance Estimation under Imputation for Missing Survey Data

WOMEN-OWNED BUSINESSES AND ACCESS TO BANK CREDIT: A TIME SERIES PERSPECTIVE

VI. Introduction to Logistic Regression

Complex Survey Design Using Stata

The Wage Return to Education: What Hides Behind the Least Squares Bias?

Does Financial Sophistication Matter in Retirement Preparedness of U.S Households? Evidence from the 2010 Survey of Consumer Finances

MEASURING THE INVENTORY TURNOVER IN DISTRIBUTIVE TRADE

Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data

Youth Risk Behavior Survey (YRBS) Software for Analysis of YRBS Data

Transcription:

Department of Social Research Methodology Vrije Universiteit Amsterdam http://home.fsw.vu.nl/m.buis/

Outline The aim

The Total Effect X Y

The Total Effect parental class attend college

The Indirect Effect Z a b X Y

The Indirect Effect during high school a b parental class attend college

The Direct Effect Z a b X c Y

The Direct Effect during high school a b parental class c attend college

The aim Z The aim is to find the size of the indirect effect relative to the total effect. a b X c Y

Outline The aim

Estimation When using regress: 1. college = class + 2. college = class

Estimation When using regress: 1. college = class + 2. college = class The direct effect is the effect of class in model 1.

Estimation When using regress: 1. college = class + 2. college = class The direct effect is the effect of class in model 1. The total effect is the effect of class in model 2.

Estimation When using regress: 1. college = class + 2. college = class The direct effect is the effect of class in model 1. The total effect is the effect of class in model 2. The indirect effect is the total effect - direct effect.

Estimation When using regress: 1. college = class + 2. college = class The direct effect is the effect of class in model 1. The total effect is the effect of class in model 2. The indirect effect is the total effect - direct effect. This won t work when using logit

Why the naive method doesn t work Easiest explained when there is no indirect effect.

Why the naive method doesn t work Easiest explained when there is no indirect effect. The total effect = the direct effect + the indirect effect.

Why the naive method doesn t work Easiest explained when there is no indirect effect. The total effect = the direct effect + the indirect effect. So, the total effect should be the same as the direct effect when there is no indirect effect.

Why the naive method doesn t work Easiest explained when there is no indirect effect. The total effect = the direct effect + the indirect effect. So, the total effect should be the same as the direct effect when there is no indirect effect. So, the effect of class in a model that controls for (the direct effect ) should be the same as the effect of class in a model that does not control for (the total effect ).

Effect while controlling for 3 1.5 log odds 0 1.5 0 proportion high medium low 1.5 transformation controlled 3 proportion high status log odds log odds low status proportion effect controlled

Averaging the proportions 3 log odds 1.5 0 1.5 0 proportion high medium low not controlled 1.5 transformation controlled 3 proportion high status log odds log odds low status proportion effect controlled

Effect while not controlling for 3 log odds 1.5 0 1.5 3 proportion high status log odds log odds low status proportion 1.5 0 proportion high medium low not controlled transformation controlled not constrolled effect controlled not constrolled

Outline The aim

Indirect effect present log odds 3 1.5 0 1.5 3 prop. high status log odds log odds low status prop. 1.5 0 proportion high medium low not controlled transformation controlled not constrolled effect controlled not constrolled

Indirect effect 3 log odds 1.5 0 1.5 3 prop. high status log odds indirect effect log odds low status prop. 1.5 0 proportion factual high medium low not controlled counterfactual high low not controlled

Direct effect 3 log odds 1.5 0 1.5 3 prop. high status log odds direct effect log odds low status prop. 1.5 0 proportion factual high medium low not controlled counterfactual high low not controlled

Direct and indirect effects in logit 3 log odds 1.5 0 1.5 3 proportion high status log odds indirect effect direct effect total effect log odds low status proportion 1.5 0 proportion factual high medium low not controlled counterfactual high low not controlled

The logic can be reversed 3 log odds 1.5 0 total effect direct effect 1.5 0 proportion factual high medium low not controlled 1.5 3 prop. high status log odds indirect effect log odds low status prop. counterfactual high low not constrolled

Extension Erikson et al. (2005) propose to compute the average proportions given the observed and counterfactual distribution of by assuming that is normally distributed, and then integrate over this normal distribution.

Extension Erikson et al. (2005) propose to compute the average proportions given the observed and counterfactual distribution of by assuming that is normally distributed, and then integrate over this normal distribution. Alternatively, these averages can be computed by predicting the observed and counterfactual proportions, add them up and divide by the number of respondents in that group.

Extension Erikson et al. (2005) propose to compute the average proportions given the observed and counterfactual distribution of by assuming that is normally distributed, and then integrate over this normal distribution. Alternatively, these averages can be computed by predicting the observed and counterfactual proportions, add them up and divide by the number of respondents in that group. The latter method has the advantage of making less assumptions about the distribution of, as it integrates over the empirical distribution of instead of over a normal distribution.

Outline The aim

Descriptives. table ocf57 if!missing(hsrankq, college), /// > contents(mean college mean hsrankq freq) /// > format(%9.3g) stubwidth(15) occupation of r father in 1957 mean(college) mean(hsrankq) Freq. lower.284 48.2 5,218 middle.38 50.6 868 higher.619 56.2 2,837

The ldecomp package ldecomp depvar [ if ] [ in ] [ weight ], direct(varname) indirect(varlist) [ obspr predpr predodds or rindirect normal range(##) nip(#) interactions nolegend nodecomp nobootstrap bootstrap_options ]

Decomposition of log odds ratios. ldecomp college, direct(ocf57) indirect(hsrankq) rind nolegend (running _ldecomp on estimation sample) Bootstrap replications (50) 1 2 3 4 5... 50 Bootstrap results Number of obs = 8923 Replications = 50 Observed Bootstrap Normal-based Coef. Std. Err. z P> z [95% Conf. Interval] 2/1 total.4367997.0689983 6.33 0.000.3015655.5720339 indirect1.0593679.0274903 2.16 0.031.0054878.1132479 direct1.3774319.0729045 5.18 0.000.2345416.5203221 indirect2.0586611.0277894 2.11 0.035.0041949.1131272 direct2.3781386.0731829 5.17 0.000.2347028.5215744 3/1 total 1.410718.0486595 28.99 0.000 1.315347 1.506088 indirect1.2058881.0176897 11.64 0.000.1712169.2405594 direct1 1.204829.0471822 25.54 0.000 1.112354 1.297305 indirect2.2012494.0186496 10.79 0.000.1646968.237802 direct2 1.209468.0472203 25.61 0.000 1.116918 1.302018 3/2 total.9739179.0775048 12.57 0.000.8220112 1.125825 indirect1.1461109.0303069 4.82 0.000.0867104.2055115 direct1.8278069.0730737 11.33 0.000.684585.9710288 indirect2.1432144.0317941 4.50 0.000.080899.2055297 direct2.8307035.073588 11.29 0.000.6864737.9749333

Relative effects 2/1r 3/1r 3/2r method1.1359155.0684536 1.99 0.047.0017489.2700821 method2.1342975.0691401 1.94 0.052 -.0012147.2698096 average.1351065.0687727 1.96 0.049.0003144.2698986 method1.1459457.0121411 12.02 0.000.1221496.1697418 method2.1426575.0127593 11.18 0.000.1176498.1676652 average.1443016.0123863 11.65 0.000.1200249.1685782 method1.1500239.0290461 5.17 0.000.0930946.2069532 method2.1470497.0307503 4.78 0.000.0867803.2073192 average.1485368.0298296 4.98 0.000.0900718.2070018

Decomposition of odds ratios. ldecomp college, direct(ocf57) indirect(hsrankq) or nolegend (running _ldecomp on estimation sample) Bootstrap replications (50) 1 2 3 4 5... 50 Bootstrap results Number of obs = 8923 Replications = 50 Observed Bootstrap Normal-based Odds Ratio Std. Err. z P> z [95% Conf. Interval] 2/1 total 1.547746.1206692 5.60 0.000 1.328423 1.80328 indirect1 1.061166.0267866 2.35 0.019 1.009942 1.114987 direct1 1.458534.1116674 4.93 0.000 1.2553 1.694672 indirect2 1.060416.0264864 2.35 0.019 1.009754 1.11362 direct2 1.459565.1118132 4.94 0.000 1.256074 1.696023 3/1 total 4.098896.1715291 33.71 0.000 3.776123 4.449258 indirect1 1.228616.0194388 13.01 0.000 1.191101 1.267312 direct1 3.33619.1467056 27.40 0.000 3.060695 3.636483 indirect2 1.22293.0201835 12.19 0.000 1.184004 1.263136 direct2 3.351702.1467947 27.62 0.000 3.075992 3.652124 3/2 total 2.6483.2089437 12.34 0.000 2.26887 3.091182 indirect1 1.157325.031425 5.38 0.000 1.097343 1.220585 direct1 2.288295.1853601 10.22 0.000 1.952368 2.682022 indirect2 1.153977.0309939 5.33 0.000 1.094802 1.216351 direct2 2.294933.1868413 10.20 0.000 1.956454 2.69197

Does it matter? Table: Comparing different estimates of the size of indirect effect relative to the size of the total effect generalization (Erikson et al. 2005) naive middle v. low method1.1359.1107 method2.1343.1088 average.1351.1098.0087 high v. low method1.1459.1107 method2.1427.0990 average.1443.1048.0142 high v. middle method1.1500.1075 method2.1470.0968 average.1485.1021.0167

Discussion This is an area of active research

Discussion There are unanswered questions:

Discussion There are unanswered questions: The need to take the average indirect effect is less than elegant.

Discussion There are unanswered questions: The need to take the average indirect effect is less than elegant. How does it relate to the alternative method proposed by Fairlie (2005) and implemented by Ben Jann as the fairlie package?

References Buis, M. L.. http://home.fsw.vu.nl/m.buis/wp/ldecomp.html Erikson, R., J. H. Goldthorpe, M. Jackson, M. Yaish, and D. R. Cox. On class differentials in educational attainment. Proceedings of the National Academy of Science, 102:9730 9733, 2005. Fairlie, R. W. An extension of the Blinder-Oaxaca decomposition technique to logit and probit models. Journal of Economic and Social Measurement, 30:305 316, 2005.