Applied Regression Analysis Using STATA

Similar documents

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

MULTIPLE REGRESSION EXAMPLE

2. Simple Linear Regression

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Nonlinear Regression Functions. SW Ch 8 1/54/

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

11. Analysis of Case-control Studies Logistic Regression

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Geostatistics Exploratory Analysis

Module 5: Multiple Regression Analysis

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Chapter 7: Simple linear regression Learning Objectives

Exercise 1.12 (Pg )

Correlation and Regression

DATA INTERPRETATION AND STATISTICS

Multinomial and Ordinal Logistic Regression

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Discussion Section 4 ECON 139/ Summer Term II

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Regression III: Advanced Methods

5. Linear Regression

Elements of statistics (MATH0487-1)

Exploratory Data Analysis

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Introduction to Regression and Data Analysis

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

A Basic Introduction to Missing Data

Handling missing data in Stata a whirlwind tour

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

The importance of graphing the data: Anscombe s regression examples

How To Check For Differences In The One Way Anova

Least Squares Estimation

Multiple Regression: What Is It?

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Additional sources Compilation of sources:

Fairfield Public Schools

Exploratory data analysis (Chapter 2) Fall 2011

Stata Walkthrough 4: Regression, Prediction, and Forecasting

International Statistical Institute, 56th Session, 2007: Phil Everson

From the help desk: hurdle models

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Normality Testing in Excel

Logistic Regression (a type of Generalized Linear Model)

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Chapter 13 Introduction to Linear Regression and Correlation Analysis

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

August 2012 EXAMINATIONS Solution Part I

Lecture 1: Review and Exploratory Data Analysis (EDA)

Getting Correct Results from PROC REG

Multivariate Logistic Regression

Week 1. Exploratory Data Analysis

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

Categorical Data Analysis

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Statistical Models in R

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

Statistics. Measurement. Scales of Measurement 7/18/2012

Organizing Your Approach to a Data Analysis

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Regression Analysis (Spring, 2000)

Interaction effects and group comparisons Richard Williams, University of Notre Dame, Last revised February 20, 2015

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Interaction effects between continuous variables (Optional)

Ordinal Regression. Chapter

Algebra 1 Course Information

Simple Predictive Analytics Curtis Seare

NCSS Statistical Software

Introduction to Quantitative Methods

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

From the help desk: Bootstrapped standard errors

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

Regression III: Advanced Methods

Data Preparation and Statistical Displays

Generalized Linear Models

Premaster Statistics Tutorial 4 Full solutions

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

Multiple Linear Regression in Data Mining

Simple linear regression

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Applying Statistics Recommended by Regulatory Documents

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Goodness of fit assessment of item response theory models

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Module 3: Correlation and Covariance

Some Essential Statistics The Lure of Statistics

Moderator and Mediator Analysis

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Simple Linear Regression

GLM I An Introduction to Generalized Linear Models

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

Week 5: Multiple Linear Regression

Lecture 2. Summarizing the Sample

Transcription:

Applied Regression Analysis Using STATA Josef Brüderl Regression analysis is the statistical method most often used in social research. The reason is that most social researchers are interested in identifying causal effects from non-experimental data. Regression is the method for doing this. The term,,regression : 1889 Sir Francis Galton investigated the relationship between body size of fathers and sons. Thereby he invented regression analysis. He estimated S s 85. 7. 56S F. This means that the size of the son regresses towards the mean. Therefore, he named his method regression. Thus, the term regression stems from the first application of this method! In most later applications, however, there is no regression towards the mean. 1a) The Idea of a Regression We consider two variables (Y, X). Data are realizations of these variables y 1, x 1,, y n, x n resp. y i,x i, for i 1,, n. Y is the dependent variable, X is the independent variable (regression of Y on X). The general idea of a regression is to consider the conditional distribution f Y y X x. This is hard to interpret. The major function of statistical methods, namely to reduce the information of the data to a few numbers, is not fulfilled. Therefore one characterizes the conditional distribution by some of its aspects:

Applied Regression Analysis, Josef Brüderl 2 Y metric: conditional arithmetic mean Y metric, ordinal: conditional quantile Y nominal: conditional frequencies (cross tabulation!) Thus, we can formulate a regression model for every level of measurement of Y. Regression with discrete X In this case we compute for every X-value an index number of the conditional distribution. Example: Income and Education (ALLBUS 1994) Y is the monthly net income. X is highest educational level. Y is metric, so we compute conditional means E Y x. Comparing these means tells us something about the effect of education on income (variance analysis). The following graph is the scattergram of the data. Since education has only four values, income values would conceal each other. Therefore, values are jittered for this graph. The conditional means are connected by a line to emphasize the pattern of relationship. 1 Nur Vollzeit, unter 1. DM (N=1459) Einkommen in DM 8 6 4 2 Haupt Real Abitur Uni Bildung

Applied Regression Analysis, Josef Brüderl 3 Regression with continuous X Since X is continuous, we can not calculate conditional index numbers (too few cases per x-value). Two procedures are possible. Nonparametric Regression Naive nonparametric regression: Dissect the x-range in intervals (slices). Within each interval compute the conditional index number. Connect these numbers. The resulting nonparametric regression line is very crude for broad intervals. With finer intervals, however, one runs out of cases. This problem grows exponentially more serious as the number of X s increases ( curse of dimensionality ). Local averaging: Calculate the index number in a neighborhood surrounding each x-value. Intuitively a window with constant bandwidth moves along the X-axis. Compute the conditional index number for every y-value within the window. Connect these numbers. With small bandwidth one gets a rough regression line. More sophisticated versions of this method weight the observations within the window (locally weighted averaging). Parametric Regression One assumes that the conditional index numbers follow a function: g x;. This is a parametric regression model. Giventhe data and the model, one estimates the parameters in such a way that a chosen criterion function is optimized. Example: OLS-Regression One assumes a linear model for the conditional means. E Y x g x;, x. The estimation criterion is usually minimize the sum of squared residuals (OLS) min y i g x i ;, 2., n i 1 It should be emphasized that this is only one of the many

Applied Regression Analysis, Josef Brüderl 4 possible models. One could easily conceive further models (quadratic, logarithmic,...) and alternative estimation criteria (LAD, ML,...). OLS is so much popular, because estimators are easily to compute and interpret. Comparing nonparametric and parametric regression Data are from ALLBUS 1994. Y is monthly net income and X is age. We compare: 1) a local mean regression (red) 2) a (naive) local median regression (green) 3) an OLS-regression (blue) 1 Nur Vollzeit, unter 1. DM (N=1461) 8 6 DM 4 2 15 25 35 45 55 65 Alter All three regression lines tell us that average conditional income increases with age. Both local regressions show that there is non-linearity. Their advantage is that they fit the data better, because they do not assume an heroic model with only a few parameters. OLS on the other side has the advantage that it is much easier to interpret, because it reduces the information of the data very much ( 37. 3).

Applied Regression Analysis, Josef Brüderl 5 Interpretation of a regression A regression shows us, whether conditional distributions differ for differing x-values. If they do there is an association between X and Y. In a multiple regression we can even partial out spurious and indirect effects. But whether this association is the result of a causal mechanism, a regression can not tell us. Therefore, in the following I do not use the term causal effect. To establish causality one needs a theory that provides a mechanism which produces the association between X and Y (Goldthorpe (2) On Sociology). Example: age and income.

Applied Regression Analysis, Josef Brüderl 6 1b) Exploratory Data Analysis Before running a parametric regression, one should always examine the data. Example: Anscombe s quartet Univariate distributions Example: monthly net income (v423, ALLBUS 1994), only full-time (v25 1) under age 66 (v247 65). N 1475.

Applied Regression Analysis, Josef Brüderl 7.4 18 eink 394 828 952.3 15 12 1128 1157 118 1353 224 26 267 83 851 871 Anteil.2 DM 9 724 779 123 129 279 47 493 523 534 656 17.1 6 1166 1351 281 643 148 154 183 185 1119 113 1399 1 152 166 342 348 48 444 454 571 682 711 812 955 6 151 159 258 341 111 13 114 253 29 37 45 56 543 616 658 78 1123 723 755 762 841 856 865 924 93 3 3 6 9 12 15 18 DM histogram boxplot The histogram is drawn with 18 bins. It is obvious that the distribution is positively skewed. The boxplot shows the three quartiles. The height of the box is the interquartile range (IQR), it represents the middle half of the data. The whiskers on each side of the box mark the last observation which is at most 1.5 IQR away. Outliers are marked by their case number. Boxplots are helpful to identify the skew of a distribution and possible outliers. Nonparametric density curves are provided by the kernel density estimator. Density is estimated locally at n points. Observations within the interval of size 2w (w half-width) are weighted by a kernel function. The following plots are based on an Epanechnikov kernel with n 1..4.4.3.3.2.2.1.1 3 6 9 12 15 18 DM Kerndichteschätzer, w=1 3 6 9 12 15 18 DM Kerndichteschätzer, w=3 Comparing distributions Often one wants to compare an empirical sample distribution with the normal distribution. A useful graphical method are normal probability plots (resp. normal quantile comparison plot). One plots empirical quantiles against normal quantiles. If the

Applied Regression Analysis, Josef Brüderl 8 data follow a normal distribution the quantile curve should be close to a line with slope one. 18 15 12 DM 9 6 3-3 3 6 9 Inverse Normal Our income distribution is obviously not normal. The quantile curve shows the pattern positive skew, high outliers. Bivariate data Bivariate associations can best be judged with a scatterplot. The pattern of the relationship can be visualized by plotting a nonparametric regression curve. Most often used is the lowess smoother (locally weighted scatterplot smoother). One computes a linear regression at point x i. Data in the neighborhood with a chosen bandwidth are weighted by a tricubic. Based on the estimated regression parameters y i is computed. This is done for all x-values. Then connect (x i, y i ) which gives the lowess curve. The higher the bandwidth is, the smoother is the lowess curve.

Applied Regression Analysis, Josef Brüderl 9 Example: income by education Income defined as above. Education (in years) includes vocational training. N 1471. Lowess smoother, bandwidth =.8 Lowess smoother, bandwidth =.3 18 18 15 15 12 12 DM 9 DM 9 6 6 3 3 8 1 12 14 16 18 2 22 24 Bildung 8 1 12 14 16 18 2 22 24 Bildung Since education is discrete, one should jitter (the graph on the left is not jittered, on the right the jitter is 2% of the plot area). Bandwidth is lower in the graph on the right (.3, i.e. 3% of the cases are used to compute the regressions). Therefore the curve is closer to the data. But usually one would want a curve as on the left, because one is only interested in the rough pattern of the association. We observe a slight non-linearity above 19 years of education. Transforming data Skewness and outliers are a problem for mean regression models. Fortunately, power transformations help to reduce skewness and to bring in outliers. Tukey s,,ladder of powers : 1 8 6 4 2-2 1 2 3 4 5 x x 3 q 3 apply if x 1.5 q 1. 5 cyan negative skew x q 1 black x.5 q.5 green apply if lnx q red positive skew x.5 q.5 blue Example: income distribution

Applied Regression Analysis, Josef Brüderl 1.4.9611 2529.62.3.2.1 3 6 9 12 15 18 DM Kerndichteschätzer, w=3.2133 5.6185 9.85524 lneink Kernel Density Estimate -.3368 -.22 inveink Kernel Density Estimate q 1 q q -1 Appendix: power functions, ln- and e-function x.5 x 1 2 2 x, x.5 1 1, x 1 x.5 2 x ln denotes the (natural) logarithm to the base e 2. 71828... : y lnx e y x. From this follows ln e y e ln y y. 4 2-4 -2 2x 4-2 -4 some arithmetic rules e x e y e x y ln xy lnx lny e x /e y e x y ln x/y lnx lny e x y e xy lnx y y lnx

Applied Regression Analysis, Josef Brüderl 11 2) OLS Regression As mentioned before OLS regression models the conditional means as a linear function: E Y x 1 x. This is the regression model! Better known is the equation that results from this to describe the data: y i 1 x i i, i 1,, n. A parametric regression model models an index number from the conditional distributions. As such it needs no error term. However, the equation that describes the data in terms of the model needs one. Multiple regression The decisive enlargement is the introduction of additional independent variables: y i 1 x i1 2 x i2 p x ip i, i 1,, n. At first, this is only an enlargement of dimensionality: this equation defines a p-dimensional surface. But there is an important difference in interpretation: In simple regression the slope coefficient gives the marginal relationship. In multiple regression the slope coefficients are partial coefficients. That is, each slope represents the effect on the dependent variable of a one-unit increase in the corresponding independent variable holding constant the value of the other independent variables. Partial regression coefficients give the direct effect of a variable that remains after controlling for the other variables. Example: Status Attainment (Blau/Duncan 1967) Dependent variable: monthly net income in DM. Independent variables: prestige father (magnitude prestige scale, values 2-19), education (years, 9-22). Sample: West-German men under 66, full-time employed. First we look for the effect of status ascription (prestige father).. regress income prestf, beta

Applied Regression Analysis, Josef Brüderl 12 Source SS df MS Number of obs 616 ------- ------------------------------ F( 1, 614) 4.5 Model 142723777 1 142723777 Prob F. Residu 2.1636e 9 614 3523785.68 R-squared.619 ------- ------------------------------ Adj R-squared.64 Total 2.363e 9 615 375127.13 Root MSE 1877.2 ------------------------------------------------------------------------ income Coef. Std. Err. t P t Beta ------- ---------------------------------------------------------------- prestf 16.16277 2.539641 6.36..248764 _cons 2587.74 163.915 15.79.. ------------------------------------------------------------------------ Prestige father has a strong effect on the income of the son: 16 DM per prestige point. This is the marginal effect. Now we are looking for the intervening mechanisms. Attainment (education) might be one.. regress income educ prestf, beta Source SS df MS Number of obs 616 ------- ------------------------------ F( 2, 613) 6.99 Model 382767979 2 19138399 Prob F. Residu 1.9236e 9 613 3137944.87 R-squared.166 ------- ------------------------------ Adj R-squared.1632 Total 2.363e 9 615 375127.13 Root MSE 1771.4 ----------------------------------------------------------------------- income Coef. Std. Err. t P t Beta ------- ---------------------------------------------------------------- educ 262.3797 29.9993 8.75..362727 prestf 5.391151 2.694496 2..46.829762 _cons -34.14422 337.3229 -.1.919. ------------------------------------------------------------------------ The effect becomes much smaller. A large part is explained via education. This can be visualized by a path diagram (path coefficients are the standardized regression coefficients). residual 1,46,36,8 residual 2 The direct effect of prestige father is.8. But there is an additional large indirect effect.46.36.17. Direct plus

Applied Regression Analysis, Josef Brüderl 13 indirect effect give the total effect ( causal effect). A word of caution:the coefficients of the multiple regression are not causal effects! To establish causality we would have to find mechanisms that explain, why prestige father and education have an effect on income. Another word of caution: Do not automatically apply multiple regression. We are not always interested in partial effects. Sometimes we want to know the marginal effect. For instance, to answer public policy issues we would use marginal effects (e.g. in international comparisons). To provide an explanation we would try to isolate direct and indirect effects (disentangle the mechanisms). Finally, a graphical view of our regression (not shown, graph too big): Estimation Using matrix notation these are the essential equations: y 1 1 x11 x1p 1 y y 2,X 1 x21 x2p, 1, 2. y n 1 xn1 xnp p n This is the multiple regression equation: y X. Assumptions: N n, 2 I Cov x, rg X p 1. Estimation Using OLS we obtain the estimator for, X X 1 X y.

Applied Regression Analysis, Josef Brüderl 14 Now we can estimate fitted values y X X X X 1 X y Hy. The residuals are y y y Hy I H y. Residual variance is 2 n p 1 y y y X n p 1. For tests we need sampling variances ( j standard errors are on the main diagonal of this matrix): Squared multiple correlation is R 2 ESS TSS 1 RSS TSS 1 V 2 X X 1. 2 i y i y 2 1 y y n y 2. Categorical variables Of great practical importance is the possibility to include categorical (nominal or ordinal) X-variables. The most popular way to do this is by coding dummy regressors. Example: Regression on income Dependent variable: monthly net income in DM. Independent variables: years education, prestige father, years labor market experience, sex, West/East, occupation. Sample: under 66, full-time employed. The dichotomous variables are represented by one dummy. The polytomous variable is coded like this: occupation D1 D2 D3 D4 blue collar 1 design matrix: white collar 1 civil servant 1 self-employed 1

Applied Regression Analysis, Josef Brüderl 15 One dummy has to be left out (otherwise there would be linear dependency amongst the regressors). This defines the reference group. We drop D1. Source SS df MS Number of obs 124 --------- ------------------------------ F( 8, 1231) 78.61 Model 1.27e 9 8 15927 Prob F. Residual 2.353e 9 1231 199268.78 R-squared.3381 --------- ------------------------------ Adj R-squared.3338 Total 3.551e 9 1239 286658.5 Root MSE 1381.8 \newpage ----------------------------------------------------------------------- income Coef. Std. Err. t P t [95% Conf. Interval] ------ ---------------------------------------------------------------- educ 182.942 17.45326 1.48. 148.6628 217.1456 exp 26.71962 3.671445 7.278. 19.51664 33.9226 prestf 4.163393 1.423944 2.924.4 1.369768 6.95719 woman -797.7655 92.5283-8.622. -979.2956-616.2354 east -159.817 86.8629-12.29. -123.122-889.5123 white 379.9241 12.523 3.76. 178.793 581.58 civil 419.793 172.6672 2.431.15 81.3569 758.5449 self 1163.615 143.5888 8.14. 881.994 1445.321 _cons 52.95 217.857.243.88-374.4947 48.347 ----------------------------------------------------------------------- The model represents parallel regression surfaces. One for each category of the categorical variables. The effects represent the distance of these surfaces. The t-values test the difference to the reference group. This is not the test, whether occupation has a significant effect. To test this, one has to perform an incremental F-test.. test white civil self ( 1) white. ( 2) civil. ( 3) self. F( 3, 1231) 21.92 Prob F. Modeling Interactions Two X-variables are said to interact when the partial effect of one depends on the value of the other. The most popular way to model this is by introducing a product regressor (multiplicative interaction). Rule: specify models including main and interaction effects. Dummy interaction

Applied Regression Analysis, Josef Brüderl 16 woman east woman*east man west man east 1 woman west 1 woman east 1 1 1

Applied Regression Analysis, Josef Brüderl 17 Example: Regression on income interaction woman*east Source SS df MS Number of obs 124 --------- ------------------------------ F( 9, 123) 74.34 Model 1.2511e 9 9 1399841 Prob F. Residual 2.3e 9 123 1869884.3 R-squared.3523 --------- ------------------------------ Adj R-squared.3476 Total 3.551e 9 1239 286658.5 Root MSE 1367.4 ------------------------------------------------------------------------ income Coef. Std. Err. t P t [95% Conf. Interval] ------- ---------------------------------------------------------------- educ 188.4242 17.353 1.888. 154.4736 222.3749 exp 24.64689 3.655269 6.743. 17.47564 31.81815 prestf 3.89539 1.41127 2.762.6 1.12887 6.66191 woman -1123.29 11.9954-1.12. -1341.51-95.5285 east -138.968 15.8774-13.43. -1588.689-1173.248 white 361.5235 11.5193 3.561. 162.3533 56.6937 civil 392.3995 17.9586 2.295.22 56.99687 727.821 self 1134.45 142.2115 7.977. 855.414 1413.49 womeast 93.7147 179.355 5.189. 578.8392 1282.59 _cons 143.9125 216.342.665.56-28.4535 568.2786 ------------------------------------------------------------------------ Models with interaction effects are difficult to understand. Conditional effect plots help very much: exp, prestf 5, blue collar. m_west m_ost f_west f_ost m_west m_ost f_west f_ost 4 4 Einkommen 3 2 1 Einkommen 3 2 1 8 1 12 14 16 18 Bildung 8 1 12 14 16 18 Bildung without interaction with interaction

Applied Regression Analysis, Josef Brüderl 18 Slope interaction woman east woman*east educ educ*east man west x man east 1 x x woman west 1 x woman east 1 1 1 x x Example: Regression on income interaction educ*east Source SS df MS Number of obs 124 --------- ------------------------------ F( 1, 1229) 68.17 Model 1.267e 9 1 126695515 Prob F. Residual 2.2841e 9 1229 1858495.34 R-squared.3568 --------- ------------------------------ Adj R-squared.3516 Total 3.551e 9 1239 286658.5 Root MSE 1363.3 ------------------------------------------------------------------------- income Coef. Std. Err. t P t [95% Conf. Interval] --------- --------------------------------------------------------------- educ 218.8579 2.15265 1.86. 179.325 258.3953 exp 24.74317 3.64427 6.79. 17.59349 31.89285 prestf 3.651288 1.4836 2.593.1.888338 6.414238 woman -1136.97 11.7549-1.265. 1354.197-919.6178 east -239.378 44.7151 -.591.554-133.38 554.6381 white 382.5477 11.4652 3.77. 183.4837 581.6118 civil 36.5762 17.7848 2.111.35 25.51422 695.6382 self 1145.624 141.8297 8.77. 867.3686 1423.879 womeast 96.5249 178.9995 5.64. 555.3465 1257.73 educeast -88.43585 3.26686-2.922.4-147.8163-29.5542 _cons -225.3985 249.9567 -.92.367-715.7875 264.995 ------------------------------------------------------------------------- 4 m_west m_ost f_west f_ost Einkommen 3 2 1 8 1 12 14 16 18 Bildung

Applied Regression Analysis, Josef Brüderl 19 The interaction educ*east is significant. Obviously the returns to education are lower in East-Germany. Note that the main effect of east changed dramatically! It would be wrong to conclude that there is no significant income difference between West and East. The reason is that the main effect now represents the difference at educ. This is a consequence of dummy coding. Plotting conditional effect plots is the best way to avoid such erroneous conclusions. If one has interest in the West-East difference one could center educ (educ educ). Then the east-dummy gives the difference at the mean of educ. Or one could use ANCOVA coding (deviation coding plus centered metric variables, see Fox p. 194).

Applied Regression Analysis, Josef Brüderl 2 3) Regression Diagnostics Assumptions do often not hold in applications. Parametric regression models use strong assumptions. Therefore, it is essential to test these assumptions. Collinearity Problem: Collinearity means that regressors are correlated. It is not a severe violation of regression assumptions (only in extreme cases). Under collinearity OLS estimates are consistent, but standard errors are increased (estimates are less precise). Thus, collinearity is mainly a problem of researchers who plug in many highly correlated items. Diagnosis: Collinearity can be assessed by the variance inflation factors (VIF, the factor by which the sampling variance of an estimator is increased due to collinearity): VIF 1 1 R, 2 j where R 2 j results from a regression of X j on the other covariates. For instance, if R j.9 (an extreme value!), then is VIF 2.29. The S.E. doubles and the t-value is cut in halve. Thus, VIFs below 4 are usually no problem. Remedy: Gather more data. Build an index. Example: Regression on income (only West-Germans). regress income educ exp prestf woman white civil self.... vif Variable VIF 1/VIF ------------- ---------------------- white 1.65.66236 educ 1.49.672516 self 1.32.758856 civil 1.31.763223 prestf 1.26.795292 woman 1.16.86534 exp 1.12.896798 ------------- ---------------------- Mean VIF 1.33

Applied Regression Analysis, Josef Brüderl 21 Nonlinearity Problem: Nonlinearity biases the estimators. Diagnosis: Nonlinearity can best be seen in the residual plot. An enhanced version is the component-plus-residual plot (cprplot). One adds jx ij to the residual, i.e. one adds the (partial) regression line. Remedy: Transformation. Using the ladder or adding a quadratic term. Example: Regression on income (only West-Germans) e( eink X,exp ) + b*exp 12 8 4-4 1 2 3 4 5 exp t Con -293 EXP 29 6.16... N 849 R 2 33.3 blue: regression line, green: lowess. There is obvious nonlinearity. Therefore, we add EXP 2 e( eink X,exp ) + b*exp 16 12 8 4-4 1 2 3 4 5 exp t Con -1257 EXP 155 9.1 EXP 2-2.8 7.69... N 849 R 2 37.7 Now it works. How can we interpret such a quadratic regression?

Applied Regression Analysis, Josef Brüderl 22 y i 1 x i 2 x i 2 i, i 1,, n. Is 1 and 2, we have an inverse U-pattern. Is 1 and 2, we have an U-pattern. The maximum (minimum) is obtained at X max 1 2. 2 In our example this is 155 2 2.8 27. 7. Heteroscedasticity Problem: Under heteroscedasticity OLS estimators are unbiased and consistent, but no longer efficient, and the S.E. are biased. Diagnosis: Plot against y (residual-versus-fitted plot, rvfplot). Nonconstant spread means heteroscedasticity. Remedy: Transformation (see below), WLS (one needs to know the weights, White-estimator (Stata option robust ) Example: Regression on income (only West-Germans) 12 8 Residuals 4-4 1 2 3 4 5 67 Fitted values It is obvious that residual variance increases with y.

Applied Regression Analysis, Josef Brüderl 23 Nonnormality Problem: Significance tests are invalid. However, the central-limit theorem assures that inferences are approximately valid in large samples. Diagnosis: Normal-probability plot of residuals (not of the dependent variable!). Remedy: Transformation Example: Regression on income (only West-Germans) 12 8 Residuals 4-4 -4-2 2 4 Inverse Normal Especially at high incomes there is departure from normality (positive skew). Since we observe heteroscedasticity and nonnormality we should apply a proper transformation. Stata has a nice command that helps here:

Applied Regression Analysis, Josef Brüderl 24 qladder income cubic square identity 5.4e+12 3.1e+8 175-8.9e+11-5.6e+7-2298.94-8.9e+11 1.e+12-5.6e+7 8.3e+7-2298.94 8672.72 sqrt log 1/sqrt 132.288 9.76996 -.552 13.2541 13.2541 96.3811 6.16121 6.51716 9.3884 -.45932 -.33484 -.552.26 inverse 8.6e-7 1/square 1.7e-9 1/cube -.211-4.5e-6-9.4e-9 -.145.26-1.3e-6 8.6e-7-2.e-9 1.7e-9 income Quantile-Normal Plots by Transformation A log-transformation (q ) seems best. Using ln(income) as dependent variable we obtain the following plots: 1.5 1.5 1 1 Residuals.5 -.5 Residuals.5 -.5-1 -1-1.5-1.5 7 7.5 8 8.5 9 Fitted values -1 -.5.5 1 Inverse Normal This transformation alleviates our problems. There is no heteroscedasticity and only light nonnormality (heavy tails).

Applied Regression Analysis, Josef Brüderl 25 This is our result:. regress lnincome educ exp exp2 prestf woman white civil self Source SS df MS Number of obs 849 --------- ------------------------------ F( 8, 84) 82.8 Model 81.4123948 8 1.1765493 Prob F. Residual 13.237891 84.12292251 R-squared.449 --------- ------------------------------ Adj R-squared.4356 Total 184.65286 848.217747978 Root MSE.3557 ----------------------------------------------------------------------- lnincome Coef. Std. Err. t P t 95% Conf. Interval] -------- --------------------------------------------------------------- educ.591425.5487 1.791..48385.699 exp.496282.41655 11.914..414522.57841 exp2 -.9166.98-1.92. -.1949 -.7383 prestf.618.4518 1.368.172 -.2689.1548 woman -.3577554.29136-12.292. -.4148798 -.36311 white.1714642.3117 5.529..115966.2323318 civil.175233.488323 3.492.1.746757.266379 self.2252737.442668 5.89..1383872.312161 _cons 6.669825.734731 9.779. 6.525613 6.81438 ----------------------------------------------------------------------- R 2 for the regression on income was 37.7%. Here it is 44.1%. However, it makes no sense to compare both, because the variance to be explained differs between these two variables! Note that we finally arrived at a specification that is identical to the one derived from human capital theory. Thus, data driven diagnostics support strongly the validity of human capital theory! Interpretation: The problem with transformations is that interpretation becomes more difficult. In our case we arrived at an semi-logarithmic specification. The standard interpretation of regression coefficients is no longer valid. Now our model is: ln y i 1 x i i, or E y x e 1 x. Coefficients are effects on ln(income). This nobody can understand. One wants an interpretation in terms of income. The marginal effect on income is de y x E y x dx 1.

Applied Regression Analysis, Josef Brüderl 26 The discrete (unit) effect on income is E y x 1 E y x E y x e 1 1. Unlike in the linear regression model, both effects are not equal and depend on the value of X! It is generally preferable to use the discrete effect. This, however, can be transformed: E y x 1 E y x e E y x 1 1. This is the percentage change of Y with an unit increase of X. Thus, coefficients of a semi-logarithmic regression can be interpreted as discrete percentage effects (rate of return). This interpretation is eased further if 1. 1, then e 1 1 1. Example: For women we have e.358 1. 3. Women s earnings are 3% below men s. These are percentage effects, don t confuse this with absolute change! Let s produce a conditional-effect plot (prestf 5, educ 13, blue collar). 4 Einkommen 3 2 1 1 2 3 4 5 Berufserfahrung blue: woman, red: man Clearly the absolute difference between men and women depends on exp. But the relative difference is constant.

Applied Regression Analysis, Josef Brüderl 27 Influential data A data point is influential if it changes the results of a regression. Problem: (only in extreme cases). The regression does not represent the majority of cases, but only a few. Diagnosis: Influence on coefficients leverage x discrepancy. Leverage is an unusual x-value, discrepancy is outlyingness. Remedy: Check whether the data point is correct. If yes, then try to improve the specification (are there common characteristics of the influential points?). Don t throw away influential points (robust regression)! This is data manipulation. Partial-regression plot Scattergrams are useful in simple regression. In multiple regression one has to use partial-regression scattergrams (added-variable plot in Stata, avplot). Plot the residual from the regression of Y on all X (without X j ) against the residual from the regression of X j on the other X. Thus one partials out the effects of the other X-variables. Influence Statistics Influence can be measured directly by dropping observations. How changes j, ifwedropcasei ( j i ). DFBETAS ij j j i j i shows the (standardized) influence of case i on coefficient j. DFBETAS ij, case i pulls up j. DFBETAS ij, case i pulls down j Influential are cases beyond the cutoff 2/ n. There is a DFBETAS ij for every case and variable. To judge the cutoff, one should use index-plots. It is easier to use Cook s D, which is a measure that averages the DFBETAS. The cutoff is here 4/n.

Applied Regression Analysis, Josef Brüderl 28 Example: Regression on income (only West-Germans) For didactical purposes we use again the regression on income. Let s have a look on the effect of self. coef = 159.4996, se = 18.553, t = 8.81 e( eink X) e( selbst X ) -.4 -.2.2.4.6.8-4 4 8 12 partial-regression plot for self DFBETAS(Selbst) Fallnummer 2 4 6 8 -.2.2.4.6 1 2 3 4 56 7 891 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 26 27 28 29 3 31 32 33 34 35 36 37 38 39 4 41 42 43 44 45 46 47 48 49 5 51 52 53 54 55 56 57 58 59 6 61 62 63 64 65 66 67 68 69 7 71 72 73 74 75 76 77 78 79 8 81 82 83 84 85 86 87 88 89 9 91 92 93 94 95 96 97 98 99 1 11 12 13 14 15 16 17 18 19 11 111 112 113 114 115 116 117 118 119 12 121 122 123 124 125 126 127 128 129 13 131 132 133 134 135 136 137 138 139 14 141 142 143 144 145 146 147 148 149 15 151 152 153 154 155 156 157 158 159 16 161 162 163 164 165 166 167 168 169 17 171 172 173 174 175 176 177 178 179 18 181 182 183 184 185 186 187 188 189 19 191 192 193 194 195 196 197 198 199 2 21 22 23 24 25 26 27 28 29 21 211 212 213 214 215 216 217 218 219 22 221 222 223 224 225 226 227 228 229 23 231 232 233 234 235 236 237 238 239 24 241 242 243 244 245 246 247 248 249 25 251 252 253 254 255 256 257 258 259 26 261 262 263 264 265 266 267 268 269 27 271 272 273 274 275 276 277 278 279 28 281 282 283 284 285 286 287 288 289 29 291 292 293 294 295 296 297 298 299 3 31 32 33 34 35 36 37 38 39 31 311 312 313 314 315 316 317 318 319 32 321 322 323 324 325 326 327 328 329 33 331 332 333 334 335 336 337 338 339 34 341 342 343 344 345 346 347 348 349 35 351 352 353 354 355 356 357 358 359 36 361 362 363 364 365 366 367 368 369 37 371 372 373 374 375 376 377 378 379 38 381 382 383 384 385 386 387 388 389 39 391 392 393 394 395 396 397 398 399 4 41 42 43 44 45 46 47 48 49 41 411 412 413 414 415 416 417 418 419 42 421 422 423 424 425 426 427 428 429 43 431 432 433 434 435 436 437 438 439 44 441 442 443 444 445 446 447 448 449 45 451 452 453 454 455 456 457 458 459 46 461 462 463 464 465 466 467 468 469 47 471 472 473 474 475 476 477 478 479 48 481 482 483 484 485 486 487 488 489 49 491 492 493 494 495 496 497 498 499 5 51 52 53 54 55 56 57 58 59 51 511 512 513 514 515 516 517 518 519 52 521 522 523 524 525 526 527 528 529 53 531 532 533 534 535 536 537 538 539 54 541 542 543 544 545 546 547 548 549 55 551 552 553 554 555 556 557 558 559 56 561 562 563 564 565 566 567 568 569 57 571 572 573 574 575 576 577 578 579 58 581 582 583 584 585 586 587 588 589 59 591 592 593 594 595 596 597 598 599 6 61 62 63 64 65 66 67 68 69 61 611 612 613 614 615 616 617 618 619 62 621 622 623 624 625 626 627 628 629 63 631 632 633 634 635 636 637 638 639 64 641 642 643 644 645 646 647 648 649 65 651 652 653 654 655 656 657 658 659 66 661 662 663 664 665 666 667 668 669 67 671 672 673 674 675 676 677 678 679 68 681 682 683 684 685 686 687 688 689 69 691 692 693 694 695 696 697 698 699 7 71 72 73 74 75 76 77 78 79 71 711 712 713 714 715 716 717 718 719 72 721 722 723 724 725 726 727 728 729 73 731 732 733 734 735 736 737 738 739 74 741 742 743 744 745 746 747 748 749 75 751 752 753 754 755 756 757 758 759 76 761 762 763 764 765 766 767 768 769 77 771 772 773 774 775 776 777 778 779 78 781 782 783 784 785 786 787 788 789 79 791 792 793 794 795 796 797 798 799 8 81 82 83 84 85 86 87 88 89 81 811 812 813 814 815 816 817 818 819 82 821 822 823 824 825 826 827 828 829 83 831 832 833 834 835 836 837 838 839 84 841 842 843 844 845 846 847 848 849 index-plot for DFBETAS(Self) There are some self-employed persons with high income residuals who pull up the regression line. Obviously the cutoff is much too low. However, it is easier to have a look on the index-plot for Cook s D. Cooks D Fallnummer 2 4 6 8.2.4.6.8.1.12.14 1 2 3 4 567891 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 26 27 28 29 3 31 32 33 34 35 36 37 38 39 4 41 42 43 44 45 46 47 48 49 5 51 52 53 54 55 56 57 58 59 6 61 62 63 64 65 66 67 68 69 7 71 72 73 74 75 76 77 78 79 8 81 82 83 84 85 86 87 88 89 9 91 92 93 94 95 96 97 98 99 1 11 12 13 14 15 16 17 18 19 11 111 112 113 114 115 116 117 118 119 12 121 122 123 124 125 126 127 128 129 13 131 132 133 134 135 136 137 138 139 14 141 142 143 144 145 146 147 148 149 15 151 152 153 154 155 156 157 158 159 16 161 162 163 164 165 166 167 168 169 17 171 172 173 174 175 176 177 178 179 18 181 182 183 184 185 186 187 188 189 19 191 192 193 194 195 196 197 198 199 2 21 22 23 24 25 26 27 28 29 21 211 212 213 214 215 216 217 218 219 22 221 222 223 224 225 226 227 228 229 23 231 232 233 234 235 236 237 238 239 24 241 242 243 244 245 246 247 248 249 25 251 252 253 254 255 256 257 258 259 26 261 262 263 264 265 266 267 268 269 27 271 272 273 274 275 276 277 278 279 28 281 282 283 284 285 286 287 288 289 29 291 292 293 294 295 296 297 298 299 3 31 32 33 34 35 36 37 38 39 31 311 312 313 314 315 316 317 318 319 32 321 322 323 324 325 326 327 328 329 33 331 332 333 334 335 336 337 338 339 34 341 342 343 344 345 346 347 348 349 35 351 352 353 354 355 356 357 358 359 36 361 362 363 364 365 366 367 368 369 37 371 372 373 374 375 376 377 378 379 38 381 382 383 384 385 386 387 388 389 39 391 392 393 394 395 396 397 398 399 4 41 42 43 44 45 46 47 48 49 41 411 412 413 414 415 416 417 418 419 42 421 422 423 424 425 426 427 428 429 43 431 432 433 434 435 436 437 438 439 44 441 442 443 444 445 446 447 448 449 45 451 452 453 454 455 456 457 458 459 46 461 462 463 464 465 466 467 468 469 47 471 472 473 474 475 476 477 478 479 48 481 482 483 484 485 486 487 488 489 49 491 492 493 494 495 496 497 498 499 5 51 52 53 54 55 56 57 58 59 51 511 512 513 514 515 516 517 518 519 52 521 522 523 524 525 526 527 528 529 53 531 532 533 534 535 536 537 538 539 54 541 542 543 544 545 546 547 548 549 55 551 552 553 554 555 556 557 558 559 56 561 562 563 564 565 566 567 568 569 57 571 572 573 574 575 576 577 578 579 58 581 582 583 584 585 586 587 588 589 59 591 592 593 594 595 596 597 598 599 6 61 62 63 64 65 66 67 68 69 61 611 612 613 614 615 616 617 618 619 62 621 622 623 624 625 626 627 628 629 63 631 632 633 634 635 636 637 638 639 64 641 642 643 644 645 646 647 648 649 65 651 652 653 654 655 656 657 658 659 66 661 662 663 664 665 666 667 668 669 67 671 672 673 674 675 676 677 678 679 68 681 682 683 684 685 686 687 688 689 69 691 692 693 694 695 696 697 698 699 7 71 72 73 74 75 76 77 78 79 71 711 712 713 714 715 716 717 718 719 72 721 722 723 724 725 726 727 728 729 73 731 732 733 734 735 736 737 738 739 74 741 742 743 744 745 746 747 748 749 75 751 752 753 754 755 756 757 758 759 76 761 762 763 764 765 766 767 768 769 77 771 772 773 774 775 776 777 778 779 78 781 782 783 784 785 786 787 788 789 79 791 792 793 794 795 796 797 798 799 8 81 82 83 84 85 86 87 88 89 81 811 812 813 814 815 816 817 818 819 82 821 822 823 824 825 826 827 828 829 83 831 832 833 834 835 836 837 838 839 84 841 842 843 844 845 846 847 848 849 Again the cutoff is much too low. But we identify two cases, who differ very much from the rest. Let s have a look on these data:

Applied Regression Analysis, Josef Brüderl 29 income yhat exp woman self D 32. 175 588.125 31.5 1.1492927 692. 175 5735.749 28.5 1.175122 These are two self-employed men, with extremely high income ( above 15. DM is the true value). They exert strong influence on the regression. What to do? Obviously we have a problem with self-employed people that is not cured by including the dummy. Thus, there is good reason to drop the self-employed from the sample. This is also what theory would tell us. Our final result is then (on ln(income)): Source SS df MS Number of obs 756 --------- ------------------------------ F( 7, 748) 15.47 Model 6.649112 7 8.66415861 Prob F. Residual 61.4445399 748.8214517 R-squared.4967 --------- ------------------------------ Adj R-squared.492 Total 122.9365 755.161713444 Root MSE.28661 ----------------------------------------------------------------------- lnincome Coef. Std. Err. t P t [95% Conf. Interval] ------- --------------------------------------------------------------- educ.57521.47798 12.34..481377.66944 exp.43369.37117 11.682..36743.56475 exp2 -.7881.834-9.455. -.9517 -.6245 prestf.5446.3951 1.378.168 -.231.1323 woman -.3211721.249711-12.862. -.37194 -.272153 white.163886.258418 6.311..1123575.2138197 civil.179793.42933 4.444..999779.258187 _cons 6.743215.63683 16.12. 6.618343 6.86887 ----------------------------------------------------------------------- Since we changed our specification, we should start anew and test whether regression assumptions also hold for this specification.

Applied Regression Analysis, Josef Brüderl 3 4) Binary Response Models With Y nominal, a mean regression makes no sense. One can, however, investigate conditional relative frequencies. Thus a regression is given by the J 1 functions j x f Y j X x for j, 1,, J. For discrete X this is a cross tabulation! If we have many X and/or continuous X, however, it makes sense to use a parametric model. The function used must have the following properties: x;,, J x; 1 J j j x; 1 Therefore, most binary models use distribution functions. The binary logit model Y is dichotomous (J 1). We choose the logistic distribution z exp z / 1 exp z, so we get the binary logit model (logistic regression). Further, specify a linear model for z ( 1 x 1 p x p x): P Y 1 e x 1 e 1 x 1 e x P Y 1 P Y 1 1 1 e. x Coefficients are not easy to interpret. Below we will discuss this in detail. Here we use only the sign interpretation (positive means P(Y 1) increases with X). Example 1: party choice and West/East (discrete X) In the ALLBUS there is as Sonntagsfrage (v329). We dichotomize: CDU/CSU 1, other party (only those, who would vote). We look for the effect of West/East. This is the crosstab:.

Applied Regression Analysis, Josef Brüderl 31 east cdu 1 Total ----------- ---------------------- ---------- 143 563 166 66.18 77.98 69.89 ----------- ---------------------- ---------- 1 533 159 692 33.82 22.2 3.11 ----------- ---------------------- ---------- Total 1576 722 2298 1. 1. 1. This is the result of a logistic regression:. logit cdu east Iteration : log likelihood -145.9621 Iteration 1: log likelihood -1389.123 Iteration 2: log likelihood -1389.67 Iteration 3: log likelihood -1389.67 Logit estimates Number of obs 2298 LR chi2(1) 33.91 Prob chi2. Log likelihood -1389.67 Pseudo R2.121 -------------------------------------------------------------------- cdu Coef. Std. Err. z P z [95% Conf. Interval] ----- -------------------------------------------------------------- east -.59344.14452-5.68. -.797679 -.388499 cons -.671335.532442-12.69. -.7756918 -.5669783 -------------------------------------------------------------------- The negative coefficient tells us, that East-Germans vote less often for CDU (significantly). However, this only reproduces the crosstab in a complicated way: P Y 1 X East 1. 22 1 e.671.593 P Y 1 X West 1. 338. 1 e.671 Thus, the logistic brings an advantage only in multivariate models.

Applied Regression Analysis, Josef Brüderl 32 Why not OLS? It is possible to estimate an OLS regression with such data: E Y x P Y 1 x x. This is the linear probability model. It has, however, nonnormal and heteroscedastic residuals. Further, prognoses can be beyond,1. Nevertheless, it often works pretty well.. regr cdu east R-squared.143 ----------------------------------------------------------------------- cdu Coef. Std. Err. t P t [95% Conf. Interval] ----- ----------------------------------------------------------------- east -.1179764.24775-5.761. -.1581326 -.77821 cons.338198.114781 29.465..3156894.36765 ----------------------------------------------------------------------- It gives a discrete effect on P(Y 1). This is exactly the percentage point difference from the crosstab. Given the ease of interpretation of this model, one should not discard it from the beginning. Example 2: party choice and age (continuous X). logit cdu age Iteration : log likelihood -145.2452 Iteration 3: log likelihood -1364.6916 Logit estimates Number of obs 2296 LR chi2(1) 81.11 Prob chi2. Log likelihood -1364.6916 Pseudo R2.289 ------------------------------------------------------ cdu Coef. Std. Err. z P z --------- -------------------------------------------- age.245216.2765 8.869. _cons -2.1266.14339-14.55. ------------------------------------------------------. regress cdu age R-squared.353 ------------------------------------------------------ cdu Coef. Std. Err. t P t --------- -------------------------------------------- age.51239.559 9.166. _cons.637782.275796 2.313.21 ------------------------------------------------------ With age P(CDU) increases. The linear model says the same.

Applied Regression Analysis, Josef Brüderl 33 1.8 CDU.6.4.2 1 2 3 4 5 6 7 8 9 1 Alter This is a (jittered) scattergram of the data with estimated regression lines: OLS (blue), logit (green), lowess (brown). They are almost identical. The reason is that the logistic function is almost linear in interval. 2,. 8. Lowess hints towards a nonmonotone effect at young ages (this is a diagnostic plot to detect deviations from the logistic function). Interpretation of logit coefficients There are many ways to interpret the coefficients of a logistic regression. This is due to the nonlinear nature of the model. Effects on a latent variable It is possible to formulate the logit model as a threshold model with a continuous, latent variable Y. Example from above: Y is the (unobservable) utility difference between CDU and other parties. We specify a linear regression model for Y : y x, We do not observe Y,but only the resulting binary choice variable Y that results form the following threshold model: y 1, for y, y, for y. To make the model practical, one has to assume a distribution for. With the logistic distribution, we obtain the logit model.

Applied Regression Analysis, Josef Brüderl 34 Thus, logit coefficients could be interpreted as discrete effects on Y. Since the scale of Y is arbitrary, this interpretation is not useful. Note: It is erroneous to state that the logit model contains no error term. This becomes obvious if we formulate the logit as threshold model on a latent variable. Probabilities, odds, and logits Let s now assume a continuous X. The logit model has three equivalent forms: Probabilities: P Y 1 x e x 1 e. x Odds: P Y 1 x P Y x e x. Logits (Log-Odds): P Y 1 x ln x. P Y x Example: For these plots 4,. 8 : 1 5 5.9 4.5 4.8 4 3.7 3.5 2.6 3 1 P.5 O 2.5 L.4 2-1.3 1.5-2.2 1-3.1.5-4 1 2 3 4 5 6 7 8 9 1 X 1 2 3 4 5 6 7 8 9 1 X -5 1 2 3 4 5 6 7 8 9 1 X probability odd logit Logit interpretation is the discrete effect on the logit. Most people, however, do not understand what a change in the logit means. Odds interpretation e is the (multiplicative) discrete effect on the odds (e x 1 e x e ). Odds are also not easy to understand, nevertheless this is the standard interpretation in the literature.

Applied Regression Analysis, Josef Brüderl 35 Example 1: e.593.55. The Odds CDU vs. Others is in the East smaller by the factor.55: Odds east.22/.78.282, Odds west.338/.662.51, thus.51.55.281. Note: Odds are difficult to understand. This leads to often erroneous interpretations. in the example the odds are smaller by about half, not P(CDU)! Example 2: e.245 1.248. For every year the odds increase by 2.5%. In 1 years they increase by 25%? No, because e.245 1 1.248 1 1.278. Probability interpretation This is the most natural interpretation, since most people have an intuitive understanding of what a probability is. The drawback is, however, that these effects depend on the X-value (see plot above). Therefore, one has to choose a value (usually x )at which to compute the discrete probability effect x 1 P Y 1 x 1 P Y 1 x e e x 1 e x 1 1 e. x Normally you would have to calculate this by hand, however Stata has a nice ado. Example 1: The discrete effect is. 338. 22. 118, i.e.-12 percentage points. Example 2: Mean age is 46.374. Therefore 1 1 e 1. 512. 2.1.245 47.374 1 e 2.1.245 46.374 The 47. year increases P(CDU) by.5 percentage points. Note: The linear probability model coefficients are identical with these effects! Marginal effects Stata computes marginal probability effects. These are easier to compute, but they are only approximations to the discrete effects. For the logit model

Applied Regression Analysis, Josef Brüderl 36 P Y 1 x x Example: 4,,8, x 7 e x P Y 1 x P Y x. 1 e x 2 P 1.9.8.7.6.5.4.3.2.1 1 2 3 4 5 6 7 8 9 1 X P Y 1 7 1 1 e 4.8 7.832, P Y 1 8 1 1 e 4.8 8. 917 discrete:.917.832.85 marginal:.832 1.832. 8. 112 ML estimation We have data y i,x i and a regression model f Y y X x;. We want to estimate the parameter in such a way that the model fits the data best. There are different criteria to do this. The best known is maximum likelihood (ML). The idea is to choose the that maximizes the likelihood of the data. Given the model and independent draws from it the likelihood is: n L f y i, x i ;. i 1 The ML estimate results from maximizing this function. For computational reasons it is better to maximize the log likelihood: n l lnf y i, x i ;. i 1

Applied Regression Analysis, Josef Brüderl 37 Compute the first derivatives and set them equal. ML estimates have some desirable statistical properties (asymptotic). consistent: E ML normally distributed: ML N, I 1, where I E 2 ln L efficient: ML estimates obtain minimal variance (Rao-Cramer) ML estimates for the binary logit model The probability to observe a data point with Y 1 isp(y 1). Accordingly for Y. Thus the likelihood is L i 1 The log likelihood is n l n i 1 n i 1 e x i 1 e x i y i ln Taking derivatives yields: l y i 1 1 e x i 1 y i e x i 1 e x i 1 y i ln 1 1 e x i y i n x i ln 1 e x i. i 1 y i x i e x i 1 e x i x i. Setting equal to yields the estimation equations: y i x i e x i 1 e x i. x i These equations have no closed form solution. One has to solve them by iterative numerical algorithms..

Applied Regression Analysis, Josef Brüderl 38 Significance tests and model fit Overall significance test Compare the log likelihood of the full model (lnl 1 ) with the one from the constant only model (lnl ). Compute the likelihood ratio test statistic: 2 2ln L 2 lnl L 1 lnl. 1 Under the null H : 1 2 p this statistic is distributed asymptotically 2 p. Example 2: lnl 1 1364.7 and lnl 145.2 (Iteration ). 2 2 1364. 7 145. 2 81.. With one degree of freedom we can reject the H. Testing one coefficient Compute the z-value (coefficient/s.e.) which is distributed asymptotically normally. One could also use the LR-test (this test is better ). Use also the LR-test to test restrictions on a set of coefficients. Model fit With nonmetric Y we no longer can define a unique measure of fit like R 2 (this is due to the different conceptions of variation in nonmetric models). Instead there are many pseudo-r 2 measures. The most popular one is McFadden s Pseudo-R 2 : R MF 2 lnl lnl 1 lnl. Experience tells that it is conservative. Another one is McKelvey-Zavoina s Pseudo-R 2 (formula see Long, p. 15). This measure is suggested by the authors of several simulation studies, because it most closely approximates the R 2 obtained from regressions on the underlying latent variable. A completely different approach has been suggested by Raftery (see Long, pp. 11). He favors the use of the Bayesian information criterion (BIC). This measure can also be used to compare non-nested models!

Applied Regression Analysis, Josef Brüderl 39 An example using Stata We continue our party choice model by adding education, occupation, and sex (output changed by inserting odds ratios and marginal effects).. logit cdu educ age east woman white civil self trainee Iteration : log likelihood -757.236 Iteration 1: log likelihood -718.71868 Iteration 2: log likelihood -718.2528 Iteration 3: log likelihood -718.25194 Logit estimates Number of obs 1262 LR chi2(8) 77.96 Prob chi2. Log likelihood -718.25194 Pseudo R2.515 ------------------------------------------------------------------ cdu Coef. Std. Err. z P z Odds Ratio MargEff ------- ---------------------------------------------------------- educ -.4362.264973-1.646.1.9573177 -.87 age.351726.59116 5.95. 1.35799.7 east -.491153.151739-3.25.1.61247 -.98 woman -.1647772.1421791-1.159.246.848827 -.329 white.1342369.1687518.795.426 1.143664.268 civil.396132.27957 1.42.156 1.48666.791 self.6567997.2148196 3.57.2 1.92861.1311 trainee.4691257.4937517.95.342 1.598596.937 _cons -1.783349.4114883-4.334. ------------------------------------------------------------------ Thanks to Scott Long there are several helpful ados:. fitstat Measures of Fit for logit of cdu Log-Lik Intercept Only: -757.23 Log-Lik Full Model: -718.252 D(1253): 1436.54 LR(8): 77.956 Prob LR:. McFadden s R2:.51 McFadden s Adj R2:.4 Maximum Likelihood R2:.6 Cragg & Uhler s R2:.86 McKelvey and Zavoina s R2:.86 Efron s R2:.66 Variance of y*: 3.6 Variance of error: 3.29 Count R2:.723 Adj Count R2:.39 AIC: 1.153 AIC*n: 1454.54 BIC: -751.484 BIC : -2.833. prchange, help logit: Changes in Predicted Probabilities for cdu min- max - 1-1/2 - sd/2 MargEfct educ -.1292 -.14 -.87 -.24 -.87 aage.4271.28.7.88.7 east -.935 -.935 -.978 -.448 -.98