Multilevel Analysis (ver. 1.0)

Similar documents
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

CHAPTER 14 MORE ABOUT REGRESSION

Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2)

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Economic Interpretation of Regression. Theory and Applications

Regression Models for a Binary Response Using EXCEL and JMP

International University of Japan Public Management & Policy Analysis Program

MULTIPLE REGRESSION EXAMPLE

SIMPLE LINEAR CORRELATION

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Lecture 14: Implementing CAPM

From the help desk: Swamy s random-coefficients model

How To Calculate The Accountng Perod Of Nequalty

Variance estimation for the instrumental variables approach to measurement error in generalized linear models

RECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY:

STATISTICAL DATA ANALYSIS IN EXCEL

Merge/Append using R (draft)

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

1. Measuring association using correlation and regression

Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Forecasting the Direction and Strength of Stock Market Movement

Lecture 15 Panel Data Models

Sample Size Calculation for Longitudinal Studies

Statistical Methods to Develop Rating Models

Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

Measures of Fit for Logistic Regression

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

Online Appendix for Forecasting the Equity Risk Premium: The Role of Technical Indicators

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank.

Part 1: quick summary 5. Part 2: understanding the basics of ANOVA 8

Lectures on: Panel data analysis for social scientists, given at the University of Bergen, October 2006

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Portfolio Loss Distribution

Although ordinary least-squares (OLS) regression

Estimation of Dispersion Parameters in GLMs with and without Random Effects

1 De nitions and Censoring

Discussion Section 4 ECON 139/ Summer Term II

Modeling Ordered Choices

Online Appendix Supplemental Material for Market Microstructure Invariance: Empirical Hypotheses

xtmixed & denominator degrees of freedom: myth or magic

Statistical algorithms in Review Manager 5

Analysis of Premium Liabilities for Australian Lines of Business

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Handling missing data in Stata a whirlwind tour

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Marginal Returns to Education For Teachers

Evaluating the generalizability of an RCT using electronic health records data

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

Multinomial and Ordinal Logistic Regression

Understanding the Impact of Marketing Actions in Traditional Channels on the Internet: Evidence from a Large Scale Field Experiment

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

L10: Linear discriminants analysis

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

Generalized Linear Models for Traffic Annuity Claims, with Application to Claims Reserving

Correlation and Regression

Nonlinear Regression Functions. SW Ch 8 1/54/

General Iteration Algorithm for Classification Ratemaking

Fuzzy Regression and the Term Structure of Interest Rates Revisited

Meta-analysis in Psychological Research.

Modeling Loss Given Default in SAS/STAT

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED

World currency options market efficiency

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Bilgi Ekonomisi ve Yönetimi Dergisi / 2013 Cilt: VIII Sayı: II

From the help desk: hurdle models

Quick Stata Guide by Liz Foster

Correlated Random Effects Panel Data Models

Competing-risks regression

The Racial and Gender Interest Rate Gap. in Small Business Lending: Improved Estimates Using Matching Methods*

Does a Threshold Inflation Rate Exist? Quantile Inferences for Inflation and Its Variability

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Forecasting Irregularly Spaced UHF Financial Data: Realized Volatility vs UHF-GARCH Models

From the help desk: Bootstrapped standard errors

Data Visualization by Pairwise Distortion Minimization

The leverage statistic, h, also called the hat-value, is available to identify cases which influence the regression model more than others.

Fixed and Random Effects in Panel Data Using Structural Equations Models

MEASURING OPERATION EFFICIENCY OF THAI HOTELS INDUSTRY: EVIDENCE FROM META-FRONTIER ANALYSIS. Abstract

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

A Practitioner's Guide to Generalized Linear Models

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Prediction of Disability Frequencies in Life Insurance

Control Charts with Supplementary Runs Rules for Monitoring Bivariate Processes

The simple linear Regression Model

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

Risk-Adjusted Performance: A two-model Approach Application in Amman Stock Exchange

14.74 Lecture 5: Health (2)

Wage inequality and returns to schooling in Europe: a semi-parametric approach using EU-SILC data

Media Mix Modeling vs. ANCOVA. An Analytical Debate

Rate-Based Daily Arrival Process Models with Application to Call Centers

Standard errors of marginal effects in the heteroskedastic probit model

Transcription:

Multlevel Analyss (ver. 1.0) Oscar Torres-Reyna Data Consultant otorres@prnceton.edu http://dss.prnceton.edu/tranng/

Motvaton Use multlevel model whenever your data s grouped (or nested) n more than one category (for example, states, countres, etc). Multlevel models allow: Study effects that vary by entty (or groups) Estmate group level averages Some advantages: Regular regresson gnores the average varaton between enttes. Indvdual regresson may face sample problems and lack of generalzaton

Varaton between enttes use http://dss.prnceton.edu/tranng/schools.dta bysort school: egen y_meanmean(y) twoway scatter y school, msze(tny) connected y_mean school, connect(l) clwdth(thck) clcolor(black) mcolor(black) msymbol(none), yttle(y) y -40-0 0 0 40 0 0 40 60 school Score y_mean 3

statsby nter_b[_cons] slope_b[x1], by(school) savng(ols, replace): regress y x1 sort school merge school usng ols Indvdual regressons (no-poolng approach) drop _merge gen yhat_ols nter + slope*x1 sort school x1 separate y, by(school) separate yhat_ols, by(school) twoway connected yhat_ols1-yhat_ols65 x1 lft y x1, clwdth(thck) clcolor(black) legend(off) yttle(y) y -0-10 0 10 0 30-40 -0 0 0 40 Readng test 4

Varyng-ntercept model (null). xtmxed y school:, mle nolog y j[ ] α + ε Mxed-effects ML regresson Number of obs 4059 Group var able: school Number of groups 65 Obs per group: mn avg 6.4 max 198 Mean of state level ntercepts Wald ch(0). Log lkelhood -14851.50 Prob > ch. y Coef. Std. Err. z P> z [95% Conf. Interval] _cons -.1317104.5367-0.5 0.806-1.18784.9193634 Standard devaton at the school level (level ) Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] Standard devaton at the ndvdual level (level ) school: Identty sd(_cons) 4.106553.3999163 3.39995 4.970174 sd(resdual) 9.07357.103014 9.007636 9.411505 LR test vs. lnear regresson: chbar(01) 498.7 Prob > chbar 0.0000 Intraclass ( sgma _ u) correlaton ( sgma _ u) + ( sgma _ e) sd(_ cons) sd(_ cons) + sd( resdual) 4.11 4.11 + 9.1 _ 0.17 Ho: Random-effects 0 If the nterclass correlaton (IC) approaches 0 then the groupng by countes (or enttes) are of no use (you may as well run a smple regresson). If the IC approaches 1 then there s no varance to explan at the ndvdual level, everybody s the same. An ntraclass correlaton tells you about the correlaton of the observatons (cases) wthn a cluster (http://www.ats.ucla.edu/stat/stata/lbrary/cpsu.htm) 5

Varyng-ntercept model (one level-1 predctor). xtmxed y x1 school:, mle nolog y α + βx + ε j[ ] Mxed-effects ML regresson Number of obs 4059 Group var able: school Number of groups 65 Obs per group: mn avg 6.4 max 198 Mean of state level ntercepts Standard devaton at the school level (level ) Standard devaton at the ndvdual level (level ) Wald ch(1) 04.57 Log lkelhood -1404.799 Prob > ch 0.0000 x1.5633697.014654 45.19 0.000.5389381.5878014 _cons.038706.40058 0.06 0.95 -.7605576.808987 Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] school: Identty y Coef. Std. Err. z P> z [95% Conf. Interval] sd(_cons) 3.03571.305516.496 3.69659 sd(resdual) 7.51481.0841759 7.35895 7.68885 LR test vs. lnear regresson: chbar(01) 403.7 Prob > chbar 0.0000 Intraclass ( sgma _ u) correlaton ( sgma _ u) + ( sgma _ e) sd(_ cons) sd(_ cons) + sd( resdual) 3.03 3.03 + 7.5 _ 0.14 Ho: Random-effects 0 If the nterclass correlaton (IC) approaches 0 then the groupng by countes (or enttes) are of no use (you may as well run a smple regresson). If the IC approaches 1 then there s no varance to explan at the ndvdual level, everybody s the same. An ntraclass correlaton tells you about the correlaton of the observatons (cases) wthn a cluster (http://www.ats.ucla.edu/stat/stata/lbrary/cpsu.htm) 6

Varyng-ntercept, varyng-coeffcent model y α β x + ε j[ ] + j[ ]. xtmxed y x1 school: x1, mle nolog covarance(unstructure) Mxed-effects ML regresson Number of obs 4059 Group var able: school Number of groups 65 Obs per group: mn avg 6.4 max 198 Mean of state level ntercepts Wald ch(1) 779.80 Log lkelhood -14004.613 Prob > ch 0.0000 y Coef. Std. Err. z P> z [95% Conf. Interval] x1.556791.0199367 7.9 0.000.5176539.5958043 _cons -.1150841.3978336-0.9 0.77 -.894836.6646554 Standard devaton at the school level (level ) Standard devaton at the ndvdual level (level ) Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] school: Unstructured sd(x1).105631.018987.0885508.1641483 sd(_cons) 3.007436.3044138.4665 3.667375 corr(x1,_cons).4975474.1487416.157843.73131 sd(resdual) 7.440788.083948 7.78059 7.607157 LR test vs. lnear regresson: ch(3) 443.64 Prob > ch 0.0000 Note: LR test s conservatve and provded only for reference. Ho: Random-effects 0 Intraclass ( sgma _ u) correlaton ( sgma _ u) + ( sgma _ e) sd(_ cons) + sd( x1) sd(_ cons) + sd( x1) + sd( resdual) 0.1 + 3.01 0.1 + 3.01 + 7.44 _ 0.14 7

Varyng-slope model y α β x + ε + j[ ]. xtmxed y x1 _all: R.x1, mle nolog Mxed-effects ML regresson Number of obs 4059 Group var able: _all Number of groups 1 Obs per group: mn 4059 avg 4059.0 max 4059 Mean of state level ntercepts Wald ch(1) 186.09 Log lkelhood -146.433 Prob > ch 0.0000 y Coef. Std. Err. z P> z [95% Conf. Interval] Standard devaton at the school level (level ) x1.5950551.01769 46.76 0.000.5701108.6199995 _cons -.011948.163914-0.09 0.95 -.596706.357746 Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] _all: Identty sd(r.x1).0003388.1806391 0. Standard devaton at the ndvdual level (level ) sd(resdual) 8.05417.08937 7.87914 8.950 LR test vs. lnear regresson: chbar(01) 0.00 Prob > chbar 1.0000 8

Postestmaton 9

Comparng models usng lkelhood-raton test Use the lkelhood-rato test (lrtest) to compare models ftted by maxmum lkelhood. Ths test compares the log lkelhood (shown n the output) of two models and tests whether they are sgnfcantly dfferent. /*Fttng random ntercepts and storng results*/ quetly xtmxed y x1 school:, mle nolog estmates store r /*Fttng random coeffcents and storng results*/ quetly xtmxed y x1 school: x1, mle nolog covarance(unstructure) estmates store rc /*Runnng the lkelhood-rato test to compare*/ lrtest r rc. lrtest r rc Lkelhood-rato test LR ch() 40.37 (Assumpton: r nested n rc) Prob > ch 0.0000 Note: LR test s conservatve The null hypothess s that there s no sgnfcant dfference between the two models. If Prob>ch<0.05, then you may reject the null and conclude that there s a statstcally sgnfcant dfference between the models. In the example above we reject the null and conclude that the random coeffcents model provdes a better ft (t has the lowest log lkelhood) 10

Varyng-ntercept, varyng-coeffcent model: postestmaton. xtmxed y x1 school: x1, mle nolog covarance(unstructure) varance Mxed-effects ML regresson Number of obs 4059 Group var able: school Number of groups 65 Obs per group: mn avg 6.4 max 198 Mean of state level ntercepts Wald ch(1) 779.80 Log lkelhood -14004.613 Prob > ch 0.0000 y Coef. Std. Err. z P> z [95% Conf. Interval] x1.556791.0199367 7.9 0.000.5176539.5958043 _cons -.1150841.3978336-0.9 0.77 -.894836.6646554 Standard devaton at the school level (level ) Standard devaton at the ndvdual level (level ) Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] school: Unstructured var(x1).0145355.004577.007841.069446 var(_cons) 9.04467 1.83101 6.08398 13.44964 cov(x1,_cons).1804036.0691515.044869.315938 var(resdual) 55.36533 1.498 5.97014 57.86883 LR test vs. lnear regresson: ch(3) 443.64 Prob > ch 0.0000 Note: LR test s conservatve and provded only for reference. ( sgma _ u) var(_ cons) + var( x1) 0.014 + 9.045 Intraclass _ correlaton 0.14 ( sgma _ u) + ( sgma _ e) var(_ cons) + var( x1) + var( resdual) 0.014 + 9.045 + 55.365 11

Postestmaton: varance-covarance matrx. xtmxed y x1 school: x1, mle nolog covarance(unstructure) varance Random-effects Parameters Estmate Std. Err. [95% Conf. Interval] school: Unstructured var(x1).0145355.004577.007841.069446 var(_cons) 9.04467 1.83101 6.08398 13.44964 cov(x1,_cons).1804036.0691515.044869.315938 var(resdual) 55.36533 1.498 5.97014 57.86883 LR test vs. lnear regresson: ch(3) 443.64 Prob > ch 0.0000 Note: LR test s conservatve and provded only for reference.. estat recovarance Random-effects covarance matrx for level school x1 _cons x1.0145355 _cons.1804036 9.04467 Varance-covarance matrx. estat recovarance, correlaton Random-effects correlaton matrx for level school x1 _cons x1 1 _cons.4975474 1 The correlaton between the ntercept and x1 shows a close relatonshp between the average of y and x1. 1

Postestmaton: estmatng random effects (group-level errors) y x α j[ ] + β j[ ] + ε y α j[ ] + β j[ ] x + uα + uβ + j[ ] ε Fxed-effects Random-effects To estmate the random effects u, use the command predct wth the opton reffects, ths wll gve you the best lnear unbased predctons (BLUPs) of the random effects whch bascally show the amount of varaton for both the ntercept and the estmated beta coeffcent(s). After runnng xtmxed, type predct u*, reffects Two new varables are created u1 BLUP r.e. for school: x1 ------- /* u β */ u BLUP r.e. for school: _cons --- /* u α */ 13

Postestmaton: estmatng random effects (group-level errors) y 0.1 + 0.56x1 y 0.1 + 0.56x1 + u α + uβ To explore some results type: Fxed-effects Random-effects bysort school: generate groups(_n1) /*_n1 selects the frst case of each group */ lst school u u1 f school<10 & groups. lst school u u1 f school<10 & groups school u u1 Here u and u1 are the group level errors for the ntercept and the slope respectvely. For the frst school the equaton would be: 1. 1 3.749336.149755 74. 4.7019.164761 19. 3 4.79768.0808666 181. 4.350505.17181 60. 5.46805.070576 95. 6 5.183809.05864 375. 7 3.64094 -.1488697 463. 8 -.11886.0068855 565. 9-1.76798 -.0886194 599. 10-3.139076 -.1360763 y 1 0.1 + 0.56x1 + 3.75 + 0.1 ( 0.1 + 3.75) + (0.56 + 0.1) x1 3.63+ 0.68x1 14

Postestmaton: estmatng ntercept/slope y 0.1 + 0.56x1 + 3.75 + 0.1 ( 0.1 + 3.75) + (0.56 + 0.1) x1 3.63+ 0.68 1 x 1 To estmate ntercepts and slopes per school type : gen ntercept _b[_cons] + u gen slope _b[x1] + u1 lst school ntercept slope f school<10 & groups Compare the coeffcents for school 1 above. lst school ntercept slope f school<10 & groups school ntercept slope 1. 1 3.63451.6817045 74. 4.587045.71455 19. 3 4.68596.6375957 181. 4.351664.6839111 60. 5.34771.687867 95. 6 5.06875.6153533 375. 7 3.55858.4078594 463. 8 -.369701.5636145 565. 9-1.883067.4681097 599. 10-3.54161.40658 15

Postestmaton: fttng values Usng ntercept and slope you can estmate yhat, type gen yhat ntercept + (slope*x1) Or, after xtmxed type: predct yhat_ft, ftted lst school yhat yhat_ft f school<10 & groups. lst school yhat yhat_ft f school<10 & groups school yhat yhat_ft 1. 1-1.4943-1.4943 74. -15.3951-15.3951 19. 3-7.179871-7.179871 181. 4-15.8805-15.8805 60. 5-5.193317-5.193318 95. 6-3.836668-3.836667 375. 7-6.084939-6.084939 463. 8-13.98353-13.98353 565. 9-15.609-15.609 599. 10-9.341847-9.341847 16

You can plot ndvdual regressons, type Postestmaton: ftted values (graph) twoway connected yhat_ft x1 f school<10, connect(l) Ftted values: xb + Zu -0-10 0 10 0-40 -0 0 0 40 Readng test 17

After xtmxed you can get the resduals by typng: Postestmaton: resduals predct resd, resduals predct resd_std, rstandard /* resduals/sd(resdual) */ A quck check for normalty n the resduals qnorm resd_std Standardzed resduals -4-0 4-4 - 0 4 Inverse Normal 18

DSS Onlne Tranng Secton http://dss.prnceton.edu/tranng/ UCLA Resources http://www.ats.ucla.edu/stat/ Prnceton DSS Lbgudes http://lbgudes.prnceton.edu/dss Books/References Useful lnks / Recommended books / References Beyond Fxed Versus Random Effects : A framework for mprovng substantve and statstcal analyss of panel, tme-seres cross-sectonal, and multlevel data / Brandom Bartels http://polmeth.wustl.edu/retreve.php?d838 Robust Standard Errors for Panel Regressons wth Cross-Sectonal Dependence / Danel Hoechle, http://fmwww.bc.edu/repec/bocode/x/xtscc_paper.pdf An Introducton to Modern Econometrcs Usng Stata/ Chrstopher F. Baum, Stata Press, 006. Data analyss usng regresson and multlevel/herarchcal models / Andrew Gelman, Jennfer Hll. Cambrdge ; New York : Cambrdge Unversty Press, 007. Data Analyss Usng Stata/ Ulrch Kohler, Frauke Kreuter, nd ed., Stata Press, 009. Desgnng Socal Inqury: Scentfc Inference n Qualtatve Research / Gary Kng, Robert O. Keohane, Sdney Verba, Prnceton Unversty Press, 1994. Econometrc analyss / Wllam H. Greene. 6th ed., Upper Saddle Rver, N.J. : Prentce Hall, 008. Introducton to econometrcs / James H. Stock, Mark W. Watson. nd ed., Boston: Pearson Addson Wesley, 007. Statstcal Analyss: an nterdscplnary ntroducton to unvarate & multvarate methods / Sam Kachgan, New York : Radus Press, c1986 Statstcs wth Stata (updated for verson 9) / Lawrence Hamlton, Thomson Books/Cole, 006 Unfyng Poltcal Methodology: The Lkelhood Theory of Statstcal Inference / Gary Kng, Cambrdge Unversty Press, 1989 19