# Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Size: px
Start display at page:

Download "Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:"

## Transcription

1 Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random effect modeling. Compare correlation specification Interpret model coefficients Use the pig data which is in wide format:. ** clear any existing data **. clear. ** increase the memory **. set memory 40m. ** increase the number of variables you can include in the dataset **. set matsize 500. ** read in the data set **. ** Make sure you specify the path correctly! **. use pigs.stata.dta, clear. ** alternatively click on: file>open>pigs.stata.dta **. ** reshape to long **. reshape long week, i(id) j(time). rename week weight. ** sort data appropriately **. sort Id time. ** spaghetti plot data **. twoway line weight time, c(l) weight time

2 Part A. Ordinary Least Squares (OLS) by hand using Stata matrix functions We wish to fit a model given by: Weight where i = pig ID and j = time.. gen intercept=1. mkmat intercept time, matrix(x) 2 ij = β 0 + β1 Timeij + ε ij ε ij ~ N( 0, σ. **look at which matrices Stata has defined **. matrix dir X[432,2]. ** read in outcome variable as a vector **. mkmat weight, matrix(y). ** use definition of OLS estimates to get parameter estimates **. matrix betahatols = invsym(x'*x)*x'*y. ** look at your estimates **. matrix list betahatols betahatols[2,1] weight intercept time Check and see if your estimates are the same as Stata:. regress weight time Source SS df MS Number of obs = F( 1, 430) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = weight Coef. Std. Err. t P> t [95% Conf. Interval] time _cons )

3 Part B. Weighted Least Squares (WLS)!!!WARNING!!! We can find wls in STATA in the options of xtreg, be. What does xtreg do? Search in the STATA help file.. ** Try the be and wls option **. xtreg weight time, be wls i(id) Between regression (regression on group means) Number of obs = 432 Group variable: Id Number of groups = 48 R-sq: within =. Obs per group: min = 9 between = avg = 9.0 overall = max = 9 F(0,47) = 0.00 sd(u_i + avg(e_i.))= Prob > F =. weight Coef. Std. Err. t P> t [95% Conf. Interval] time (dropped) _cons What model did Stata fit? Stata doesn't really have a generic WLS command. To fit WLS ourselves, we first get a uniform (exchangeable) correlation estimate, using the GEE routine for convenience. Another possibility is to average all the correlations in the ACF.. ** the quietly option hides the results from the output window **. quietly xtgee weight time, i(id) corr(exch). xtcorr Estimated within-id correlation matrix R: c1 c2 c3 c4 c5 c6 c7 c8 c9 r r r r r r r r r So a good estimate of rho is Create WLS weighting matrix by hand. Use uniform correlation formula: Corr = (1-rho)I + rho(11'). ** Using Stata matrix commands **

4 . ** This is one way to define a number in Stata, since the **. ** STATA dataset only deals with variables **. matrix rho= matrix list rho symmetric rho[1,1] c1 r ** create a vector of 1s, with the length as the number of **. ** observations within each subject, here is 9 **. matrix one = (1\1\1\1\1\1\1\1\1). matrix list one one[9,1] c1 r1 1 r2 1 r3 1 r4 1 r5 1 r6 1 r7 1 r8 1 r9 1. ** create the matrix of all 1s **. matrix oneonet = one*(one'). matrix list oneonet symmetric oneonet[9,9] r1 r2 r3 r4 r5 r6 r7 r8 r9 r1 1 r2 1 1 r r r r r r r ** symmetric means it is a symmetric matrix, and only half of the ** values are displayed. ** ** create the identity matrix using the I() command, **. matrix I = I(9). matrix list I symmetric I[9,9] c1 c2 c3 c4 c5 c6 c7 c8 c9 r1 1 r2 0 1 r r r r r

5 r r ** create the correlation matrix according to the fomular **. matrix R = (1-rho)*I + rho*oneonet. matrix list R symmetric R[9,9] r1 r2 r3 r4 r5 r6 r7 r8 r9 r1 1 r r r r r r r r Trick: use xtgee command to do the WLS with the created correlation matrix. But GEE and WLS are not equivalent.. xtgee weight time, i(id) t(time) corr(fixed R) Iteration 1: tolerance = 7.330e-15 GEE population-averaged model Number of obs = 432 Group and time vars: Id time Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9.0 Correlation: fixed (specified) max = 9 Wald chi2(1) = Scale parameter: Prob > chi2 = time _cons WLS and GEE are for marginal models, the estimated coefficients correspond to population-average effect. Although the estimated coefficients are the same as the model without considering withsubject correlation, the standard errors change, our inference changes. Try WLS with another uniform correlation, like 0.5 for fun!. matrix rho=0.5. matrix one = (1\1\1\1\1\1\1\1\1). matrix oneonet = one*(one'). matrix I = I(9). matrix R = (1-rho)*I + rho*oneonet

6 . matrix list R symmetric R[9,9] r1 r2 r3 r4 r5 r6 r7 r8 r9 r1 1 r2.5 1 r r r r r r r Do WLS with new weighting (correlation) matrix and compare:. xtgee weight time, i(id) t(time) corr(fixed R) Iteration 1: tolerance = 1.047e-14 GEE population-averaged model Number of obs = 432 Group and time vars: Id time Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9.0 Correlation: fixed (specified) max = 9 Wald chi2(1) = Scale parameter: Prob > chi2 = time _cons The estimated coefficients are still the same, the standard errors change.

7 Part C. Comparison of the correlation structures: independent, exchangeable, ARI, and unstructured. Correlation Structure Independent Exchangeable ARI Unstructured Model OLS GEE and Random Effect Model GEE and Random Effect Model GEE I. Independent correlation structure: (1) Ordinary Least Squares. regress weight time Source SS df MS Number of obs = F( 1, 430) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = weight Coef. Std. Err. t P> t [95% Conf. Interval] time _cons predict wtpred. twoway line weight wtpred time,c(l) clwidth(thin thick) time weight Fitted values. drop wtpred

8 (2) GEE with independent correlation:. xtgee weight time, i(id) corr(ind) Iteration 1: tolerance = 1.971e-15 GEE population-averaged model Number of obs = 432 Group variable: Id Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9.0 Correlation: independent max = 9 Wald chi2(1) = Scale parameter: Prob > chi2 = Pearson chi2(432): Deviance = Dispersion (Pearson): Dispersion = time _cons xtcorr Estimated within-id correlation matrix R: c1 c2 c3 c4 c5 c6 c7 c8 c9 r r r r r r r r r We can see that they give the same results. The slight difference in the standard error of the coefficients should come from that GEE needs iteration for estimation. Alternative commands (xtreg, pa) is just GEE estimate.

9 II. Exchangeable (uniform) correlation structure: Here we assume the correlation between measurements within the same pig Cor(Y ij, Y ij ) is a constant. Specifically, this correlation is independent of the time between measurements. (1) Marginal specification of uniform correlation via GEE: GEE is one method to analyze longitudinal data when marginal correlation structure is specified. We wish to fit the model: Weight ij 2 = β + β Time + ε ε N( 0, σ ) cor( ε, ε ) = ρ 0 1 ij ij ij ~ ij ij' Alternatively: ( ) Var Weight i 1 2 ρ = σ ρ ρ ρ 1 K ρ ρ M 1 ρ ρ ρ ρ 1. xtgee weight time, i(id) corr(exch) Iteration 1: tolerance = 5.585e-15 GEE population-averaged model Number of obs = 432 Group variable: Id Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9.0 Correlation: exchangeable max = 9 Wald chi2(1) = Scale parameter: Prob > chi2 = time _cons ** Alternative to fit marginal uniform corr with GEE **. xtreg weight time, pa i(id) corr(exch) To look at the estimated correlation matrix:. xtcorr Estimated within-id correlation matrix R: c1 c2 c3 c4 c5 c6 c7 c8 c9 r r r r

10 Plot of the predicted growth trend:. xtreg weight time, pa i(id) corr(exch). predict wtpred. twoway line weight wtpred time,c(l) clwidth(thin thick) time weight xb. drop wtpred We can see it is the same as the one from independent correlation structure. This is because in linear regression, the estimate of the mean function coefficient is independent of the estimate of the variance and the correlation structure. (2) Conditional specification of uniform correlation using random effect model: Random effect model specifies the correlation structure through the conditional distribution, conditioned on the random effects U_i. We can derive the marginal correlation structure from the conditional distribution. The marginal correlation structure induced from random intercept model is exchangeable. We wish to fit the model: Weight β β ε 2 ~ N( 0, σ ) 2 ε ~ N( 0, σ ) ij = 0 + ui + 1 Timeij + ij ui u ij ε This gives the same variance structure 1 ρ ρ ρ 2 2 ρ 1 M ρ Var ( Weight i ) = ( σ τ + σ u ) ρ 1 ρ where K ρ ρ ρ 1 σ ρ = σ σ 2 u 2 + τ 2 u

11 The Stata command xtreg allow two methods to fit the random effect model: GLS and MLE.. xtreg weight time, re i(id) Random-effects GLS regression Number of obs = 432 Group variable: Id Number of groups = 48 R-sq: within = Obs per group: min = 9 between = avg = 9.0 overall = max = 9 Random effects u_i ~ Gaussian Wald chi2(1) = corr(u_i, X) = 0 (assumed) Prob > chi2 = time _cons sigma_u sigma_e rho (fraction of variance due to u_i). xtreg weight time, re i(id) mle time _cons /sigma_u /sigma_e rho Likelihood-ratio test of sigma_u=0: chibar2(01)= Prob>=chibar2 = The MLE method for fitting random effect model can also be obtained by the following command:. xtreg weight time, i(id) mle It performs exactly the same procedure as the previous command. Plot the predicted growth trend from the random effect model:. quietly xtreg weight time, re i(id) mle. ** predict the marginal mean function **. predict wtpred (option xb assumed; fitted values).** predict the individual pig mean function **. predict wtpredi, xbu

12 . twoway line weight wtpredi wtpred time,c(l) time weight Xb Xb + u[id]. drop wtpred wtpredi It can be seen that the marginal mean growth trend which is averaged from the individual growth trend is the same as the one estimated using marginal specification of the correlation structure. III. AR1 correlation structure: (1) Marginal specification of AR1 correlation structure:. xtgee weight time, i(id) corr(ar1) t(time) Iteration 1: tolerance = Iteration 2: tolerance = Iteration 3: tolerance = 4.366e-07 GEE population-averaged model Number of obs = 432 Group and time vars: Id time Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9.0 Correlation: AR(1) max = 9 Wald chi2(1) = Scale parameter: Prob > chi2 = time _cons

13 . xtcorr Estimated within-id correlation matrix R: c1 c2 c3 c4 c5 c6 c7 c8 c9 r r r r r r r r r The AR parameter is , the correlation with time lag 1. (2) Conditional specification of AR1 correlation using random effect model: ** xtregar cannot use the variable specificed for time as covariate in ** regression. ** generate variable "week" for the tsset command **. gen week=time. xtset Id week panel variable: Id, 1 to 48 time variable: week, 1 to 9. xtregar weight time RE GLS regression with AR(1) disturbances Number of obs = 432 Group variable: Id Number of groups = 48 R-sq: within = Obs per group: min = 9 between = avg = 9.0 overall = max = 9 Wald chi2(2) = corr(u_i, Xb) = 0 (assumed) Prob > chi2 = time _cons rho_ar (estimated autocorrelation coefficient) sigma_u sigma_e rho_fov (fraction of variance due to u_i) theta We can see the results are different from the marginal specification, in both the standard error of beta and in the AR parameter.. ** predict the marginal mean trend **. predict wtpred. ** predict the random intercept **. predict ui, u (48 missing values generated) (48 missing values generated). ** generate the individual mean trend **

14 . gen wtpredi = wtpred + ui (48 missing values generated). twoway (line weight time, c(l)) (line wtpredi time, c(l) ) (line wpred time, lwidth(thick)) time weight Linear prediction wtpredi. drop wtpred wtpredi u The individual growth trend is different from the ones with uniform correlation. This is because we give different assumption for the individual random intercept. IV. Unstructured correlation: All the above are parametric methods, because we make assumptions for the form of the correlation structure. The unstructured methods is non-parametric, therefore it will reflect the actual correlation structure in the data. However, it has the drawback that it takes away too much of the degree of freedom in the model fitting, since every single element in the correlation matrix is estimated as a parameter. The guideline would be: when the number of observations for each subject is not big, i.e. the correlation matrix is not in large dimension, it is suggested to compare the unstructured inference with other parametric inference. If they are close, then we used the parametric inference because they are more statistically powerful and does not lose much information. If they are not close, unstructured inference should be used. However, when the correlation matrix is too large, we have to use parametric methods.

15 . xtgee weight time, i(id) corr(unstr) t(time) Iteration 1: tolerance = Iteration 2: tolerance = Iteration 3: tolerance = Iteration 4: tolerance = Iteration 5: tolerance = Iteration 6: tolerance = 3.304e-06 Iteration 7: tolerance = 7.981e-07 GEE population-averaged model Number of obs = 432 Group and time vars: Id time Number of groups = 48 Link: identity Obs per group: min = 9 Family: Gaussian avg = 9.0 Correlation: unstructured max = 9 Wald chi2(1) = Scale parameter: Prob > chi2 = time _cons working correlation matrix not positive definite convergence not achieved. xtcorr c1 c2 c3 c4 c5 c6 c7 c8 c9 r r r r r r r r r We can see the estimated correlation matrix is neither uniform or AR1. More importantly, the standard error for betas are very different from the ones with uniform and AR1 correlation structure. So should use unstructured. Q1 Does this look stationary? No!! Correlations with the same time lag are different. So the correlation does not depend on the length of lag only. Q2 How can we program an "unstructured" correlation structure with random effects? No, you couldn t, random effects model is completely parametric.

16 Finally, a comparison of regression coefficients and standard errors: Cor Structure Model Time se (Time) Independence OLS Exchangeable WLS rho = rho = GEE rho = RE rho = AR1 GEE AR = RE AR = Unstructured GEE

### Sample Size Calculation for Longitudinal Studies

Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG18911-01A1) Introduction

### Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED

1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility

### Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)

Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation

### DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo nadja.dreca@students.ius.edu.ba Abstract The analysis of a data set of observation for 10

### Correlated Random Effects Panel Data Models

INTRODUCTION AND LINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. The Linear

### From the help desk: Swamy s random-coefficients model

The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

### xtmixed & denominator degrees of freedom: myth or magic

xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or

### Multilevel Analysis and Complex Surveys. Alan Hubbard UC Berkeley - Division of Biostatistics

Multilevel Analysis and Complex Surveys Alan Hubbard UC Berkeley - Division of Biostatistics 1 Outline Multilevel data analysis Estimating specific parameters of the datagenerating distribution (GEE) Estimating

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2)

Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2) Oscar Torres-Reyna otorres@princeton.edu December 2007 http://dss.princeton.edu/training/ Intro Panel data (also known as longitudinal

### Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005

Panel Data Analysis Josef Brüderl, University of Mannheim, March 2005 This is an introduction to panel data analysis on an applied level using Stata. The focus will be on showing the "mechanics" of these

### HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal

### Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,

### August 2012 EXAMINATIONS Solution Part I

August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

### STATA Commands for Unobserved Effects Panel Data

STATA Commands for Unobserved Effects Panel Data John C Frain November 24, 2008 Contents 1 Introduction 1 2 Examining Panel Data 4 3 Estimation using xtreg 8 3.1 Introduction..........................................

### Stata Walkthrough 4: Regression, Prediction, and Forecasting

Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting

### Addressing Alternative. Multiple Regression. 17.871 Spring 2012

Addressing Alternative Explanations: Multiple Regression 17.871 Spring 2012 1 Did Clinton hurt Gore example Did Clinton hurt Gore in the 2000 election? Treatment is not liking Bill Clinton 2 Bivariate

### Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

### Nonlinear relationships Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Nonlinear relationships Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February, 5 Sources: Berry & Feldman s Multiple Regression in Practice 985; Pindyck and Rubinfeld

### The following postestimation commands for time series are available for regress:

Title stata.com regress postestimation time series Postestimation tools for regress with time series Description Syntax for estat archlm Options for estat archlm Syntax for estat bgodfrey Options for estat

### Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

### Interaction effects between continuous variables (Optional)

Interaction effects between continuous variables (Optional) Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February 0, 05 This is a very brief overview of this somewhat

### Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

### Discussion Section 4 ECON 139/239 2010 Summer Term II

Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase

### Rockefeller College University at Albany

Rockefeller College University at Albany PAD 705 Handout: Hypothesis Testing on Multiple Parameters In many cases we may wish to know whether two or more variables are jointly significant in a regression.

### MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector

Journal of Modern Accounting and Auditing, ISSN 1548-6583 November 2013, Vol. 9, No. 11, 1519-1525 D DAVID PUBLISHING A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing

### I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS * Hun Myoung Park,

### Forecasting in STATA: Tools and Tricks

Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be

### A Simple Feasible Alternative Procedure to Estimate Models with High-Dimensional Fixed Effects

DISCUSSION PAPER SERIES IZA DP No. 3935 A Simple Feasible Alternative Procedure to Estimate Models with High-Dimensional Fixed Effects Paulo Guimarães Pedro Portugal January 2009 Forschungsinstitut zur

### Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

### Clustering in the Linear Model

Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

### Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### 10 Dichotomous or binary responses

10 Dichotomous or binary responses 10.1 Introduction Dichotomous or binary responses are widespread. Examples include being dead or alive, agreeing or disagreeing with a statement, and succeeding or failing

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### 25 Working with categorical data and factor variables

25 Working with categorical data and factor variables Contents 25.1 Continuous, categorical, and indicator variables 25.1.1 Converting continuous variables to indicator variables 25.1.2 Converting continuous

### Random effects and nested models with SAS

Random effects and nested models with SAS /************* classical2.sas ********************* Three levels of factor A, four levels of B Both fixed Both random A fixed, B random B nested within A ***************************************************/

### Using Stata 9 & Higher for OLS Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 8, 2015

Using Stata 9 & Higher for OLS Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 8, 2015 Introduction. This handout shows you how Stata can be used

### Quick Stata Guide by Liz Foster

by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the

### I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

### especially with continuous

Handling interactions in Stata, especially with continuous predictors Patrick Royston & Willi Sauerbrei German Stata Users meeting, Berlin, 1 June 2012 Interactions general concepts General idea of a (two-way)

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Linear Regression Models with Logarithmic Transformations

Linear Regression Models with Logarithmic Transformations Kenneth Benoit Methodology Institute London School of Economics kbenoit@lse.ac.uk March 17, 2011 1 Logarithmic transformations of variables Considering

### The average hotel manager recognizes the criticality of forecasting. However, most

Introduction The average hotel manager recognizes the criticality of forecasting. However, most managers are either frustrated by complex models researchers constructed or appalled by the amount of time

### Regression Analysis (Spring, 2000)

Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity

### E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

### Longitudinal Data Analysis: Stata Tutorial

Part A: Overview of Stata I. Reading Data: Longitudinal Data Analysis: Stata Tutorial use Read data that have been saved in Stata format. infile Read raw data and dictionary files. insheet Read spreadsheets

### Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

### Does corporate performance predict the cost of equity capital?

AMERICAN JOURNAL OF SOCIAL AND MANAGEMENT SCIENCES ISSN Print: 2156-1540, ISSN Online: 2151-1559, doi:10.5251/ajsms.2011.2.1.26.33 2010, ScienceHuβ, http://www.scihub.org/ajsms Does corporate performance

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims

### Title. Syntax. stata.com. fp Fractional polynomial regression. Estimation

Title stata.com fp Fractional polynomial regression Syntax Menu Description Options for fp Options for fp generate Remarks and examples Stored results Methods and formulas Acknowledgment References Also

### is paramount in advancing any economy. For developed countries such as

Introduction The provision of appropriate incentives to attract workers to the health industry is paramount in advancing any economy. For developed countries such as Australia, the increasing demand for

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

MODELING AUTO INSURANCE PREMIUMS Brittany Parahus, Siena College INTRODUCTION The findings in this paper will provide the reader with a basic knowledge and understanding of how Auto Insurance Companies

### Models for Longitudinal and Clustered Data

Models for Longitudinal and Clustered Data Germán Rodríguez December 9, 2008, revised December 6, 2012 1 Introduction The most important assumption we have made in this course is that the observations

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### Week 5: Multiple Linear Regression

BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

### Panel Data Analysis in Stata

Panel Data Analysis in Stata Anton Parlow Lab session Econ710 UWM Econ Department??/??/2010 or in a S-Bahn in Berlin, you never know.. Our plan Introduction to Panel data Fixed vs. Random effects Testing

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### SUMAN DUVVURU STAT 567 PROJECT REPORT

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

### SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

### SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### Competing-risks regression

Competing-risks regression Roberto G. Gutierrez Director of Statistics StataCorp LP Stata Conference Boston 2010 R. Gutierrez (StataCorp) Competing-risks regression July 15-16, 2010 1 / 26 Outline 1. Overview

### Highlights the connections between different class of widely used models in psychological and biomedical studies. Multiple Regression

GLMM tutor Outline 1 Highlights the connections between different class of widely used models in psychological and biomedical studies. ANOVA Multiple Regression LM Logistic Regression GLM Correlated data

### Estimating the random coefficients logit model of demand using aggregate data

Estimating the random coefficients logit model of demand using aggregate data David Vincent Deloitte Economic Consulting London, UK davivincent@deloitte.co.uk September 14, 2012 Introduction Estimation

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### An introduction to GMM estimation using Stata

An introduction to GMM estimation using Stata David M. Drukker StataCorp German Stata Users Group Berlin June 2010 1 / 29 Outline 1 A quick introduction to GMM 2 Using the gmm command 2 / 29 A quick introduction

### 1.1. Simple Regression in Excel (Excel 2010).

.. Simple Regression in Excel (Excel 200). To get the Data Analysis tool, first click on File > Options > Add-Ins > Go > Select Data Analysis Toolpack & Toolpack VBA. Data Analysis is now available under

### Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

### ADVANCED FORECASTING MODELS USING SAS SOFTWARE

ADVANCED FORECASTING MODELS USING SAS SOFTWARE Girish Kumar Jha IARI, Pusa, New Delhi 110 012 gjha_eco@iari.res.in 1. Transfer Function Model Univariate ARIMA models are useful for analysis and forecasting

### Topic 3. Chapter 5: Linear Regression in Matrix Form

Topic Overview Statistics 512: Applied Linear Models Topic 3 This topic will cover thinking in terms of matrices regression on multiple predictor variables case study: CS majors Text Example (NKNW 241)

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Data Analysis Methodology 1

Data Analysis Methodology 1 Suppose you inherited the database in Table 1.1 and needed to find out what could be learned from it fast. Say your boss entered your office and said, Here s some software project

### For more information on the Stata Journal, including information for authors, see the web page. http://www.stata-journal.com

The Stata Journal Editor H. Joseph Newton Department of Statistics Texas A & M University College Station, Texas 77843 979-845-3142; FAX 979-845-3144 jnewton@stata-journal.com Associate Editors Christopher

### Implementing Panel-Corrected Standard Errors in R: The pcse Package

Implementing Panel-Corrected Standard Errors in R: The pcse Package Delia Bailey YouGov Polimetrix Jonathan N. Katz California Institute of Technology Abstract This introduction to the R package pcse is

### 1 Simple Linear Regression I Least Squares Estimation

Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

### Imputing Missing Data using SAS

ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

### Econometrics I: Econometric Methods

Econometrics I: Econometric Methods Jürgen Meinecke Research School of Economics, Australian National University 24 May, 2016 Housekeeping Assignment 2 is now history The ps tute this week will go through