Getting Started in Fixed/Random Effects Models using R
|
|
|
- Brett Douglas Hart
- 9 years ago
- Views:
Transcription
1 Getting Started in Fixed/Random Effects Models using R (ver. 0.1-Draft) Oscar Torres-Reyna Data Consultant [email protected]
2 Intro Panel data (also known as longitudinal or cross-sectional time-series data) is a dataset in which the behavior of entities are observed across time. These entities could be states, companies, individuals, countries, etc. country year Y X1 X2 X Panel data looks like this For a brief introduction on the theory behind panel data analysis please see the following document: The contents of this document rely heavily on the document: Panel Data Econometrics in R: the plm package and notes from the ICPSR s Summer Program in Quantitative Methods of Social Research (summer 2010) 2
3 library(foreign) Panel <- read.dta(" coplot(y ~ year country, type="l", data=panel) # Lines coplot(y ~ year country, type="b", data=panel) # Points and lines # Bars at top indicates corresponding graph (i.e. countries) from left to right starting on the bottom row (Muenchen/Hilbe:355) Exploring panel data 3
4 Exploring panel data library(foreign) Panel <- read.dta(" library(car) scatterplot(y~year country, boxplots=false, smooth=true, reg.line=false, data=panel) 4
5 FIXED-EFFECTS MODEL (Covariance Model, Within Estimator, Individual Dummy Variable Model, Least Squares Dummy Variable Model)
6 Fixed effects: Heterogeneity across countries (or entities) library(foreign) Panel <- read.dta(" library(gplots) plotmeans(y ~ country, main="heterogeineity across countries", data=panel) # plotmeans draw a 95% confidence interval around the means detach("package:gplots") # Remove package gplots from the workspace Heterogeneity: unobserved variables that do not change over time 6
7 Fixed effects: Heterogeneity across years library(foreign) Panel <- read.dta(" library(gplots) plotmeans(y ~ year, main="heterogeineity across years", data=panel) # plotmeans draw a 95% confidence interval around the means detach("package:gplots") # Remove package gplots from the workspace Heterogeneity: unobserved variables that do not change over time 7
8 > library(foreign) > Panel <- read.dta(" > ols <-lm(y ~ x1, data=panel) > summary(ols) OLS regression Call: lm(formula = y ~ x1, data = Panel) Residuals: Min 1Q Median 3Q Max e e e e e+09 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1.524e e * x e e Signif. codes: 0 *** ** 0.01 * Regular OLS regression does not consider heterogeneity across groups or time Residual standard error: 3.028e+09 on 68 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 68 DF, p-value: > yhat <- ols$fitted > plot(panel$x1, Panel$y, pch=19, xlab="x1", ylab="y") > abline(lm(panel$y~panel$x1),lwd=3, col="red") 8
9 Fixed effects using Least squares dummy variable model > library(foreign) >Panel <- read.dta(" > fixed.dum <-lm(y ~ x1 + factor(country) - 1, data=panel) > summary(fixed.dum) Call: lm(formula = y ~ x1 + factor(country) - 1, data = Panel) Residuals: Min 1Q Median 3Q Max e e e e e+09 Coefficients: Estimate Std. Error t value Pr(> t ) x e e * factor(country)a 8.805e e factor(country)b e e factor(country)c e e factor(country)d 3.163e e *** factor(country)e e e factor(country)f 2.011e e factor(country)g e e Signif. codes: 0 *** ** 0.01 * Residual standard error: 2.796e+09 on 62 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 8 and 62 DF, p-value: 8.892e-06 For the theory behind fixed effects, please see 9
10 > yhat <- fixed.dum$fitted > library(car) Least squares dummy variable model > scatterplot(yhat~panel$x1 Panel$country, boxplots=false, xlab="x1", ylab="yhat",smooth=false) > abline(lm(panel$y~panel$x1),lwd=3, col="red") OLS regression 10
11 Comparing OLS vs LSDV model Each component of the factor variable (country) is absorbing the effects particular to each country. Predictor x1 was not significant in the OLS model, once controlling for differences across countries, x1 became significant in the OLS_DUM (i.e. LSDV model). > library(apsrtable) > apsrtable(ols,fixed.dum, model.names = c("ols", "OLS_DUM")) # Displays a table in Latex form The coefficient of x1 indicates how much Y changes when X increases by one unit. Notice x1 is not significant in the OLS model \begin{table}[!ht] \caption{} \label{} \begin{tabular}{ l D{.}{.}{2}D{.}{.}{2} } \hline & \multicolumn{ 1 }{ c }{ OLS } & \multicolumn{ 1 }{ c }{ OLS_DUM } \\ \hline % & OLS & OLS_DUM \\ (Intercept) & ^* & \\ & ( ) & \\ x1 & & ^*\\ & ( ) & ( ) \\ factor(country)a & & \\ & & ( ) \\ factor(country)b & & \\ & & ( ) \\ factor(country)c & & \\ & & ( ) \\ factor(country)d & & ^*\\ & & ( ) \\ factor(country)e & & \\ & & ( ) \\ factor(country)f & & \\ & & ( ) \\ factor(country)g & & \\ & & ( ) \\ $N$ & 70 & 70 \\ $R^2$ & 0.01 & 0.44 \\ adj. $R^2$ & & 0.37 \\ Resid. sd & & \\ \hline \multicolumn{3}{l}{\footnotesize{standard errors in parentheses}}\\ \multicolumn{3}{l}{\footnotesize{$^*$ indicates significance at $p< 0.05 $}} \end{tabular} \end{table} The coefficient of x1 indicates how much Y changes overtime, controlling by differences in countries, when X increases by one unit. Notice x1 is significant in the LSDV model > cat(apsrtable(ols, fixed.dum, model.names = c("ols", "OLS_DUM"), Sweave=F), file="ols_fixed1.txt") # Exports the table to a text file (in Latex code). 11
12 Outcome variable Fixed effects: n entity-specific intercepts (using plm) > library(plm) > fixed <- plm(y ~ x1, data=panel, index=c("country", "year"), model="within") > summary(fixed) Oneway (individual) effect Within Model Predictor variable(s) Panel setting Call: plm(formula = y ~ x1, data = Panel, model = "within", index = c("country", "year")) n = # of groups/panels, T = # years, N = total # of observations Balanced Panel: n=7, T=10, N=70 Fixed effects option The coeff of x1 indicates how much Y changes overtime, on average per country, when X increases by one unit. Residuals : Min. 1st Qu. Median Mean 3rd Qu. Max e e e e e e+09 Coefficients : Estimate Std. Error t-value Pr(> t ) x * --- Signif. codes: 0 *** ** 0.01 * Total Sum of Squares: e+20 Residual Sum of Squares: e+20 R-Squared : Adj. R-Squared : F-statistic: on 1 and 62 DF, p-value: Pr(> t )= Two-tail p-values test the hypothesis that each coefficient is different from 0. To reject this, the p-value has to be lower than 0.05 (95%, you could choose also an alpha of 0.10), if this is the case then you can say that the variable has a significant influence on your dependent variable (y) If this number is < 0.05 then your model is ok. This is a test (F) to see whether all the coefficients in the model are different than zero. > fixef(fixed) # Display the fixed effects (constants for each country) A B C D E F G > pftest(fixed, ols) # Testing for fixed effects, null: OLS better than fixed F test for individual effects data: y ~ x1 F = , df1 = 6, df2 = 62, p-value = alternative hypothesis: significant effects If the p-value is < 0.05 then the fixed effects model is a better choice 12
13 RANDOM-EFFECTS MODEL (Random Intercept, Partial Pooling Model)
14 Outcome variable Random effects (using plm) > random <- plm(y ~ x1, data=panel, index=c("country", "year"), model="random") > summary(random) Oneway (individual) effect Random Effect Model (Swamy-Arora's transformation) Call: plm(formula = y ~ x1, data = Panel, model = "random", index = c("country", "year")) Balanced Panel: n=7, T=10, N=70 Effects: var std.dev share idiosyncratic 7.815e e individual 1.133e e theta: Predictor variable(s) Panel setting Random effects option n = # of groups/panels, T = # years, N = total # of observations Interpretation of the coefficients is tricky since they include both the within-entity and between-entity effects. In the case of TSCS data represents the average effect of X over Y when X changes across time and between countries by one unit. Residuals : Min. 1st Qu. Median Mean 3rd Qu. Max e e e e e e+09 Coefficients : Estimate Std. Error t-value Pr(> t ) (Intercept) x Total Sum of Squares: e+20 Residual Sum of Squares: e+20 R-Squared : Adj. R-Squared : F-statistic: on 1 and 68 DF, p-value: Pr(> t )= Two-tail p-values test the hypothesis that each coefficient is different from 0. To reject this, the p-value has to be lower than 0.05 (95%, you could choose also an alpha of 0.10), if this is the case then you can say that the variable has a significant influence on your dependent variable (y) If this number is < 0.05 then your model is ok. This is a test (F) to see whether all the coefficients in the model are different than zero. # Setting as panel data (an alternative way to run the above model Panel.set <- plm.data(panel, index = c("country", "year")) # Random effects using panel setting (same output as above) random.set <- plm(y ~ x1, data = Panel.set, model="random") summary(random.set) For the theory behind random effects please see: 14
15 FIXED OR RANDOM?
16 Fixed or Random: Hausman test To decide between fixed or random effects you can run a Hausman test where the null hypothesis is that the preferred model is random effects vs. the alternative the fixed effects (see Green, 2008, chapter 9). It basically tests whether the unique errors (u i ) are correlated with the regressors, the null hypothesis is they are not. Run a fixed effects model and save the estimates, then run a random model and save the estimates, then perform the test. If the p-value is significant (for example <0.05) then use fixed effects, if not use random effects. > phtest(fixed, random) If this number is < 0.05 then use fixed effects Hausman Test data: y ~ x1 chisq = 3.674, df = 1, p-value = alternative hypothesis: one model is inconsistent For the theory behind fixed/random effects please see: 16
17 OTHER TESTS/ DIAGNOSTICS
18 Testing for time-fixed effects > library(plm) > fixed <- plm(y ~ x1, data=panel, index=c("country", "year"), model="within") > fixed.time <- plm(y ~ x1 + factor(year), data=panel, index=c("country", "year"), model="within") > summary(fixed.time) Oneway (individual) effect Within Model Call: plm(formula = y ~ x1 + factor(year), data = Panel, model = "within", index = c("country", "year")) Balanced Panel: n=7, T=10, N=70 Residuals : Min. 1st Qu. Median Mean 3rd Qu. Max e e e e e e+09 Coefficients : Estimate Std. Error t-value Pr(> t ) x factor(year) factor(year) factor(year) factor(year) factor(year) factor(year) factor(year) factor(year) factor(year) Signif. codes: 0 *** ** 0.01 * Total Sum of Squares: e+20 Residual Sum of Squares: e+20 R-Squared : Adj. R-Squared : F-statistic: on 10 and 53 DF, p-value: > # Testing time-fixed effects. The null is that no time-fixed effects needed If this number is < 0.05 then > pftest(fixed.time, fixed) use time-fixed effects. In this example, no need to use F test for individual effects time-fixed effects. data: y ~ x1 + factor(year) F = 1.209, df1 = 9, df2 = 53, p-value = alternative hypothesis: significant effects > plmtest(fixed, c("time"), type=("bp")) Lagrange Multiplier Test - time effects (Breusch-Pagan) data: y ~ x1 chisq = , df = 1, p-value = alternative hypothesis: significant effects 18
19 Testing for random effects: Breusch-Pagan Lagrange multiplier (LM) > # Regular OLS (pooling model) using plm > > pool <- plm(y ~ x1, data=panel, index=c("country", "year"), model="pooling") > summary(pool) Oneway (individual) effect Pooling Model Call: plm(formula = y ~ x1, data = Panel, model = "pooling", index = c("country", "year")) Balanced Panel: n=7, T=10, N=70 Residuals : Min. 1st Qu. Median Mean 3rd Qu. Max e e e e e e+09 Coefficients : Estimate Std. Error t-value Pr(> t ) (Intercept) * x Signif. codes: 0 *** ** 0.01 * Total Sum of Squares: e+20 Residual Sum of Squares: e+20 R-Squared : Adj. R-Squared : F-statistic: on 1 and 68 DF, p-value: The LM test helps you decide between a random effects regression and a simple OLS regression. The null hypothesis in the LM test is that variances across entities is zero. This is, no significant difference across units (i.e. no panel effect). ( > # Breusch-Pagan Lagrange Multiplier for random effects. Null is no panel effect (i.e. OLS better). > plmtest(pool, type=c("bp")) Lagrange Multiplier Test - (Breusch-Pagan) data: y ~ x1 chisq = , df = 1, p-value = alternative hypothesis: significant effects Here we failed to reject the null and conclude that random effects is not appropriate. This is, no evidence of significant differences across countries, therefore you can run a simple OLS regression. 19
20 Testing for cross-sectional dependence/contemporaneous correlation: using Breusch-Pagan LM test of independence and Pasaran CD test According to Baltagi, cross-sectional dependence is a problem in macro panels with long time series. This is not much of a problem in micro panels (few years and large number of cases). The null hypothesis in the B-P/LM and Pasaran CD tests of independence is that residuals across entities are not correlated. B-P/LM and Pasaran CD (cross-sectional dependence) tests are used to test whether the residuals are correlated across entities*. Cross-sectional dependence can lead to bias in tests results (also called contemporaneous correlation). > fixed <- plm(y ~ x1, data=panel, index=c("country", "year"), model="within") > pcdtest(fixed, test = c("lm")) Breusch-Pagan LM test for cross-sectional dependence in panels data: formula chisq = , df = 21, p-value = alternative hypothesis: cross-sectional dependence > pcdtest(fixed, test = c("cd")) No cross-sectional dependence Pesaran CD test for cross-sectional dependence in panels data: formula z = , p-value = alternative hypothesis: cross-sectional dependence *Source: Hoechle, Daniel, Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence, 20
21 Testing for serial correlation Serial correlation tests apply to macro panels with long time series. Not a problem in micro panels (with very few years). The null is that there is not serial correlation. > pbgtest(fixed) Loading required package: lmtest Breusch-Godfrey/Wooldridge test for serial correlation in panel models data: y ~ x1 chisq = , df = 10, p-value = alternative hypothesis: serial correlation in idiosyncratic errors No serial correlation 21
22 Testing for unit roots/stationarity The Dickey-Fuller test to check for stochastic trends. The null hypothesis is that the series has a unit root (i.e. non-stationary). If unit root is present you can take the first difference of the variable. > Panel.set <- plm.data(panel, index = c("country", "year")) > library(tseries) > adf.test(panel.set$y, k=2) Augmented Dickey-Fuller Test data: Panel.set$y Dickey-Fuller = , Lag order = 2, p-value = alternative hypothesis: stationary If p-value < 0.05 then no unit roots present. 22
23 Testing for heteroskedasticity The null hypothesis for the Breusch-Pagan test is homoskedasticity. > library(lmtest) > bptest(y ~ x1 + factor(country), data = Panel, studentize=f) Breusch-Pagan test data: y ~ x1 + factor(country) BP = , df = 7, p-value = Presence of heteroskedasticity If hetersokedaticity is detected you can use robust covariance matrix to account for it. See the following pages. 23
24 Controlling for heteroskedasticity: Robust covariance matrix estimation (Sandwich estimator) The --vcovhc function estimates three heteroskedasticity-consistent covariance estimators: "white1" - for general heteroskedasticity but no serial correlation. Recommended for random effects. "white2" - is "white1" restricted to a common variance within groups. Recommended for random effects. "arellano" - both heteroskedasticity and serial correlation. Recommended for fixed effects. The following options apply*: HC0 - heteroskedasticity consistent. The default. HC1,HC2, HC3 Recommended for small samples. HC3 gives less weight to influential observations. HC4 - small samples with influential observations HAC - heteroskedasticity and autocorrelation consistent (type?vcovhac for more details) See the following pages for examples For more details see: (see page 4) Stock and Watson *Kleiber and Zeileis,
25 Controlling for heteroskedasticity: Random effects > random <- plm(y ~ x1, data=panel, index=c("country", "year"), model="random") > coeftest(random) # Original coefficients t test of coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x > coeftest(random, vcovhc) # Heteroskedasticity consistent coefficients t test of coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x > coeftest(random, vcovhc(random, type = "HC3")) # Heteroskedasticity consistent coefficients, type 3 t test of coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x > # The following shows the HC standard errors of the coefficients > t(sapply(c("hc0", "HC1", "HC2", "HC3", "HC4"), function(x) sqrt(diag(vcovhc(random, type = x))))) (Intercept) x1 HC HC HC HC HC Standard errors given different types of HC. 25
26 > fixed <- plm(y ~ x1, data=panel, index=c("country", "year"), model="within") > coeftest(fixed) # Original coefficients t test of coefficients: Estimate Std. Error t value Pr(> t ) x * --- Signif. codes: 0 *** ** 0.01 * Controlling for heteroskedasticity: Fixed effects > coeftest(fixed, vcovhc) # Heteroskedasticity consistent coefficients t test of coefficients: Estimate Std. Error t value Pr(> t ) x Signif. codes: 0 *** ** 0.01 * > coeftest(fixed, vcovhc(fixed, method = "arellano")) # Heteroskedasticity consistent coefficients (Arellano) t test of coefficients: Estimate Std. Error t value Pr(> t ) x Signif. codes: 0 *** ** 0.01 * > coeftest(fixed, vcovhc(fixed, type = "HC3")) # Heteroskedasticity consistent coefficients, type 3 t test of coefficients: Estimate Std. Error t value Pr(> t ) x Signif. codes: 0 *** ** 0.01 * > # The following shows the HC standard errors of the coefficients > t(sapply(c("hc0", "HC1", "HC2", "HC3", "HC4"), function(x) sqrt(diag(vcovhc(fixed, type = x))))) HC0.x1 HC1.x1 HC2.x1 HC3.x1 HC4.x1 [1,] Standard errors given different types of HC. 26
27 References/Useful links DSS Online Training Section Princeton DSS Libguides John Fox s site Quick-R UCLA Resources to learn and use R UCLA Resources to learn and use Stata DSS - Stata DSS - R Panel Data Econometrics in R: the plm package Econometric Computing with HC and HAC Covariance Matrix Estimators 27
28 References/Recommended books An R Companion to Applied Regression, Second Edition / John Fox, Sanford Weisberg, Sage Publications, 2011 Data Manipulation with R / Phil Spector, Springer, 2008 Applied Econometrics with R / Christian Kleiber, Achim Zeileis, Springer, 2008 Introductory Statistics with R / Peter Dalgaard, Springer, 2008 Complex Surveys. A guide to Analysis Using R / Thomas Lumley, Wiley, 2010 Applied Regression Analysis and Generalized Linear Models / John Fox, Sage, 2008 R for Stata Users / Robert A. Muenchen, Joseph Hilbe, Springer, 2010 Introduction to econometrics / James H. Stock, Mark W. Watson. 2nd ed., Boston: Pearson Addison Wesley, Data analysis using regression and multilevel/hierarchical models / Andrew Gelman, Jennifer Hill. Cambridge ; New York : Cambridge University Press, Econometric analysis / William H. Greene. 6th ed., Upper Saddle River, N.J. : Prentice Hall, Designing Social Inquiry: Scientific Inference in Qualitative Research / Gary King, Robert O. Keohane, Sidney Verba, Princeton University Press, Unifying Political Methodology: The Likelihood Theory of Statistical Inference / Gary King, Cambridge University Press, 1989 Statistical Analysis: an interdisciplinary introduction to univariate & multivariate methods / Sam Kachigan, New York : Radius Press, c1986 Statistics with Stata (updated for version 9) / Lawrence Hamilton, Thomson Books/Cole,
Merge/Append using R (draft)
Merge/Append using R (draft) Oscar Torres-Reyna Data Consultant [email protected] January, 2011 http://dss.princeton.edu/training/ Intro Merge adds variables to a dataset. This document will use merge
Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2)
Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2) Oscar Torres-Reyna [email protected] December 2007 http://dss.princeton.edu/training/ Intro Panel data (also known as longitudinal
Multiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day
Implementing Panel-Corrected Standard Errors in R: The pcse Package
Implementing Panel-Corrected Standard Errors in R: The pcse Package Delia Bailey YouGov Polimetrix Jonathan N. Katz California Institute of Technology Abstract This introduction to the R package pcse is
EFFECT OF INVENTORY MANAGEMENT EFFICIENCY ON PROFITABILITY: CURRENT EVIDENCE FROM THE U.S. MANUFACTURING INDUSTRY
EFFECT OF INVENTORY MANAGEMENT EFFICIENCY ON PROFITABILITY: CURRENT EVIDENCE FROM THE U.S. MANUFACTURING INDUSTRY Seungjae Shin, Mississippi State University Kevin L. Ennis, Mississippi State University
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
Time-Series Regression and Generalized Least Squares in R
Time-Series Regression and Generalized Least Squares in R An Appendix to An R Companion to Applied Regression, Second Edition John Fox & Sanford Weisberg last revision: 11 November 2010 Abstract Generalized
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
Week 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
Regression Analysis (Spring, 2000)
Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity
Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
Panel Data Analysis in Stata
Panel Data Analysis in Stata Anton Parlow Lab session Econ710 UWM Econ Department??/??/2010 or in a S-Bahn in Berlin, you never know.. Our plan Introduction to Panel data Fixed vs. Random effects Testing
Using R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
From the help desk: Swamy s random-coefficients model
The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
We extended the additive model in two variables to the interaction model by adding a third term to the equation.
Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic
SPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
Comparing Nested Models
Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller
E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s
I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS * Hun Myoung Park,
Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Introduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480
1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500
UNDERSTANDING THE DEPENDENT-SAMPLES t TEST
UNDERSTANDING THE DEPENDENT-SAMPLES t TEST A dependent-samples t test (a.k.a. matched or paired-samples, matched-pairs, samples, or subjects, simple repeated-measures or within-groups, or correlated groups)
Regression step-by-step using Microsoft Excel
Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
ANOVA. February 12, 2015
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R
ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
COURSES: 1. Short Course in Econometrics for the Practitioner (P000500) 2. Short Course in Econometric Analysis of Cointegration (P000537)
Get the latest knowledge from leading global experts. Financial Science Economics Economics Short Courses Presented by the Department of Economics, University of Pretoria WITH 2015 DATES www.ce.up.ac.za
Data Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST
UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly
DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS
DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo [email protected] Abstract The analysis of a data set of observation for 10
The following postestimation commands for time series are available for regress:
Title stata.com regress postestimation time series Postestimation tools for regress with time series Description Syntax for estat archlm Options for estat archlm Syntax for estat bgodfrey Options for estat
Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:
Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Vector Time Series Model Representations and Analysis with XploRe
0-1 Vector Time Series Model Representations and Analysis with plore Julius Mungo CASE - Center for Applied Statistics and Economics Humboldt-Universität zu Berlin [email protected] plore MulTi Motivation
Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions
Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially
This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.
One-Degree-of-Freedom Tests Test for group occasion interactions has (number of groups 1) number of occasions 1) degrees of freedom. This can dilute the significance of a departure from the null hypothesis.
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Directions for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
Testing for Granger causality between stock prices and economic growth
MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted
Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance
Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance 14 November 2007 1 Confidence intervals and hypothesis testing for linear regression Just as there was
Forecasting the US Dollar / Euro Exchange rate Using ARMA Models
Forecasting the US Dollar / Euro Exchange rate Using ARMA Models LIUWEI (9906360) - 1 - ABSTRACT...3 1. INTRODUCTION...4 2. DATA ANALYSIS...5 2.1 Stationary estimation...5 2.2 Dickey-Fuller Test...6 3.
Moderation. Moderation
Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation
Examining the effects of exchange rates on Australian domestic tourism demand: A panel generalized least squares approach
19th International Congress on Modelling and Simulation, Perth, Australia, 12 16 December 2011 http://mssanz.org.au/modsim2011 Examining the effects of exchange rates on Australian domestic tourism demand:
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
Psychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
Note 2 to Computer class: Standard mis-specification tests
Note 2 to Computer class: Standard mis-specification tests Ragnar Nymoen September 2, 2013 1 Why mis-specification testing of econometric models? As econometricians we must relate to the fact that the
Final Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
MIXED MODEL ANALYSIS USING R
Research Methods Group MIXED MODEL ANALYSIS USING R Using Case Study 4 from the BIOMETRICS & RESEARCH METHODS TEACHING RESOURCE BY Stephen Mbunzi & Sonal Nagda www.ilri.org/rmg www.worldagroforestrycentre.org/rmg
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
Stat 5303 (Oehlert): Tukey One Degree of Freedom 1
Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
This chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
Financial Risk Models in R: Factor Models for Asset Returns. Workshop Overview
Financial Risk Models in R: Factor Models for Asset Returns and Interest Rate Models Scottish Financial Risk Academy, March 15, 2011 Eric Zivot Robert Richards Chaired Professor of Economics Adjunct Professor,
Chapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
Multivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.
Correlated Random Effects Panel Data Models
INTRODUCTION AND LINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. The Linear
Exchange Rate Regime Analysis for the Chinese Yuan
Exchange Rate Regime Analysis for the Chinese Yuan Achim Zeileis Ajay Shah Ila Patnaik Abstract We investigate the Chinese exchange rate regime after China gave up on a fixed exchange rate to the US dollar
Independent t- Test (Comparing Two Means)
Independent t- Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent t-test when to use the independent t-test the use of SPSS to complete an independent
Chapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
SYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship)
1 Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship) I. Authors should report effect sizes in the manuscript and tables when reporting
Getting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
Basic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
Lecture 15. Endogeneity & Instrumental Variable Estimation
Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
INTERNATIONAL UNIVERSITY OF JAPAN Public Management and Policy Analysis Program Graduate School of International Relations
INTERNATIONAL UNIVERSITY OF JAPAN Public Management and Policy Analysis Program Graduate School of International Relations ADC6512 Topics in Data Analysis (Panel Data Models Using Stata) (2 Credits) Winter
Multiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)
Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................
The Effect of R&D Expenditures on Stock Returns, Price and Volatility
Master Degree Project in Finance The Effect of R&D Expenditures on Stock Returns, Price and Volatility A study on biotechnological and pharmaceutical industry in the US market Aleksandra Titi Supervisor:
Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is
In this lab we will look at how R can eliminate most of the annoying calculations involved in (a) using Chi-Squared tests to check for homogeneity in two-way tables of catagorical data and (b) computing
Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:
Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
Lucky vs. Unlucky Teams in Sports
Lucky vs. Unlucky Teams in Sports Introduction Assuming gambling odds give true probabilities, one can classify a team as having been lucky or unlucky so far. Do results of matches between lucky and unlucky
EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST
EPS 625 INTERMEDIATE STATISTICS The Friedman test is an extension of the Wilcoxon test. The Wilcoxon test can be applied to repeated-measures data if participants are assessed on two occasions or conditions
Handling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
Spatial panel models
Spatial panel models J Paul Elhorst University of Groningen, Department of Economics, Econometrics and Finance PO Box 800, 9700 AV Groningen, the Netherlands Phone: +31 50 3633893, Fax: +31 50 3637337,
ADVANCED FORECASTING MODELS USING SAS SOFTWARE
ADVANCED FORECASTING MODELS USING SAS SOFTWARE Girish Kumar Jha IARI, Pusa, New Delhi 110 012 [email protected] 1. Transfer Function Model Univariate ARIMA models are useful for analysis and forecasting
Generalized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.
MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of
Testing for Lack of Fit
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
Air passenger departures forecast models A technical note
Ministry of Transport Air passenger departures forecast models A technical note By Haobo Wang Financial, Economic and Statistical Analysis Page 1 of 15 1. Introduction Sine 1999, the Ministry of Business,
THE IMPACT OF EXCHANGE RATE VOLATILITY ON BRAZILIAN MANUFACTURED EXPORTS
THE IMPACT OF EXCHANGE RATE VOLATILITY ON BRAZILIAN MANUFACTURED EXPORTS ANTONIO AGUIRRE UFMG / Department of Economics CEPE (Centre for Research in International Economics) Rua Curitiba, 832 Belo Horizonte
