# Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used to test the joint significance of a subset of coefficients, namely. Take for example the two variables from the example above, baths and bedrms. These two variables are individually insignificant based on t-tests with very high p values. But before dropping them together, we may want to test the joint significance of them using Wald test. Run in Stata: test bedrms baths The command test bedrms baths tests whether baths and bedrms are insignificant jointly. Since the null says they are, and F-stat s p-value=0.6375, then we cannot reject the null. DROP both baths and bedrms from the regression equation. They don t belong to the model. MODEL A reg price sqft bedrms baths Source SS df MS Number of obs 14 F (3, 10) Model Prob > F Residual R-squared Adj R-squared Total Root MSE price Coef. Std. Err. t P>t [95% Conf. Interval] sqft bedrms

2 baths _cons ( 1) bedrms = 0 ( ) baths = 0 F(, 10) = 0.47 Prob > F = Special Wald Test This is an F-test for the significance of all variables in the model, i.e. sqft, bedrms and baths. Hence, the null states betas of all variables in the model are set equal to zero. Null H 0 : β sqft = βbedrms = βbaths = 0 Alternative H A : At least some are non zero. Run in Stata this command: test bedrms baths sqft ( 1) bedrms = 0 ( ) baths = 0 ( 3) sqft = 0 F( 3, 10) = Prob > F = Notice that F-test p-value is , which is lower than 1% of α. Hence, we can reject the null and at least some variables in this trio is significant. This variable is SQFT! (based on the t-test). RESTRICTED MODEL MODEL B reg price sqft Source SS df MS Number of obs 14

3 F( 1, 1) Model Prob > F 0 Residual R-squared Adj R- squared Total Root MSE price Coef. Std. Err. t P>t [95% Conf. Interval] sqft cons OLS Regression Regress yvar xvarlist Regress the dependent variable yvar on the independent variables xvarlist. Regress yvar xvarlist, vce(robust) Regress, but this time compute robust (Eicker- Huber-White)standard errors. We are always using the vce(robust) option because we want consistent (i. e,, asymptotically unbiased) results, but we do not want to have to assume homoskedasticity and normality of the random error terms. So, remember always to specify the vce(robust) option after estimation commands. The vce stands for variance-covariance estimates (of the estimated model parameters). Regress yvar xvarlist, vce(robust) level(#) Regress with robust standard errors, and this time change the confidence interval to #% (e.g. use 99 for a 99% confidence interval). Improved Robust Standard Errors in Finite Samples For robust standard errors, an apparent improvement is possible. Davidson and MacKinnon*report two variance-covariance estimation methods that seem, at least in their Monte Carlo simulations, to converge more quickly, as sample size n increases, to the correct variance covariance estimates. Thus their methods seem to be better, although they require more computational time. Stata by default makes Davidson and MacKinnon s recommended simple degrees of freedom correction by multiplying the estimated variance matrix by n/(n-k). However, we should learn about an alternative in which the squared residuals are rescaled. To use this formula, specify vce(hc) instead of vce(robust). An alternative is vce(hc3) instead of vce(robust).

4 Weighted Least Squares We learn about (variance-) weighted least squares. If you know (to within a constant multiple) the variances of the error terms for all observations, this yields more efficient estimates (OLS with robust standard errors works properly using asymptotic methods but is not the most efficient estimator). Suppose you have, stored in a variable sdvar, a reasonable estimate of the standard deviation of the error term for each observation. Then weighted least squares can be performed as follows: Run in Stata: vwls yvar xvarlist, sd(sdvar) 3. Post-Estimation Commands Commands described here work after OLS regression. They sometimes work after other estimation commands, depending on the command. Fitted Values, Residuals, and Related Plots predict yhatvar After a regression, create a new variable, having the name you enter here, that contains for each observation its fitted value ˆyi. predict rvar, residuals After a regression, create a new variable, having the name you enter here, that contains for each observation its residual ˆ ui. scatter y yhat x Plot variables named y and yhat versus x. scatter resids x It is wise to plot your residuals versus each of your x-variables. Such residual plots may reveal a systematic relationship that your analysis has ignored. It is also wise to plot your residuals versus the fitted values of y, again to check for a possible nonlinearity that your analysis has ignored. rvfplot Plot the residuals versus the fitted values of y. rvpplot Plot the residuals versus a predictor (x-variable). Confidence Intervals and Hypothesis Tests For a single coefficient in your statistical model, the confidence interval is already reported in the table of regression results, along with a -sided t-test for whether the true coefficient is zero. However, you may need to carry out F-tests, as well as compute confidence intervals and t-tests for linear combinations of coefficients in the model.

5 Here are example commands. Note that when a variable name is used in this subsection, it really refers to the coefficient (the β k) in front of that variable in the model equation. Run in Stata: lincom logpl+logpk+logpf Compute the estimated sum of three model coefficients, which are the coefficients in front of the variables named logpl, logpk, and logpf. Along with this estimated sum, carry out a t-test with the null hypothesis being that the linear combination equals zero, and compute a confidence interval. lincom *logpl+1*logpk-1*logpf Like the above, but now the formula is a different linear combination of regression coefficients. lincom *logpl+1*logpk-1*logpf, level(#) As above, but this time change the confidence interval to #% (e.g. use 99 for a 99% confidence interval). test logpl+logpk+logpf==1 Test the null hypothesis that the sum of the coefficients of variables logpl, logpk, and logpf, totals to 1. This only makes sense after a regression involving variables with these names. This is an F-test. test (logq==logq1) (logq3==logq1) (logq4==logq1) (logq5==logq1) Test the null hypothesis that four equations are all true simultaneously: the coefficient of logq equals the coefficient of logq1, the coefficient of logq3 equals the coefficient of logq1, the coefficient of logq4 equals the coefficient of logq1, and the coefficient of logq5 equals the coefficient of logq1; i.e., they are all equal to each other. This is an F- test. test x3 x4 x5 Test the null hypothesis that the coefficient of x3 equals 0 and the coefficient of x4 equals 0 and the coefficient of x5 equals 0. This is an F-test. Nonlinear Hypothesis Tests After estimating a model, you could do something like the following: testnl _b[popdensity]*_b[landarea] = 3000 Test a nonlinear hypothesis. Note that coefficients must be specified using _b, whereas the linear test command lets you omit the _b[]. testnl (_b[mpg] = 1/_b[weight]) (_b[trunk] = 1/_b[length]) For multi-equation tests you can put parentheses around each equation (or use multiple equality signs in the same equation)

7 estimated coefficient of any given variable would be if any particular observation were dropped from the data. To do so for one variable, for all observations, use this command: predict newvarname, dfbeta(varname) Computes the influence statistic ( DFBETA ) for varname: how much the estimated coefficient of varname would change if each observation were excluded from the data. The change divided by the standard error of varname, for each observation i, is stored in the ith observation of the newly created variable newvarname. Then you might use summarize newvarname, detail to find out the largest values by which the estimates would change (relative to the standard error of the estimate). If these are large (say close to 1 or more), then you might be alarmed that one or more observations may completely change your results, so you had better make sure those results are valid or else use a more robust estimation technique (such as robust regression, which is not related to robust standard errors, or quantile regression, both available in Stata). If you want to compute influence statistics for many or all regressors, Stata s dfbeta command lets you do so in one step. Functional Form Test It is sometimes important to ensure that you have the right functional form for variables in your regression equation. Sometimes you don t want to be perfect, you just want to summarize roughly how some independent variables affect the dependent variable. But sometimes, e.g., if you want to control fully for the effects of an independent variable, it can be important to get the functional form right (e.g., by adding polynomials and interactions to the model). To check whether the functional form is reasonable and consider alternative forms, it helps to plot the residuals versus the fitted values and versus the predictors. Another approach is to formally test the null hypothesis that the patterns in the residuals cannot be explained by powers of the fitted values. One such formal test is the Ramsey RESET test: estat ovtest Ramsey s (1969) regression equation specification error test. Heteroskedasticity Tests After running a regression, you can carry out White s test for heteroskedasticity using the command: estat imtest, white Note, however, that there are many other heteroskedasticity tests that may be more appropriate. Stata s imtest command also carries out other tests, and the commands hettest and szroeter carry out different tests for heteroskedasticity. The Breusch-Pagan Lagrange multiplier test, which assumes normally distributed errors, can be carried out after running a regression, by using the command: estat hettest, normal

8 Other tests that do not require normally distributed errors include: estat hettest, iid (Heteroskedasticity test Koenker s (1981) s score test, assumes iid errors.) estat hettest, fstat (Heteroskedasticity test Wooldridge s (006) F-test, assumes iid errors.) estat szroeter, rhs mtest(bonf) (Heteroskedasticity test Szroeter (1978) rank test for null hypothesis that variance of error term is unrelated to each variable.) estat imtest ( Heteroskedasticity test Cameron and Trivedi (1990), also includes tests for higher-order moments of residuals (skewness and kurtosis). Serial Correlation Tests To carry out these tests in Stata, you must first tsset your data. For a Breusch-Godfrey test where, say, p = 3, do your regression and then use Stata s estat bgodfrey command: estat bgodfrey, lags(1 3) Heteroskedasticity tests including White test. Other tests for serial correlation are available. For example, the Durbin-Watson d-statistic is available using Stata s estat dwatson command. However the Durbin-Watson statistic assumes there is no endogeneity even under the alternative hypothesis, an assumption which is typically violated if there is serial correlation, so you really should use the Breusch-Godfrey test instead (or use Durbin s alternative test, estat durbinalt ). 4.LM (Lagrange multiplier) Test on Non-linearities and Model Specification/ Likelihood Ratio Test Run in Stata: Step 1 : reg y x 1 x x 3 (Run in Stata the dependable variable with the independable variables) Step : estimates store a 1 Step 3: Gen X sq= X ( To generate the square of the variable X ) Step 4: reg Y x 1 x x 3 x sq ( Now run the regression including the new variable X )

9 Step 5: estimates store a Step 6: lrtest a 1 a Step 7: Reject H 0 if : a.fstat > F* b.p-value< α

### Useful Stata Commands (for Stata version 12)

Useful Stata Commands (for Stata version 12) Kenneth L. Simons This document is updated continually. For the latest version, open it from the course disk space. This document briefly summarizes Stata commands

### Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 08/11/2016 Structure This Week What is a linear model? How

### The following postestimation commands for time series are available for regress:

Title stata.com regress postestimation time series Postestimation tools for regress with time series Description Syntax for estat archlm Options for estat archlm Syntax for estat bgodfrey Options for estat

### Heteroskedasticity Richard Williams, University of Notre Dame, Last revised January 30, 2015

Heteroskedasticity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 30, 2015 These notes draw heavily from Berry and Feldman, and, to a lesser extent, Allison,

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is

### August 2012 EXAMINATIONS Solution Part I

August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

### Econ 371 Problem Set #3 Answer Sheet

Econ 371 Problem Set #3 Answer Sheet 4.3 In this question, you are told that a OLS regression analysis of average weekly earnings yields the following estimated model. AW E = 696.7 + 9.6 Age, R 2 = 0.023,

### Introduction to Stata

Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

### Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

### Regression in Stata. Alicia Doyle Lynch Harvard-MIT Data Center (HMDC)

Regression in Stata Alicia Doyle Lynch Harvard-MIT Data Center (HMDC) Documents for Today Find class materials at: http://libraries.mit.edu/guides/subjects/data/ training/workshops.html Several formats

### Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo nadja.dreca@students.ius.edu.ba Abstract The analysis of a data set of observation for 10

### Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

### Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Regression Analysis (Spring, 2000)

Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity

### Regression Analysis. Data Calculations Output

Regression Analysis In an attempt to find answers to questions such as those posed above, empirical labour economists use a useful tool called regression analysis. Regression analysis is essentially a

### Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)

Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation

### Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship

### Quantitative Methods for Economics Tutorial 9. Katherine Eyal

Quantitative Methods for Economics Tutorial 9 Katherine Eyal TUTORIAL 9 4 October 2010 ECO3021S Part A: Problems 1. In Problem 2 of Tutorial 7, we estimated the equation ŝleep = 3, 638.25 0.148 totwrk

### Rockefeller College University at Albany

Rockefeller College University at Albany PAD 705 Handout: Hypothesis Testing on Multiple Parameters In many cases we may wish to know whether two or more variables are jointly significant in a regression.

### Quantitative Methods for Economics Tutorial 12. Katherine Eyal

Quantitative Methods for Economics Tutorial 12 Katherine Eyal TUTORIAL 12 25 October 2010 ECO3021S Part A: Problems 1. State with brief reason whether the following statements are true, false or uncertain:

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a

Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects

### Econ 371 Problem Set #3 Answer Sheet

Econ 371 Problem Set #3 Answer Sheet 4.1 In this question, you are told that a OLS regression analysis of third grade test scores as a function of class size yields the following estimated model. T estscore

### How Do We Test Multiple Regression Coefficients?

How Do We Test Multiple Regression Coefficients? Suppose you have constructed a multiple linear regression model and you have a specific hypothesis to test which involves more than one regression coefficient.

### Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015

Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,

### 12-1 Multiple Linear Regression Models

12-1.1 Introduction Many applications of regression analysis involve situations in which there are more than one regressor variable. A regression model that contains more than one regressor variable is

### KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

### Department of Economics, Session 2012/2013. EC352 Econometric Methods. Exercises from Week 03

Department of Economics, Session 01/013 University of Essex, Autumn Term Dr Gordon Kemp EC35 Econometric Methods Exercises from Week 03 1 Problem P3.11 The following equation describes the median housing

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### From the help desk: Swamy s random-coefficients model

The Stata Journal (2003) 3, Number 3, pp. 302 308 From the help desk: Swamy s random-coefficients model Brian P. Poi Stata Corporation Abstract. This article discusses the Swamy (1970) random-coefficients

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### Estimation and forecasting: OLS, IV, IV-GMM

Estimation and forecasting: OLS, IV, IV-GMM Christopher F Baum Boston College and DIW Berlin Birmingham Business School, March 2013 Christopher F Baum (BC / DIW) Estimation and forecasting BBS 2013 1 /

### Stata Walkthrough 4: Regression, Prediction, and Forecasting

Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting

### Forecasting in STATA: Tools and Tricks

Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be

### Lecture 16. Endogeneity & Instrumental Variable Estimation (continued)

Lecture 16. Endogeneity & Instrumental Variable Estimation (continued) Seen how endogeneity, Cov(x,u) 0, can be caused by Omitting (relevant) variables from the model Measurement Error in a right hand

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### Linear Regression with One Regressor

Linear Regression with One Regressor Michael Ash Lecture 10 Analogy to the Mean True parameter µ Y β 0 and β 1 Meaning Central tendency Intercept and slope E(Y ) E(Y X ) = β 0 + β 1 X Data Y i (X i, Y

MODELING AUTO INSURANCE PREMIUMS Brittany Parahus, Siena College INTRODUCTION The findings in this paper will provide the reader with a basic knowledge and understanding of how Auto Insurance Companies

### I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s

I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS * Hun Myoung Park,

### Chapter 11: Two Variable Regression Analysis

Department of Mathematics Izmir University of Economics Week 14-15 2014-2015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions

### Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything

### Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially

### Quick Stata Guide by Liz Foster

by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the

### Lectures 8, 9 & 10. Multiple Regression Analysis

Lectures 8, 9 & 0. Multiple Regression Analysis In which you learn how to apply the principles and tests outlined in earlier lectures to more realistic models involving more than explanatory variable and

### Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,

### Cointegration and the ECM

Cointegration and the ECM Two nonstationary time series are cointegrated if they tend to move together through time. For instance, we have established that the levels of the Fed Funds rate and the 3-year

### 25 Working with categorical data and factor variables

25 Working with categorical data and factor variables Contents 25.1 Continuous, categorical, and indicator variables 25.1.1 Converting continuous variables to indicator variables 25.1.2 Converting continuous

### Testing for serial correlation in linear panel-data models

The Stata Journal (2003) 3, Number 2, pp. 168 177 Testing for serial correlation in linear panel-data models David M. Drukker Stata Corporation Abstract. Because serial correlation in linear panel-data

### MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.

### Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:

Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader)

The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader) Abstract This project measures the effects of various baseball statistics on the win percentage of all the teams in MLB. Data

### ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Regression Techniques in Stata

Regression Techniques in Stata Christopher F Baum Boston College and DIW Berlin University of Adelaide, June 2010 Christopher F Baum (BC / DIW) Regression Techniques in Stata Adelaide, June 2010 1 / 134

### International Statistical Institute, 56th Session, 2007: Phil Everson

Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

### Interaction effects between continuous variables (Optional)

Interaction effects between continuous variables (Optional) Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February 0, 05 This is a very brief overview of this somewhat

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### Econ 371 Problem Set #4 Answer Sheet. P rice = (0.485)BDR + (23.4)Bath + (0.156)Hsize + (0.002)LSize + (0.090)Age (48.

Econ 371 Problem Set #4 Answer Sheet 6.5 This question focuses on what s called a hedonic regression model; i.e., where the sales price of the home is regressed on the various attributes of the home. The

### SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

### Using R for Linear Regression

Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

### Testing for Lack of Fit

Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

### Inference for Regression

Simple Linear Regression Inference for Regression The simple linear regression model Estimating regression parameters; Confidence intervals and significance tests for regression parameters Inference about

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

### Estimating and interpreting structural equation models in Stata 12

Estimating and interpreting structural equation models in Stata 12 David M. Drukker Director of Econometrics Stata Stata Conference, Chicago July 14, 2011 1 / 31 Purpose and outline Purpose To excite structural-equation-model

### INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### Wooldridge, Introductory Econometrics, 4th ed. Multiple regression analysis:

Wooldridge, Introductory Econometrics, 4th ed. Chapter 4: Inference Multiple regression analysis: We have discussed the conditions under which OLS estimators are unbiased, and derived the variances of

### Correlation and Simple Linear Regression

Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

### The Bootstrap. 1 Introduction. The Bootstrap 2. Short Guides to Microeconometrics Fall Kurt Schmidheiny Unversität Basel

Short Guides to Microeconometrics Fall 2016 The Bootstrap Kurt Schmidheiny Unversität Basel The Bootstrap 2 1a) The asymptotic sampling distribution is very difficult to derive. 1b) The asymptotic sampling

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### ECON Introductory Econometrics. Lecture 17: Experiments

ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.

Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Residuals. Residuals = ª Department of ISM, University of Alabama, ST 260, M23 Residuals & Minitab. ^ e i = y i - y i

A continuation of regression analysis Lesson Objectives Continue to build on regression analysis. Learn how residual plots help identify problems with the analysis. M23-1 M23-2 Example 1: continued Case

### Questions and Answers on Hypothesis Testing and Confidence Intervals

Questions and Answers on Hypothesis Testing and Confidence Intervals L. Magee Fall, 2008 1. Using 25 observations and 5 regressors, including the constant term, a researcher estimates a linear regression

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### CORRELATION AND SIMPLE REGRESSION ANALYSIS USING SAS IN DAIRY SCIENCE

CORRELATION AND SIMPLE REGRESSION ANALYSIS USING SAS IN DAIRY SCIENCE A. K. Gupta, Vipul Sharma and M. Manoj NDRI, Karnal-132001 When analyzing farm records, simple descriptive statistics can reveal a

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### Multiple Regression Analysis in Minitab 1

Multiple Regression Analysis in Minitab 1 Suppose we are interested in how the exercise and body mass index affect the blood pressure. A random sample of 10 males 50 years of age is selected and their

### MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part