Regression Analysis (Spring, 2000)



Similar documents
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Multiple Linear Regression

Simple Linear Regression Inference

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

SPSS Guide: Regression Analysis

Premaster Statistics Tutorial 4 Full solutions

Module 5: Multiple Regression Analysis

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Lecture 5: Model Checking. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Regression step-by-step using Microsoft Excel

2. Linear regression with multiple regressors

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Chapter 13 Introduction to Linear Regression and Correlation Analysis

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Part 2: Analysis of Relationship Between Two Variables

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

THE USE OF REGRESSION ANALYSIS IN MARKETING RESEARCH

Lecture 15. Endogeneity & Instrumental Variable Estimation

Correlation and Simple Linear Regression

Multiple Regression: What Is It?

Introduction to Regression and Data Analysis

Simple linear regression

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:


Directions for using SPSS

MULTIPLE REGRESSION EXAMPLE

SPSS-Applications (Data Analysis)

Using R for Linear Regression

Example: Boats and Manatees

Moderation. Moderation

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Additional sources Compilation of sources:

Panel Data Analysis in Stata

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Introduction to Quantitative Methods

Notes on Applied Linear Regression

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Section 13, Part 1 ANOVA. Analysis Of Variance

Correlation and Regression Analysis: SPSS

SPSS Explore procedure

Multiple Regression Using SPSS

Moderator and Mediator Analysis

Elements of statistics (MATH0487-1)

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Chapter 5 Estimating Demand Functions

Univariate Regression

Statistical Models in R

Correlation and Regression

Nonlinear Regression Functions. SW Ch 8 1/54/

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Regression Analysis: A Complete Example

Testing for Granger causality between stock prices and economic growth

Regression III: Advanced Methods

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Simple Regression Theory II 2010 Samuel L. Baker

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

SPSS TUTORIAL & EXERCISE BOOK

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Note 2 to Computer class: Standard mis-specification tests

The Relationship between Life Insurance and Economic Growth: Evidence from India

Week TSX Index

Final Exam Practice Problem Answers

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS

EFFECT OF INVENTORY MANAGEMENT EFFICIENCY ON PROFITABILITY: CURRENT EVIDENCE FROM THE U.S. MANUFACTURING INDUSTRY

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

Linear Models in STATA and ANOVA

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Getting Correct Results from PROC REG

Data analysis and regression in Stata

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

Causal Forecasting Models

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Rockefeller College University at Albany

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

5. Linear Regression

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader)

Homework 11. Part 1. Name: Score: / null

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Chapter 23. Inferences for Regression

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Transcription:

Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity of their relationship c. Given a fixed x value, we can predict y value. (How does a change of in X affect Y, ceteris paribus?) (By constructing SRF, we can estimate PRF.) OLS (ordinary least squares) method: A method to choose the SRF in such a way that the sum of the residuals is as small as possible. Cf. Think of trigonometrical function and the use of differentiation Steps of regression analysis: 1. Determine independent and dependent variables: Stare one dimension function model! 2. Look that the assumptions for dependent variables are satisfied: Residuals analysis! a. Linearity (assumption 1) b. Normality (assumption 3) draw histogram for residuals (dependent variable) or normal P-P plot (Spss statistics regression linear plots Histogram, Normal P-P plot of regression standardized ) c. Equal variance (homoscedasticity: assumption 4) draw scatter plot for residuals (Spss statistics regression linear plots: Y = *ZRESID, X =*ZPRED) Its form should be rectangular! If there were no symmetry form in the scatter plot, we should suspect the linearity. d. Independence (assumption 5,6: no autocorrelation between the disturbances, zero covariance between error term and X) each individual should be independent 3. Look at the correlation between two variables by drawing scatter graph: (Spss graph scatter simple) a. Is there any correlation? b. Is there a linear relation? c. Are there outliers? If yes, clarify the reason and modify it! (We should make outliers dummy as a new variable, and do regression analysis again.) d. Are there separated groups? If yes, it means those data came from different populations

4. Obtain a proper model by using statistical packages (SPSS) 5. Test the model: a. Test the significance of the model (the significance of slope): F-Test In the ANOVA table, find the f-value and p-value(sig.) If p-value is smaller than alpha, the model is significant. b. Test the goodness of fit of the model In the Model Summary, look at R-square. R-square(coefficient of determination) It measures the proportion or percentage of the total variation in Y explained by the regression model. (If the model is significant but R-square is small, it means that observed values are widely spread around the regression line.) 6. Test that the slope is significantly different from zero: a. Look at t-value in the Coefficients table and find p-vlaue. b. T-square should be equal to F-value. 7. If there is the significance of the model, Show the model and interpret it! steps: a. Show the SRF b. In Model Summary Interpret R-square! c. In ANOVA table Show the table, interpret F-value and the null hypothesis! d. In Coefficients table Show the table and interpret beta values! e. Show the residuals statistics and residuals scatter plot! If there is no significance of the model, interpret it like this: X variable is little helpful for explaining Y variable. or There is no linear relationship between X variable and Y variable. 8. Mean estimation (prediction) and individual prediction : We can predict the mean, individuals and their confidence intervals. (Spss statistics regression linear save predicted values: unstandardized)

Testing a model Wonjae Before setting up a model 1. Identify the linear relationship between each independent variable and dependent variable. Create scatter plot for each X and Y. ( STATA: plot Y X 1, plot Y X 2 ) ovtest, rhs graph Y X1 X2 X3, matrix avplots ) 2. Check partial correlation for each X and Y. ( STATA: pcorr Y X 1 X 2, pcorr X 1 Y X 2, pcorr X 2 Y X 1 ) After setting up a model 1. Testing whether two different variables have same coefficients. The null hypothesis is that X 1 and X 2 variables have the same impact on Y. ( STATA: test X 1 = X 2 ) 2. Testing Multicollinearity (Gujarati, p.345) 1) Detection High R 2 but few significant t-ratios. High pair-wise (zero-order) correlations among regressors graph Y X1 X2 X3, matrix avplots ) Examination of partial correlations Auxiliary regressions Eigen-values and condition index Tolerance and variance inflation factor vif ) # Interpretation: If a VIF is in excess of 20, or a tolerance (1/VIF) is.05 or less, There might be a problem of multicollinearity.

2) Correction: A. Do nothing If the main purpose of modeling is predicting Y only, then don t worry. (since ESS is left the same) Don t worry about multicollinearity if the R-squared from the regression exceeds the R-squared of any independent variable regressed on the other independent variables. Don t worry about it if the t-statistics are all greater than 2. (Kennedy, Peter. 1998. A Guide to Econometrics: 187) B. Incorporate additional information After examining correlations between all variables, find the most strongly related variable with the others. And simply omit it. ( STATA: corr X1 X2 X3 ) Be careful of the specification error, unless the true coefficient of that variable is zero. Increase the number of data Formalize relationships among regressors: for example, create interaction term(s) If it is believed that the multicollinearity arises from an actual approximate linear relationship among some of the regressors, this relationship could be formalized and the estimation could then proceed in the context of a simultaneous equation estimation problem. Specify a relationship among some parameters: If it is well-known that there exists a specific relationship among some of the parameters in the estimating equation, incorporate this information. The variances of the estimates will reduce. Form a principal component: Form a composite index variable capable of representing this group of variables by itself, only if the variables included in the composite have some useful combined economic interpretation. Incorporate estimates from other studies: See Kennedy (1998, 188-189). Shrink the OLS estimates: See Kennedy. 3. Heteroscedasticity 1) Detection Create scatter plot for residual squares and Y (p.368) Create scatter plot for each X and Y residuals (standardized) ( Partial Regression Plot in SPSS) ( STATA: predict rstan plot X1 rstan

plot X2 rstan ) White s test (p.379) Step 1: regress your model (STATA: reg Y X1 X2 ) Step 2: obtain the residuals and the squared residuals ( STATA: predict resi / gen resi2 = resi^2 ) Step 3: generate the fitted values yhat and the squared fitted values yhat ( STATA: predict yhat / gen yhat2 = yhat^2 ) Step 4: run the auxiliary regression and get the R 2 ( STATA: reg resi2 yhat yhat2 ) Step 5: 1) By using f-statistic and its p-value, evaluate the null hypothesis. or 2) By comparing χ 2 calculated (n times R 2 ) with χ 2 critical, evaluate it again. If the calculated value is greater than the critical value (reject the null), there might be heteroscedasticity or specification bias or both. Cook & Weisberg test hettest ) The Breusch-Pagan test ( STATA: reg Y X1 X2. / predict resi / gen resi2 = resi^2 / reg res2 X1 X2 ) 2) Remedial measures when variance is known: use WLS method ( STATA: reg Y* X 0 X 1 * noconstant ) cf. Y* = Y/δ, X* = X/δ when variance is not known: use white method ( STATA: gen X2r =sqrt(x), gen dx2r = 1/X2r gen Y* = Y/X2r reg Y* dx2r X2r, noconstant ) 4. Autocorrelation 1) Detection Create plot ( STATA: predict resi, resi gen lagged resi = resi[_n-1] plot resi lagged resi ) Durbin-Watson d test (Run the OLS regression and obtain the residuals compute d find d Lcritical and d Uvalues, given the N and K decide according to the decision rules)

dwstat ) Runs test predict resi, resi runtest resi ) 2) Remedial measures (pp.426-433) Estimate ρ: ρ = 1 d/2 (D-W) or ρ = n 2 (1- d/2) + k 2 /n 2 k 2 (Theil-Nagar) Regress with transformed variables and get the new d statistic. Compare it with d Lcritical and d Uvalues 5. Testing Normality of residuals obtain normal probability plot (With ZY and ZX, choose Normal probability plot in SPSS) ( STATA: predict resi, resi egen zr = std(resi) pnorm zr ) 6. Testing Outliers Detection (STATA: avplot x1 / cprplot X1 / rvpplot X1) Cooksd test: Cook s distance, which measures the aggregate change in the estimated coefficients, when each observation is left out of the estimation (STATA: regress Y X1 X2 predict c, cooksd display 4/d.f. remember this cut point value! list X1 X2 c compare the values in c with the cut point value! list c if c > 4/d.f. identify which observations are outliers! drop if c > 4/d.f. If you want to drop outliers! )