Economics 345 Applied Econometrics. From:

Similar documents
2. Linear regression with multiple regressors

Forecasting the US Dollar / Euro Exchange rate Using ARMA Models

Simple linear regression

Chapter 7: Simple linear regression Learning Objectives

Module 3: Correlation and Covariance

Chapter 13 Introduction to Linear Regression and Correlation Analysis

SPSS Guide: Regression Analysis

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Module 5: Multiple Regression Analysis

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Determinants of Stock Market Performance in Pakistan

IMPACT OF WORKING CAPITAL MANAGEMENT ON PROFITABILITY

5. Multiple regression

Introduction to Regression and Data Analysis

Data analysis and regression in Stata

Exercise 1.12 (Pg )

Price volatility in the silver spot market: An empirical study using Garch applications

2. Simple Linear Regression

Causal Forecasting Models

European Journal of Business and Management ISSN (Paper) ISSN (Online) Vol.5, No.30, 2013

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

The Relationship between Life Insurance and Economic Growth: Evidence from India

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

A Guide to Using EViews with Using Econometrics: A Practical Guide

Chapter 23. Inferences for Regression

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Simple Linear Regression Inference

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Using Excel for Statistical Analysis

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Econometrics Simple Linear Regression

Forecasting Using Eviews 2.0: An Overview

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Exploring Changes in the Labor Market of Health Care Service Workers in Texas and the Rio Grande Valley I. Introduction

UK GDP is the best predictor of UK GDP, literally.

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Logs Transformation in a Regression Equation

OLS Examples. OLS Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Univariate Regression

Lesson 4 Measures of Central Tendency

The relationship between stock market parameters and interbank lending market: an empirical evidence

Session 9 Case 3: Utilizing Available Software Statistical Analysis

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

2013 MBA Jump Start Program. Statistics Module Part 3

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Air passenger departures forecast models A technical note

MULTIPLE REGRESSION EXAMPLE

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Chapter 6: Multivariate Cointegration Analysis

Testing for Granger causality between stock prices and economic growth

Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

The Impact of Privatization in Insurance Industry on Insurance Efficiency in Iran

Descriptive Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Competition as an Effective Tool in Developing Social Marketing Programs: Driving Behavior Change through Online Activities

On the Degree of Openness of an Open Economy Carlos Alfredo Rodriguez, Universidad del CEMA Buenos Aires, Argentina

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

The Dummy s Guide to Data Analysis Using SPSS

Forecasting in STATA: Tools and Tricks

CALCULATIONS & STATISTICS

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Moderator and Mediator Analysis

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Binary Logistic Regression

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Lecture 15. Endogeneity & Instrumental Variable Estimation

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Multiple Linear Regression

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

5 Correlation and Data Exploration

Some Essential Statistics The Lure of Statistics

11. Analysis of Case-control Studies Logistic Regression

False. Model 2 is not a special case of Model 1, because Model 2 includes X5, which is not part of Model 1. What she ought to do is estimate

The Basic Two-Level Regression Model

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Source engine marketing: A preliminary empirical analysis of web search data

Solución del Examen Tipo: 1

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7

Multiple Linear Regression in Data Mining

Scatter Plot, Correlation, and Regression on the TI-83/84

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Regression III: Advanced Methods

How To Run Statistical Tests in Excel

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

THE IMPACT OF COMPANY INCOME TAX AND VALUE-ADDED TAX ON ECONOMIC GROWTH: EVIDENCE FROM NIGERIA

A Short Introduction to Eviews

The importance of graphing the data: Anscombe s regression examples

Predicting The Outcome Of NASCAR Races: The Role Of Driver Experience Mary Allender, University of Portland

Directions for using SPSS

THE INTERNATIONAL JOURNAL OF BUSINESS & MANAGEMENT

Transcription:

Economics 345 Applied Econometrics Lab 2: Simple Linear Regression Prof: Martin Farnham TAs: Rebecca Wortzman Review from last lab From: http://www.six-sigma-material.com/normal-distribution.html

From: http://www.six-sigma-material.com/normal-distribution.html 1 0.05 2 = 0.975

1 0.95 2 = 0.025 From: https://www.scribd.com/doc/51960227/normal-distribution-table-positive-negative

LAB 2: Looking at measurable student outcomes and school district expenditure. DATA: alabama.wf1. FOUND IN: sfgclients on uvic\storage (S:) Browse to this drive and to the folder \social sciences\economics\econ 345\Wooldridge Eviews Files\ Introduction: A perennial question among social scientists in general, and economists focusing on education in particular, is what effect school spending has on student outcomes. Measurable student outcomes we could focus on include test scores, dropout rates, wages upon graduation, college attendance rates, etc. In this lab we will focus on standardized test scores for reading and math administered to students in grades 8-9. Recall from class, OLS estimation: Important for empirical work: - How can we interpret these estimates (what do estimates of Beta mean)? - Does our data satisfy the assumptions we make when we do OLS?

Data: Number of School Districts: 127 in the State of Alabama in the late 1980s. Focusing on 3 variables today: score89 Average reading and math standardized test score y=score89 our dependent variable for 8-9th grade students. (In standard deviation units. e.g. each student s score is expressed as a number of standard deviations from the mean. In other words, the score has been standardized (as we saw in lecture) to have a mean of zero and a standard deviation of 1). exppup Average expenditure per pupil in the district. X=exppup our independent variable (or our explanatory variable pcy Per capita income in the district. X=our control variable i.e. what is the effect of expenditure, holding per capita income fixed

Some Descriptive Statistics: Means, medians and histograms 1. What are the mean and median test score (score89) and average expenditure per pupil (exppup) o RECALL, you can group these two variables and look at the statistics at the same time What does a large difference between mean and median imply?

2. What is the distribution for test scores and expenditure per pupil? o (i.e. create a histogram for both of these variables) What can we say about the respective distributions? How do they differ?

TWO WAYS TO RESTRICT YOUR SAMPLE Sample only values above the mean 3. Create a histograms of test scores for districts with per pupil expenditures that are greater than or equal to the statewide average. - QUICK>SAMPLE>IF conditional: exppup>=@mean(exppup) - Now create a histogram for Test scores Does it differ? o How many observations are in this sample? 12 10 8 6 4 2 0-3 -2-1 0 1 2 3 4 5 Series: SCORE89 Sample 1 127 IF EXPPUP> =@MEAN(EXPPUP,"@all") Observations 51 Mean 0.325873 Median 0.286800 Maximum 4.722100 Minimum -2.804000 Std. Dev. 1.503898 Skewness 0.289054 Kurtosis 3.348888 Jarque-Bera 0.968855 Probability 0.616050 Return to looking at the full sample

Create a dummy variable, and sample only observations where dummy=1 4. Separately examine the histogram of test scores for districts that have per pupil expenditures above the statewide median, and then for districts below the statewide median. - Create a dummy variable that seperates high and low spending school districts - In the Command bar: genr highspend=(exppup>=@median(exppup)) - Now, restrict our sample to include only high-spending schools i.e. those above the statewide median - QUICK>SAMPLE>IF conditional: - In the Command bar highspend=1 OR smpl @all if exppup>=@median(exppup)

5. What is the mean value of score89? Looking at the histogram, is most of the sample distribution lying above or below zero? 6. Compare this to the bottom half of the district... - Change sample: QUICK>SAMPLE>IF: highspend=0 7. What is the mean value of score89? Looking at the histogram, is most of the sample distribution lying above or below zero? What might this imply about expenditure in school districts? Is this the result you expected? Return to the full sample

Descriptive Statistics Continued: Correlation 1. What is the correlation between exppup and score89? Is the correlation higher or lower than you expected? - Make sure you have the entire sample - Click exxpup, and CTRL + CLICK score89 - SHOW> OK - VIEW>COVARIANCE ANALYSIS> check the CORRELATIONS box - The off-diagonal elements give the correlations between exppup and score89. Recall from class:

2. To generate a simple scatter plot of exppup and score89. - Click exxpup, and CTRL + CLICK score89: SHOW - VIEW>GRAPH>SCATTER>SIMPLE SCATTER 3. Do you observe a positive or a negative relationship between the two variables of interest? Is this consistent with the correlation coefficient you obtained above? 4. Think back now over what you ve done so far in this lab. What appears to be the relationship between per pupil spending and standardized test scores? Does what you ve observed thus far tell you anything about the direction of causality between these two variables?

Regression Analysis of the Effect of School Spending on Test Scores: 1. Write out the population regression function for a regression of test scores on per pupil expenditures. 2. Run a regression of test scores on per pupil expenditures. - In the Command bar: ls score89 c exppup c: is for constant. If you leave out the c, EViews will fit a line that passes through the origin. The c is needed to tell EViews to estimate an intercept term ls: tells eviews we want to do an ordinary least squares regression We re interested in the effect that spending has on scores, so the SCORE is the dependent variable here

Interpreting Output: Dependent Variable: SCORE89 Method: Least Squares Date: 09/26/16 Time: 16:55 Sample: 1 127 Included observations: 127 Variable Coefficient Std. Error t-statistic Prob. C -4.105969 0.987363-4.158519 0.0001 EXPPUP 0.002553 0.000598 4.267907 0.0000 R-squared 0.127187 Mean dependent var 0.083917 Adjusted R-squared 0.120204 S.D. dependent var 1.266603 S.E. of regression 1.188041 Akaike info criterion 3.198111 Sum squared resid 176.4302 Schwarz criterion 3.242902 Log likelihood -201.0801 Hannan-Quinn criter. 3.216309 F-statistic 18.21503 Durbin-Watson stat 1.619126 Prob(F-statistic) 0.000039 3. Of particular interest is the estimate produced of the slope coefficient. What is the coefficient? 4. What is the R-Squared and the Sum of Squared Residuals (SSR)? 5. Can you determine total sum of squares (SST) from this information? What is SSE? Recall from class

Regression Analysis controlling for Income: 1. Now include a second variable on the right-hand-side (RHS) of the regression model (rewrite the population model). - The second RHS variable will be pcy. - COMMAND: ls score89 c exppup pcy 2. Has the R-squared increased? 3. Did you expect that the R-squared would increase? Explain. What has happened to the coefficient on average expenditure per pupil? Interpret this result. Dependent Variable: SCORE89 Method: Least Squares Date: 09/26/16 Time: 17:04 Sample: 1 127 Included observations: 127 Variable Coefficient Std. Error t-statistic Prob. C -1.673325 0.861572-1.942178 0.0544 EXPPUP -0.000528 0.000622-0.848403 0.3978 PCY 0.000243 3.05E-05 7.979017 0.0000 R-squared 0.423286 Mean dependent var 0.083917 Adjusted R-squared 0.413984 S.D. dependent var 1.266603 S.E. of regression 0.969606 Akaike info criterion 2.799484 Sum squared resid 116.5768 Schwarz criterion 2.866670 Log likelihood -174.7672 Hannan-Quinn criter. 2.826781 F-statistic 45.50563 Durbin-Watson stat 1.850062 Prob(F-statistic) 0.000000

Regression Analysis, do the assumptions hold up? 4. Prior to the inclusion of pcy do you think the assumption that the x s and the u s were uncorrelated was realistic? - i.e. are there things that could be correlated with expenditure in a district that also might affect test scores 5. Explain the likely relationship between pcy and exppup and how the omission of pcy would be likely to affect the relationship between exppup and u. - i.e. do you think this relationship would hold? 6. In light of this discussion, do you think the original estimate of the slope coefficient you obtained (when only exppup was on the RHS) was unbiased?

7. How does the ability to control for something like pcy improve your insight into the relationship between spending and test scores, when compared with the starting analysis in this lab, where you just looked at correlations, scatterplots, etc. 8. Remember that the error term captures the effect of all other factors that affect score89. Can you think of some other factors that might affect score89? 9. Are any of these likely to be correlated with exppup? 10. If so, then does the coefficient estimate on exppup capture the true CAUSAL relationship between spending and test scores? 11. What could you do to get a better estimate of that causal relationship? Next week we are going to talk about hypothesis testing, and whether these relationship are statistically significant.