Economics 345 Applied Econometrics Lab 2: Simple Linear Regression Prof: Martin Farnham TAs: Rebecca Wortzman Review from last lab From: http://www.six-sigma-material.com/normal-distribution.html
From: http://www.six-sigma-material.com/normal-distribution.html 1 0.05 2 = 0.975
1 0.95 2 = 0.025 From: https://www.scribd.com/doc/51960227/normal-distribution-table-positive-negative
LAB 2: Looking at measurable student outcomes and school district expenditure. DATA: alabama.wf1. FOUND IN: sfgclients on uvic\storage (S:) Browse to this drive and to the folder \social sciences\economics\econ 345\Wooldridge Eviews Files\ Introduction: A perennial question among social scientists in general, and economists focusing on education in particular, is what effect school spending has on student outcomes. Measurable student outcomes we could focus on include test scores, dropout rates, wages upon graduation, college attendance rates, etc. In this lab we will focus on standardized test scores for reading and math administered to students in grades 8-9. Recall from class, OLS estimation: Important for empirical work: - How can we interpret these estimates (what do estimates of Beta mean)? - Does our data satisfy the assumptions we make when we do OLS?
Data: Number of School Districts: 127 in the State of Alabama in the late 1980s. Focusing on 3 variables today: score89 Average reading and math standardized test score y=score89 our dependent variable for 8-9th grade students. (In standard deviation units. e.g. each student s score is expressed as a number of standard deviations from the mean. In other words, the score has been standardized (as we saw in lecture) to have a mean of zero and a standard deviation of 1). exppup Average expenditure per pupil in the district. X=exppup our independent variable (or our explanatory variable pcy Per capita income in the district. X=our control variable i.e. what is the effect of expenditure, holding per capita income fixed
Some Descriptive Statistics: Means, medians and histograms 1. What are the mean and median test score (score89) and average expenditure per pupil (exppup) o RECALL, you can group these two variables and look at the statistics at the same time What does a large difference between mean and median imply?
2. What is the distribution for test scores and expenditure per pupil? o (i.e. create a histogram for both of these variables) What can we say about the respective distributions? How do they differ?
TWO WAYS TO RESTRICT YOUR SAMPLE Sample only values above the mean 3. Create a histograms of test scores for districts with per pupil expenditures that are greater than or equal to the statewide average. - QUICK>SAMPLE>IF conditional: exppup>=@mean(exppup) - Now create a histogram for Test scores Does it differ? o How many observations are in this sample? 12 10 8 6 4 2 0-3 -2-1 0 1 2 3 4 5 Series: SCORE89 Sample 1 127 IF EXPPUP> =@MEAN(EXPPUP,"@all") Observations 51 Mean 0.325873 Median 0.286800 Maximum 4.722100 Minimum -2.804000 Std. Dev. 1.503898 Skewness 0.289054 Kurtosis 3.348888 Jarque-Bera 0.968855 Probability 0.616050 Return to looking at the full sample
Create a dummy variable, and sample only observations where dummy=1 4. Separately examine the histogram of test scores for districts that have per pupil expenditures above the statewide median, and then for districts below the statewide median. - Create a dummy variable that seperates high and low spending school districts - In the Command bar: genr highspend=(exppup>=@median(exppup)) - Now, restrict our sample to include only high-spending schools i.e. those above the statewide median - QUICK>SAMPLE>IF conditional: - In the Command bar highspend=1 OR smpl @all if exppup>=@median(exppup)
5. What is the mean value of score89? Looking at the histogram, is most of the sample distribution lying above or below zero? 6. Compare this to the bottom half of the district... - Change sample: QUICK>SAMPLE>IF: highspend=0 7. What is the mean value of score89? Looking at the histogram, is most of the sample distribution lying above or below zero? What might this imply about expenditure in school districts? Is this the result you expected? Return to the full sample
Descriptive Statistics Continued: Correlation 1. What is the correlation between exppup and score89? Is the correlation higher or lower than you expected? - Make sure you have the entire sample - Click exxpup, and CTRL + CLICK score89 - SHOW> OK - VIEW>COVARIANCE ANALYSIS> check the CORRELATIONS box - The off-diagonal elements give the correlations between exppup and score89. Recall from class:
2. To generate a simple scatter plot of exppup and score89. - Click exxpup, and CTRL + CLICK score89: SHOW - VIEW>GRAPH>SCATTER>SIMPLE SCATTER 3. Do you observe a positive or a negative relationship between the two variables of interest? Is this consistent with the correlation coefficient you obtained above? 4. Think back now over what you ve done so far in this lab. What appears to be the relationship between per pupil spending and standardized test scores? Does what you ve observed thus far tell you anything about the direction of causality between these two variables?
Regression Analysis of the Effect of School Spending on Test Scores: 1. Write out the population regression function for a regression of test scores on per pupil expenditures. 2. Run a regression of test scores on per pupil expenditures. - In the Command bar: ls score89 c exppup c: is for constant. If you leave out the c, EViews will fit a line that passes through the origin. The c is needed to tell EViews to estimate an intercept term ls: tells eviews we want to do an ordinary least squares regression We re interested in the effect that spending has on scores, so the SCORE is the dependent variable here
Interpreting Output: Dependent Variable: SCORE89 Method: Least Squares Date: 09/26/16 Time: 16:55 Sample: 1 127 Included observations: 127 Variable Coefficient Std. Error t-statistic Prob. C -4.105969 0.987363-4.158519 0.0001 EXPPUP 0.002553 0.000598 4.267907 0.0000 R-squared 0.127187 Mean dependent var 0.083917 Adjusted R-squared 0.120204 S.D. dependent var 1.266603 S.E. of regression 1.188041 Akaike info criterion 3.198111 Sum squared resid 176.4302 Schwarz criterion 3.242902 Log likelihood -201.0801 Hannan-Quinn criter. 3.216309 F-statistic 18.21503 Durbin-Watson stat 1.619126 Prob(F-statistic) 0.000039 3. Of particular interest is the estimate produced of the slope coefficient. What is the coefficient? 4. What is the R-Squared and the Sum of Squared Residuals (SSR)? 5. Can you determine total sum of squares (SST) from this information? What is SSE? Recall from class
Regression Analysis controlling for Income: 1. Now include a second variable on the right-hand-side (RHS) of the regression model (rewrite the population model). - The second RHS variable will be pcy. - COMMAND: ls score89 c exppup pcy 2. Has the R-squared increased? 3. Did you expect that the R-squared would increase? Explain. What has happened to the coefficient on average expenditure per pupil? Interpret this result. Dependent Variable: SCORE89 Method: Least Squares Date: 09/26/16 Time: 17:04 Sample: 1 127 Included observations: 127 Variable Coefficient Std. Error t-statistic Prob. C -1.673325 0.861572-1.942178 0.0544 EXPPUP -0.000528 0.000622-0.848403 0.3978 PCY 0.000243 3.05E-05 7.979017 0.0000 R-squared 0.423286 Mean dependent var 0.083917 Adjusted R-squared 0.413984 S.D. dependent var 1.266603 S.E. of regression 0.969606 Akaike info criterion 2.799484 Sum squared resid 116.5768 Schwarz criterion 2.866670 Log likelihood -174.7672 Hannan-Quinn criter. 2.826781 F-statistic 45.50563 Durbin-Watson stat 1.850062 Prob(F-statistic) 0.000000
Regression Analysis, do the assumptions hold up? 4. Prior to the inclusion of pcy do you think the assumption that the x s and the u s were uncorrelated was realistic? - i.e. are there things that could be correlated with expenditure in a district that also might affect test scores 5. Explain the likely relationship between pcy and exppup and how the omission of pcy would be likely to affect the relationship between exppup and u. - i.e. do you think this relationship would hold? 6. In light of this discussion, do you think the original estimate of the slope coefficient you obtained (when only exppup was on the RHS) was unbiased?
7. How does the ability to control for something like pcy improve your insight into the relationship between spending and test scores, when compared with the starting analysis in this lab, where you just looked at correlations, scatterplots, etc. 8. Remember that the error term captures the effect of all other factors that affect score89. Can you think of some other factors that might affect score89? 9. Are any of these likely to be correlated with exppup? 10. If so, then does the coefficient estimate on exppup capture the true CAUSAL relationship between spending and test scores? 11. What could you do to get a better estimate of that causal relationship? Next week we are going to talk about hypothesis testing, and whether these relationship are statistically significant.