Business Statistics 41000: Simple Linear Regression

Transcription

1 Business Statistics 41000: Simple Linear Regression Drew D. Creal University of Chicago, Booth School of Business March 7 and 8,

2 Class information Drew D. Creal Office: 404 Harper Center Office hours: me for an appointment Office phone:

3 Course schedule Week # 1: Plotting and summarizing univariate data Week # 2: Plotting and summarizing bivariate data Week # 3: Probability 1 Week # 4: Probability 2 Week # 5: Probability 3 Week # 6: In-class exam and Probability 4 Week # 7: Statistical inference 1 Week # 8: Statistical inference 2 Week # 9: Simple linear regression Week # 10: Multiple linear regression 3

4 Outline of today s topics I. Motivation: why regression? II. The simple linear regression model III. Interpretation of the regression parameters IV. Regression as a model of P(Y = y X = x) V. Estimation of the regression parameters VI. Plug-in prediction V. Confidence Intervals and Hypothesis Tests for the Regression Parameters VIII. Fits, residuals, and R-squared IX. Application: The Market Model 4

5 Motivation: why regression? 5

6 Motivation: why regression? Regression is a useful tool for many reasons. The most important are: Prediction/forecasting Measuring dependence (e.g. correlation) between variables. 6

7 Motivation: why regression? Consider the housing data (MidCity.xls). There are two numeric variables price and size of the homes Price Size: square feet 7

8 Motivation: why regression? Consider only the data on housing prices. For the moment, assume we do not observe house sizes. After Lectures # 3-7, we recognize that our sample of n = 128 homes is exactly that...a sample. What if we are interested in learning about the population and want to account for sampling uncertainty? After all, we could have observed a different sample of houses. How could we model the price of a house using probability? 8

9 Motivation: why regression? We can define a r.v. Y i to be the price of the i-th house! It is a r.v. because before the house is sold we do not know how much it will sell for. We treat our data (y 1, y 2,..., y n ) as the outcome of a sequence of r.v. s (Y 1, Y 2,..., Y n ). We can model this data with an i.i.d. Normal model. p(y 1, y 2,..., y n ) = p(y 1 ) p(y 2 ) p(y n ) In other words, each observation is a normal r.v. p(y i ) = N(µ, σ 2 ) 9

10 Motivation: why regression? Suppose you are going to sell your house in this city. You are interested in the average price µ of a house. Following Lectures #7-8, we can perform statistical inference for the parameters of interest, e.g. E[Y ] = µ. For example, we could use x as an estimator of µ. We can use x to predict what the house Y i will sell for. But, we are ignoring information about house sizes!! We are currently only looking at the marginal distribution P(Y = y). 10

11 Motivation: why regression? To incorporate information on house sizes, we need to define a second r.v. X i, which is the size of the i-th house. In reality, we not only observe outcomes of r.v. s (Y 1, Y 2,..., Y n ) but we observe outcomes on the pair (X i, Y i ). We believe that a house s size is clearly related to its price. Which distribution are we interested in? 1. P(Y = y X = x) 2. P(Y = y, X = x) 11

12 Motivation: why regression? Which distribution are we interested in? 1. P(Y = y X = x) 2. P(Y = y, X = x) Since we believe that a house s size is clearly related to its price, we think that P(Y = y X = x) P(Y = y). KEY POINT: Regression provides a simple way to model the conditional distribution of Y given X = x. This allows us to incorporate our information on house sizes. It may help us improve our prediction of house prices. 12

13 Simple Linear Regression (NOTE: The term simple linear regression means we are looking at a relationship between two variables. In Lecture # 10, we will do multiple linear regression (one y, lots of x s). ) 13

14 Simple Linear Regression When doing regression, we care about the conditional distribution P(Y = y X = x) = p(y x). We use the following terminology: Y is the dependent variable. X is the independent variable, the explanatory variable, or sometimes just the regressor. 14

15 Simple Linear Regression Consider modeling the house prices y i as an approximate linear function of their size x i. y i = b + m x i + error We need the errors because this linear relationship is not exact. y depends on other things besides x that we don t observe in our sample. 15

16 Simple Linear Regression Why are we approaching the problem in this way? Here are three reasons. 1. Sometimes you know x and just need to predict y as in the housing price problem from homework # The conditional distribution is an excellent way to think about the relationship between two variables. 3. Linear relationships are easy to work with and are a good approximation in lots of real world problems. 16

17 Simple Linear Regression The simple linear regression model is Y i = α + βx i + ε i. ε i N(0, σ 2 ) i.i.d. ε i is independent of X i. The intercept is α. The slope is β. We use the normal distribution to describe the errors. 17

18 Simple Linear Regression: Remarks The parameters of our model are α, β, and σ. The slope β measures the change in y when x increases by 1 unit. The intercept α is the value y takes when x = 0. The linear relationship holds for each pair (X i, Y i ). Consequently, it is common to drop the subscripts and write Y = α + βx + ε instead of Y i = α + βx i + ε i. The assumption that X is independent of ε is important. It implies that they are uncorrelated. 18

19 Simple Linear Regression: Remarks The parameters of our model are α, β, and σ. This is just like p from the i.i.d. Bernoulli(p) model or µ from the i.i.d. Normal(µ, σ 2 ) model. Just like p and µ, the true parameters are unknown, when using real data. Remember how the parameter p had a natural interpretation in the voting example, i.e. it was the fraction of the population that voted Democrat. Similarly, we interpret α and β as true. How realistic this is may depend on the setting. 19

20 Interpretation of the regression parameters α, β, and σ 20

21 IMPORTANT Given a specific value X = x, how do we interpret α, β, and σ? β tells us: if the value we saw for X was one unit bigger, how much would our prediction for Y change? α tells us: what would we predict for Y if x = 0? σ tells us: if α + βx is our prediction for Y given x, how big is the error associated with this prediction? 21

22 Simple Linear Regression Here is a picture of our model. We are simply drawing a line through the data. α is the intercept. Y The error ε i for this observation This is the true relationship between Y and X without the errors Y = α+ β X The intercept α X 22

23 Simple Linear Regression β measures the slope of the line. Y The green bar is β which is what happens to Y for a 1 unit change in X. The blue bar is a 1 unit change in X. X 23

24 Simple Linear Regression How do we get y 1 given a specific value for X 1 = x 1? Y y 1 = α+ β x 1 + ε 1 y 1 α+ β x 1 The green dashed line is the realized value of ε 1 ε 1 was a draw from a normal distribution N(0,σ 2 ) x 1 X 24

25 Simple Linear Regression Each ε i is i.i.d. N(0, σ 2 ). The variance σ 2 measures the spread of the normal distribution, i.e. the size of our errors. Y Each ε i is an independent draw from a normal distribution N(0,σ 2 ) X 25

26 Simple Linear Regression In practice, we only observe the data! Y We have to estimate the unknown true values α and β. X 26

27 Simple Linear Regression What role does the variance σ 2 play? The variance of the error term σ 2 describes how big the errors are on average. When σ 2 is smaller (right) the data are closer to the true regression line. 27

28 Simple Linear Regression What role does the variance σ 2 play? The variance will determine how wide (or narrow) our predictive intervals are. 28

29 Regression as a model of P(Y = y X = x) 29

30 Simple Linear Regression Regression looks at the conditional distribution of Y given X. Instead of coming up with a story for the joint distribution p(x, y): What do I think the next (x, y) pair will be? Regression just talks about the conditional distribution p(y x): Given a value for x, what will the next y be? 30

31 Regression as a model of P(Y = y X = x) Our model is: Y = α + βx + ε ε N(0, σ 2 ) where ε is independent of X. Regression is a model for the conditional distribution P(Y = y X = x). What are the mean and variance of the conditional distribution? E[Y X = x] V[Y X = x] 31

32 Regression as a model of P(Y = y X = x) Since our model is linear Y = α + βx + ε ε N(0, σ 2 ) we can use our formulas for linear functions! First, we can compute the conditional mean E[Y X = x] = E[α + βx + ε X = x] = α + βx + E[ε X = x] = α + βx 32

33 Regression as a model of P(Y = y X = x) Since our model is linear Y = α + βx + ε ε N(0, σ 2 ) we can use our formulas for linear functions! And, we can compute the conditional variance V[Y X = x] = V[α + βx + ε X = x] = V[ε X = x] = σ 2 33

34 Regression as a model of P(Y = y X = x) Another way of thinking about our model is: P(Y X = x) = N(α + βx, σ 2 ) In other words, Y X = x N(α + βx, σ 2 ) The conditional distribution of Y is normal with mean: E[Y X = x] = α + βx variance: V[Y X = x] = σ 2 34

35 Prediction using P(Y = y X = x) Suppose for the moment, we know α, β, and σ. Given a specific value for X = x and our model, what is our prediction of Y? Y X = x N(α + βx, σ 2 ) Our prediction is the mean: α + βx Since Y has a (conditional) normal distribution, we know that there is a 95% probability that the observed y will be within 2σ. 35

36 Prediction using P(Y = y X = x) Given a specific value for X = x, we can predict. Y α+ β x + 2σ y = α+ β x α+ β x α+ β x 2σ The red line is a 95% predictive interval, i.e. the empirical rule. x X 36

37 Prediction using P(Y = y X = x) Consider two different values x 1 and x 2. Note that since σ 2 is the same for both, the size of the intervals is the same. Y α+ β x + 2σ α+ β x 2σ α+ β x x 1 x 2 Two 95% predictive intervals X 37

38 Prediction using P(Y = y X = x) Important. 1. The width of the prediction interval produced from P(Y = y X = x) will (typically) be smaller than P(Y = y). 2. The variance of the conditional distribution P(Y = y X = x) cannot be larger than the variance of P(Y = y). 3. Using information on X will help us predict Y. 4. We can see this visually on the house price data (next slide). 38

39 Prediction using P(Y = y X = x) Price Size: square feet What would a 95% prediction interval look like using only housing prices? What does a 95% prediction interval look like if x i = 2200? 39

40 Prediction using P(Y = y X = x) Given a specific value for x, our prediction is the conditional mean α + βx and with 95% probability the observed value y will lie in the interval (α + βx 2σ, α + βx + 2σ). In practice, we do not know the true parameters α, β, and σ. We have to estimate them from the observed data! 40

41 Estimation of the regression parameters α, β, and σ 41

42 Estimates In Lectures #7 and #8, we investigated two models: i.i.d. Bernoulli(p) model with unknown parameter p. i.i.d. Normal(µ, σ 2 ) model with unknown parameter µ. We chose estimators and considered the sampling distributions of the estimators. For the i.i.d. Bernoulli(p) model, we used ˆp as an estimator of the unknown parameter p. For the i.i.d. Normal(µ, σ 2 ) model, we used x as an estimator of the unknown parameter µ. The goal in this section is to find estimators for α, β, and σ. 42

43 Estimates Simple linear regression assumes a linear relationship between Y and X : Y = α + βx + ε ε N(0, σ 2 ) where ε is independent of X. In practice, we don t know α, β and σ. They are unknown parameters in our model. We have to estimate them using the data we see! We have already seen the estimators in Lecture #2! 43

44 Linear regression formulas We saw in Lecture #2 that the estimators for α and β are slope: ˆβ = sxy s 2 x = r xy s y s x intercept: ˆα = y ˆβ x The formulas for the slope and intercept just use the sample mean, sample covariance, and sample variance. In a moment, I will show you how we got these formulas. What are the units of ˆα and ˆβ? 44

45 Estimates The results for a regression of house price on house size. We will discuss all the output throughout the lecture. 45

46 How do we interpret the estimates? You can (and should!) interpret ˆβ as saying a house that is 1000 square feet larger sells for about $70,000 more. You probably should not interpret ˆα as saying the price of a house of size zero is -$10,000. Are there any houses of size zero in our data? Would we want to use this data to predict the price of an 8,000 square foot mansion? 46

47 Estimates Given our estimates of ˆα and ˆβ, these determine a new regression line y = ˆα + ˆβx which is called the fitted regression line. Remember that due to sampling error, our estimate ˆα is not going to be exactly equal to α and our estimate ˆβ is not going to be exactly equal to β. Consequently, the fitted regression line is not going to be exactly equal to the true regression line: y = α + βx 47

48 The Fitted Regression Line What does the fitted regression line look like? On real data, we can t see the true line. 200 Price Fitted regression line y = x House size 48

49 The Fitted Regression Line On simulated data, we can see that the fitted regression line is not the same as the true line. Y Unobserved true line y = α+ β x Fitted line y = ^α + ^β x based on our estimates ^α and ^β X 49

50 How did we get the estimators ˆα and ˆβ? The fitted regression line y = ˆα + ˆβx is not going to be exactly equal to the true regression line y = α + βx However, we would like to choose ˆα and ˆβ to make them close! One way of doing this is called least squares. 50

51 Linear regression formulas Define the residual as e i = y i (ˆα + ˆβx i ). The residual is the distance between the observed y i and the corresponding point on our fitted line ˆα + ˆβx i. (NOTE: We will discuss these concepts in more detail further below.) ˆα and ˆβ are the least squares estimates of α and β. Using calculus we can show that the estimates ˆα and ˆβ minimize the function SSR = n (y i ˆα ˆβx ) 2 i i=1 where SSR stands for the sum of squared residuals. 51

52 Estimates What do the residuals look like? 200 Price Fitted regression line y = x Three different residuals e i = y i x i House size 52

53 Linear regression formulas Our estimate of σ is just the sample standard deviation of the residuals e i. s e = ni=1 e 2 i n 2 = n i=1(y i ˆα ˆβx i ) 2 n 2 Here we divide by n 2 instead of n 1 for the same technical reasons (to get an unbiased estimator). s e just asks, on average, how far are our observed values y i away from the line we fitted? 53

54 Estimate for σ Excel automatically prints out the estimate of σ. 54

55 Plug-in prediction 55

56 Prediction Earlier when we knew the true values α, β, and σ, we stated the conditional distribution of Y as Y X = x N(α + βx, σ 2 ) Using this, we formed a 95% prediction interval: α + βx ± 2σ. Given our least squares estimates ˆα, ˆβ, and s e, we can form a 95% prediction interval by plugging-in our estimates. (ˆα + ˆβx 2s e, ˆα + ˆβx + 2s e ). 56

57 Prediction Given ˆα, ˆβ, and s e, we can get 95% prediction intervals y = x y = ^α + ^β x + 2 s e Price y = ^α + ^β x 2s e = x House size 57

58 Prediction Suppose x = 2.2. Then, ˆα + ˆβx = and 2s e = y = x 150 Price For x = 2.2, the interval is (99.46, ) House size 58

59 Summary: estimators and prediction Unknown parameter α β σ estimator ˆα ˆβ s e Given a value for x, the 95% plug-in predictive interval is ˆα + ˆβx ± 2s e 59

60 Confidence Intervals and Hypothesis Tests for α, β, and σ 60

61 Sampling distributions for ˆα and ˆβ Thus far, we have assumed that there exists a true linear relationship Y = α + βx + ε ε N(0, σ 2 ) for unknown parameters α, β, and σ. I have shown you the formulas for our estimators ˆα, ˆβ, and s e. Remember that our estimators are random variables. Why?? 61

62 Sampling distributions for ˆα and ˆβ Recall that we view our estimators ˆα, ˆβ and s e as random variables. For each possible sample of data that you might observe, you will likely have different values for ˆα, ˆβ, and s e. Sampling error! For example, there may be many possible samples on house prices and sizes that you could take resulting in different values for ˆα, ˆβ, and s e. 62

63 The Sampling Distribution of an Estimator The sampling distribution of an estimator is a probability distribution that describes all the possible values we might see if we could repeat our sample over and over again; i.e., if we could see other potential samples from the population we are studying. 63

64 Sampling distributions for ˆα and ˆβ When we view ˆα, ˆβ, and s e as estimators, they are random variables and each will have their own sampling distribution. It can be shown that (when n is large) the sampling distributions for ˆα and ˆβ are both normal distributions (due to the CLT). I won t derive the mathematical details of the sampling distributions here like I did for ˆp and x in Lecture #7. Nevertheless, we can construct standard errors and build confidence intervals for the true unknown parameters α, β, and σ just like we did for p and µ in Lectures #7 and #8. 64

65 Standard Errors for ˆα and ˆβ Let sˆα denote the standard error associated with the estimate ˆα. Let s ˆβ denote the standard error associated with the estimate ˆβ. sˆα = and s ˆβ =

66 ASIDE: Unbiasedness As a side note, it can also be shown that ˆα and ˆβ are unbiased: E[ˆα X ] = α E[ ˆβ X ] = β Intuitively, our estimate can turn out to be too big or too small, but it is not systematically too high or too low. We will recover the true value on average. (NOTE: The expectation (or average) is being taken over hypothetical random samples we might observe from the model.) 66

67 Confidence Intervals for α and β We can also build confidence intervals for α and β. In practice, you will often see confidence intervals for α and β constructed using the Student s t distribution instead of the standard normal. The reasoning behind this is the same as when we standardized the estimator x in Lecture #8. Again, we are standardizing the estimators ˆα and ˆβ to compute the test statistic. This means we are dividing them by the standard errors sˆα and s ˆβ, which need to be estimated from the data. 67

68 Confidence Intervals The 95% confidence interval for α is ˆα ± tval sˆα where tval = T.INV(0.05, n 2) (NOTE: in Excel) The 95% confidence interval for β is ˆβ ± tval s ˆβ where tval = T.INV(0.05, n 2) (NOTE: in Excel) Remember that if n > 30, the tval is roughly 2. 68

69 Confidence Intervals In the housing data (MidCity.xls), we have n = 128 observations. A 95% confidence interval for the slope β is: ˆβ ± 2 s ˆβ = ± 2(9.43) = ± = (51.37, 89.09) This is pretty big. We aren t very certain of the true slope β. 69

70 Confidence Intervals Excel automatically prints out the 95% confidence intervals for α and β. 70

71 ASIDE: Normality Assumption of ε Normality of the errors ε in the linear equation Y = α + βx + ε is not a crucial assumption. When the sample size n is large, the sampling distributions of the estimators ˆα and ˆβ will still be (approximately) normal distributions. This is because ˆα and ˆβ are just averages of y and x and we can apply the Central Limit Theorem. Even if ε is not normal, the confidence intervals will be 95% C.I. for α : ˆα ± 2 sˆα 95% C.I. for β : ˆβ ± 2 s ˆβ 71

72 Hypothesis Tests for α and β Using the sampling distributions of the estimators ˆα and ˆβ, we can also perform hypothesis tests. Let H 0 : α = α 0 or H 0 : β = β 0 be a null hypothesis in which you are interested (α 0 and β 0 are just numbers). In practice, we construct the test statistics using the standardized values. Consequently, we use the Student s t distribution as the sampling distribution of our test statistics. 72

73 Hypothesis Tests for α To test the null hypothesis H 0 : α = α 0 vs. H a : α α 0 We reject at the 5% level if: t > tval where we define t = ˆα α0 s ˆα tval = T.INV(0.05, n 2) (NOTE: in Excel) otherwise we fail to reject. Remember: if n > 30, the tval is roughly 2 so we reject if t > 2. 73

74 Hypothesis Tests for β To test the null hypothesis H 0 : β = β 0 vs. H a : β β 0 We reject at the 5% level if: t > tval where we define t = ˆβ β 0 s ˆβ tval = T.INV(0.05, n 2) (NOTE: in Excel) otherwise we fail to reject. Remember: if n > 30, the tval is roughly 2 so we reject if t > 2. 74

75 Hypothesis Tests for β IMPORTANT: The null hypothesis that: H 0 : β = 0 plays a very important role in regression analysis. Why? Remember, the conditional distribution of Y is Y X = x N(α + βx, σ 2 ) Consequently, if β = 0 then the conditional distribution of Y does not depend on X. This means that the random variables Y and X are independent (at least according to our model)!! 75

76 Hypothesis Tests Excel automatically prints out the t-tests for the null hypotheses that H 0 : α = 0 and H 0 : β = 0 versus the alternatives that they are not zero. 76

77 p-values Most regression packages automatically print out the p-values for the hypotheses that the intercept is 0 and that the slope is 0. That s the p-value column in the StatPro output. Is the intercept 0? p-value =.59 fail to reject Is the slope 0? p-value =.0000 reject From a practical standpoint, what does this mean? Rejecting H 0 : β = 0 means that we find evidence that square footage does significantly impact the housing price! 77

78 p-values How is Excel getting this p-value? For n greater than about 30, the t-stat can be interpreted as a z-value. Thus we can compute the p-value using the normal distribution. For example, we can compute the p-value for the intercept ˆα α0 t = sˆα = = 0.53 If we take this as our z-value, we get a p-value of 2*(1 - NORM.DIST(ABS(-0.53),0,1,1)) =

79 p-values Excel automatically prints out the p-values. 79

80 Fits, residuals, and R-squared 80

81 Fitted values and residuals Our model is Y = α + βx + ε ε N(0, σ 2 ) Conditional on a value x i, we think of each y i as a draw from Y i = α+ β x i + ε i the part of y that depends on x the part of y that has nothing to do with x 81

82 Fitted values and residuals We want to ask, How well does X explain Y? We could think about this by breaking up Y into two parts: α + βx i (part that s explained by x) ε i (part that s NOT explained by x) But remember, we don t know α or β!! However, we can use our estimates ˆα and ˆβ to create estimates of these two parts for each observation in our sample. 82

83 Fitted values and residuals So let s suppose we have some data (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) and we ve run a regression. That is, we ve computed the estimates: ˆα, ˆβ, and s e. For each (x i, y i ) in the data, we know the following α + βx i ˆα + ˆβx i ε i = y i (α + βx i ) y i (ˆα + ˆβx i ) = e i 83

84 Fitted values and residuals Define two new variables ŷ i and e i as follows ŷ i = ˆα + ˆβx i e i = y i ŷ i Notice that we have broken up each observation into two parts: y i = ŷ i + e i ŷ i is called the fitted value for the i-th observation. It is the part of y i that is explained by x i. e i is called the residual for the i-th observation. It is the part of y i that is left unexplained. 84

85 Fitted values and residuals What do e i and ŷ i look like? Y The residuals e i are the purple lines y = ^α + ^β x The fitted values ^y i are the dashed green lines. X 85

86 Fitted values and residuals Remember the residuals and fitted line for the housing data. 200 Price Fitted regression line y = x Three different residuals e i = y i x i House size 86

87 Least squares interpretation We stated earlier that ˆα and ˆβ are often called the least squares estimates of α and β. The line we are fitting through the data is the best fitting line because ˆα and ˆβ are chosen to minimize the function SSR = n (y i ˆα ˆβx ) 2 i i=1 where SSR stands for the sum of squared residuals. A by-product of this is that by construction our residuals e i will have nice properties. 87

88 Properties of the residual Two important properties of the residuals e i are: The sample mean of the residuals equals zero: ē = 1 n n i=1 e i = 0. The sample correlation between the residuals e and the explanatory variable x is zero: cor(e, x) = 0. Let s see what this looks like graphically on the housing data. 88

89 Properties of the residual This is the fitted regression line (left) and the residuals (right) Price 150 Residuals House size House size Notice how the residuals demonstrate no obvious pattern and visually look like they have mean zero. 89

90 Properties of the residual Consider another line that is NOT the least squares line Least squares fitted line y = ^α + ^β x An alternative line Notice how the residuals computed from this alternative line leave a downward right pattern. 90

91 Properties of the residuals We know that cor(e, x) = 0 which means that: cor(e, x) = 0 cor(e, ˆα + ˆβx) = 0 cor(e, ŷ) = 0 In other words, the sample correlation between residuals and fitted values is zero. Therefore, we now have the three properties: y i = ŷ i + e i ē = 1 n n i=1 e i = 0. cor(e, ŷ) = 0. 91

92 Properties of the residuals Given y i = ŷ i + e i, we can show two more important properties. Notice that y i is a linear function of ŷ i and e i. Using the formulas for the sample mean and variance from Lecture # 2, we have: and y i = ŷ i + e i y = ŷ + ê y = ŷ y i = ŷ i + e i s 2 y = s 2 ŷ + s 2 e 92

93 Properties of the residuals What does the second property s 2 y = s 2 ŷ + s2 e mean? 1 n 1 n (y i y) 2 = i=1 n (y i y) 2 = i=1 1 n (ŷ i y) n 1 n 1 i=1 n n (ŷ i y) 2 + i=1 i=1 e 2 i n i=1 e 2 i Intuitively, it says that the variance of our dependent variable y can be broken apart into two pieces n i=1 (y i y) 2. This is the total variation in y. n i=1 (ŷ i y) 2. This is the variation in y explained by x. n i=1 e2 i. This is the unexplained variation in y. 93

94 R-squared R 2 = = explained variation total variation n i=1 (ŷ i y) 2 n i=1 (y i y) 2 Intuitively, R 2 measures the amount of variation in y we can explain with x. It is always the case that 0 R 2 1. The closer R-squared is to 1, the better the (in-sample) fit. 94

95 R-squared Excel automatically prints out these results. We have n i=1 (ŷ i y) 2 = and n i=1 e2 i =

96 R-squared For simple linear regression (only one x), R-squared is the correlation between y and x squared! You can easily test this by going in to Excel and computing the correlation between y and x. For example, this is the result for our housing data. Table of correlations price size price size R 2 = =

97 R-squared Excel automatically prints out the R 2. 97

98 Application: The Market Model 98

99 The Market Model In finance, a popular model is to regress stock returns against returns on some market index, such as the S&P 500. The slope of the regression line, referred to as beta, is a measure of how sensitive a stock is to movements in the market. Usually, a beta less than 1 means the stock is less risky than the market, equal to 1 same risk as the market and greater than 1, riskier than the market. 99

100 The Market Model I have collected monthly data on General Electric (GE) and the S&P 500 from January 1989 to March 2010 for a sample of n = GE S&P

101 The Market Model The regression we are running is GE i = α + βs&p500 i + ε i ε i N(0, σ 2 ) Before we see the results, what do you think our estimate ˆβ will be? Do you think we will reject the hypothesis H 0 : β = 0 at the 5% level? How can we test the hypothesis H 0 : β = 1 at the 5% level? 101

102 The Market Model Here are the results of the regression from Excel. Results of simple regression for GE Summary measures Multiple R R-Square StErr of Est ANOVA table Source df SS MS F p-value Explained Unexplained Regression coefficients Coefficient Std Err t-value p-value Lower limit Upper limit Constant SP Our estimates are ˆα = and ˆβ = ! 102

103 The Market Model Suppose we want to test the hypothesis H 0 : β = 0 at the 5% level. First, how do we interpret this test? t = ˆβ 0 s ˆβ = = The critical value is: tval = T.INV(0.05,252) = Do we reject? What is the p-value? 103

104 The Market Model Excel reports the same value of the test statistic for this hypothesis. Results of simple regression for GE Summary measures Multiple R R-Square StErr of Est ANOVA table Source df SS MS F p-value Explained Unexplained Regression coefficients Coefficient Std Err t-value p-value Lower limit Upper limit Constant SP

105 The Market Model Suppose we want to test the hypothesis H 0 : β = 1 at the 5% level. First, how do we interpret this test? t = ˆβ 1 s ˆβ = = 3.53 The critical value is: tval = T.INV(0.05,252) = Do we reject the hypothesis? 105

106 The Market Model How do we construct 95% confidence intervals for α and β? Find the 95% critical value: tval = T.INV(0.05,252) = Our 95% confidence interval for α is then: ˆα ± tval sˆα = ± = ( , ) Our 95% confidence interval for β is then: ˆβ ± tval s ˆβ = ± = (1.1146, ) 106

107 The Market Model Excel reports the same values for the 95% confidence intervals. Results of simple regression for GE Summary measures Multiple R R-Square StErr of Est ANOVA table Source df SS MS F p-value Explained Unexplained Regression coefficients Coefficient Std Err t-value p-value Lower limit Upper limit Constant SP

108 The Market Model Here is a picture of the fitted regression line GE GE = S&P S&P

109 The Market Model Here is a picture of the residuals e i S&P