Business Statistics 41000: Simple Linear Regression

Size: px
Start display at page:

Download "Business Statistics 41000: Simple Linear Regression"

Transcription

1 Business Statistics 41000: Simple Linear Regression Drew D. Creal University of Chicago, Booth School of Business March 7 and 8,

2 Class information Drew D. Creal Office: 404 Harper Center Office hours: me for an appointment Office phone:

3 Course schedule Week # 1: Plotting and summarizing univariate data Week # 2: Plotting and summarizing bivariate data Week # 3: Probability 1 Week # 4: Probability 2 Week # 5: Probability 3 Week # 6: In-class exam and Probability 4 Week # 7: Statistical inference 1 Week # 8: Statistical inference 2 Week # 9: Simple linear regression Week # 10: Multiple linear regression 3

4 Outline of today s topics I. Motivation: why regression? II. The simple linear regression model III. Interpretation of the regression parameters IV. Regression as a model of P(Y = y X = x) V. Estimation of the regression parameters VI. Plug-in prediction V. Confidence Intervals and Hypothesis Tests for the Regression Parameters VIII. Fits, residuals, and R-squared IX. Application: The Market Model 4

5 Motivation: why regression? 5

6 Motivation: why regression? Regression is a useful tool for many reasons. The most important are: Prediction/forecasting Measuring dependence (e.g. correlation) between variables. 6

7 Motivation: why regression? Consider the housing data (MidCity.xls). There are two numeric variables price and size of the homes Price Size: square feet 7

8 Motivation: why regression? Consider only the data on housing prices. For the moment, assume we do not observe house sizes. After Lectures # 3-7, we recognize that our sample of n = 128 homes is exactly that...a sample. What if we are interested in learning about the population and want to account for sampling uncertainty? After all, we could have observed a different sample of houses. How could we model the price of a house using probability? 8

9 Motivation: why regression? We can define a r.v. Y i to be the price of the i-th house! It is a r.v. because before the house is sold we do not know how much it will sell for. We treat our data (y 1, y 2,..., y n ) as the outcome of a sequence of r.v. s (Y 1, Y 2,..., Y n ). We can model this data with an i.i.d. Normal model. p(y 1, y 2,..., y n ) = p(y 1 ) p(y 2 ) p(y n ) In other words, each observation is a normal r.v. p(y i ) = N(µ, σ 2 ) 9

10 Motivation: why regression? Suppose you are going to sell your house in this city. You are interested in the average price µ of a house. Following Lectures #7-8, we can perform statistical inference for the parameters of interest, e.g. E[Y ] = µ. For example, we could use x as an estimator of µ. We can use x to predict what the house Y i will sell for. But, we are ignoring information about house sizes!! We are currently only looking at the marginal distribution P(Y = y). 10

11 Motivation: why regression? To incorporate information on house sizes, we need to define a second r.v. X i, which is the size of the i-th house. In reality, we not only observe outcomes of r.v. s (Y 1, Y 2,..., Y n ) but we observe outcomes on the pair (X i, Y i ). We believe that a house s size is clearly related to its price. Which distribution are we interested in? 1. P(Y = y X = x) 2. P(Y = y, X = x) 11

12 Motivation: why regression? Which distribution are we interested in? 1. P(Y = y X = x) 2. P(Y = y, X = x) Since we believe that a house s size is clearly related to its price, we think that P(Y = y X = x) P(Y = y). KEY POINT: Regression provides a simple way to model the conditional distribution of Y given X = x. This allows us to incorporate our information on house sizes. It may help us improve our prediction of house prices. 12

13 Simple Linear Regression (NOTE: The term simple linear regression means we are looking at a relationship between two variables. In Lecture # 10, we will do multiple linear regression (one y, lots of x s). ) 13

14 Simple Linear Regression When doing regression, we care about the conditional distribution P(Y = y X = x) = p(y x). We use the following terminology: Y is the dependent variable. X is the independent variable, the explanatory variable, or sometimes just the regressor. 14

15 Simple Linear Regression Consider modeling the house prices y i as an approximate linear function of their size x i. y i = b + m x i + error We need the errors because this linear relationship is not exact. y depends on other things besides x that we don t observe in our sample. 15

16 Simple Linear Regression Why are we approaching the problem in this way? Here are three reasons. 1. Sometimes you know x and just need to predict y as in the housing price problem from homework # The conditional distribution is an excellent way to think about the relationship between two variables. 3. Linear relationships are easy to work with and are a good approximation in lots of real world problems. 16

17 Simple Linear Regression The simple linear regression model is Y i = α + βx i + ε i. ε i N(0, σ 2 ) i.i.d. ε i is independent of X i. The intercept is α. The slope is β. We use the normal distribution to describe the errors. 17

18 Simple Linear Regression: Remarks The parameters of our model are α, β, and σ. The slope β measures the change in y when x increases by 1 unit. The intercept α is the value y takes when x = 0. The linear relationship holds for each pair (X i, Y i ). Consequently, it is common to drop the subscripts and write Y = α + βx + ε instead of Y i = α + βx i + ε i. The assumption that X is independent of ε is important. It implies that they are uncorrelated. 18

19 Simple Linear Regression: Remarks The parameters of our model are α, β, and σ. This is just like p from the i.i.d. Bernoulli(p) model or µ from the i.i.d. Normal(µ, σ 2 ) model. Just like p and µ, the true parameters are unknown, when using real data. Remember how the parameter p had a natural interpretation in the voting example, i.e. it was the fraction of the population that voted Democrat. Similarly, we interpret α and β as true. How realistic this is may depend on the setting. 19

20 Interpretation of the regression parameters α, β, and σ 20

21 IMPORTANT Given a specific value X = x, how do we interpret α, β, and σ? β tells us: if the value we saw for X was one unit bigger, how much would our prediction for Y change? α tells us: what would we predict for Y if x = 0? σ tells us: if α + βx is our prediction for Y given x, how big is the error associated with this prediction? 21

22 Simple Linear Regression Here is a picture of our model. We are simply drawing a line through the data. α is the intercept. Y The error ε i for this observation This is the true relationship between Y and X without the errors Y = α+ β X The intercept α X 22

23 Simple Linear Regression β measures the slope of the line. Y The green bar is β which is what happens to Y for a 1 unit change in X. The blue bar is a 1 unit change in X. X 23

24 Simple Linear Regression How do we get y 1 given a specific value for X 1 = x 1? Y y 1 = α+ β x 1 + ε 1 y 1 α+ β x 1 The green dashed line is the realized value of ε 1 ε 1 was a draw from a normal distribution N(0,σ 2 ) x 1 X 24

25 Simple Linear Regression Each ε i is i.i.d. N(0, σ 2 ). The variance σ 2 measures the spread of the normal distribution, i.e. the size of our errors. Y Each ε i is an independent draw from a normal distribution N(0,σ 2 ) X 25

26 Simple Linear Regression In practice, we only observe the data! Y We have to estimate the unknown true values α and β. X 26

27 Simple Linear Regression What role does the variance σ 2 play? The variance of the error term σ 2 describes how big the errors are on average. When σ 2 is smaller (right) the data are closer to the true regression line. 27

28 Simple Linear Regression What role does the variance σ 2 play? The variance will determine how wide (or narrow) our predictive intervals are. 28

29 Regression as a model of P(Y = y X = x) 29

30 Simple Linear Regression Regression looks at the conditional distribution of Y given X. Instead of coming up with a story for the joint distribution p(x, y): What do I think the next (x, y) pair will be? Regression just talks about the conditional distribution p(y x): Given a value for x, what will the next y be? 30

31 Regression as a model of P(Y = y X = x) Our model is: Y = α + βx + ε ε N(0, σ 2 ) where ε is independent of X. Regression is a model for the conditional distribution P(Y = y X = x). What are the mean and variance of the conditional distribution? E[Y X = x] V[Y X = x] 31

32 Regression as a model of P(Y = y X = x) Since our model is linear Y = α + βx + ε ε N(0, σ 2 ) we can use our formulas for linear functions! First, we can compute the conditional mean E[Y X = x] = E[α + βx + ε X = x] = α + βx + E[ε X = x] = α + βx 32

33 Regression as a model of P(Y = y X = x) Since our model is linear Y = α + βx + ε ε N(0, σ 2 ) we can use our formulas for linear functions! And, we can compute the conditional variance V[Y X = x] = V[α + βx + ε X = x] = V[ε X = x] = σ 2 33

34 Regression as a model of P(Y = y X = x) Another way of thinking about our model is: P(Y X = x) = N(α + βx, σ 2 ) In other words, Y X = x N(α + βx, σ 2 ) The conditional distribution of Y is normal with mean: E[Y X = x] = α + βx variance: V[Y X = x] = σ 2 34

35 Prediction using P(Y = y X = x) Suppose for the moment, we know α, β, and σ. Given a specific value for X = x and our model, what is our prediction of Y? Y X = x N(α + βx, σ 2 ) Our prediction is the mean: α + βx Since Y has a (conditional) normal distribution, we know that there is a 95% probability that the observed y will be within 2σ. 35

36 Prediction using P(Y = y X = x) Given a specific value for X = x, we can predict. Y α+ β x + 2σ y = α+ β x α+ β x α+ β x 2σ The red line is a 95% predictive interval, i.e. the empirical rule. x X 36

37 Prediction using P(Y = y X = x) Consider two different values x 1 and x 2. Note that since σ 2 is the same for both, the size of the intervals is the same. Y α+ β x + 2σ α+ β x 2σ α+ β x x 1 x 2 Two 95% predictive intervals X 37

38 Prediction using P(Y = y X = x) Important. 1. The width of the prediction interval produced from P(Y = y X = x) will (typically) be smaller than P(Y = y). 2. The variance of the conditional distribution P(Y = y X = x) cannot be larger than the variance of P(Y = y). 3. Using information on X will help us predict Y. 4. We can see this visually on the house price data (next slide). 38

39 Prediction using P(Y = y X = x) Price Size: square feet What would a 95% prediction interval look like using only housing prices? What does a 95% prediction interval look like if x i = 2200? 39

40 Prediction using P(Y = y X = x) Given a specific value for x, our prediction is the conditional mean α + βx and with 95% probability the observed value y will lie in the interval (α + βx 2σ, α + βx + 2σ). In practice, we do not know the true parameters α, β, and σ. We have to estimate them from the observed data! 40

41 Estimation of the regression parameters α, β, and σ 41

42 Estimates In Lectures #7 and #8, we investigated two models: i.i.d. Bernoulli(p) model with unknown parameter p. i.i.d. Normal(µ, σ 2 ) model with unknown parameter µ. We chose estimators and considered the sampling distributions of the estimators. For the i.i.d. Bernoulli(p) model, we used ˆp as an estimator of the unknown parameter p. For the i.i.d. Normal(µ, σ 2 ) model, we used x as an estimator of the unknown parameter µ. The goal in this section is to find estimators for α, β, and σ. 42

43 Estimates Simple linear regression assumes a linear relationship between Y and X : Y = α + βx + ε ε N(0, σ 2 ) where ε is independent of X. In practice, we don t know α, β and σ. They are unknown parameters in our model. We have to estimate them using the data we see! We have already seen the estimators in Lecture #2! 43

44 Linear regression formulas We saw in Lecture #2 that the estimators for α and β are slope: ˆβ = sxy s 2 x = r xy s y s x intercept: ˆα = y ˆβ x The formulas for the slope and intercept just use the sample mean, sample covariance, and sample variance. In a moment, I will show you how we got these formulas. What are the units of ˆα and ˆβ? 44

45 Estimates The results for a regression of house price on house size. We will discuss all the output throughout the lecture. 45

46 How do we interpret the estimates? You can (and should!) interpret ˆβ as saying a house that is 1000 square feet larger sells for about $70,000 more. You probably should not interpret ˆα as saying the price of a house of size zero is -$10,000. Are there any houses of size zero in our data? Would we want to use this data to predict the price of an 8,000 square foot mansion? 46

47 Estimates Given our estimates of ˆα and ˆβ, these determine a new regression line y = ˆα + ˆβx which is called the fitted regression line. Remember that due to sampling error, our estimate ˆα is not going to be exactly equal to α and our estimate ˆβ is not going to be exactly equal to β. Consequently, the fitted regression line is not going to be exactly equal to the true regression line: y = α + βx 47

48 The Fitted Regression Line What does the fitted regression line look like? On real data, we can t see the true line. 200 Price Fitted regression line y = x House size 48

49 The Fitted Regression Line On simulated data, we can see that the fitted regression line is not the same as the true line. Y Unobserved true line y = α+ β x Fitted line y = ^α + ^β x based on our estimates ^α and ^β X 49

50 How did we get the estimators ˆα and ˆβ? The fitted regression line y = ˆα + ˆβx is not going to be exactly equal to the true regression line y = α + βx However, we would like to choose ˆα and ˆβ to make them close! One way of doing this is called least squares. 50

51 Linear regression formulas Define the residual as e i = y i (ˆα + ˆβx i ). The residual is the distance between the observed y i and the corresponding point on our fitted line ˆα + ˆβx i. (NOTE: We will discuss these concepts in more detail further below.) ˆα and ˆβ are the least squares estimates of α and β. Using calculus we can show that the estimates ˆα and ˆβ minimize the function SSR = n (y i ˆα ˆβx ) 2 i i=1 where SSR stands for the sum of squared residuals. 51

52 Estimates What do the residuals look like? 200 Price Fitted regression line y = x Three different residuals e i = y i x i House size 52

53 Linear regression formulas Our estimate of σ is just the sample standard deviation of the residuals e i. s e = ni=1 e 2 i n 2 = n i=1(y i ˆα ˆβx i ) 2 n 2 Here we divide by n 2 instead of n 1 for the same technical reasons (to get an unbiased estimator). s e just asks, on average, how far are our observed values y i away from the line we fitted? 53

54 Estimate for σ Excel automatically prints out the estimate of σ. 54

55 Plug-in prediction 55

56 Prediction Earlier when we knew the true values α, β, and σ, we stated the conditional distribution of Y as Y X = x N(α + βx, σ 2 ) Using this, we formed a 95% prediction interval: α + βx ± 2σ. Given our least squares estimates ˆα, ˆβ, and s e, we can form a 95% prediction interval by plugging-in our estimates. (ˆα + ˆβx 2s e, ˆα + ˆβx + 2s e ). 56

57 Prediction Given ˆα, ˆβ, and s e, we can get 95% prediction intervals y = x y = ^α + ^β x + 2 s e Price y = ^α + ^β x 2s e = x House size 57

58 Prediction Suppose x = 2.2. Then, ˆα + ˆβx = and 2s e = y = x 150 Price For x = 2.2, the interval is (99.46, ) House size 58

59 Summary: estimators and prediction Unknown parameter α β σ estimator ˆα ˆβ s e Given a value for x, the 95% plug-in predictive interval is ˆα + ˆβx ± 2s e 59

60 Confidence Intervals and Hypothesis Tests for α, β, and σ 60

61 Sampling distributions for ˆα and ˆβ Thus far, we have assumed that there exists a true linear relationship Y = α + βx + ε ε N(0, σ 2 ) for unknown parameters α, β, and σ. I have shown you the formulas for our estimators ˆα, ˆβ, and s e. Remember that our estimators are random variables. Why?? 61

62 Sampling distributions for ˆα and ˆβ Recall that we view our estimators ˆα, ˆβ and s e as random variables. For each possible sample of data that you might observe, you will likely have different values for ˆα, ˆβ, and s e. Sampling error! For example, there may be many possible samples on house prices and sizes that you could take resulting in different values for ˆα, ˆβ, and s e. 62

63 The Sampling Distribution of an Estimator The sampling distribution of an estimator is a probability distribution that describes all the possible values we might see if we could repeat our sample over and over again; i.e., if we could see other potential samples from the population we are studying. 63

64 Sampling distributions for ˆα and ˆβ When we view ˆα, ˆβ, and s e as estimators, they are random variables and each will have their own sampling distribution. It can be shown that (when n is large) the sampling distributions for ˆα and ˆβ are both normal distributions (due to the CLT). I won t derive the mathematical details of the sampling distributions here like I did for ˆp and x in Lecture #7. Nevertheless, we can construct standard errors and build confidence intervals for the true unknown parameters α, β, and σ just like we did for p and µ in Lectures #7 and #8. 64

65 Standard Errors for ˆα and ˆβ Let sˆα denote the standard error associated with the estimate ˆα. Let s ˆβ denote the standard error associated with the estimate ˆβ. sˆα = and s ˆβ =

66 ASIDE: Unbiasedness As a side note, it can also be shown that ˆα and ˆβ are unbiased: E[ˆα X ] = α E[ ˆβ X ] = β Intuitively, our estimate can turn out to be too big or too small, but it is not systematically too high or too low. We will recover the true value on average. (NOTE: The expectation (or average) is being taken over hypothetical random samples we might observe from the model.) 66

67 Confidence Intervals for α and β We can also build confidence intervals for α and β. In practice, you will often see confidence intervals for α and β constructed using the Student s t distribution instead of the standard normal. The reasoning behind this is the same as when we standardized the estimator x in Lecture #8. Again, we are standardizing the estimators ˆα and ˆβ to compute the test statistic. This means we are dividing them by the standard errors sˆα and s ˆβ, which need to be estimated from the data. 67

68 Confidence Intervals The 95% confidence interval for α is ˆα ± tval sˆα where tval = T.INV(0.05, n 2) (NOTE: in Excel) The 95% confidence interval for β is ˆβ ± tval s ˆβ where tval = T.INV(0.05, n 2) (NOTE: in Excel) Remember that if n > 30, the tval is roughly 2. 68

69 Confidence Intervals In the housing data (MidCity.xls), we have n = 128 observations. A 95% confidence interval for the slope β is: ˆβ ± 2 s ˆβ = ± 2(9.43) = ± = (51.37, 89.09) This is pretty big. We aren t very certain of the true slope β. 69

70 Confidence Intervals Excel automatically prints out the 95% confidence intervals for α and β. 70

71 ASIDE: Normality Assumption of ε Normality of the errors ε in the linear equation Y = α + βx + ε is not a crucial assumption. When the sample size n is large, the sampling distributions of the estimators ˆα and ˆβ will still be (approximately) normal distributions. This is because ˆα and ˆβ are just averages of y and x and we can apply the Central Limit Theorem. Even if ε is not normal, the confidence intervals will be 95% C.I. for α : ˆα ± 2 sˆα 95% C.I. for β : ˆβ ± 2 s ˆβ 71

72 Hypothesis Tests for α and β Using the sampling distributions of the estimators ˆα and ˆβ, we can also perform hypothesis tests. Let H 0 : α = α 0 or H 0 : β = β 0 be a null hypothesis in which you are interested (α 0 and β 0 are just numbers). In practice, we construct the test statistics using the standardized values. Consequently, we use the Student s t distribution as the sampling distribution of our test statistics. 72

73 Hypothesis Tests for α To test the null hypothesis H 0 : α = α 0 vs. H a : α α 0 We reject at the 5% level if: t > tval where we define t = ˆα α0 s ˆα tval = T.INV(0.05, n 2) (NOTE: in Excel) otherwise we fail to reject. Remember: if n > 30, the tval is roughly 2 so we reject if t > 2. 73

74 Hypothesis Tests for β To test the null hypothesis H 0 : β = β 0 vs. H a : β β 0 We reject at the 5% level if: t > tval where we define t = ˆβ β 0 s ˆβ tval = T.INV(0.05, n 2) (NOTE: in Excel) otherwise we fail to reject. Remember: if n > 30, the tval is roughly 2 so we reject if t > 2. 74

75 Hypothesis Tests for β IMPORTANT: The null hypothesis that: H 0 : β = 0 plays a very important role in regression analysis. Why? Remember, the conditional distribution of Y is Y X = x N(α + βx, σ 2 ) Consequently, if β = 0 then the conditional distribution of Y does not depend on X. This means that the random variables Y and X are independent (at least according to our model)!! 75

76 Hypothesis Tests Excel automatically prints out the t-tests for the null hypotheses that H 0 : α = 0 and H 0 : β = 0 versus the alternatives that they are not zero. 76

77 p-values Most regression packages automatically print out the p-values for the hypotheses that the intercept is 0 and that the slope is 0. That s the p-value column in the StatPro output. Is the intercept 0? p-value =.59 fail to reject Is the slope 0? p-value =.0000 reject From a practical standpoint, what does this mean? Rejecting H 0 : β = 0 means that we find evidence that square footage does significantly impact the housing price! 77

78 p-values How is Excel getting this p-value? For n greater than about 30, the t-stat can be interpreted as a z-value. Thus we can compute the p-value using the normal distribution. For example, we can compute the p-value for the intercept ˆα α0 t = sˆα = = 0.53 If we take this as our z-value, we get a p-value of 2*(1 - NORM.DIST(ABS(-0.53),0,1,1)) =

79 p-values Excel automatically prints out the p-values. 79

80 Fits, residuals, and R-squared 80

81 Fitted values and residuals Our model is Y = α + βx + ε ε N(0, σ 2 ) Conditional on a value x i, we think of each y i as a draw from Y i = α+ β x i + ε i the part of y that depends on x the part of y that has nothing to do with x 81

82 Fitted values and residuals We want to ask, How well does X explain Y? We could think about this by breaking up Y into two parts: α + βx i (part that s explained by x) ε i (part that s NOT explained by x) But remember, we don t know α or β!! However, we can use our estimates ˆα and ˆβ to create estimates of these two parts for each observation in our sample. 82

83 Fitted values and residuals So let s suppose we have some data (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) and we ve run a regression. That is, we ve computed the estimates: ˆα, ˆβ, and s e. For each (x i, y i ) in the data, we know the following α + βx i ˆα + ˆβx i ε i = y i (α + βx i ) y i (ˆα + ˆβx i ) = e i 83

84 Fitted values and residuals Define two new variables ŷ i and e i as follows ŷ i = ˆα + ˆβx i e i = y i ŷ i Notice that we have broken up each observation into two parts: y i = ŷ i + e i ŷ i is called the fitted value for the i-th observation. It is the part of y i that is explained by x i. e i is called the residual for the i-th observation. It is the part of y i that is left unexplained. 84

85 Fitted values and residuals What do e i and ŷ i look like? Y The residuals e i are the purple lines y = ^α + ^β x The fitted values ^y i are the dashed green lines. X 85

86 Fitted values and residuals Remember the residuals and fitted line for the housing data. 200 Price Fitted regression line y = x Three different residuals e i = y i x i House size 86

87 Least squares interpretation We stated earlier that ˆα and ˆβ are often called the least squares estimates of α and β. The line we are fitting through the data is the best fitting line because ˆα and ˆβ are chosen to minimize the function SSR = n (y i ˆα ˆβx ) 2 i i=1 where SSR stands for the sum of squared residuals. A by-product of this is that by construction our residuals e i will have nice properties. 87

88 Properties of the residual Two important properties of the residuals e i are: The sample mean of the residuals equals zero: ē = 1 n n i=1 e i = 0. The sample correlation between the residuals e and the explanatory variable x is zero: cor(e, x) = 0. Let s see what this looks like graphically on the housing data. 88

89 Properties of the residual This is the fitted regression line (left) and the residuals (right) Price 150 Residuals House size House size Notice how the residuals demonstrate no obvious pattern and visually look like they have mean zero. 89

90 Properties of the residual Consider another line that is NOT the least squares line Least squares fitted line y = ^α + ^β x An alternative line Notice how the residuals computed from this alternative line leave a downward right pattern. 90

91 Properties of the residuals We know that cor(e, x) = 0 which means that: cor(e, x) = 0 cor(e, ˆα + ˆβx) = 0 cor(e, ŷ) = 0 In other words, the sample correlation between residuals and fitted values is zero. Therefore, we now have the three properties: y i = ŷ i + e i ē = 1 n n i=1 e i = 0. cor(e, ŷ) = 0. 91

92 Properties of the residuals Given y i = ŷ i + e i, we can show two more important properties. Notice that y i is a linear function of ŷ i and e i. Using the formulas for the sample mean and variance from Lecture # 2, we have: and y i = ŷ i + e i y = ŷ + ê y = ŷ y i = ŷ i + e i s 2 y = s 2 ŷ + s 2 e 92

93 Properties of the residuals What does the second property s 2 y = s 2 ŷ + s2 e mean? 1 n 1 n (y i y) 2 = i=1 n (y i y) 2 = i=1 1 n (ŷ i y) n 1 n 1 i=1 n n (ŷ i y) 2 + i=1 i=1 e 2 i n i=1 e 2 i Intuitively, it says that the variance of our dependent variable y can be broken apart into two pieces n i=1 (y i y) 2. This is the total variation in y. n i=1 (ŷ i y) 2. This is the variation in y explained by x. n i=1 e2 i. This is the unexplained variation in y. 93

94 R-squared R 2 = = explained variation total variation n i=1 (ŷ i y) 2 n i=1 (y i y) 2 Intuitively, R 2 measures the amount of variation in y we can explain with x. It is always the case that 0 R 2 1. The closer R-squared is to 1, the better the (in-sample) fit. 94

95 R-squared Excel automatically prints out these results. We have n i=1 (ŷ i y) 2 = and n i=1 e2 i =

96 R-squared For simple linear regression (only one x), R-squared is the correlation between y and x squared! You can easily test this by going in to Excel and computing the correlation between y and x. For example, this is the result for our housing data. Table of correlations price size price size R 2 = =

97 R-squared Excel automatically prints out the R 2. 97

98 Application: The Market Model 98

99 The Market Model In finance, a popular model is to regress stock returns against returns on some market index, such as the S&P 500. The slope of the regression line, referred to as beta, is a measure of how sensitive a stock is to movements in the market. Usually, a beta less than 1 means the stock is less risky than the market, equal to 1 same risk as the market and greater than 1, riskier than the market. 99

100 The Market Model I have collected monthly data on General Electric (GE) and the S&P 500 from January 1989 to March 2010 for a sample of n = GE S&P

101 The Market Model The regression we are running is GE i = α + βs&p500 i + ε i ε i N(0, σ 2 ) Before we see the results, what do you think our estimate ˆβ will be? Do you think we will reject the hypothesis H 0 : β = 0 at the 5% level? How can we test the hypothesis H 0 : β = 1 at the 5% level? 101

102 The Market Model Here are the results of the regression from Excel. Results of simple regression for GE Summary measures Multiple R R-Square StErr of Est ANOVA table Source df SS MS F p-value Explained Unexplained Regression coefficients Coefficient Std Err t-value p-value Lower limit Upper limit Constant SP Our estimates are ˆα = and ˆβ = ! 102

103 The Market Model Suppose we want to test the hypothesis H 0 : β = 0 at the 5% level. First, how do we interpret this test? t = ˆβ 0 s ˆβ = = The critical value is: tval = T.INV(0.05,252) = Do we reject? What is the p-value? 103

104 The Market Model Excel reports the same value of the test statistic for this hypothesis. Results of simple regression for GE Summary measures Multiple R R-Square StErr of Est ANOVA table Source df SS MS F p-value Explained Unexplained Regression coefficients Coefficient Std Err t-value p-value Lower limit Upper limit Constant SP

105 The Market Model Suppose we want to test the hypothesis H 0 : β = 1 at the 5% level. First, how do we interpret this test? t = ˆβ 1 s ˆβ = = 3.53 The critical value is: tval = T.INV(0.05,252) = Do we reject the hypothesis? 105

106 The Market Model How do we construct 95% confidence intervals for α and β? Find the 95% critical value: tval = T.INV(0.05,252) = Our 95% confidence interval for α is then: ˆα ± tval sˆα = ± = ( , ) Our 95% confidence interval for β is then: ˆβ ± tval s ˆβ = ± = (1.1146, ) 106

107 The Market Model Excel reports the same values for the 95% confidence intervals. Results of simple regression for GE Summary measures Multiple R R-Square StErr of Est ANOVA table Source df SS MS F p-value Explained Unexplained Regression coefficients Coefficient Std Err t-value p-value Lower limit Upper limit Constant SP

108 The Market Model Here is a picture of the fitted regression line GE GE = S&P S&P

109 The Market Model Here is a picture of the residuals e i S&P

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

1 Simple Linear Regression I Least Squares Estimation

1 Simple Linear Regression I Least Squares Estimation Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6 WEB APPENDIX 8A Calculating Beta Coefficients The CAPM is an ex ante model, which means that all of the variables represent before-thefact, expected values. In particular, the beta coefficient used in

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

One-Way Analysis of Variance

One-Way Analysis of Variance One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

ABSORBENCY OF PAPER TOWELS

ABSORBENCY OF PAPER TOWELS ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480 1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

More information

2 Sample t-test (unequal sample sizes and unequal variances)

2 Sample t-test (unequal sample sizes and unequal variances) Variations of the t-test: Sample tail Sample t-test (unequal sample sizes and unequal variances) Like the last example, below we have ceramic sherd thickness measurements (in cm) of two samples representing

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4 4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

Interaction between quantitative predictors

Interaction between quantitative predictors Interaction between quantitative predictors In a first-order model like the ones we have discussed, the association between E(y) and a predictor x j does not depend on the value of the other predictors

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

12: Analysis of Variance. Introduction

12: Analysis of Variance. Introduction 1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Statistics 104: Section 6!

Statistics 104: Section 6! Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996) MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

More information

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences. 1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract

More information