# Lecture 2: Simple Linear Regression

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 DMBA: Statistics Lecture 2: Simple Linear Regression Least Squares, SLR properties, Inference, and Forecasting Carlos Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching 1

2 Today s Plan 1. The Least Squares Criteria 2. The Simple Linear Regression Model 3. Estimation for the SLR Model sampling distributions confidence intervals hypothesis testing 2

3 Linear Prediction Ŷ i = b 0 + b 1 X i b 0 is the intercept and b 1 is the slope We find b 0 and b 1 using Least Squares 3

4 The Least Squares Criterion The formulas for b 0 and b 1 that minimize the least squares criterion are: b 1 = corr(x, Y ) s Y s X b 0 = Ȳ b 1 X where, s Y = n ( Yi Ȳ ) 2 i=1 and s X = n ( Xi X ) 2 i=1 4

5 The direction i the sign of the c Correlation and Covariance Review: Covariance and Correlation Measure the direction, -"./01"\$23"\$direction and strength of the linear.4*\$strength relationship 56\$23"\$()4".1 between variables Y and 1"(.2)54/3)7\$8"29""4\$23"\$:.1).8("/\$;\$.4*\$< X Applied Regression n Analysis Carlos M. Carvalhoi=1 Cov(Y, X ) = (Y i Ȳ )(X i X ) n 1 5

6 Correlation and Covariance Correlation is the standardized covariance: corr(x, Y ) = cov(x, Y ) = cov(x, Y ) var(x )var(y ) sd(x )sd(y ) The correlation is scale invariant and the units of measurement don t matter: It is always true that 1 corr(x, Y ) 1. This gives the direction (- or +) and strength (0 1) of the linear relationship between X and Y. 6

7 Correlation corr = corr = corr = corr =

8 Correlation Only measures linear relationships: corr(x, Y ) = 0 does not mean the variables are not related! corr = 0.01 corr = Also be careful with influential observations. 8

9 Back to Least Squares 1. Intercept: b 0 = Ȳ b 1 X Ȳ = b 0 + b 1 X The point ( X, Ȳ ) is on the regression line! Least squares finds the point of means and rotate the line through that point until getting the right slope 2. Slope: b 1 = corr(x, Y ) s Y s X So, the right slope is the correlation coefficient times a scaling factor that ensures the proper units for b 1 9

10 More on Least Squares From now on, terms fitted values (Ŷ i ) and residuals (e i ) refer to those obtained from the least squares line. The fitted values and residuals have some special properties. Lets look at the housing data analysis to figure out what these properties are... 10

11 The Fitted Values and X Fitted Values corr(y.hat, x) = X 11

12 The Residuals and X Residuals corr(e, x) = 0 mean(e) = X 12

13 Why? What is the intuition for the relationship between Ŷ and e and X? Lets consider some crazy alternative line: Y Crazy line: X LS line: X X 13

14 Fitted Values and Residuals This is a bad fit! We are underestimating the value of small houses and overestimating the value of big houses. Crazy Residuals corr(e, x) = -0.7 mean(e) = Clearly, we have left some predictive ability on the table! X 14

15 Fitted Values and Residuals As long as the correlation between e and X is non-zero, we could always adjust our prediction rule to do better. We need to exploit all of the predictive power in the X values and put this into Ŷ, leaving no Xness in the residuals. In Summary: Y = Ŷ + e where: Ŷ is made from X ; corr(x, Ŷ ) = 1. e is unrelated to X ; corr(x, e) = 0. 15

16 Another way to derive things The intercept: 1 n n e i = 0 1 n i=1 n (Y i b 0 b 1 X i ) i=1 Ȳ b 0 b 1 X = 0 b 0 = Ȳ b 1 X 16

17 Another way to derive things The slope: corr(e, X ) = = = n e i (X i X ) = 0 i=1 n (Y i b 0 b 1 X i )(X i X ) i=1 n (Y i Ȳ b 1 (X i X ))(X i X ) i=1 b 1 = n i=1 (X i X )(Y i Ȳ ) n i=1 (X i X ) 2 = r xy s y s x 17

18 Decomposing the Variance How well does the least squares line explain variation in Y? Since Ŷ and e are independent (i.e. cov(ŷ, e) = 0), var(y ) = var(ŷ + e) = var(ŷ ) + var(e) This leads to n (Y i Ȳ ) 2 = i=1 n (Ŷ i Ȳ ) 2 + n i=1 i=1 e 2 i 18

19 Decomposing the Variance ANOVA Tables SSR: Variation in Y explained by the regression line. SSE: Variation in Y that is left unexplained. SSR = SST perfect fit. Be careful of similar acronyms; e.g. SSR for residual SS. 19

20 Decomposingthe the Variance The ANOVA Tables pplied Regression Analysis Fall

21 A Goodness of Fit Measure: R 2 The coefficient of determination, denoted by R 2, measures goodness of fit: R 2 = SSR SST = 1 SSE SST 0 < R 2 < 1. The closer R 2 is to 1, the better the fit. 21

22 A Goodness of Fit Measure: R 2 An interesting fact: R 2 = r 2 xy ( i.e., R 2 is squared correlation). R 2 = = n i=1 (Ŷi Ȳ )2 n i=1 (Y i Ȳ ) 2 n i=1 (b 0 + b 1 X i b 0 b 1 X ) 2 n i=1 (Y i Ȳ ) 2 = b2 1 n i=1 (X i X ) 2 n i=1 (Y i Ȳ ) 2 = b2 1 s2 x s 2 y = r 2 xy No surprise: the higher the sample correlation between X and Y, the better you are doing in your regression. 22

23 Back Back to the to the House Data ''- ''/ ''. Applied Regression Analysis Carlos M. Carvalho!""#\$%%&\$'()*"\$+, 23

24 Prediction and the Modelling Goal A prediction rule is any function where you input X and it outputs Ŷ as a predicted response at X. The least squares line is a prediction rule: Ŷ = f (X ) = b 0 + b 1 X 24

25 Prediction and the Modelling Goal Ŷ is not going to be a perfect prediction. We need to devise a notion of forecast accuracy. 25

26 Prediction and the Modelling Goal There are two things that we want to know: What value of Y can we expect for a given X? How sure are we about this forecast? Or how different could Y be from what we expect? Our goal is to measure the accuracy of our forecasts or how much uncertainty there is in the forecast. One method is to specify a range of Y values that are likely, given an X value. Prediction Interval: probable range for Y-values given X 26

27 Prediction and the Modelling Goal Key Insight: To construct a prediction interval, we will have to assess the likely range of residual values corresponding to a Y value that has not yet been observed! We will build a probability model (e.g., normal distribution). Then we can say something like with 95% probability the residuals will be no less than -\$28,000 or larger than \$28,000. We must also acknowledge that the fitted line may be fooled by particular realizations of the residuals. 27

28 The Simple Linear Regression Model The power of statistical inference comes from the ability to make precise statements about the accuracy of the forecasts. In order to do this we must invest in a probability model. Simple Linear Regression Model: Y = β 0 + β 1 X + ε ε N(0, σ 2 ) The error term ε is independent idosyncratic noise. 28

29 Independent Normal Additive Error Why do we have ε N(0, σ 2 )? E[ε] = 0 E[Y X ] = β 0 + β 1 X (E[Y X ] is conditional expectation of Y given X ). Many things are close to Normal (central limit theorem). MLE estimates for β s are the same as the LS b s. It works! This is a very robust model for the world. We can think of β 0 + β 1 X as the true regression line. 29

30 The Regression Model and our House Data Think of E[Y X ] as the average price of houses with size X: Some houses could have a higher than expected value, some lower, and the true line tells us what to expect on average. The error term represents influence of factors other X. 30

31 Conditional Distributions The conditional distribution for Y given X is Normal: Y X N(β 0 + β 1 X, σ 2 ). σ controls dispersion: 31

32 Conditional vs Marginal Distributions More on the conditional distribution: Y X N(E[Y X ], var(y X )). Mean is E[Y X ] = E[β 0 + β 1 X + ε] = β 0 + β 1 X. Variance is var(β 0 + β 1 X + ε) = var(ε) = σ 2. Remember our sliced boxplots: σ 2 < var(y ) if X and Y are related. 32

33 Prediction Intervals with the True Model You are told (without looking at the data) that β 0 = 40; β 1 = 45; σ = 10 and you are asked to predict price of a 1500 square foot house. What do you know about Y from the model? Y = (1.5) + ε = ε Thus our prediction for price is Y N(107.5, 10 2 ) 33

34 Prediction Intervals with the True Model The model says that the mean value of a 1500 sq. ft. house is \$107,500 and that deviation from mean is within \$20,000. We are 95% sure that 20 < ε < 20 \$87, 500 < Y < \$127, 500 In general, the 95 % Prediction Interval is PI = β 0 + β 1 X ± 2σ. 34

35 Summary of Simple Linear Regression Assume that all observations are drawn from our regression model and that errors on those observations are independent. The model is Y i = β 0 + β 1 X i + ε i where ε is independent and identically distributed N(0, σ 2 ). The SLR has 3 basic parameters: β 0, β 1 (linear pattern) σ (variation around the line). 35

36 Key Characteristics of Linear Regression Model Mean of Y is linear in X. Error terms (deviations from line) are normally distributed (very few deviations are more than 2 sd away from the regression mean). Error terms have constant variance. 36

37 Break Back in 15 minutes... 37

38 Recall: Estimation for the SLR Model SLR assumes every observation in the dataset was generated by the model: Y i = β 0 + β 1 X i + ε i This is a model for the conditional distribution of Y given X. We use Least Squares to estimate β 0 and β 1 : ˆβ 1 = b 1 = n i=1 (X i X )(Y i Ȳ ) n i=1 (X i X ) 2 ˆβ 0 = b 0 = Ȳ b 1 X 38

39 Estimation for the SLR Model 39

40 Estimation of Error Variance Recall that ε i iid N(0, σ 2 ), and that σ drives the width of the prediction intervals: σ 2 = var(ε i ) = E[(ε i E[ε i ]) 2 ] = E[ε 2 i ] A sensible strategy would be to estimate the average for squared errors with the sample average squared residuals: n ˆσ 2 = 1 n i=1 e 2 i 40

41 Estimation of Error Variance However, this is not an unbiased estimator of σ 2. We have to alter the denominator slightly: s 2 = 1 n 2 n i=1 e 2 i = SSE n 2 (2 is the number of regression coefficients; i.e. 2 for β 0 + β 1 ). We have n 2 degrees of freedom because 2 have been used up in the estimatiation of b 0 and b 1. We usually use s = SSE/(n 2), in the same units as Y. 41

42 Degrees of Freedom Degrees of Freedom is the number of times you get to observe useful information about the variance you re trying to estimate. For example, consider SST = n i=1 (Y i Ȳ )2 : If n = 1, Ȳ = Y 1 and SST = 0: since Y 1 is used up estimating the mean, we haven t observed any variability! For n > 1, we ve only had n 1 chances for deviation from the mean, and we estimate s 2 y = SST /(n 1). In regression with p coefficients (e.g., p = 2 in SLR), you only get n p real observations of variability DoF = n p. 42

43 Estimation of Error Variance Estimation of 2!,"-"\$).\$s )/\$0,"\$123"(\$ Where is s in the Excel output?. Remember that whenever you see standard error read it as ).\$0,"\$.0?/*?-*\$*"<)?0)4/ estimated standard deviation: σ is the standard deviation. Applied Regression Analysis Carlos M. Carvalho!""#\$%%%&\$'()*"\$+ 43

44 Sampling Distribution of Least Squares Estimates How much do our estimates depend on the particular random sample that we happen to observe? Imagine: Randomly draw different samples of the same size. For each sample, compute the estimates b 0, b 1, and s. If the estimates don t vary much from sample to sample, then it doesn t matter which sample you happen to observe. If the estimates do vary a lot, then it matters which sample you happen to observe. 44

45 Sampling Distribution of Least Squares Estimates 45

46 Sampling Distribution of Least Squares Estimates 46

47 Sampling Distribution of Least Squares Estimates LS lines are much closer to the true line when n = 50. For n = 5, some lines are close, others aren t: we need to get lucky 47

48 Review: Sampling Distribution of Sample Mean Step back for a moment and consider the mean for an iid sample of n observations of a random variable {X 1,..., X n } Suppose that E(X i ) = µ and var(x i ) = σ 2 E( X ) = 1 n E(Xi ) = µ var( X ) = var ( 1 ) n Xi = 1 n var 2 (Xi ) = σ2 n ( If X is normal, then X N µ, σ2 n If X is not normal, we have the central limit theorem (more in a minute)! ). 48

49 Oracle vs SAP Example (understanding variation) 49

50 Oracle vs SAP 50

51 Oracle vs SAP Do you really believe that SAP affects ROE? How else could we look at this question? 51

52 Central Limit Theorem Simple CLT states that for iid random variables, X, with mean µ and variance σ 2, the distribution of the sample mean becomes normal as the number of observations, n, gets large. That is, X n N(µ, σ2 ), and sample averages tend to be normally n distributed in large samples. 52

53 Central Limit Theorem Exponential random variables don t look very normal: f(x) E[X ] = 1 and var(x ) = 1. X 53

54 Central Limit Theorem 1000 means from n=2 samples Frequency apply(matrix(rexp(2 * 1000), ncol = 1000), 2, mean) 54

55 Central Limit Theorem 1000 means from n=5 samples Frequency apply(matrix(rexp(5 * 1000), ncol = 1000), 2, mean) 55

56 Central Limit Theorem 1000 means from n=10 samples Frequency apply(matrix(rexp(10 * 1000), ncol = 1000), 2, mean) 56

57 Central Limit Theorem 1000 means from n=100 samples Frequency apply(matrix(rexp(100 * 1000), ncol = 1000), 2, mean) 57

58 Central Limit Theorem 1000 means from n=1k samples Frequency apply(matrix(rexp(1000 * 1000), ncol = 1000), 2, mean) 58

59 Sampling Distribution of b 1 The sampling distribution of b 1 describes how estimator b 1 = ˆβ 1 varies over different samples with the X values fixed. It turns out that b 1 is normally distributed: b 1 N(β 1, σ 2 b 1 ). b 1 is unbiased: E[b 1 ] = β 1. Sampling sd σ b1 determines precision of b 1. 59

60 Sampling Distribution of b 1 Can we intuit what should be in the formula for σ b1? How should σ figure in the formula? What about n? Anything else? var(b 1 ) = σ 2 (Xi X ) 2 = σ 2 (n 1)s 2 x Three Factors: sample size (n), error variance (σ 2 = σ 2 ε), and X -spread (s x ). 60

61 Sampling Distribution of b 0 The intercept is also normal and unbiased: b 0 N(β 0, σ 2 b 0 ). σ 2 b 0 = var(b 0 ) = σ 2 ( 1 n + X 2 (n 1)s 2 x ) What is the intuition here? 61

62 The Importance of Understanding Variation When estimating a quantity, it is vital to develop a notion of the precision of the estimation; for example: estimate the slope of the regression line estimate the value of a house given its size estimate the expected return on a portfolio estimate the value of a brand name estimate the damages from patent infringement Why is this important? We are making decisions based on estimates, and these may be very sensitive to the accuracy of the estimates! 62

63 The Importance of Understanding Variation Example from everyday life: When building a house, we can estimate a required piece of wood to 1/4? When building a fine cabinet, the estimates may have to be accurate to 1/16 or even 1/32. The standard deviations of the least squares estimators of the slope and intercept give a precise measurement of the accuracy of the estimator. However, these formulas aren t especially practical since they involve the unknown quantity: σ 63

64 Estimated Variance We estimate variation with sample standard deviations : s b1 = s 2 (n 1)s 2 x s b0 = s 2 ( 1 n + X 2 (n 1)s 2 x ) Recall that s = e 2 i /(n 2) is the estimator for σ = σ ε. Hence, s b1 = ˆσ b1 and s b0 = ˆσ b0 are estimated coefficient sd s. A high level of info/precision/accuracy means small s b values. 64

65 Normal and Student s t Recall what Student discovered: If θ N(µ, σ 2 ), but you estimate σ 2 s 2 based on n p degrees of freedom, then θ t n p (µ, s 2 ). For example: Ȳ t n 1 (µ, s 2 y /n). b 0 t n 2 ( β0, s 2 b 0 ) and b1 t n 2 ( β1, s 2 b 1 ) The t distribution is just a fat-tailed version of the normal. As n p, our tails get skinny and the t becomes normal. 65

66 Standardized Normal and Student s t We ll also usually standardize things: b j β j σ bj N(0, 1) = b j β j s bj t n 2 (0, 1) We use Z N(0, 1) and Z n p t n p (0, 1) to represent standard random variables. Notice that the t and normal distributions depend upon assumed values for β j : this forms the basis for confidence intervals, hypothesis testing, and p-values. 66

67 Testing and Confidence Intervals (in 3 slides) Suppose Z n p is distributed t n p (0, 1). A centered interval is P( t n p,α/2 < Z n p < t n p,α/2 ) = 1 α 67

68 Confidence Intervals Since b j t n p (β j, s bj ), 1 α = P( t n p,α/2 < b j β j s bj < t n p,α/2 ) = P(b j t n p,α/2 s bj < β j < b j + t n p,α/2 s bj ) Thus (1 α)*100% of the time, β j is within the Confidence Interval: b j ± t n p,α/2 s bj 68

69 Testing Similarly, suppose that assuming b j t n p (β j, s bj ) for our sample b j leads to (recall Z n p t n p (0, 1)) ( ) ( P Z n p < b j β j s bj + P Z n p > b j β j s bj Then the p-value is ϕ = 2P(Z n p > b j β j /s bj ). ) = ϕ. You do this calculation for β j = βj 0, an assumed null/safe value, and only reject βj 0 if ϕ is too small (e.g., ϕ < 1/20). In regression, β 0 j = 0 almost always. 69

70 More Detail... Confidence Intervals Why should we care about Confidence Intervals? The confidence interval captures the amount of information in the data about the parameter. The center of the interval tells you what your estimate is. The length of the interval tells you how sure you are about your estimate. 70

71 More Detail... Testing Suppose that we are interested in the slope parameter, β 1. For example, is there any evidence in the data to support the existence of a relationship between X and Y? We can rephrase this in terms of competing hypotheses. H 0 : β 1 = 0. Null/safe; implies no effect and we ignore X. H 1 : β 1 0. Alternative; leads us to our best guess β 1 = b 1. 71

72 Hypothesis Testing If we want statistical support for a certain claim about the data, we want that claim to be the alternative hypothesis. Our hypothesis test will either reject or not reject the null hypothesis (the default if our claim is not true). If the hypothesis test rejects the null hypothesis, we have statistical support for our claim! 72

73 Hypothesis Testing We use b j for our test about β j. Reject H 0 when b j is far from β 0 j (ususally 0). Assume H 0 when b j is close to β 0 j. An obvious tactic is to look at the difference b j β 0 j. But this measure doesn t take into account the uncertainty in estimating b j : What we really care about is how many standard deviations b j is away from β 0 j. 73

74 Hypothesis Testing The t-statistic for this test is z bj = b j β 0 j s bj = b j s bj for β 0 j = 0. If H 0 is true, this should be distributed z bj t n p (0, 1). Small z bj leaves us happy with the null β 0 j. Large z bj (i.e., > about 2) should get us worried! 74

75 Hypothesis Testing We assess the size of z bj with the p-value : ϕ = P( Z n p > z bj ) = 2P(Z n p > z bj ) (once again, Z n p t n p (0, 1)). p-value = 0.05 (with 8 df) p(z) z z Z 75

76 Hypothesis Testing The p-value is the probablity, assuming that the null hypothesis is true, of seeing something more extreme (further from the null) than what we have observed. You can think of 1 ϕ (inverse p-value) as a measure of distance between the data and the null hypothesis. In other words, 1 ϕ is the strength of evidence against the null. 76

77 Hypothesis Testing The formal 2-step approach to hypothesis testing Pick the significance level α (often 1/20 = 0.05), our acceptable risk (probability) of rejecting a true null hypothesis (we call this a type 1 error). This α plays the same role as α in CI s. Calculate the p-value, and reject H 0 if ϕ < α (in favor of our best alternative guess; e.g. β j = b j ). If ϕ > α, continue working under null assumptions. This is equivilent to having the rejection region z bj > t n p,α/2. 77

78 Example: Hypothesis Testing Consider again a CAPM regression for the Windsor fund. Does Windsor have a non-zero intercept? (i.e., does it make/lose money independent of the market?). H 0 : β 0 = 0 and there is no-free money. H 1 : β 0 0 and Windsor is cashing regardless of market. 78

79 Hypothesis Testing Windsor Fund Example Example: -"./((\$01"\$!)2*345\$5"65"33)42\$72\$8\$,9:;< Hypothesis Testing Applied Regression Analysis Carlos M. Carvalho b! s b! b s b!!!""#\$%%%&\$'()*"\$+, It turns out that we reject the null at α =.05 (ϕ =.0105). Thus Windsor does have an alpha over the market. 79

80 Example: Hypothesis Testing Looking at the slope, this is a very rare case where the null hypothesis is not zero: H 0 : β 1 = 1 Windsor is just the market (+ alpha). H 1 : β 1 1 and Windsor softens or exaggerates market moves. We are asking whether or not Windsor moves in a different way than the market (e.g., is it more conservative?). Now, t = b 1 1 s b1 = = t n 2,α/2 = t 178,0.025 = 1.96 Reject H 0 at the 5% level 80

81 Forecasting The conditional forecasting problem: Given covariate X f and sample data {X i, Y i } n i=1, predict the future observation y f. The solution is to use our LS fitted value: Ŷ f = b 0 + b 1 X f. This is the easy bit. The hard (and very important!) part of forecasting is assessing uncertainty about our predictions. 81

82 Forecasting If we use Ŷ f, our prediction error is e f = Y f Ŷ f = Y f b 0 b 1 X f 82

83 Forecasting This can get quite complicated! A simple strategy is to build the following (1 α)100% prediction interval: b 0 + b 1 X f ± t n 2,α/2 s A large predictive error variance (high uncertainty) comes from Large s (i.e., large ε s). Small n (not enough data). Small s x (not enough observed spread in covariates). Large difference between X f and X. Just remember that you are uncertain about b 0 and b 1! Reasonably inflating the uncertainty in the interval above is always a good idea... as always, this is problem dependent. 83

84 Forecasting For X f far from our X, the space between lines is magnified... 84

85 Glossary and Equations Ŷ i = b 0 + b 1 X i is the ith fitted value. e i = Y i Ŷ i is the ith residual. s: standard error of regression residuals ( σ = σ ε ). s 2 = 1 e 2 n 2 i s bj : standard error of regression coefficients. s b1 = s 2 (n 1)s 2 x s b0 = s 1 n + X 2 (n 1)s 2 x 85

86 Glossary and Equations α is the significance level (prob of type 1 error). t n p,α/2 is the value such that for Z n p t n p (0, 1), P(Z n p > t n p,α/2 ) = P(Z n p < t n p,α/2 ) = α/2. z bj t n p (0, 1) is the standardized coefficient t-value: z bj = b j β 0 j s bj (= b j /s bj most often) The (1 α) 100% for β j is b j ± t n p,α/2 s bj. 86

### Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

### Week 5: Multiple Linear Regression

BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.

Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

### Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur

Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 7 Multiple Linear Regression (Contd.) This is my second lecture on Multiple Linear Regression

### Chapter 11: Two Variable Regression Analysis

Department of Mathematics Izmir University of Economics Week 14-15 2014-2015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions

### Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

### Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

### Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Statistics II Final Exam - January Use the University stationery to give your answers to the following questions.

Statistics II Final Exam - January 2012 Use the University stationery to give your answers to the following questions. Do not forget to write down your name and class group in each page. Indicate clearly

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 2 Simple Linear Regression

Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 2 Simple Linear Regression Hi, this is my second lecture in module one and on simple

### Homework Zero. Max H. Farrell Chicago Booth BUS41100 Applied Regression Analysis. Complete before the first class, do not turn in

Homework Zero Max H. Farrell Chicago Booth BUS41100 Applied Regression Analysis Complete before the first class, do not turn in This homework is intended as a self test of your knowledge of the basic statistical

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### Inference in Regression Analysis. Dr. Frank Wood

Inference in Regression Analysis Dr. Frank Wood Inference in the Normal Error Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.

### Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

### Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

### Hypothesis Testing or How to Decide to Decide Edpsy 580

Hypothesis Testing or How to Decide to Decide Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Hypothesis Testing or How to Decide to Decide

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### 1 Simple Linear Regression I Least Squares Estimation

Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

### Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Chapter 7 Notes - Inference for Single Samples You know already for a large sample, you can invoke the CLT so: X N(µ, ). Also for a large sample, you can replace an unknown σ by s. You know how to do a

### Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

### 5.1 Identifying the Target Parameter

University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

### Definition The covariance of X and Y, denoted by cov(x, Y ) is defined by. cov(x, Y ) = E(X µ 1 )(Y µ 2 ).

Correlation Regression Bivariate Normal Suppose that X and Y are r.v. s with joint density f(x y) and suppose that the means of X and Y are respectively µ 1 µ 2 and the variances are 1 2. Definition The

### Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Section 2: Estimation, Confidence Intervals and Testing Hypothesis Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/

### Statistics for Management II-STAT 362-Final Review

Statistics for Management II-STAT 362-Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Sample Size Determination

Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance

Lecture 5: Hypothesis Testing What we know now: OLS is not only unbiased it is also the most precise (efficient) unbiased estimation technique - ie the estimator has the smallest variance (if the Gauss-Markov

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

### Coefficient of Determination

Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Lecture 13 More on hypothesis testing

Lecture 13 More on hypothesis testing Thais Paiva STA 111 - Summer 2013 Term II July 22, 2013 1 / 27 Thais Paiva STA 111 - Summer 2013 Term II Lecture 13, 07/22/2013 Lecture Plan 1 Type I and type II error

### Two-sample inference: Continuous data

Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

### Bivariate Regression Analysis. The beginning of many types of regression

Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression

### , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Linear and Piecewise Linear Regressions

Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

### fifty Fathoms Statistics Demonstrations for Deeper Understanding Tim Erickson

fifty Fathoms Statistics Demonstrations for Deeper Understanding Tim Erickson Contents What Are These Demos About? How to Use These Demos If This Is Your First Time Using Fathom Tutorial: An Extended Example

### Simple Linear Regression

Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression Statistical model for linear regression Estimating

### Analysis of Variance ANOVA

Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### 17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

### Hypothesis testing - Steps

Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

### 2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or

Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.\$ and Sales \$: 1. Prepare a scatter plot of these data. The scatter plots for Adv.\$ versus Sales, and Month versus

### Wooldridge, Introductory Econometrics, 4th ed. Multiple regression analysis:

Wooldridge, Introductory Econometrics, 4th ed. Chapter 4: Inference Multiple regression analysis: We have discussed the conditions under which OLS estimators are unbiased, and derived the variances of

### 0.1 Multiple Regression Models

0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different

### 2. What are the theoretical and practical consequences of autocorrelation?

Lecture 10 Serial Correlation In this lecture, you will learn the following: 1. What is the nature of autocorrelation? 2. What are the theoretical and practical consequences of autocorrelation? 3. Since

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Linear Regression with One Regressor

Linear Regression with One Regressor Michael Ash Lecture 10 Analogy to the Mean True parameter µ Y β 0 and β 1 Meaning Central tendency Intercept and slope E(Y ) E(Y X ) = β 0 + β 1 X Data Y i (X i, Y

### Covariance and Correlation

Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

### Notes on Applied Linear Regression

Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

### Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Regression Analysis: Basic Concepts

The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

### The Simple Linear Regression Model: Specification and Estimation

Chapter 3 The Simple Linear Regression Model: Specification and Estimation 3.1 An Economic Model Suppose that we are interested in studying the relationship between household income and expenditure on

### In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a

Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information

Chapter 8 The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information An important new development that we encounter in this chapter is using the F- distribution to simultaneously

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

### Chapter 8: Interval Estimates and Hypothesis Testing

Chapter 8: Interval Estimates and Hypothesis Testing Chapter 8 Outline Clint s Assignment: Taking Stock Estimate Reliability: Interval Estimate Question o Normal Distribution versus the Student t-distribution:

### 17.0 Linear Regression

17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was

### 1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

### Estimation and Confidence Intervals

Estimation and Confidence Intervals Fall 2001 Professor Paul Glasserman B6014: Managerial Statistics 403 Uris Hall Properties of Point Estimates 1 We have already encountered two point estimators: th e

### Basic Statistcs Formula Sheet

Basic Statistcs Formula Sheet Steven W. ydick May 5, 0 This document is only intended to review basic concepts/formulas from an introduction to statistics course. Only mean-based procedures are reviewed,

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### An example ANOVA situation. 1-Way ANOVA. Some notation for ANOVA. Are these differences significant? Example (Treating Blisters)

An example ANOVA situation Example (Treating Blisters) 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### Hypothesis Testing - II

-3σ -2σ +σ +2σ +3σ Hypothesis Testing - II Lecture 9 0909.400.01 / 0909.400.02 Dr. P. s Clinic Consultant Module in Probability & Statistics in Engineering Today in P&S -3σ -2σ +σ +2σ +3σ Review: Hypothesis

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

### Variances and covariances

Chapter 4 Variances and covariances 4.1 Overview The expected value of a random variable gives a crude measure for the center of location of the distribution of that random variable. For instance, if the

### Inference for Regression

Simple Linear Regression Inference for Regression The simple linear regression model Estimating regression parameters; Confidence intervals and significance tests for regression parameters Inference about

### Solutions for the exam for Matematisk statistik och diskret matematik (MVE050/MSG810). Statistik för fysiker (MSG820). December 15, 2012.

Solutions for the exam for Matematisk statistik och diskret matematik (MVE050/MSG810). Statistik för fysiker (MSG8). December 15, 12. 1. (3p) The joint distribution of the discrete random variables X and

### Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

### SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

### STAT 350 Practice Final Exam Solution (Spring 2015)

PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

### Heteroskedasticity and Weighted Least Squares

Econ 507. Econometric Analysis. Spring 2009 April 14, 2009 The Classical Linear Model: 1 Linearity: Y = Xβ + u. 2 Strict exogeneity: E(u) = 0 3 No Multicollinearity: ρ(x) = K. 4 No heteroskedasticity/

### The Assumption(s) of Normality

The Assumption(s) of Normality Copyright 2000, 2011, J. Toby Mordkoff This is very complicated, so I ll provide two versions. At a minimum, you should know the short one. It would be great if you knew