" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Size: px
Start display at page:

Download "" Y. Notation and Equations for Regression Lecture 11/4. Notation:"

Transcription

1 Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through m. So, the predictor variables are X1, X2,, Xm. Y: The outcome variable that is being predicted or explained in a regression ˆ Y : The estimated outcome value as predicted by the regression equation bi: The regression coefficient for predictor Xi b0: The intercept in the regression equation bi : The standard error of a regression coefficient SSY: The total sum of squares for Y, which represents the uncertainty in the outcome SSresidual: The residual sum of squares, representing the uncertainty left over after we use regression to generate the best possible predictions for the outcome SSregression: The sum of squares for the regression, representing the amount of uncertainty or variability that the predictors can explain R 2 : The proportion of variability explained by the regression MSresidual: The residual mean square, which is the mean squared error of the regression prediction, Y ˆ ; also used as an estimate of the population variance, MSregression: The mean square for the regression F: F statistic, which is the test statistic for deciding whether a regression explains meaningful variability in the outcome " Y 2 Regression equation. A regression equation is a formula that uses a set of predictor variables (Xi) to make a prediction for some outcome variable (Y). The prediction is called Y ˆ. The purpose of a regression equation is to find the best possible way to do this, that is, to combine the predictors in a way that gives estimates ( Y ˆ ) that are as close as possible to the right answer (Y). When there s only one predictor (m = 1), regression is essentially the same as correlation, and the regression equation is the same as the regression line we would draw on a scatterplot. The equation for a line always has the form Y ˆ = b0 + b1 X, where b0 and b1 are numbers representing the intercept and the slope. Finding the line that s closest to the data is the same as choosing b0 and b1 so that the predicted Y ˆ values are as close as possible to the true Y values. When there are more than one variable, we take the basic equation for a line and extend it in a natural way.

2 ˆ Y = b 0 + b 1 X 1 + b 2 X b m X m " = b 0 + b i X i i=1 to m (1) The two lines of Equation 1 mean the same thing. The first line writes everything out, and the second line combines things into a sum (you can choose to remember either one). In the sum, the index i takes on all values from 1 to m. When i = 1, the summand is b1x1; when i is 2, the summand is b2x2; and so on until bmxm. Notice that the contribution of each predictor (bixi) is the same as in the equation for a simple line with one predictor. This is why we say that each predictor has a linear effect on the predicted outcome. If we held all of the predictors except one fixed, then the relationship between the remaining predictor and the prediction would be a line. The goal of regression is to find the values of b0 through bm that lead to the best predictions. Therefore, the bs are a main focus of regression. Each bi is called the regression coefficient for its corresponding predictor (e.g., b1 is the regression coefficient for X1). The regression coefficient tells what kind of influence each predictor has on the predicted outcome. When a regression coefficient is positive, that means the predictor has a positive effect, just like with a positive correlation. When a regression coefficient is negative, the predictor has a negative effect, just like with a negative correlation. The magnitude or absolute value of bi tells how strong the effect is. If bi is near zero, then Xi has a weak effect on the outcome, just like with a correlation near zero. If bi is large (either positive or negative), then Xi has a strong effect on the outcome, just like with a correlation near ±1. The difference between regression coefficients and correlations is that regression coefficients aren t standardized, meaning they re not restricted to lie between - 1 and 1. Therefore, the strength of a regression coefficient needs to be interpreted in terms of the units of the predictor and outcome variables. In general, bi tells how many units Y ˆ increases by for every unit increase in Xi. For example, if Xi is height (in inches) and Y is how long it takes a person to run a mile (in seconds), bi = 7 would mean that for every extra inch of height, a person tends to take 7 seconds longer to run a mile. A negative coefficient means a decrease; e.g., bi = - 5 would mean that for every extra inch of height, a person tends to take 5 seconds less to run a mile. The b0 variable is a special regression coefficient called the intercept. Just like the intercept of a simple line, b0 tells what the value of Y ˆ is when all the Xis are zero (i.e., where the line intersects the Y axis). If zero isn t a sensible value of any of the predictors (e.g., no one is 0 inches tall), then the intercept taken by itself won t be a very sensible value for the outcome (e.g., it could be - 50 seconds). Therefore, we generally don t think too hard about what the value of the intercept indicates; we just know that it needs to be in the regression equation so that the overall pattern of predictions can be shifted up or down as needed to match the true values of the outcome. Partitioning sums of squares. One important question with any regression equation is how well it does predicting the outcome. We answer this question in terms of how much variability in Y is explained by the regression, meaning how much uncertainty goes away when we use the regression equation to predict the outcome. Variability or uncertainty in

3 this case is measured in sums of squares (SS), which are very similar to variance and mean squared error, except that we don t divide by degrees of freedom (yet). The total variability in Y is called SSY (sum of squares for Y) and is defined as SSY = (Y MY) 2 (2) As usual, MY represents the mean of the sample Y. Notice that if we divided SSY by n 1, we would have the sample variance of Y. So, the sum of squares for Y is just like the variance for Y except that we don t divide by n 1 (i.e., it s a sum instead of an average). As with variance, we can think of the sum of squares as a measure of uncertainty, or how much error we would expect to make if we had to guess the value of Y blindly. If we have no information about a subject, our best guess of their Y score is the mean, MY. Therefore (Y MY) 2 is our squared error, and SSY is the sum of the squared error over all subjects. If we do know something about a subject, then we can make a better prediction of their Y score than by blindly guessing the mean. This is what the regression equation does for us it uses all the predictors, Xi, to come up with the best possible prediction, Y ˆ. Once we have Y ˆ, we can ask how well it does as an estimate of Y, again by adding up the squared error over all subjects. The result is called the residual sum of squares, because it represents the uncertainty in Y that s left over after we do the regression. Notice that the residual sum of squares is the same as mean squared error except that once again it s a sum instead of a mean, because we don t divide by degrees of freedom. SSresidual = (Y ˆ Y ) 2 (3) As stated above, the goal of regression is to find the values of the regression coefficients (bi) that lead to the best predictions. By best predictions, we mean minimizing the squared difference between Y and Y ˆ. In other words, the goal is to minimize SSresidual. This is how we determine the regression coefficients (or, usually, how a computer determines them for us). The question now is how well the regression did. If the predictors do a really good job of explaining the outcome, SSresidual will be close to zero. If the predictors tell us little or nothing about the outcome, then SSresidual will be close to the original, total sum of squares, SSY. SSresidual is always less than or equal to SSY (because SSY is the error of the naive prediction, MY, and SSresidual is the error of the best possible prediction, Y ˆ ), but the question is how much less. The reduction in uncertainty from SSY to SSresidual is called SSregression, because it s the amount of variability in the outcome (Y) that the regression can explain. SSregression = SSY SSresidual SSY = SSregression + SSresidual (4) The two versions of Equation 4 say the same thing. The first version shows how we get SSregression by subtracting the residual sum of squares from the total sum of squares. The second version shows how the total sum of squares (i.e., the original variability in the data) can be broken into two parts: the portion that we can explain using the predictors (SSregression), and the portion that we cannot explain (SSresidual).

4 Explained variability. Once we ve worked out the total variability (SSY) and the portion explained by the predictors (SSregression), we can calculate their ratio. The ratio is the fraction of the total variability explained by the predictors, and it s our final measure of how useful the regression was. The fraction of explained variability is called R 2, because it extends the idea of the squared correlation, r 2. Recall that when we re doing a simple correlation (i.e., there s only one predictor), r 2 is the fraction of the variance in Y that can be explained by X. In other words, when there s only one predictor, R 2 and r 2 are equal. R 2 is just a more general concept that works when there are multiple predictors. If R 2 is close to 1, then the regression explains most of the variability in Y, meaning that if we know the values of the predictors for some subject then we can confidently predict the outcome for that subject. If R 2 is close to 0, then the predictors don t give us much information about the outcome. (R 2 is always between 0 and 1.) R 2 = SS regression SS Y (5) Hypothesis testing: The effect of one predictor. Once we ve run a regression to find the regression coefficients for a set of predictors, we can ask how reliable the regression coefficients are. The regression coefficients are statistics, meaning they re computed from a sample (for each subject in our sample, we have measured all the Xis and Y). We can use the regression coefficients as estimates of the population, but as with all estimators, they are imperfect. If we gathered a new sample and ran the regression on the new data, we d obtain somewhat different values for the regression coefficients. Therefore, each bi has a sampling distribution, which represents the probabilities of all possible values we could get for bi if we replicated the experiment. Each bi also has a standard error, which as usual is the standard deviation of the sampling distribution. The standard error tells us how reliable our estimate is, meaning how far we can expect it to be from the true population value. We won t go into detail about how the standard errors of the regression coefficients are calculated, because it s much easier to use a computer for this. However, it s important to understand what the standard errors can tell us. First, the standard error can be used to create a confidence interval for each bi, in the same way we create confidence intervals for means. I won t describe the math, but the idea is the same as before: The confidence interval is centered on the actual value of bi obtained from the sample, and the width of the confidence interval is determined by the size of the standard error. The second thing we can use standard errors for is hypothesis testing. In regression, the most common hypotheses to test regard whether the predictors have reliable influences on the population. This corresponds to asking whether the true values of the regression coefficients are different from zero. For each predictor, Xi, the null hypothesis bi = 0 states that Xi has no reliable effect on Y (the alternative hypothesis is bi 0). Notice that there s a separate null hypothesis for each predictor, and we can test each one individually. To test whether bi = 0, we calculate a t statistic equal to bi (the actual regression coefficient we obtained from our sample) divided by its standard error. t = b i " bi (6)

5 Just as with the t statistic for a t- test, t for a regression coefficient tells us how large that coefficient is relative to how large it would be expected to be by chance (i.e., according to the null hypothesis that its true value is zero). If t is large (either positive or negative), then bi is larger than would be expected by chance, so chance is not a good explanation for the result that we got. In this case, we reject the null hypothesis and adopt the alternative hypothesis that bi 0 (i.e., that Xi has a real effect on Y). If t is close to zero, then bi fits with what we d expect by chance, so we retain the null hypothesis that bi = 0 (i.e., Xi has no real effect on Y). The t statistic from a regression is used in the same way as in a t- test. If t is greater than tcrit, then we reject the null hypothesis. The alternative (and equivalent) approach is to compute a p- value, which is the probability of a result as or more extreme than t: p( tdf > t ). This is the formula for a two- tailed test, but we can also compute a one- tailed p- value if the direction of the effect (i.e., the sign of bi) was predicted in advance. In either case, we reject the null hypothesis if p < α. The only remaining information needed to find tcrit or p is the degrees of freedom. As usual, the degrees of freedom for t equals the degrees of freedom for the standard error used to compute t. The standard error for a regression coefficient, " bi, comes from SSresidual, which, as is explained below, has n m 1 degrees of freedom. Therefore, a t- test for a regression coefficient uses df = n m 1 (you don t need to memorize this). Hypothesis testing of multiple predictors. Another way to test whether predictors have reliable effects on the outcome is to test whether they explain more variability than would be expected by chance. This can be done with multiple predictors, by comparing how much variability the regression explains with those predictors to how much it explains with those predictors left out. We ll focus on the most common situation, where we want to test whether the full set of predictors, X1 through Xm, collectively tell us anything meaningful about the outcome, Y. As explained above, SSregression represents the amount of variability in Y explained by all of the predictors. We want to test whether SSregression is larger than would be expected by chance. Our null hypothesis is that none of the predictors has any effect on the outcome, meaning that the true values of the regression coefficients (except perhaps the intercept, b0) are all zero. Notice that this is the same null hypothesis used above for testing predictors one at a time, except that now we re testing them all at once, to see whether any predictor gives reliable information about the outcome. Even if all the bis are zero in the population, the regression coefficients we get from our sample will deviate from zero because of sampling variability. This leads the variability explained by the regression, SSregression, to be greater than zero, even though this explained variability is meaningless random error. Therefore, SSregression has a sampling distribution that, as usual, tells us how large it can be expected to be just by chance. Comparing the actual value of SSregression obtained from the sample to the amount expected by chance allows us to test whether chance is a good explanation for the results (H0) or the predictors are explaining something real about the outcome (H1).

6 To test whether SSregression is larger that would be expected by chance, we first divide it by its degrees of freedom to get the mean square, MSregression. MS regression = SS regression df regression (7) According to the null hypothesis that the regression doesn t explain anything real, MSregression has a (modified) chi- square distribution, multiplied by " 2 Y, the variance of Y in the population. If we knew " 2 Y then we would know the likelihood function for MSregression exactly. This is the same situation that came up with t- tests, where we knew the likelihood function for M except for not knowing σ 2. Once again, we divide by an estimate of " 2 Y to get our final test statistic, and once again, we estimate " 2 Y using the residual mean square. In this case, the residual mean square is the residual sum of squares divided by its degrees of freedom (see Eq. 3). MS residual = SS residual (8) df residual When we divide MSregression by MSresidual, " 2 Y cancels out, and we end up with a test statistic that doesn t depend on any population parameters. That is, we have a test statistic with a likelihood function that we know exactly, which is what s required for hypothesis testing. The test statistic is called F. F = MS regression MS residual (9) According to the null hypothesis, the F statistic has what s called an F distribution. F distributions arise any time you divide one chi- square variable by another, such as MSregression and MSresidual. Because the distribution of a chi- square variable depends on its degrees of freedom, an F distribution depends on the degrees of freedom of both chi- square variables that it s based on. That is, an F distribution is defined by two degrees of freedom. So, the last things we need to know are the degrees of freedom for MSregression and MSresidual. The total degrees of freedom in SSY equals n 1, and these are divided up between SSregression and SSresidual. SSresidual loses m degrees of freedom for the m regression coefficients (b1 through bm) that go into defining Y ˆ, and these degrees of freedom end up in SSregression (essentially because SSregression = SSY SSresidual). The degrees of freedom for the sums of squares carry over to the mean squares, so MSregression has m degrees of freedom and MSresidual has n m 1 degrees of freedom. (You don t need to memorize the degrees of freedom, but you should understand the basic idea of where they come from.) Once we have F and both dfs, we can compare F to its sampling distribution to get a p- value. The goal here is to decide whether F is bigger than would be expected by chance. If so, then SSregression is bigger than would be expected by chance, which means the regression is explaining something meaningful. Therefore, we want to know the probability of an F value as big as or bigger than the one we got from the data. We always compute a one- tailed p- value (using the upper tail), because an F that is unusually small doesn t tell us

7 anything interesting. The p- value isn t something that can be calculated by hand; instead we use a computer (e.g., the pf() function in R). As always, if the p- value is less than our alpha level (e.g.,.05) then we reject the null hypothesis and conclude that the predictors are explaining something meaningful about the outcome. p = p( F dfregression,df residual " F) (10) There may seem like a lot to this hypothesis test, but conceptually there are four simple steps. These same steps are used in other statistical tests, such as ANOVA, which we will learn next. Remembering and understanding these steps will not only help you understand hypothesis testing with regression, but it will make understanding ANOVA a lot easier as well. 1. Break the total sum of squares into an explainable sum of squares and an unexplainable sum of squares. 2. Divide each sum of squares by its degrees of freedom to get a mean square. 3. Divide the explainable mean squares by the unexplainable mean squares to get the test statistic, F. 4. Compare F to an F distribution to get the p- value, or find the critical value and compare F directly to Fcrit.

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis.

where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis. Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

More information

Introduction to Regression. Dr. Tom Pierce Radford University

Introduction to Regression. Dr. Tom Pierce Radford University Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything

More information

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p. Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Bivariate Regression Analysis. The beginning of many types of regression

Bivariate Regression Analysis. The beginning of many types of regression Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Simple Regression and Correlation

Simple Regression and Correlation Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Inference for Regression

Inference for Regression Simple Linear Regression Inference for Regression The simple linear regression model Estimating regression parameters; Confidence intervals and significance tests for regression parameters Inference about

More information

Testing for Lack of Fit

Testing for Lack of Fit Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

More information

The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Regression Analysis: Basic Concepts

Regression Analysis: Basic Concepts The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

More information

Lecture 18 Linear Regression

Lecture 18 Linear Regression Lecture 18 Statistics Unit Andrew Nunekpeku / Charles Jackson Fall 2011 Outline 1 1 Situation - used to model quantitative dependent variable using linear function of quantitative predictor(s). Situation

More information

Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur

Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 7 Multiple Linear Regression (Contd.) This is my second lecture on Multiple Linear Regression

More information

CHAPTER 2 AND 10: Least Squares Regression

CHAPTER 2 AND 10: Least Squares Regression CHAPTER 2 AND 0: Least Squares Regression In chapter 2 and 0 we will be looking at the relationship between two quantitative variables measured on the same individual. General Procedure:. Make a scatterplot

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

17.0 Linear Regression

17.0 Linear Regression 17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was

More information

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ 1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

More information

Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology

Regression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

The scatterplot indicates a positive linear relationship between waist size and body fat percentage:

The scatterplot indicates a positive linear relationship between waist size and body fat percentage: STAT E-150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

Lesson Lesson Outline Outline

Lesson Lesson Outline Outline Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

More information

Today: Dummy variables. Dummy variables in a multiple regression, regression wrap up.

Today: Dummy variables. Dummy variables in a multiple regression, regression wrap up. Today: Dummy variables. Dummy variables in a multiple regression, regression wrap up. Looking back in regression, we ve looked at how an interval data response y changes as an interval data explanatory

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Open book and note Calculator OK Multiple Choice 1 point each MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Find the mean for the given sample data.

More information

Experimental Designs (revisited)

Experimental Designs (revisited) Introduction to ANOVA Copyright 2000, 2011, J. Toby Mordkoff Probably, the best way to start thinking about ANOVA is in terms of factors with levels. (I say this because this is how they are described

More information

Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/2004 Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

More information

Correlational Research

Correlational Research Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

More information

Chapter 11: Two Variable Regression Analysis

Chapter 11: Two Variable Regression Analysis Department of Mathematics Izmir University of Economics Week 14-15 2014-2015 In this chapter, we will focus on linear models and extend our analysis to relationships between variables, the definitions

More information

Data and Regression Analysis. Lecturer: Prof. Duane S. Boning. Rev 10

Data and Regression Analysis. Lecturer: Prof. Duane S. Boning. Rev 10 Data and Regression Analysis Lecturer: Prof. Duane S. Boning Rev 10 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance (ANOVA) 2. Multivariate Analysis of Variance Model forms 3.

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

E205 Final: Version B

E205 Final: Version B Name: Class: Date: E205 Final: Version B Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The owner of a local nightclub has recently surveyed a random

More information

Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline. Correlation & Regression, III. Review. Relationship between r and regression Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

More information

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand linear regression with a single predictor Understand how we assess the fit of a regression model Total Sum of Squares

More information

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables. SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question. Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

A. Karpinski

A. Karpinski Chapter 3 Multiple Linear Regression Page 1. Overview of multiple regression 3-2 2. Considering relationships among variables 3-3 3. Extending the simple regression model to multiple predictors 3-4 4.

More information

Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

More information

7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

More information

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480 1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

One-Way Analysis of Variance

One-Way Analysis of Variance One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Simple Linear Regression

Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression Statistical model for linear regression Estimating

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

More information

AMS7: WEEK 8. CLASS 1. Correlation Monday May 18th, 2015

AMS7: WEEK 8. CLASS 1. Correlation Monday May 18th, 2015 AMS7: WEEK 8. CLASS 1 Correlation Monday May 18th, 2015 Type of Data and objectives of the analysis Paired sample data (Bivariate data) Determine whether there is an association between two variables This

More information

, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results. BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

More information

AP Statistics 2002 Scoring Guidelines

AP Statistics 2002 Scoring Guidelines AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought

More information

Statistics for Management II-STAT 362-Final Review

Statistics for Management II-STAT 362-Final Review Statistics for Management II-STAT 362-Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

0.1 Multiple Regression Models

0.1 Multiple Regression Models 0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different

More information

The general form of the PROC GLM statement is

The general form of the PROC GLM statement is Linear Regression Analysis using PROC GLM Regression analysis is a statistical method of obtaining an equation that represents a linear relationship between two variables (simple linear regression), or

More information

Regression in ANOVA. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Regression in ANOVA. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Regression in ANOVA James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Regression in ANOVA 1 Introduction 2 Basic Linear

More information

Simple Linear Regression

Simple Linear Regression STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

More information

Confidence Intervals on Effect Size David C. Howell University of Vermont

Confidence Intervals on Effect Size David C. Howell University of Vermont Confidence Intervals on Effect Size David C. Howell University of Vermont Recent years have seen a large increase in the use of confidence intervals and effect size measures such as Cohen s d in reporting

More information

Math 62 Statistics Sample Exam Questions

Math 62 Statistics Sample Exam Questions Math 62 Statistics Sample Exam Questions 1. (10) Explain the difference between the distribution of a population and the sampling distribution of a statistic, such as the mean, of a sample randomly selected

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

ECO220Y1Y Duration - 3 hours Examination Aids: Calculator

ECO220Y1Y Duration - 3 hours Examination Aids: Calculator Page 1 of 16 UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2011 EXAMINATIONS ECO220Y1Y Duration - 3 hours Examination Aids: Calculator Last Name: First Name: Student #: ENTER YOUR NAME AND STUDENT

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other 1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

More information

Regression, least squares

Regression, least squares Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

More information

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

More information

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

More information

Solutions to Worksheet on Hypothesis Tests

Solutions to Worksheet on Hypothesis Tests s to Worksheet on Hypothesis Tests. A production line produces rulers that are supposed to be inches long. A sample of 49 of the rulers had a mean of. and a standard deviation of.5 inches. The quality

More information

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods

More information

General Regression Formulae ) (N-2) (1 - r 2 YX

General Regression Formulae ) (N-2) (1 - r 2 YX General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Beta-hat: β = r YX (1 Standard

More information

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

EPS 625 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM

EPS 625 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM EPS 6 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM ANCOVA One Continuous Dependent Variable (DVD Rating) Interest Rating in DVD One Categorical/Discrete Independent Variable

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis In this guide I will make use of Microsoft Excel in the examples and explanations. This should not be taken as an endorsement of Microsoft or its products. In fact, there are

More information

Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm.

Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm. Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm. Political Science 15 Lecture 12: Hypothesis Testing Sampling

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information