# Regression in ANOVA. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Save this PDF as:

Size: px
Start display at page:

Download "Regression in ANOVA. James H. Steiger. Department of Psychology and Human Development Vanderbilt University"

## Transcription

1 Regression in ANOVA James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30

2 Regression in ANOVA 1 Introduction 2 Basic Linear Regression in R 3 Multiple Regression in R 4 Nested Models 5 ANOVA as Dummy Variable Regression James H. Steiger (Vanderbilt University) 2 / 30

3 Introduction Introduction In this module, we begin the study of the classic analysis of variance (ANOVA) designs. Since we shall be analyzing these models using R and the regression framework of the General Linear Model, we start by recalling some of the basics of regression modeling. We work through linear regression and multiple regression, and include a brief tutorial on the statistical comparison of nested multiple regression models. We then show how the classic ANOVA model can be (and is) analyzed as a multiple regression model. James H. Steiger (Vanderbilt University) 3 / 30

4 Basic Linear Regression in R Basic Linear Regression in R Let s define and plot some artificial data on two variables. > set.seed(12345) > x <- rnorm(25) > y <- sqrt(1/2) * x + sqrt(1/2) * rnorm(25) > plot(x, y) y x James H. Steiger (Vanderbilt University) 4 / 30

5 Basic Linear Regression in R Basic Linear Regression in R We want to predict y from x using least squares linear regression. We seek to fit a model of the form y i = β 0 + β 1 x i + e i = ŷ i + e i while minimizing the sum of squared errors in the up-down plot direction. We fit such a model in R by creating a fit object and examining its contents. We see that the formula for ŷ i is a straight line with slope β 1 and intercept β 0. James H. Steiger (Vanderbilt University) 5 / 30

6 Basic Linear Regression in R Basic Linear Regression in R We start by creating the model with a model specification formula. This formula corresponds to the model stated on the previous slide in a specific way: 1 Instead of an equal sign, a is used. 2 The coefficients themselves are not listed, only the predictor variables. 3 The error term is not listed 4 The intercept term generally does not need to be listed, but can be listed with a 1. So the model on the previous page is translated as y ~ x. James H. Steiger (Vanderbilt University) 6 / 30

7 Basic Linear Regression in R Basic Linear Regression in R We create the fit object as follows. > fit.1 <- lm(y ~ x) Once we have created the fit object, we can examine its contents. > summary(fit.1) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 23 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 1 and 23 DF, p-value: James H. Steiger (Vanderbilt University) 7 / 30

8 Basic Linear Regression in R Basic Linear Regression in R We see the printed coefficients for the intercept and for x. There are statistical t tests for each coefficient. These are tests of the null hypothesis that the coefficient is zero. There is also a test of the hypothesis that the squared multiple correlation (the square of the correlation between ŷ and y) is zero. Standard errors are also printed, so you can compute confidence intervals. (How would you do that quickly in your head? (C.P.) The slope is not significantly different from zero. Does that surprise you? (C.P.) The squared correlation is What is the correlation in the population? (C.P.) James H. Steiger (Vanderbilt University) 8 / 30

9 Basic Linear Regression in R Basic Linear Regression in R If we want, we can, in the case of simple bivariate regression, add a regression line to the plot automatically using the abline function. > plot(x, y) > abline(fit.1, col = "red") y x James H. Steiger (Vanderbilt University) 9 / 30

10 Multiple Regression in R Multiple Regression in R If we have more than one predictor, we have a multiple regression model. Suppose, for example, we add another predictor w to our artificial data set. We design this predictor to be completely uncorrelated with the other predictor and the criterion, so this predictor is, in the population, of no value. Now our model becomes y i = β 0 + β 1 x i + β 2 w i + e i > w <- rnorm(25) James H. Steiger (Vanderbilt University) 10 / 30

11 Multiple Regression in R Multiple Regression in R How would we set up and fit the model y i = β 0 + β 1 x i + β 2 w i + e i in R? James H. Steiger (Vanderbilt University) 11 / 30

12 Multiple Regression in R Multiple Regression in R How would we set up and fit the model y i = β 0 + β 1 x i + β 2 w i + e i in R? That s right, > fit.2 <- lm(y ~ x + w) James H. Steiger (Vanderbilt University) 12 / 30

13 Multiple Regression in R Multiple Regression in R > summary(fit.2) Call: lm(formula = y ~ x + w) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x *** w Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 22 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 2 and 22 DF, p-value: James H. Steiger (Vanderbilt University) 13 / 30

14 Nested Models Introduction Nested Models The situation we examined in the previous sections is a simple example of a sequence of nested models. One model is nested within another if it is a special case of the other in which some model coefficients are constrained to be zero. The model with only x as a predictor is a special case of the model with x and w as predictors, with the coefficient β 2 constrained to be zero. James H. Steiger (Vanderbilt University) 14 / 30

15 Nested Models Model Comparison Nested Models When two models are nested multiple regression models, there is a simple procedure for comparing them. This procedure tests whether the more complex model is significantly better than the simpler model. In the sample, of course, the more complex of two nested models will always fit at least as well as the less complex model. James H. Steiger (Vanderbilt University) 15 / 30

16 Nested Models Nested Models Partial F -Tests: A General Approach Suppose Model A includes Model B as a special case. That is, Model B is a special case of Model A where some terms have coefficients of zero. Then Model B is nested within Model A. If we define SS a to be the sum of squared residuals for Model A, SS b the sum of squared residuals for Model B. Since Model B is a special case of Model A, model A is more complex so SS b will always be as least as large as SS a. We define df a to be n p a, where p a is the number of terms in Model A including the intercept, and correspondingly df b = n p b. Then, to compare Model B against Model A, we compute the partial F statistic as follows. F dfb df a,df a = MS comparison MS res = (SS b SS a )/(p a p b ) SS a /df a (1) James H. Steiger (Vanderbilt University) 16 / 30

17 Nested Models Nested Models Partial F -Tests: A General Approach R will perform the partial F -test automatically, using the anova command. > anova(fit.1, fit.2) Analysis of Variance Table Model 1: y ~ x Model 2: y ~ x + w Res.Df RSS Df Sum of Sq F Pr(>F) e Note that the p value for the model difference test is the same as the p value for the t-test of the significance of the coefficient for w shown previously. James H. Steiger (Vanderbilt University) 17 / 30

18 Nested Models Partial F -Tests: A General Approach Nested Models What happens if we call the anova command with just a single model? > anova(fit.1) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x *** Residuals Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Note that the p-value for this test is the same as the p-value for the overall test of zero squared multiple correlation shown in the output summary for fit.1. What is going on? James H. Steiger (Vanderbilt University) 18 / 30

19 Nested Models Nested Models Partial F -Tests: A General Approach It turns out, if you call the anova command with a single fit object, it startes by comparing the first non-intercept term in the model against a baseline model with no predictors (i.e., just an intercept). If there is a second predictor, it compares the model with both predictors against the model with just one predictor. It produces this sequence of comparisons automatically. To demonstrate, let s fit a model with just an intercept. > fit.0 <- lm(y ~ 1) Recall that the 1 in the model formula stands for the intercept. No let s perform a partial F -test comparing fit.0 with fit.1. James H. Steiger (Vanderbilt University) 19 / 30

20 Nested Models Nested Models Partial F -Tests: A General Approach Here we go. > anova(fit.0, fit.1) Analysis of Variance Table Model 1: y ~ 1 Model 2: y ~ x Res.Df RSS Df Sum of Sq F Pr(>F) *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Note that we get exactly the same result for the model comparison as we got when we ran anova on just the fit.1 object. James H. Steiger (Vanderbilt University) 20 / 30

21 ANOVA as Dummy Variable Regression ANOVA as Dummy Variable Regression Suppose we have 3 groups, and we want to test the null hypothesis that all 3 come from populations with the same mean. A side assumption is that all groups have the same variance, and that the population distributions are normal. The alternative hypothesis is that at least one of the groups has a mean that is different from the others. Suppose that we want to test this hypothesis with some artificial data. In Group 1, the scores are 1,2,3. In Group 2, the scores are 4,5,6. In Group 3, they are 7,8,9. How can we set up a regression model corresponding to the null model? James H. Steiger (Vanderbilt University) 21 / 30

22 ANOVA as Dummy Variable Regression ANOVA as Dummy Variable Regression The Null Model Actually, such a model is very simple to specify, providing we learn a couple of simple tricks. First, instead of conceptualizing our scores as 3 columns with 3 numbers in each column, imagine them as stacked in a single vector of 9 scores, representing 9 observations from the variable y. Our null model is simply y i = β 0 + e i (2) Think about it. If all 3 population means are equal to a common value, then all 9 scores represent random variation around a single value β 0. James H. Steiger (Vanderbilt University) 22 / 30

23 ANOVA as Dummy Variable Regression ANOVA as Dummy Variable Regression The Null Model So in R, we could fit the model as follows. > y <- 1:9 > model.0 <- lm(y ~ 1) We want to compare this model against a model that allows each group to have its own mean. How do we do that? The answer is to create dummy predictors. Let s see how that is done. James H. Steiger (Vanderbilt University) 23 / 30

24 ANOVA as Dummy Variable Regression ANOVA as Dummy Variable Regression The Alternative Model Remember, our baseline model includes an intercept. Let s consider why R signifies and intercept with a 1. The model can be rewritten as y i = β 0 + e i y i = β 0 One + e i where One is a dummy variable that always takes on the value 1. Since every variable has a 1 for the (implicit) intercept variable, we need to allow Groups 1 and 2 to vary from the value β 0 in order for them to be modeled as having different means from Group 3. James H. Steiger (Vanderbilt University) 24 / 30

25 ANOVA as Dummy Variable Regression ANOVA as Dummy Variable Regression The Alternative Model That is easy to do. Simply create two more dummy variables called Group1 and Group2. Group1 takes on the value 1 for any observation in Group 1, but it takes on the value 0 otherwise. Group2 takes on the value 2 for any observation in Group 2, but takes on the value 0 otherwise. > Group1 <- c(1, 1, 1, 0, 0, 0, 0, 0, 0) > Group2 <- c(0, 0, 0, 1, 1, 1, 0, 0, 0) Our non-null model is then We fit this model in R as y i = β 0 + β 1 Group 1 + β 2 Group 2 + e i > model.1 <- lm(y ~ 1 + Group1 + Group2) James H. Steiger (Vanderbilt University) 25 / 30

26 ANOVA as Dummy Variable Regression ANOVA as Dummy Variable Regression The Alternative Model To test the null hypothesis of equal means against the alternative that the means may not be equal, we compare the two models. > anova(model.0, model.1) Analysis of Variance Table Model 1: y ~ 1 Model 2: y ~ 1 + Group1 + Group2 Res.Df RSS Df Sum of Sq F Pr(>F) ** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Notice that the value of the F -statistic is 27.00, so the model that allows the group means to each be different is significantly better than the model that forces them all to be the same. Since our null model only had an intercept, we would get the identical result running the anova James H. Steiger (Vanderbilt University) 26 / 30

27 ANOVA as Dummy Variable Regression Using Factor Variables A couple of observations are in order. First, we would get the same F -statistic had we chosen our dummy variables to be Group2 and Group3 (or Group1 and Group3) instead of Group1 and Group2. Second, although this is straightforward, it is tedious. R developers have automated the whole process through the use of factor variables. A factor variable contains codes for the various groups. If you include a factor variable in a regression formula, R automatically substitutes dummy variables for it. Let s create a factor variable called Group. > Group <- factor(c(1, 1, 1, 2, 2, 2, 3, 3, 3)) James H. Steiger (Vanderbilt University) 27 / 30

28 ANOVA as Dummy Variable Regression Using Factor Variables Now, we ll fit a model with the Group variable as the only predictor. > anova.model <- lm(y ~ Group) > anova(anova.model) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) Group *** Residuals Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Since the factor variable includes two dummy predictors (implicitly), the anova command compares a model with both predictors against a model with just the intercept. James H. Steiger (Vanderbilt University) 28 / 30

29 ANOVA as Dummy Variable Regression Using Factor Variables > summary(anova.model) Call: lm(formula = y ~ Group) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * Group * Group *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1 on 6 degrees of freedom Multiple R-squared: 0.9,Adjusted R-squared: F-statistic: 27 on 2 and 6 DF, p-value: James H. Steiger (Vanderbilt University) 29 / 30

30 ANOVA as Dummy Variable Regression Using Factor Variables Note that the factor variable decided to create dummy variables for Group2 and Group3, rather than Group1 and Group2. In our data, the cell means were 2,5, and 8. Note how the intercept and the coefficients for Groups 2 and 3 reproduce those means. Scores in Group 1 are reproduced only with the intercept (2) and error, so they have a mean of 2. Scores in Group 2 are reproduced with the intercept (2) plus the coefficient for Group 2 (i.e., 3) plus error, so they are estimated to have a population mean of 5, and so on. James H. Steiger (Vanderbilt University) 30 / 30

### Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

### We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

### Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance 14 November 2007 1 Confidence intervals and hypothesis testing for linear regression Just as there was

### Comparing Nested Models

Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller

### ANOVA. February 12, 2015

ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

### SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

### Lecture 5 Hypothesis Testing in Multiple Linear Regression

Lecture 5 Hypothesis Testing in Multiple Linear Regression BIOST 515 January 20, 2004 Types of tests 1 Overall test Test for addition of a single variable Test for addition of a group of variables Overall

### Using R for Linear Regression

Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

### Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly

### Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

### Testing for Lack of Fit

Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

### Correlation and Simple Linear Regression

Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

### Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything

### Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

In this lab we will look at how R can eliminate most of the annoying calculations involved in (a) using Chi-Squared tests to check for homogeneity in two-way tables of catagorical data and (b) computing

### Regression step-by-step using Microsoft Excel

Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### N-Way Analysis of Variance

N-Way Analysis of Variance 1 Introduction A good example when to use a n-way ANOVA is for a factorial design. A factorial design is an efficient way to conduct an experiment. Each observation has data

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

### Unit 2 Regression and Correlation Practice Problems. SOLUTIONS Version R

PubHlth 640. Regression and Correlation Page 1 of 0 Unit Regression and Correlation Practice Problems SOLUTIONS Version R 1. A regression analysis of measurements of a dependent variable Y on an independent

### KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

### E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### 1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Introduction to Stata

Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the mid-range of how easy it is to use. Other options include SPSS,

### Psychology 205: Research Methods in Psychology

Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### Bivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2

Bivariate Analysis Variable 2 LEVELS >2 LEVELS COTIUOUS Correlation Used when you measure two continuous variables. Variable 2 2 LEVELS X 2 >2 LEVELS X 2 COTIUOUS t-test X 2 X 2 AOVA (F-test) t-test AOVA

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### Multiple Linear Regression. Multiple linear regression is the extension of simple linear regression to the case of two or more independent variables.

1 Multiple Linear Regression Basic Concepts Multiple linear regression is the extension of simple linear regression to the case of two or more independent variables. In simple linear regression, we had

### And sample sizes > tapply(count, spray, length) A B C D E F And a boxplot: > boxplot(count ~ spray) How does the data look?

ANOVA in R 1-Way ANOVA We re going to use a data set called InsectSprays. 6 different insect sprays (1 Independent Variable with 6 levels) were tested to see if there was a difference in the number of

### Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### MULTIPLE REGRESSION WITH CATEGORICAL DATA

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

### Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### t-tests and F-tests in regression

t-tests and F-tests in regression Johan A. Elkink University College Dublin 5 April 2012 Johan A. Elkink (UCD) t and F-tests 5 April 2012 1 / 25 Outline 1 Simple linear regression Model Variance and R

### 1. Complete the sentence with the correct word or phrase. 2. Fill in blanks in a source table with the correct formuli for df, MS, and F.

Final Exam 1. Complete the sentence with the correct word or phrase. 2. Fill in blanks in a source table with the correct formuli for df, MS, and F. 3. Identify the graphic form and nature of the source

### DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

### One-Way Analysis of Variance

One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### General Regression Formulae ) (N-2) (1 - r 2 YX

General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Beta-hat: β = r YX (1 Standard

### Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

### Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### Regression Analysis. Data Calculations Output

Regression Analysis In an attempt to find answers to questions such as those posed above, empirical labour economists use a useful tool called regression analysis. Regression analysis is essentially a

### Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang

Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang School of Computing Sciences, UEA University of East Anglia Brief Introduction to R R is a free open source statistics and mathematical

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

### Regression in Stata. Alicia Doyle Lynch Harvard-MIT Data Center (HMDC)

Regression in Stata Alicia Doyle Lynch Harvard-MIT Data Center (HMDC) Documents for Today Find class materials at: http://libraries.mit.edu/guides/subjects/data/ training/workshops.html Several formats

### Week 5: Multiple Linear Regression

BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

### Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### 1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

### EPS 625 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM

EPS 6 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM ANCOVA One Continuous Dependent Variable (DVD Rating) Interest Rating in DVD One Categorical/Discrete Independent Variable

### Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

### 2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or

Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.\$ and Sales \$: 1. Prepare a scatter plot of these data. The scatter plots for Adv.\$ versus Sales, and Month versus

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

### Multiple Hypothesis Testing: The F-test

Multiple Hypothesis Testing: The F-test Matt Blackwell December 3, 2008 1 A bit of review When moving into the matrix version of linear regression, it is easy to lose sight of the big picture and get lost

### Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

### MIXED MODEL ANALYSIS USING R

Research Methods Group MIXED MODEL ANALYSIS USING R Using Case Study 4 from the BIOMETRICS & RESEARCH METHODS TEACHING RESOURCE BY Stephen Mbunzi & Sonal Nagda www.ilri.org/rmg www.worldagroforestrycentre.org/rmg

### Chapter 9. Section Correlation

Chapter 9 Section 9.1 - Correlation Objectives: Introduce linear correlation, independent and dependent variables, and the types of correlation Find a correlation coefficient Test a population correlation

### Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

### Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

### 1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

### Regression and Correlation

Regression and Correlation Topics Covered: Dependent and independent variables. Scatter diagram. Correlation coefficient. Linear Regression line. by Dr.I.Namestnikova 1 Introduction Regression analysis

### Statistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen!

Information Science, Groningen j.nerbonne@rug.nl Slides improved a lot by Harmut Fitz, Groningen! March 24, 2010 Correlation and regression We often wish to compare two different variables Examples: compare

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

### n + n log(2π) + n log(rss/n)

There is a discrepancy in R output from the functions step, AIC, and BIC over how to compute the AIC. The discrepancy is not very important, because it involves a difference of a constant factor that cancels

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### Chapter 6: t test for dependent samples

Chapter 6: t test for dependent samples ****This chapter corresponds to chapter 11 of your book ( t(ea) for Two (Again) ). What it is: The t test for dependent samples is used to determine whether the

### Simple Linear Regression, Scatterplots, and Bivariate Correlation

1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

### How to calculate an ANOVA table

How to calculate an ANOVA table Calculations by Hand We look at the following example: Let us say we measure the height of some plants under the effect of different fertilizers. Treatment Measures Mean

### 2 Sample t-test (unequal sample sizes and unequal variances)

Variations of the t-test: Sample tail Sample t-test (unequal sample sizes and unequal variances) Like the last example, below we have ceramic sherd thickness measurements (in cm) of two samples representing