DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Size: px
Start display at page:

Download "DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9"

Transcription

1 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression

2 So far in this course, we have looked at statistical models for continuous response (or dependent) variables). We have dealt with: Response variable continuous Explanatory variables 1 continuous Method Regression continuous 1 categorical continuous 2 categorical dummy variable regression / ANOVA table independent samples t-test Dummy variable regression/ ANOVA table. Test of interaction posible 2

3 We have used similar techniques in each of these cases. We have fitted a model with one or more explanatory variables, looked at their parameter estimates and used the t-values to determine whether terms are significant. We continue with continuous covariates, but now look at the situation where there is one categorical and one continuous variable. This type of model has a special name analysis of covariance. We will continue to develop the idea of an interaction term, introducing it into the model and testing its significance. We will, however, be using the same type of approach. 3

4 An example of ANCOVA analysis of covariance BLOCK DESIGNS A sample of 24 children was randomly chosen from the 5 th grade of a state primary school in Sydney. Each child was assigned to one of two experimental groups. The children had to complete four of the 3x3 squared designs in the block design subtest of the Weschler Intelligence Scale for Children (WISC). Children in the first group were told to start with a row of three blocks (the ROW group), and children in the second group were told to start with a corner of three blocks (the CORNER group). The total time to complete the task was then measured. Before the experiment began, the extent of each child s field dependence was measured by using the Embedded Figures Test (EFT), which measures the extent to which subjects can abstract the logical structure of a problem from its context. High scores correspond to high field dependence. 1 We therefore have one continuous explanatory variable EFT, and one categorical variable GROUP. An analysis with mixed types of independent variables is sometimes known as Analysis of Covaraince (ANCOVA) 1 The data is reported in Aitkin,M. Anderson, D., Francis, B. and Hinde, J.(1987) Statistical modelling in GLIM, Oxford University Press 4

5 Row group: time: eft: Corner group: time: eft: The data is stored in blockdesign.dat We read in the data to the dataframe block. block=read.table( blockdesign.dat, header=t) block$groupf=factor(block$group) We define two subsets of data a subset for the row group (group==0) and a subset for the corner group (group==1) row=block[block.group==0, ] corner=block[block.group==1, ] The object block can be thought of as a matrix we are selecting all rows for which group=1, and selecting all columns. names(row); summary(row) We can first plot the data. 5

6 plot(block$eft,block$time,type="n", xlab="eft score", ylab="time to complete") points(row$eft,row$time,pch=1,col=2) points(corner$eft,corner$time,pch=2,col=4) legend(20, 750, c("row group","corner group"), pch=c(1,2), col=c(2,4)) time to complete row group corner group We can see that there appears to be a strong relationship between TIME and EFT, and perhaps the corner group is taking less time. We explore this using statistical modelling. We can consider a number of different models. EFT score 6

7 If we fit a model with no explanatory variables, then we have a null model this says that there is no relationship between the time taken and either EFT or the experimental GROUP. m1=lm(block$time~1) anova(m1) Analysis of Variance Table Response: block$time Df Sum Sq Mean Sq F value Pr(>F) Residuals > summary(m1) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-13 *** The model is ESTIMATED TIME=384.3 The t-test on the estimate of the intercept is simply indicating whether there is evidence that the intercept estimate is different from zero. 7

8 lines(block$eft, m1$fitted.values) title("null model - mean only") null model - mean only time to complete row group corner group Sums of squares = Degrees of freedom = EFT score 8

9 We can then add either EFT or GROUP as explanatory variables. Suppose we add EFT first of all. The model is then EFT there is no group effect, but there is a linear relationship between TIME and EFT. m2=lm(time~ eft, data=block) anova(m2) Response: time Df Sum Sq Mean Sq F value Pr(>F) eft ** Residuals > summary(m2) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-06 *** eft ** The model now becomes ESTIMATED TIME= EFT Thus, for every unit increase of EFT, the TIME to complete the task increases by 2 seconds. Under this mode, the experimental group makes no difference. 9

10 repeating earlier graphics commands, then lines(block$eft, m2$fitted.values) title("model: EFT") model: EFT time to complete row group corner group SS= df= EFT score 10

11 We can then add in the group effect. This main effects model is the following model: TIME= B0 +B1 EFT+B2 GROUP If GROUP is defined to be GROUP=0 (Row group) GROUP=1 (Corner group) then this model becomes: TIME=B0+ B1 EFT (Row group) TIME=B0+B2 + B1 EFT (Corner group) The estimate of the group effect B2 is simply the difference between the fitted line for the row group and the fitted line for the corner group. m3=lm(time~ eft+groupf, data=block) anova(m3) Response: time Df Sum Sq Mean Sq F value Pr(>F) eft ** groupf Residuals

12 summary(m3) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-06 *** eft ** groupf The estimate of the GROUP effect is the corner group is estimated to be performing over 44 seconds faster than the row group. However, the GROUP effect is not significant. The slope of the two lines are the same. The estimates are ESTIMATED TIME = EFT (Row group) = EFT (Corner group) 12

13 repeating earlier graphics commands, then lines( row$eft,m3$fitted.values[block$groupf==0],col=2, lty=1) lines(corner$eft,m3$fitted.values[block$groupf==1],col=4, lty=2) title("model: eft+group") SS= df=21 The fitted lines of this model are parallel. However, the slope in the row group might be different from the slope of the corner group. 13

14 As with the 2-way ANOVA model last week, this model is known as an interaction model. We want to specify an interaction between groupf and eft- this is saying that the effect of groupf depends on the level of eft. This is a two-way interaction- an interaction between two terms in the model. More complex interaction terms can be fitted. We fit the model using m4=lm(time~ eft+groupf+eft:groupf, data=block) anova(m4) Response: time Df Sum Sq Mean Sq F value Pr(>F) eft ** groupf eft:groupf Residuals

15 summary(m4) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** eft ** groupf eft:groupf We now see the fitted model is ESTIMATED TIME = EFT (Row group) = EFT (Corner group) The slope of the corner group regression line is estimated to be the group also has a higher intercept. The regression lines show an increasing difference between the estimated times as EFT increases. However, this difference is not statistically significant. We display the plot, as before 15

16 lines( row$eft,m4$fitted.values[block$groupf==0],col=2, lty=1) lines(corner$eft,m4$fitted.values[block$groupf==1],col=4, lty=2) title("model: eft+group+eft:group") model: eft+group+eft:group time to complete row group corner group SS= df= EFT score 16

17 Model simplification There are two strategies. AIC method. We have already seen this method. We can construct a table of AIC values for each model and choose the model with the lowest value. model AIC value eft eft+group eft+group+eft:group From this analysis we choose the model eft as being the best. However, the interaction model has an AIC value which is very close to the minimum and might be an alternative. 17

18 Classic backward elimination: When we examine the significance level of the interaction term, we find that the difference in slopes of is not statistically significant from zero (p=0.13). We reject this model in favour of the main effects model already discussed. The main effects model can also be simplified as there is no evidence that the group effect is significantly different from zero (p=0.33). Our final model is one where there is no group difference and with eft the only single predictor of time. In this example, the two methods give the same result. In general, the AIC method is less parsimonious, and more likely to retain terms in the model. Backward elimination at the 5% level (p=0.05) will be more likely to exclude terms and is a more conservative method. 18

19 Constructing an Analysis of Variance table. We can construct an analysis of variance table by examining the Sums of Squares given for each model, and differencing them.. We start at the most complex model, and use the sums of squares form this model as our estimate of residual variation. Thus the residual sums of squares is on 20 df. We now work backwards, removing the most complex terms first.. So the sums of squares due to the interaction is on degrees of freedom. Sums of squares due to SS df SS df EFT GROUP given EFT INTERACTION Sum of Squares df Mean F-value p-value square EFT GROUP given EFT Interaction Residual

20 Multiple linear regression In simple linear regression, we attempt to explain the dependent variable in terms of an intercept and a single independent variable which is continuous. Multiple linear regression extends these techniques to allow for several independent continuous variables. Least squares is again used to determine the unstandardised coefficients (the Bs) in the more general equation: Dependent = Intercept + B1 x Independent + B2 x Independent +... Variable (or B0) Variable 1 Variable 2 20

21 Example: professor s salary data University students often complain that universities reward professors for research but not for teaching, and argue that professors react to this situation by devoting more time and energy to research publications and less time and energy to classroom activities. Professors counter that research and teaching go hand in hand; more research makes better teachers. A US student organization decided to investigate part of the issue. They randomly selected 50 psychology professors in their area. The students recorded the salaries of the professors, their average teaching evaluations, and the total number of articles published in their careers. salary salary (in 1000$) evalulation average teaching evaluation (10 point scale) articles number of articles published All variables can be treated as continuous ( although there is an argument for categorising evaluation what is it?) We therefore use multiple regression to investigate the relationship between SALARY, teaching quality and research output. 21

22 We look at the correlation between the three variables: profs=read.table( profs.dat,header=t) cor(profs) salary evaluation articles salary evaluation articles There is a strong correlation between every pair of variables. Research is associated with good teaching, and both are correlated with salaries. We could plot each variable against each other to obtain a graphical picture of each relationship pair. We fit a linear model to investigate further. m1=lm(salary~evaluation+articles,data=profs) summary(lm) 22

23 Estimate Std. Error t value Pr(> t ) (Intercept) e-07 *** evaluation articles e-10 *** We look at the first part of the output. This is familiar. We see that the variable evaluation (once articles is in the model) is not significant. However the variable articles is significant. This suggests that we can remove articles from the model. The number of articles appears to determine salary, but the teaching evaluation score does not provide an additional component. Salary is not determined by teaching quality. We look to exclude evaluation from the model But first we look at the other part of the output. 23

24 Residual standard error: on 47 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 60.7 on 2 and 47 DF, p-value: 9.453e-14 The Multiple R-squared is a measure of the proportion of variance explained by the model. We can find from anova calculations that the sums of squares from the null ( constant mean) model is sums of squares from the articles+evaluation model is the proportion of variance explained is 1 (2308.8/8272.0) = The adjusted R-squared provides an unbiased estimate of what the proportion of variance explained might be in the population data. 24

25 > m2=lm(salary~articles,data=profs) > summary(m2) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** articles e-15 *** Multiple R-Squared: , Adjusted R-squared: The estimate for articles remains very close to the value in the previous output. The multiple R-squared and adjusted r-squared are close to the previous values. Our final model is estimated salary = x (numberof articles) How well does this model fit? 25

26 plot(profs$articles, profs$salary, xlab="number of articles", ylab="salary") lines(profs$articles, m2$fitted.values) salary The straight line seems to fit the data well, and the observed points are well scattered around the line. The next lecture wil look at diagnostics for regression models and how to test the assumptions of the linear model more carefully. The model says that every increase in one published article is associated with a estimated salary increase of $1,112 dollars number of articles 26

27 Assumptions of linear regression. a) effects are linear each covariate has a linear rather than a non-linear relationship to the dependent variable. b) The residuals have a Normal distribution. c) There is constant variance. The variance does not depend on the covariates or on the mean. d) The observations are independent. Important this is part of the design. In the experiment above, the observations would not be independent if one child copied from another. 27

28 Using SPSS In SPSS, we use the SPSS Linear Regression procedure in ENTER mode (all the independent variables are entered into the model) with SALARY as the Dependent Variable and EVALUATI and ARTICLES as Independent Variables, giving the following (edited) summary results: Model 1 Model Summary Adjusted Std. Error of R R Square R Square the Estimate.849 a a. Predictors: (Constant), articles, evaluati Model 1 (Constant) evaluati articles a. Dependent Variable: salary Unstandardized Coefficients Coefficients a Standardized Coefficients B Std. Error Beta t Sig

29 3D plots A simple 3D scatterplot can be carried out by library(lattice) cloud($profs$salary~profs$articles+profs$evaluation) More complex interactive plots can be carried out if you have the ability to install software. 29

30 Different types of multiple regression The preceding example showed an example of regression for exploration and prediction. We wanted to explore the data set to build a suitable statistical model for salary from the teaching evaluation and research publications. This is akin to separating out the structure of the data from the noise. The regression analysis tells us which independent (or predictor) variables are not needed for the structure- and can therefore be considered as part of the noise. However, another reason to carry out multiple regression is to CONTROL for the effect of other variables. We may be interested in the association between levels of a hormone and an aggression score. However, our data is observational, not experimental. Observational data means that we survey people and then measuring both the hormone level and the aggression score. Thus we cannot randomly assign individuals to low or high hormone levels our data will be unbalanced and will depend on who we survey. Additionally, we might imagine that other variables affect aggression score in particular age and gender. We need to control for the effect of these variables in our analysis. 30

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

Testing for Lack of Fit

Testing for Lack of Fit Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

Psychology 205: Research Methods in Psychology

Psychology 205: Research Methods in Psychology Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

ANOVA. February 12, 2015

ANOVA. February 12, 2015 ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Week 5: Multiple Linear Regression

Week 5: Multiple Linear Regression BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Multiple Regression. Page 24

Multiple Regression. Page 24 Multiple Regression Multiple regression is an extension of simple (bi-variate) regression. The goal of multiple regression is to enable a researcher to assess the relationship between a dependent (predicted)

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch

More information

An analysis method for a quantitative outcome and two categorical explanatory variables.

An analysis method for a quantitative outcome and two categorical explanatory variables. Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

More information

Homework 11. Part 1. Name: Score: / null

Homework 11. Part 1. Name: Score: / null Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

10. Analysis of Longitudinal Studies Repeat-measures analysis

10. Analysis of Longitudinal Studies Repeat-measures analysis Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices: Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

More information

1.1. Simple Regression in Excel (Excel 2010).

1.1. Simple Regression in Excel (Excel 2010). .. Simple Regression in Excel (Excel 200). To get the Data Analysis tool, first click on File > Options > Add-Ins > Go > Select Data Analysis Toolpack & Toolpack VBA. Data Analysis is now available under

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015 Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

More information

Regression Analysis (Spring, 2000)

Regression Analysis (Spring, 2000) Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity

More information

MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM 1986-2010

MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM 1986-2010 Advances in Economics and International Finance AEIF Vol. 1(1), pp. 1-11, December 2014 Available online at http://www.academiaresearch.org Copyright 2014 Academia Research Full Length Research Paper MULTIPLE

More information

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

More information

Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.

Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data. Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

More information

Simple Methods and Procedures Used in Forecasting

Simple Methods and Procedures Used in Forecasting Simple Methods and Procedures Used in Forecasting The project prepared by : Sven Gingelmaier Michael Richter Under direction of the Maria Jadamus-Hacura What Is Forecasting? Prediction of future events

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Main Effects and Interactions

Main Effects and Interactions Main Effects & Interactions page 1 Main Effects and Interactions So far, we ve talked about studies in which there is just one independent variable, such as violence of television program. You might randomly

More information

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0. Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

Two-way ANOVA and ANCOVA

Two-way ANOVA and ANCOVA Two-way ANOVA and ANCOVA In this tutorial we discuss fitting two-way analysis of variance (ANOVA), as well as, analysis of covariance (ANCOVA) models in R. As we fit these models using regression methods

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

N-Way Analysis of Variance

N-Way Analysis of Variance N-Way Analysis of Variance 1 Introduction A good example when to use a n-way ANOVA is for a factorial design. A factorial design is an efficient way to conduct an experiment. Each observation has data

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest Analyzing Intervention Effects: Multilevel & Other Approaches Joop Hox Methodology & Statistics, Utrecht Simplest Intervention Design R X Y E Random assignment Experimental + Control group Analysis: t

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

SPSS-Applications (Data Analysis)

SPSS-Applications (Data Analysis) CORTEX fellows training course, University of Zurich, October 2006 Slide 1 SPSS-Applications (Data Analysis) Dr. Jürg Schwarz, juerg.schwarz@schwarzpartners.ch Program 19. October 2006: Morning Lessons

More information

Introduction to Data Analysis in Hierarchical Linear Models

Introduction to Data Analysis in Hierarchical Linear Models Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480 1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

More information

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases:

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases: Profile Analysis Introduction Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases: ) Comparing the same dependent variables

More information

data visualization and regression

data visualization and regression data visualization and regression Sepal.Length 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 I. setosa I. versicolor I. virginica I. setosa I. versicolor I. virginica Species Species

More information

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information