# 12/31/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand when to use multiple Understand the multiple equation and what the coefficients represent Understand different methods of Simultaneous Hierarchical Stepwise Understand how to conduct a multiple using SPSS Understand how to interpret multiple Understand the assumptions of multiple and how to test them Linear Regression is a model to predict the value of one variable from another Multiple Regression is a natural extension of this model: We use it to predict values of an outcome from several predictors It is a hypothetical model of the relationship between several variables The outcome variable should be continuous and measured at the interval or ratio scale 1

2 With multiple the relationship is described using a variation of the equation of a straight line Simple Linear Regression: Y = A + BX + e Multiple Regression: Y = A + B 1 X 1 + B 2 X B n X n + e A is the Y-intercept The Y-intercept is the value of the Y variable when all Xs= 0 This is the point at which the line (simple linear ) or plane (multiple ) crosses the Y-axis (vertical) B 1 is the coefficient for variable 1 B 2 is the coefficient for variable 2 B n is the coefficient for n th variable An important idea in multiple is statistical controlwhich refers to mathematical operations that remove the effects of other variables from the relationship between a predictor variable and an outcome variable Terms that are used to refer to this statistical control include Partialling Controlling for Residualizing Holding constant The resulting association between a predictor variable and the outcome variable (after controlling for the other predictors) is referred to as unique association 2

3 A record company boss was interested in predicting record sales from advertising Data 200 different album releases Outcome variable: Sales (CDs and Downloads) in the week after release Predictor variables The amount (in s) spent promoting the record before release (from our linear lecture) Number of plays on the radio (new variable) B Advertising Y-Intercept B Airplay 3

4 Simultaneous: All predictors are entered simultaneously Hierarchical (also referred to as Sequential): Researcher decides the order in which variables are entered into the model Stepwise: Predictors are selected using their semi-partial correlation with the outcome These methods differ in how they allocate shared (or ping) variance among the predictors in the model Predictor Variable 1 Predictor Variable 2 Predictor Variable 3 A. Overlapping variance sections B. Allocation of ping variance in standard multiple C. Allocation of ping variance in hierarchical D. Allocation of ping variance in stepwise Predictor Variable 1 Predictor Variable 1 for this Predictor Variable 2 for this Predictor Variable 2 Predictor Variable 3 A. Overlapping variance sections B. Allocation of ping variance in standard multiple C. Allocation of ping variance in hierarchical D. Allocation of ping variance in stepwise Predictor Variable 3 for this None of the predictors get credit for this None of the predictors get credit for this 4

5 Predictor Variable 1 Predictor Variable 2 Predictor Variable 3 A. Overlapping variance sections B. Allocation of ping variance in standard multiple C. Allocation of ping variance in hierarchical D. Allocation of ping variance in stepwise Predictor Variable 1 for this Predictor Variable 2 for this Predictor Variable 3 for this Predictor Variable 1 Predictor Variable 2 Predictor Variable 3 A. Overlapping variance sections B. Allocation of ping variance in standard multiple C. Allocation of ping variance in hierarchical D. Allocation of ping variance in stepwise Predictor Variable 1 for this Predictor Variable 3 for this Predictor Variable 2 for this All variables are entered into the model on the same step This is the simplest type of multiple Each predictor variable is assessed as if it had been entered into the model after all other predictors Each predictor is evaluated in terms of what it adds to the prediction of the outcome variable that is different from the predictability afforded by all the other predictors (i.e., how much unique variance in the outcome variable does it explain?) 5

6 Predictor Variable 1 Predictor Variable 1 for this Predictor Variable 2 for this Predictor Variable 2 Predictor Variable 3 A. Overlapping variance sections B. Allocation of ping variance in standard multiple C. Allocation of ping variance in hierarchical D. Allocation of ping variance in stepwise Predictor Variable 3 for this None of the predictors get credit for this None of the predictors get credit for this The researcher decides the order in which the predictors are entered into the model For example, the researcher may enter known predictors (based on past research) into the model first with new predictors being entered on subsequent steps (or blocks) of the model It is the best method of : Based on theory testing It allows you to see the unique predictive influence of a new variable on the outcome because known predictors are held constant in the model The results that you obtain depend on the variables that you enter into the model so it is important to have good theoretical reasons for including a particular variable Weakness of hierarchical multiple : It is heavily reliant on the researcher knowing what he or she is doing! 6

7 Predictor Variable 1 Predictor Variable 2 Predictor Variable 3 A. Overlapping variance sections B. Allocation of ping variance in standard multiple C. Allocation of ping variance in hierarchical D. Allocation of ping variance in stepwise Predictor Variable 1 for this Predictor Variable 2 for this Predictor Variable 3 for this Variables are entered into the model based on mathematical criteria This is a controversial approach and it is not one that I would recommend you use in most cases Computer selects variables in steps There are actually three types of stepwise Forward selection: Model starts out empty and predictor variables are added one at a time provided they meet statistical criteria for entry (e.g., statistical significance). Variables remain in the model once they are added Backward deletion: Model starts out with all predictors and variables are deleted one a time based on a statistical criteria (e.g., statistical significance) Stepwise: Compromise between forward selection and backward deletion. Model starts out empty and predictor variables are added one at a time based on a statistical criteria but they can also be deleted at a later step if they fail to continue to display adequate predictive ability Step 1 SPSS looks for the predictor that can explain the most variance in the outcome variable 7

8 Predictor Variable 2: Previous Exam Variance explained (1.7%) Criterion Variable: Exam Performance Predictor Variable 3: Difficulty Variance explained (1.3%) Variance accounted for by Revision Time (33.1%) Predictor Variable 1: Revision Time Step 2: Having selected the 1 st predictor, a second one is chosen from the remaining predictors Thesemi-partial correlationis used as the criterion for selection The variable with the strongest semipartial correlation is selected on Step 2 The same process is repeated until the remaining predictor variables fail to reach the statistical criteria (e.g., statistical significance) 8

9 Partial correlation: Measures the relationship between two variables, controlling for the effect that a third variable has on them both A semi-partial correlation: Measures the relationship between two variables controlling for the effect that a third variable has on only one of the others In stepwise, the semi-partial correlation controls for the between the previous predictor variable(s) and the new predictor variable without removing the variability in the outcome variable that is accounted for by the previous predictor variables Partial Correlation Semi-Partial Correlation The semi-partial correlation Measures the relationship between a predictor and the outcome, controlling for the relationship between that predictor and any others already in the model It measures the unique contributionof a predictor to explaining the variance of the outcome 9

10 Rely on a mathematical criterion Variable selection may depend upon only slight differences in the semi-partial correlation These slight numerical differences can lead to major theoretical differences These models are often distorted by random sampling variation and there is a danger of over-fitting (including too many predictor variables that do not explain meaningful variability in the outcome) or under-fitting (leaving out important predictor variables) Should be used only for exploratory purposes If stepwise methods are used, then you should crossvalidate your results using a second sample Predictor Variable 1 Predictor Variable 2 Predictor Variable 3 A. Overlapping variance sections B. Allocation of ping variance in standard multiple C. Allocation of ping variance in hierarchical D. Allocation of ping variance in stepwise Predictor Variable 1 for this Predictor Variable 3 for this Predictor Variable 2 for this There is no exact number required for multiple You need enough participants to generate a stable correlation matrix One suggestion is to use the following formula to estimate the number of participants you need for multiple m (where m is the number of predictors in your model) If there is a lot of noise in the data you may need more than that If little noise you can get by with less If you are interested in generalizing your model, then you may need at least twice that number of participants 10

11 11

12 Step 1 Step 2 It gives us the estimated coefficients of the model. Test statistics and their significance are produced for each coefficient: a t-test is used to see whether the coefficient is significantly different from 0 12

13 Produces confidence intervals for each of the unstandardized coefficients. CIs can be useful for assessing the likely value of the coefficient in the population It provides a statistical test of the model s ability to predict the outcome variable (the F-test) as well as the corresponding R, R 2, and adjusted R 2 Displays the change in R 2 resulting from the inclusion of additional predictors on subsequent steps. This measure is extremely useful for determining the contribution of new predictors to explaining variance in the outcome 13

14 Displays a table of the means, standard deviations, and number of observations for all of the variables in the model. It will also display a correlation matrix for all of the variables which can be very useful in determining whether there is multicollinearity Displays the partial correlation between each predictor and the outcome controlling for all other predictors in the model. It also produces semi-partial correlations between each predictor and the outcome (i.e., the unique relationship between a predictor and the outcome) This will display some collinearity statistics including VIF and tolerance (we will cover these later) 14

15 This option produces the Durbin-Watson test statistic which tests the assumption of independent errors. SPSS does not provide the significance value so you would have to decide for yourself whether the value is different enough from 2 to be of concern This option lists the observed values of the outcome, the predicted values of the outcome, the difference between these values (the residual), and the standardized difference. The default is to list these values for those more than 3 standard deviations but you should consider changing this to 2 standard deviations R (multiple correlation) The correlation between the observed values of the outcome, and the values predicted by the model R 2 (multiple correlation squared) The proportion of variance in the outcome that is accounted for by the predictor variables in the model Adjusted R 2 An estimate of R 2 in the population rather than this particular sample (it captures shrinkage) It has been criticized because it tells us nothing about how well the model would perform using a different set of data 15

16 This is the R for Model 1 which reflects the degree of association between our first model (advertising budget only) and the criterion variable This is R 2 for Model 1 which represents the amount of variance in the criterion variable that is explained by the model (SS M ) relative to how much variance there was to explain in the first place (SS T ). To express this value as a percentage, you should multiply this value by 100 (which will tell you the percentage of variation in the criterion variable that can be explained by the model). This is the Adjusted R 2 for the first model which corrects R 2 for the number of predictor variables included in the model. It will always be less than or equal to R 2. It is a more conservative estimate of model fit because it penalizes researchers for including predictor variables that are not strongly associated with the criterion variable 16

17 This is the standard error of the estimate for the first model which is a measure of the accuracy of predictions. The larger the standard error of the estimate, the more error in our model The change statistics are not very useful for the first step because it is comparing this model (i.e., one predictor) with an empty model (i.e., no predictors) which is going to be the same as the R 2 This is the R for Model 2 which reflects the degree of association between this model (advertising budget, attractiveness of band, and airplay on radio) and the criterion variable 17

18 This is R 2 for Model 2 which represents the amount of variance in the criterion variable that is explained by the model (SS M ) relative to how much variance there was to explain in the first place (SS T ). To express this value as a percentage, you should multiply this value by 100 (which will tell you the percentage of variation in the criterion variable that can be explained by the model). This is the Adjusted R 2 for Model 2 which corrects R 2 for the number of predictor variables included in the model. It will always be less than or equal to R 2. It is a more conservative estimate of model fit because it penalizes researchers for including predictor variables that are not strongly associated with the criterion variable This is the standard error of the estimate for Model 2 which is a measure of the accuracy of predictions. The larger the standard error of the estimate, the more error in our model 18

19 The change statistics tell us whether the addition of the other predictors in Model 2 significantly improved the model fit. A significant F Change value means that there has been a significant improvement in model fit (i.e., more variance in the outcome variable has been explained by Model 2 than Model 1) SS M MS M SS R MS R SS T The ANOVA for Model 1 tells us whether the model, overall, results in a significantly good degree of prediction of the outcome variable. SS M MS M SS R MS R SS T The ANOVA for Model 2 tells us whether the model, overall, results in a significantly good degree of prediction of the outcome variable. However, the ANOVA does not tell us about the individual contribution of variables in the model 19

20 The F-test Examines whether the variance explained by the model (SS M ) is significantly greater than the error within the model (SS R ) It tells us whether using the model is significantly better at predicting values of the outcome than simply using their means This is the Y-intercept for Model 1 (i.e., the place where the line intercepts the Y-axis). This means that when 0 in advertising is spent, the model predicts that 134,140 records will be sold (unit of measurement is thousands of records ) This t-test compares the value of the Y-intercept with 0. If it is significant, then it means that the value of the Y-intercept ( in this example) is significantly different from 0 20

21 This is the unstandardized slope or gradient coefficient. It is the change in the outcome associated with a one unit change in the predictor variable. This means that for every 1 point the advertising budget is increased (unit of measurement is thousands of pounds) the model predicts an increase of 96 records being sold (unit of measurement for record sales was thousands of records) This is the standardized coefficient (β). It represents the strength of the association between the predictor and the criterion variable. If there is more than one predictor, then β may exceed +/-1 This t-test compares the magnitude of the standardized coefficient (β) with 0. If it is significant, then it means that the value of β(0.578 in this example) is significantly different from 0 (i.e., the predictor variable is significantly associated with the criterion variable) 21

22 This is the Y-intercept for Model 2 (i.e., the place where the plane intercepts the Y-axis). This means that when 0 in advertising is spent, the record receives 0 airplay, and the attractiveness of the band is 0, then the model predicts that -26,613 records will be sold (unit of measurement is thousands of records ) This t-test compares the value of the Y-intercept for Model 2 with 0. If it is significant, then it means that the value of the Y-intercept ( in this example) is significantly different from 0 This is the information for advertising budget when airplay and attractiveness are also included in the model 22

23 This is the unstandardized slope or gradient coefficient for airplay when advertising budget and attractiveness are also included in the model. It is the change in the outcome associated with a one unit change in the predictor variable. This means that for every additional airplay the model predicts an increase of 3,367 records being sold (unit of measurement for record sales was thousands of records) This is the unstandardized slope or gradient coefficient for attractiveness of the band when advertising budget and airplay are also included in the model. It is the change in the outcome associated with a one unit change in the predictor variable. This means that for every additional point on attractiveness the model predicts an increase of 11,086 records being sold (unit of measurement for record sales was thousands of records) This is the standardized coefficient (β). It represents the strength of the association between airplay and record sales. It is standardized so it can be compared with other βs 23

24 This is the standardized coefficient (β). It represents the strength of the association between the attractiveness of the band and record sales. It is standardized so it can be compared with other βs This t-test compares the magnitude of the standardized coefficient (β) with 0. If it is significant, then it means that the value of β(0.512 in this example) is significantly different from 0 (i.e., airplay is significantly associated with record sales) This t-test compares the magnitude of the standardized coefficient (β) with 0. If it is significant, then it means that the value of β(0.192 in this example) is significantly different from 0 (i.e., attractiveness of the band is significantly associated with record sales) 24

25 Unstandardized coefficients (Bs): the change in the outcome variable associated with a single-unit change in the predictor variable Standardized coefficients (βs): tell us the same thing as the unstandardized coefficient but it is expressed in terms of standard deviations (i.e., the number of standard deviations you would expect the outcome variable to change in accordance with a one standard deviation change in the predictor variable) There are two ways to assess the accuracy of the model in the sample Residual Statistics Standardized Residuals In an average sample, 95% of standardized residuals should lie between ±2 and 99% of standardized residuals should lie between ±2.5 Outliers: Any case for which the absolute value of the standardized residual is 3 or more, is likely to be an outlier Influential cases Mahalanobis distance: Measures the distance of each case from the mean of the predictor variable (absolute values greater than 15 may be problematic but see Barnett & Lewis, 1978) Cook s distance: Measures the influence of a single case on the model as a whole (absolute values greater than 1 may be cause for concern according to Weisberg, 1982) Can the be generalized to other data? Randomly separate a data set into two halves Estimate equation with one half Apply it to the other half and see if it fits Collect a second sample Estimate equation with first sample Apply it to the second sample and see if it predicts 25

26 When we run a, we hope to be able to generalize the sample model to the entire population To do this, several assumptions must be met Violating these assumptions prevents us from being able to adequately generalize our results beyond the present sample Variable Type: Outcome must be continuous Predictors can be continuous or dichotomous Non-Zero Variance: Predictors must not have zero variance Linearity: The relationship we model is, in reality, linear We will talk about curvilinear associations later Independence: All values of the outcome should come from a different person Normality: The outcome variable must be normally distributed (no skewness or outliers) Normally distributed predictors can make interpretation easier but this is not required No multicollinearity Predictors must not be highly correlated because this can cause problems estimating the coefficients No multivariate outliers Leverage an observation with an extreme value on a predictor variable is said to have high leverage Influence individual observations that have an undue impact on the coefficients Mahalanobis distance and Cook s distance Homoscedasticity (homogeneity of variance) For each value of the predictors the variance of the error term should be constant Independent errors For any pair of observations, the error terms should be uncorrelated Normally-distributed errors Model specification The model should include relevant variables and exclude irrelevant variables The model may only include linear terms when the relationships between the predictors and the outcome variable are non-linear 26

27 Multicollinearity exists if predictors are highly correlated The best prediction occurs when the predictors are moderately independent of each other, but each is highly correlated with the outcome variable Problems stemming from multicollinearity Increases standard errors of coefficients (creates untrustworthy coefficients) It limits the size of R because the variables may be accounting for the same variance in the outcome (i.e., they are not accounting for unique variance) Difficult to distinguish between the importance of predictors because of their (they appear to be interchangeable with each other) This assumption can be checked with collinearity diagnostics Look at correlation matrix of predictors for values above.80 Variance Inflation Factor (VIF): Indicates whether the variable has a strong linear relationship with the other predictors Tolerance = 1 R 2 Tolerance should be more than 0.2 or it may be indicative of problems (Menard, 1995) VIF should be less than 10 or it may be indicative of problems (Myers, 1990) 27

28 Homoscedacity/Independence of Errors: Plot ZRESID against ZPRED Normality of Errors: Normal probability plot Good Assumptions Met Bad Heteroscedasticity 28

29 Residuals should be normally distributed Good Bad The straight line represents a normal distribution so deviations from that tell us that the residuals for our data are not normally distributed Good Bad 29

### Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

### Running head: ASSUMPTIONS IN MULTIPLE REGRESSION 1. Assumptions in Multiple Regression: A Tutorial. Dianne L. Ballance ID#

Running head: ASSUMPTIONS IN MULTIPLE REGRESSION 1 Assumptions in Multiple Regression: A Tutorial Dianne L. Ballance ID#00939966 University of Calgary APSY 607 ASSUMPTIONS IN MULTIPLE REGRESSION 2 Assumptions

### 1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

### Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Regression Analysis (Spring, 2000)

Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity

### The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

### Multiple Regression Using SPSS

Multiple Regression Using SPSS The following sections have been adapted from Field (2009) Chapter 7. These sections have been edited down considerably and I suggest (especially if you re confused) that

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

### Multiple Regression: What Is It?

Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### Correlation and Regression Analysis: SPSS

Correlation and Regression Analysis: SPSS Bivariate Analysis: Cyberloafing Predicted from Personality and Age These days many employees, during work hours, spend time on the Internet doing personal things,

### Regression-Based Tests for Moderation

Regression-Based Tests for Moderation Brian K. Miller, Ph.D. 1 Presentation Objectives 1. Differentiate between mediation & moderation 2. Differentiate between hierarchical and stepwise regression 3. Run

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### SPSS Explore procedure

SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

### Collinearity of independent variables. Collinearity is a condition in which some of the independent variables are highly correlated.

Collinearity of independent variables Collinearity is a condition in which some of the independent variables are highly correlated. Why is this a problem? Collinearity tends to inflate the variance of

### Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Moderator and Mediator Analysis

Moderator and Mediator Analysis Seminar General Statistics Marijtje van Duijn October 8, Overview What is moderation and mediation? What is their relation to statistical concepts? Example(s) October 8,

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### Prediction and Confidence Intervals in Regression

Fall Semester, 2001 Statistics 621 Lecture 3 Robert Stine 1 Prediction and Confidence Intervals in Regression Preliminaries Teaching assistants See them in Room 3009 SH-DH. Hours are detailed in the syllabus.

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Chapter 14: Analyzing Relationships Between Variables

Chapter Outlines for: Frey, L., Botan, C., & Kreps, G. (1999). Investigating communication: An introduction to research methods. (2nd ed.) Boston: Allyn & Bacon. Chapter 14: Analyzing Relationships Between

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

### SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 Statistical techniques to be covered Explore relationships among variables Correlation Regression/Multiple regression Logistic regression Factor analysis

### The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Introducing the Linear Model

Introducing the Linear Model What is Correlational Research? Correlational designs are when many variables are measured simultaneously but unlike in an experiment none of them are manipulated. When we

### 17.0 Linear Regression

17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was

### POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### Bivariate Regression Analysis. The beginning of many types of regression

Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression

### Module 3 - Multiple Linear Regressions

Module 3 - Multiple Linear Regressions Objectives Understand the strength of Multiple linear regression (MLR) in untangling cause and effect relationships Understand how MLR can answer substantive questions

### Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

### 1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

### Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables

### Introduction to Linear Regression

14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

### Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

### ANOVA Analysis of Variance

ANOVA Analysis of Variance What is ANOVA and why do we use it? Can test hypotheses about mean differences between more than 2 samples. Can also make inferences about the effects of several different IVs,

### Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### Some Critical Information about SOME Statistical Tests and Measures of Correlation/Association

Some Critical Information about SOME Statistical Tests and Measures of Correlation/Association This information is adapted from and draws heavily on: Sheskin, David J. 2000. Handbook of Parametric and

### AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

### Introduction to Linear Regression

14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Using R for Linear Regression

Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

### This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

### An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### , has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

### Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination

Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used

### SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### Moderation. Moderation

Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

### The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

### FEV1 (litres) Figure 1: Models for gas consumption and lung capacity

Simple Linear Regression: Reliability of predictions Richard Buxton. 2008. 1 Introduction We often use regression models to make predictions. In Figure 1 (a), we ve fitted a model relating a household

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### 4. Multiple Regression in Practice

30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look

### USING SAS/STAT SOFTWARE'S REG PROCEDURE TO DEVELOP SALES TAX AUDIT SELECTION MODELS

USING SAS/STAT SOFTWARE'S REG PROCEDURE TO DEVELOP SALES TAX AUDIT SELECTION MODELS Kirk L. Johnson, Tennessee Department of Revenue Richard W. Kulp, David Lipscomb College INTRODUCTION The Tennessee Department

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates 1. (a) (i) µ µ (ii) σ σ n is exactly Normally distributed. (c) (i) is approximately Normally

### Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

### Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### Canonical Correlation

Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present

### Chapter 9. Section Correlation

Chapter 9 Section 9.1 - Correlation Objectives: Introduce linear correlation, independent and dependent variables, and the types of correlation Find a correlation coefficient Test a population correlation

### Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

### EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

### Homework 11. Part 1. Name: Score: / null

Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

### Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

### Data analysis process

Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

### A synonym is a word that has the same or almost the same definition of

Slope-Intercept Form Determining the Rate of Change and y-intercept Learning Goals In this lesson, you will: Graph lines using the slope and y-intercept. Calculate the y-intercept of a line when given

### Chapter 5 Estimating Demand Functions

Chapter 5 Estimating Demand Functions 1 Why do you need statistics and regression analysis? Ability to read market research papers Analyze your own data in a simple way Assist you in pricing and marketing

### Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

### TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

UNIVERSITY OF DUBLIN TRINITY COLLEGE Faculty of Engineering, Mathematics and Science School of Computer Science & Statistics BA (Mod) Enter Course Title Trinity Term 2013 Junior/Senior Sophister ST7002

### Module 2 - Simple Linear Regression

Module 2 - Simple Linear Regression OBJECTIVES 1. Know how to graph and explore associations in data 2. Understand the basis of statistical summaries of association (e.g. variance, covariance, Pearson's

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to