# , then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients (

Save this PDF as:

Size: px
Start display at page:

Download ", then the form of the model is given by: which comprises a deterministic component involving the three regression coefficients ("

## Transcription

1 Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we have two predictor variables, X1 and X 2, then the form of the model is given by: Y 0 1X1 2 X 2 e which comprises a deterministic component involving the three regression coefficients ( 0, 1 and 2 ) and a random component involving the residual (error) term, e. Note that the predictor variables can be either continuous or categorical. In the case of the latter these variables need to be coded as dummy variables (not considered in this tutorial). The response variable must be measured on a continuous scale. The residual terms represent the difference between the predicted and observed values of individuals. They are assumed to be independently and identically distributed normally with 2 zero mean and variance, and account for natural variability as well as maybe measurement error. For the two (continuous) predictor example the deterministic component is in the form of a plane which provides the predicted (mean/expected) response for given predictor variable value combinations. Thus if we want the expected value for the specific values x 1 and x 2, then this is obtained from the orthogonal projection from the point (x 1,x 2 ) in the X1 X2 plane to the expected value plane in the 3D space. The resulting Y value is the expected value from this explanatory variable combination.

2 Observed values for this combination of explanatory variables are drawn from a normal 2 distribution with variance centred on the expected value point: Our data should thus appear to be a collection of points that are randomly scattered with constant variability around the plane. The multiple regression model fitting process takes such data and estimates the regression coefficients ( 0, 1 and 2 ) that yield the plane that has best fit amongst all planes.

3 Model assumptions The assumptions build on those of simple linear regression: Ratio of cases to explanatory variables. Invariably this relates to research design. The minimum requirement is to have at least five times more cases than explanatory variables. If the response variable is skewed then this number may be substantially more. Outliers. These can have considerable impact upon the regression solution and their inclusion needs to be carefully considered. Checking for extreme values should form part of the initial data screening process and should be performed on both the response and explanatory variables. Univariate outliers can simply be identified by considering the distributions of individual variables say by using boxplots. Multivariate outliers can be detected from residual scatterplots. Multicollinearity and singularity. Multicollinearity exists when there are high correlations among the explanatory variables. Singularity exists when there is perfect correlation between explanatory variables. The presence of either affect the intepretation of the explanatory variables effect on the response variable. Also it can lead to numerical problems in finding the regression solution. The presence of multicollinearity can be detected by examining the correlation matrix (say r= 0.9 and above). If there is a pair of variables that appear to be highly multicollinear then only one should be used in the regression. Note; some context dependent thought has to be given as to which one to retain! Normality, linearity, homoscedasticity and independence of residuals. The first three of these assumptions are checked using residual diagnostic plots after having fit a multiple regression model. The independence of residuals is usually assumed to be true if we have indeed collected a random sample from the relvant population.

4 Example Suppose we are interested in predicting the current market value of houses in a particular city. We have collected data that comprises a random sample of 30 house current values ( 1,000s) together with their corresponding living area (100 ft 2 ) and the distance in miles from the city centre. Can we build a multiple regression model that can successfully predict house values using the living area and distance variables?

5 If we consider the relationship between value and area it appears that there is a very significant positive correlation between the two variables (i.e. value increases with area). Fitting a simple linear regression model indicates that 61.4% of the variability in the values is explained by the area. If we consider the relationship between value and distance it appears that there is a significant negative correlation between the two variables (i.e. value decreases with distance). Fitting a simple linear regression model indicates that 17.7% of the variability in the values is explained by the distance from the city centre.

6 Thus individually either variable is useful for predicting a house value. We shall now consider the fitting of a multiple regression model that uses both variables for predictions. First of all we need to address the assumptions that we check before fitting a multiple regression model. Ratio of cases to explanatory variables. We have 30 cases and 2 explanatory variables. Looking at the boxplot of the response variable value does not overtly worry us that there is a skewness problem. Thus as we have 15 times more cases than explanatory variables we should have an adequate number of cases. Outliers The boxplot above of the response variable does not identify any outliers and neither do the two boxplots below of the explanatory variables:

7 Multicollinearity and singularity Examining the correlation between the two explanatory variables reveals that there is not a significant correlation between them. Thus we have no concerns over multicollinearity. Independence of residuals Our data has come from a random sample and thus the observations should be independent and hence the residuals should be too. It appears that our pre model fitting assumption checks are satisfactory, and so we can now consider the multiple regression output.

8 The unstandardized coefficients are the coefficients of the estimated regression model. Thus the expected value of a house is given by: value distance area. Recalling that value is measured in 1,000s and area is in units of 100 ft 2, we can interpret the coefficients (and associated 95% confidence intervals) as follows. For each one mile increase in distance from the city centre, the expected change in house value is - 9,456 (- 11,456, - 7,476). Thus house values drop by 9,456 for each one mile from the city centre. For each 100 ft 2 increase in area, the expected house value is expected to increase by 12,548 ( 10,870, 14,226). The significance tests of the two explanatory variable coefficients indicate that both of the explanatory variables are significant (p<.001) for predicting house values. If however either had a p-value >.05, then we could infer that the offending variable(s) are not significant for predicting house values. Note that the intercept here gives the expected value of 80,121 for what would be a house of no area in the exact middle of the city. It is debateable whether this makes any sense and can be dismissed by the fact that these values of the explanatory variables are an extrapolation from what we have observed. The standardized coefficients are appropriate in multiple regression when we have explanatory variables that are measured on different units (which is the case here). These coefficients are obtained from regression after the explanatory variables are all standardized. The idea is that the coefficients of explanatory variables can be more easily compared with each other as they are then on the same scale. Here we see that the area standardised coefficient is larger in absolute value than that of distance: thus we can conclude that a change in area has a greater relative effect on house value than does a change in distance from the city centre.

9 Examining the model summary table: The multiple correlation coefficient, R, indicates that we have a very high correlation of.957 between our response variable and the two explanatory variables. From the R squared value (coefficient of determination) we can see that the model fits the data reasonably well; 91.5% of the variation in the house values can be explained by the fitted model together with the house area and distance from the city centre values. The adjusted R square value is attempts to correct for this. Here we can see it has slightly reduced the estimated proportion. If you have a small data set it may be worth reporting the adjusted R squared value. The standard error of the estimate is the estimate of the standard deviation of the error term of the model,. This gives us an idea of the expected variability of predictions and is used in calculation of confidence intervals and significance tests.

10 The remaining output is concerned with checking the model assumptions of normality, linearity and homoscedasticity of the residuals. Residuals are the differences between the observed and predicted responses. The residual scatterplots allow you to check: Normality: the residuals should be normally distributed about the predicted responses; Linearity: the residuals should have a straight line relationship with the predicted responses; Homoscedasticity: the variance of the residuals about predicted responses should be the same for all predicted responses. The above table summarises the predicted values and residuals in unstandarised and standardised forms. It is usual practice to consider standardised residuals due to their ease of interpretation. For instance outliers (observations that do not appear to fit the model that well) can be identified as those observations with standardised residual values above 3.3 (or less than -3.3). From the above we can see that we do not appear to have any outliers. The above plot is a check on normality; the histogram should appear normal; a fitted normal distribution aids us in our consideration. Serious departures would suggest that normality assumption is not met. Here we have a histogram that does look reasonably normal given that we have only 30 data points and thus we have no real cause for concern.

11 The above plot is a check on normality; the plotted points should follow the straight line. Serious departures would suggest that normality assumption is not met. Here we have no major cause for concern. The above scatterplot of standardised residuals against predicted values should be a random pattern centred around the line of zero standard residual value. The points should have the same dispersion about this line over the predicted value range. From the above we can see no clear relationship between the residuals and the predicted values which is consistent with the assumption of linearity. Thus we are happy that the assumptions of the model have been met and thus would be confident about any inference/predictions that we gain from the model.

12 Predictions In order to get an expected house value for particular distance and area values we can use the fitted equation. For example, for a house that is 5 miles from the city centre and is 1,400 ft 2 : i.e. 208,513. value Alternatively, we could let a statistics program do the work and calculate confidence or prediction intervals at the same time. For instance, when requesting a predicted value in SPSS we can also obtain the following: the predicted values for the various explanatory variable combinations together with the associated standard errors of the predictions; 95% CI for the expected response; 95% CI for individual predicted responses. Returning to our example we get the following: the expected house value is 208,512 (s.e. = 2,067.6); we are 95% certain that interval from 204,269 to 212,754 covers the unknown expected house value; we are 95% certain that interval from 193,051 to 223,972 covers the range of predicted individual house value observations.

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Spearman s correlation

Spearman s correlation Introduction Before learning about Spearman s correllation it is important to understand Pearson s correlation which is a statistical measure of the strength of a linear relationship

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### Pearson s correlation

Pearson s correlation Introduction Often several quantitative variables are measured on each member of a sample. If we consider a pair of such variables, it is frequently of interest to establish if there

### Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

### Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Running head: ASSUMPTIONS IN MULTIPLE REGRESSION 1. Assumptions in Multiple Regression: A Tutorial. Dianne L. Ballance ID#

Running head: ASSUMPTIONS IN MULTIPLE REGRESSION 1 Assumptions in Multiple Regression: A Tutorial Dianne L. Ballance ID#00939966 University of Calgary APSY 607 ASSUMPTIONS IN MULTIPLE REGRESSION 2 Assumptions

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Canonical Correlation

Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### , has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### MTH 140 Statistics Videos

MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### SPSS Explore procedure

SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

### Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

### Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

### Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Factors affecting online sales

Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

### The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### STAT 350 Practice Final Exam Solution (Spring 2015)

PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

### 17.0 Linear Regression

17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Regression analysis in practice with GRETL

Regression analysis in practice with GRETL Prerequisites You will need the GNU econometrics software GRETL installed on your computer (http://gretl.sourceforge.net/), together with the sample files that

### Correlation and Simple Linear Regression

Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

### Correlation and Regression Analysis: SPSS

Correlation and Regression Analysis: SPSS Bivariate Analysis: Cyberloafing Predicted from Personality and Age These days many employees, during work hours, spend time on the Internet doing personal things,

### a) Find the five point summary for the home runs of the National League teams. b) What is the mean number of home runs by the American League teams?

1. Phone surveys are sometimes used to rate TV shows. Such a survey records several variables listed below. Which ones of them are categorical and which are quantitative? - the number of people watching

### Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

### Linear Regression in SPSS

Linear Regression in SPSS Data: mangunkill.sav Goals: Examine relation between number of handguns registered (nhandgun) and number of man killed (mankill) checking Predict number of man killed using number

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

### AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

### Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

### DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

### Multiple Regression: What Is It?

Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

### Chapter 10 - Practice Problems 1

Chapter 10 - Practice Problems 1 1. A researcher is interested in determining if one could predict the score on a statistics exam from the amount of time spent studying for the exam. In this study, the

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner)

Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner) The Exam The AP Stat exam has 2 sections that take 90 minutes each. The first section is 40 multiple choice questions, and the second

### Descriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination

Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg - ) Chromium (mg kg - ) Copper (mg kg - ) Lead (mg kg - ) Mercury

### The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces Or: How I Learned to Stop Worrying and Love the Ball Comment [DP1]: Titles, headings, and figure/table captions

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### Prediction and Confidence Intervals in Regression

Fall Semester, 2001 Statistics 621 Lecture 3 Robert Stine 1 Prediction and Confidence Intervals in Regression Preliminaries Teaching assistants See them in Room 3009 SH-DH. Hours are detailed in the syllabus.

### Teaching Multivariate Analysis to Business-Major Students

Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### 1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

### The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Chapter 8 Graphs and Functions:

Chapter 8 Graphs and Functions: Cartesian axes, coordinates and points 8.1 Pictorially we plot points and graphs in a plane (flat space) using a set of Cartesian axes traditionally called the x and y axes

### Confidence Intervals for One Standard Deviation Using Standard Deviation

Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

### Collinearity of independent variables. Collinearity is a condition in which some of the independent variables are highly correlated.

Collinearity of independent variables Collinearity is a condition in which some of the independent variables are highly correlated. Why is this a problem? Collinearity tends to inflate the variance of

### Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

### SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

### Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

### Module 3 - Multiple Linear Regressions

Module 3 - Multiple Linear Regressions Objectives Understand the strength of Multiple linear regression (MLR) in untangling cause and effect relationships Understand how MLR can answer substantive questions

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

### Simple Linear Regression

STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals (9.3) Conditions for inference (9.1) Want More Stats??? If you have enjoyed learning how to analyze

### DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity