# The importance of graphing the data: Anscombe s regression examples

Size: px
Start display at page:

Transcription

1 The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC

2 The Objective To demonstrate that good graphs are an essential part of linear regression analysis. B. Weaver, NHRC

3 Not this kind of regression analysis B. Weaver, NHRC

4 This kind of regression analysis B. Weaver, NHRC

5 A very brief primer on simple linear regression B. Weaver, NHRC

6 Simple linear regression A model in which X is used to predict Y. Y is a continuous variable with interval scale properties. In the prototypical case, X is also a continuous variable with interval-scale properties. Example: Y = distance in a 6-minute walk test X = FEV1 B. Weaver, NHRC

7 Back to high school Equation for a straight line Y = bx + a SLOPE INTERCEPT b = slope of the line = the rise over the run a = the value of Y when X = 0 B. Weaver, NHRC

8 Example of a straight line Gym membership Annual fee = \$100 Fee per visit = \$2 Let X = the number of visits to the gym Let Y = the total cost Y = 2X Let X = 200 visits to the gym Total cost = 2(200) = \$500 B. Weaver, NHRC

9 What if the relationship is imperfect? Straight line for a perfect relationship: Y = bx + a Straight line for an imperfect relationship: Y = bx + a Y = bx + a Two different symbols for the predicted value of Y B. Weaver, NHRC

10 R-squared R-squared = the proportion of variability in Y that is accounted for by explanatory variables in the model. For a simple linear regression model (i.e., one predictor variable), R-squared = the proportion of the variability in Y that can be accounted for by the linear relationship between X and Y The adjusted R-squared corrects for upward bias in R-squared B. Weaver, NHRC

11 Anscombe s examples (1973) Frank Anscombe devised 4 sets of X-Y pairs He performed simple linear regression for each data set Here are the results B. Weaver, NHRC

12 Means & Standard Deviations X Y Data Set N Mean SD Mean SD The means and SDs for the 4 data sets are identical to two decimals. B. Weaver, NHRC

13 Correlations between X and Y Data Set Pearson r R-squared Adj. R-sq SE Correlations, R-squared, adjusted R- squared, and standard errors are all identical to two decimals. B. Weaver, NHRC

14 ANOVA Summary Tables Data Set Source SS df MS F p Regression Residual Total Regression Residual Total Regression Residual Total Regression Residual Total B. Weaver, NHRC

15 The Regression Coefficients Data Set B SE t p 95% CI Lower Upper Constant X Constant X Constant X Constant X For all 4 models, Y = 0.5(X) + 3 B. Weaver, NHRC

16 Which Model is Best? Judging by everything we ve just seen, it appears that the models are all equally good But if that were true, I wouldn t be doing this talk! It is well known that good graphs are an essential part of data analysis (Tukey, 1977; Tufte, 1997) Let s look at some graphs that show the relationship between X and Y B. Weaver, NHRC

17 Scatter-plot for Data Set 1 10 data points Influential point Not a good model B. Weaver, NHRC

18 Scatter-plot for Data Set 2 Perfect linear relationship except for one outlier Better model than for Data Set 1, but still not great. B. Weaver, NHRC

19 Scatter-plot for Data Set 3 Wrong model! The relationship between X and Y is curvilinear, not linear! The model should include both X and X 2 as predictors. B. Weaver, NHRC

20 Scatter-plot for Data Set 4 This is a good looking plot. No influential points; straight line provides a good fit. B. Weaver, NHRC

21 Summary The usual summary statistics for the 4 regression models were virtually identical Scatter-plots revealed that only one of the 4 data sets gave us a good model Appropriate graphs are an essential part of data analysis B. Weaver, NHRC

22 What about multivariable models? Scatter-plots are useful for simple linear regression models (i.e., only one predictor variable) But often, we have multiple, or multivariable regression models (i.e., 2 or more predictor variables) In that case, it is more common to assess the fit of the model by looking at residual plots B. Weaver, NHRC

23 What is a residual? In linear regression, a residual is an error in prediction Residual = (Y Y ) = (actual score predicted score) B. Weaver, NHRC

24 Set 1: Scatter-plot vs. Residual Plot Scatter-plot Residual Plot Y Residual X Predicted value of Y B. Weaver, NHRC

25 Set 2: Scatter-plot vs. Residual Plot Scatter-plot Residual Plot Residual Predicted value of Y B. Weaver, NHRC

26 Set 3: Scatter-plot vs. Residual Plot Scatter-plot Residual Plot Residual Predicted value of Y Runs of same-sign residuals B. Weaver, NHRC

27 Set 4: Scatter-plot vs. Residual Plot Scatter-plot Residual Plot Residual Predicted value of Y B. Weaver, NHRC

28 Summary The usual summary statistics for the 4 regression models were virtually identical Scatter-plots revealed that only one of the 4 data sets gave us a good model Residual plots reveal the same thing, and have the advantage of being applicable to multivariable regression models Appropriate graphs are an essential part of data analysis B. Weaver, NHRC

29 Questions? I think you should be more explicit here in step 2. B. Weaver, NHRC

30 References Anscombe FJ. (1973). Graphs in statistical analysis. The American Statistician, 27, Tufte ER. (1997). Visual Explanations, Images and Quantities, Evidence and Narrative (3rd Ed.). Graphics Press: Cheshire. Tukey JW. (1977). Exploratory data analysis. Addison-Wesley: Reading, Mass. B. Weaver, NHRC

31 Extra Slides B. Weaver, NHRC

32 Just as one would expect! The experimentalist comes running excitedly into the theorist's office, waving a graph taken off his latest experiment. "Hmmm," says the theorist, "That's exactly where you'd expect to see that peak. Here's the reason (long logical explanation follows)." In the middle of it, the experimentalist says "Wait a minute", studies the chart for a second, and says, "Oops, this is upside down." He fixes it. "Hmmm," says the theorist, "you'd expect to see a dip in exactly that position. Here's the reason...". B. Weaver, NHRC

33 Best-fitting line: Least squares criterion Many lines could be placed on the scatter-plot, but only one of them is considered the best-fitting line. The most common criterion for best-fitting is that the sum of the squared errors in prediction is minimized. This is called the least-squares criterion. B. Weaver, NHRC

34 Illustration of Least Squares Error in prediction B. Weaver, NHRC

35 Illustration of Least Squares Squared error in prediction Error = 0 for this point, so no square Squared error in prediction B. Weaver, NHRC

36 Illustration of Least Squares Sum of squared errors = the sum of the areas of all these squares For any other regression line, the sum of the squared errors would be greater. B. Weaver, NHRC

37 What is a residual plot? Scatter-plot with: X = the fitted (or predicted) value of Y Y = the residual (i.e., the error in prediction) Residuals should be independent of the fitted value of Y There should be no serial correlation in the residuals (e.g., long runs of same-sign residuals) Both of these problems (plus some others) can be detected via residual plots Advantage of residual plots: they can be used in multivariable (i.e., multi-predictor) regression models B. Weaver, NHRC

38 Examples of residual plots Curvilinear relationship Residual Predicted Y Outlier Heteroscedasticity B. Weaver, NHRC

39 Example of a good residual plot B. Weaver, NHRC

40 Example of a zig-zag pattern You do not want to see this kind of zig-zag pattern in the residual plot. B. Weaver, NHRC

41 Simple linear regression & correlation Pearson r = the correlation It measures of the direction and strength of the linear association between X and Y It ranges from -1 to +1 B. Weaver, NHRC

42 Direction of the linear relationship Positive relationship Negative relationship As X increases, Y increases As X increases, Y decreases B. Weaver, NHRC

43 Perfect vs. Imperfect Relationship Perfect relationship Imperfect relationship B. Weaver, NHRC

44 r-squared The square of Pearson r is a measure of how well the regression model fits the observed data It gives the proportion of variability in Y that is accounted for the linear relationship between X and Y. E.g., let r = 0.6 (or -0.6) r 2 = 0.36 So 36% of the variability in the Y-scores is accounted for by the linear relationship between X and Y B. Weaver, NHRC

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Module 7 Test Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. You are given information about a straight line. Use two points to graph the equation.

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

### Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### MTH 140 Statistics Videos

MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### 1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### Outline: Demand Forecasting

Outline: Demand Forecasting Given the limited background from the surveys and that Chapter 7 in the book is complex, we will cover less material. The role of forecasting in the chain Characteristics of

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

### 1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### Scatter Plot, Correlation, and Regression on the TI-83/84

Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page

### STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012) TA: Zhen (Alan) Zhang zhangz19@stt.msu.edu Office hour: (C500 WH) 1:45 2:45PM Tuesday (office tel.: 432-3342) Help-room: (A102 WH) 11:20AM-12:30PM,

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is

### Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

### Section 1.5 Linear Models

Section 1.5 Linear Models Some real-life problems can be modeled using linear equations. Now that we know how to find the slope of a line, the equation of a line, and the point of intersection of two lines,

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

### Florida Math for College Readiness

Core Florida Math for College Readiness Florida Math for College Readiness provides a fourth-year math curriculum focused on developing the mastery of skills identified as critical to postsecondary readiness

### Simple Linear Regression, Scatterplots, and Bivariate Correlation

1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

### Homework 11. Part 1. Name: Score: / null

Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

### TIME SERIES ANALYSIS & FORECASTING

CHAPTER 19 TIME SERIES ANALYSIS & FORECASTING Basic Concepts 1. Time Series Analysis BASIC CONCEPTS AND FORMULA The term Time Series means a set of observations concurring any activity against different

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

### Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

### Introduction to Linear Regression

14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Homework 8 Solutions

Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### Algebra 1 Course Information

Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

### AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

### Multiple Regression: What Is It?

Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

### POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

### MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Introduction to Data Analysis in Hierarchical Linear Models

Introduction to Data Analysis in Hierarchical Linear Models April 20, 2007 Noah Shamosh & Frank Farach Social Sciences StatLab Yale University Scope & Prerequisites Strong applied emphasis Focus on HLM

### Introduction to Linear Regression

14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

### 10. Analysis of Longitudinal Studies Repeat-measures analysis

Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

### Moderation. Moderation

Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

### August 2012 EXAMINATIONS Solution Part I

August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

### Mario Guarracino. Regression

Regression Introduction In the last lesson, we saw how to aggregate data from different sources, identify measures and dimensions, to build data marts for business analysis. Some techniques were introduced

### Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Calibration and Linear Regression Analysis: A Self-Guided Tutorial Part 1 Instrumental Analysis with Excel: The Basics CHM314 Instrumental Analysis Department of Chemistry, University of Toronto Dr. D.

### Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

### Factors affecting online sales

Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

### Testing for Lack of Fit

Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

### Part Three. Cost Behavior Analysis

Part Three Cost Behavior Analysis Cost Behavior Cost behavior is the manner in which a cost changes as some related activity changes An understanding of cost behavior is necessary to plan and control costs

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

### Using Excel for Statistical Analysis

Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure

Business Valuation Review Regression Analysis in Valuation Engagements By: George B. Hawkins, ASA, CFA Introduction Business valuation is as much as art as it is science. Sage advice, however, quantitative

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

### 17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

### Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

### PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

### 1.1. Simple Regression in Excel (Excel 2010).

.. Simple Regression in Excel (Excel 200). To get the Data Analysis tool, first click on File > Options > Add-Ins > Go > Select Data Analysis Toolpack & Toolpack VBA. Data Analysis is now available under

### Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

### Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

### KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

### Curve Fitting. Before You Begin

Curve Fitting Chapter 16: Curve Fitting Before You Begin Selecting the Active Data Plot When performing linear or nonlinear fitting when the graph window is active, you must make the desired data plot

### SPSS Explore procedure

SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

### Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

### Curve Fitting in Microsoft Excel By William Lee

Curve Fitting in Microsoft Excel By William Lee This document is here to guide you through the steps needed to do curve fitting in Microsoft Excel using the least-squares method. In mathematical equations

### Linear Models in STATA and ANOVA

Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

### 1 Simple Linear Regression I Least Squares Estimation

Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and