# Lecture 7 Linear Regression Diagnostics

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Lecture 7 Linear Regression Diagnostics BIOST 515 January 27, 2004 BIOST 515, Lecture 6

2 Major assumptions 1. The relationship between the outcomes and the predictors is (approximately) linear. 2. The error term ɛ has zero mean. 3. The error term ɛ has constant variance. 4. The errors are uncorrelated. 5. The errors are normally distributed or we have an adequate sample size to rely on large sample theory. We should always check fitted models to make sure that these assumptions have not been violated. BIOST 515, Lecture 6 1

3 Departures from the underlying assumptions cannot be detected using any of the summary statistics we ve examined so far such as the t or F statistics or R 2. In fact, tests based on these statistics may lead to incorrect inference since they are based on many of the assumptions above. BIOST 515, Lecture 6 2

4 Residual analysis The diagnostic methods we ll be exploring are based primarily on the residuals. Recall, the residual is defined as where e i = y i ŷ i, i = 1,..., n, ŷ = X ˆβ. If the model is appropriate, it is reasonable to expect the residuals to exhibit properties that agree with the stated assumptions. BIOST 515, Lecture 6 3

5 Characteristics of residuals The mean of the {e i } is 0: ē = 1 n n i=1 e i = 0. The estimate of the population variance computed from the sample of the n residuals is S 2 = 1 n p 1 n i=1 e 2 i which is the residual mean square, MSE = SSE/(n p 1). BIOST 515, Lecture 6 4

6 The {e i } are not independent random variables. In general, if the number of residuals (n) is large relative to the number of independent variables (p), the dependency can be ignored for all practical purposes in an analysis of residuals. BIOST 515, Lecture 6 5

7 Methods for standardizing residuals Standardized residuals Studentized residuals Jackknife residuals BIOST 515, Lecture 6 6

8 Standardized residuals An obvious choice for scaling residuals is to divide them by their estimated standard error. The quantity z i = e i MSE is called a standardized residual. Based on the linear regression assumptions, we might expect the z i s to resemble a sample from a N(0, 1) distribution. BIOST 515, Lecture 6 7

9 Studentized residuals Using MSE as the variance of the ith residual e i is only an approximation. We can improve the residual scaling by dividing e i by the standard deviation of the ith residual. We can show that the covariance matrix of the residuals is var(e) = σ 2 (I H). Recall H = X(X X) 1 X is the hat matrix. The variance of the ith residual is var(e i ) = σ 2 (1 h i ), where h i is the ith element on the diagonal of the hat matrix and 0 h i 1. BIOST 515, Lecture 6 8

10 The quantity r i = e i MSE(1 hi ) is called a studentized residual and approximately follows a t distribution with n p 1 degrees of freedom (assuming the assumptions stated at the beginning of lecture are satisfied). Studentized residuals have a mean near 0 and a variance, 1 n p 1 n i=1 r 2 i, that is slightly larger than 1. In large data sets, the standardized and studentized residuals should not differ dramatically. BIOST 515, Lecture 6 9

11 The quantity r ( i) = r i s MSE MSE ( i) = Jackknife residuals e i q MSE ( i) (1 h i ) = r i s (n p 1) 1 (n p 1) r 2 i is called a jackknife residual (or R-Student residual). MSE ( i) is the residual variance computed with the ith observation deleted. Jackknife residuals have a mean near 0 and a variance 1 n r( i) 2 (n p 1) 1 that is slightly greater than 1. Jackknife residuals are usually the preferred residual for regression diagnostics. i=1 BIOST 515, Lecture 6 10

12 How to use residuals for diagnostics? at Residual analysis is usually done graphically. We may look Quantile plots: to assess normality Scatterplots: to assess model assumptions, such as constant variance and linearity, and to identify potential outliers Histograms, stem and leaf diagrams and boxplots BIOST 515, Lecture 6 11

13 Quantile-quantile plots Quantile-quantile plots can be useful for comparing two samples to determine if they arise from the same distribution. Similarly, we can compare quantiles of a sample to the expected quantiles if the sample came from some distribution F for a visual assessment of whether the sample arises from F. In linear regression, this can help us determine the normality of the residuals (if we have relied on an assumption of normality). To construct a quantile-quantile plot for the residuals, we plot the quantiles of the residuals against the theorized quantiles if the residuals arose from a normal distribution. If the residuals come from a normal distribution the plot should resemble a straight line. A straight line connecting the 1st and 3rd quartiles is often added to the plot to aid in visual assessment. BIOST 515, Lecture 6 12

14 Samples from N(0, 1) distribution Normal Q Q Plot Theoretical Quantiles Sample Quantiles BIOST 515, Lecture 6 13

15 Samples from a skewed distribution Normal Q Q Plot Theoretical Quantiles Residuals Histogram of y Residuals Density BIOST 515, Lecture 6 14

16 Samples from a heavy-tailed distribution Normal Q Q Plot Residuals Theoretical Quantiles Histogram of y Density Residuals BIOST 515, Lecture 6 15

17 Samples from a light-tailed distribution Normal Q Q Plot Theoretical Quantiles Residuals Histogram of y Residuals Density BIOST 515, Lecture 6 16

18 Scatterplots Another useful aid for inspection is a scatterplot of the residuals against the fitted values and/or the predictors. These plots can help us identify: Non-constant variance Violation of the assumption of linearity Potential outliers BIOST 515, Lecture 6 17

19 Satisfactory residual plot y i ^ e i BIOST 515, Lecture 6 18

20 Non constant variance y i ^ e i BIOST 515, Lecture 6 19

21 Non constant variance y i ^ e i BIOST 515, Lecture 6 20

22 Example Suppose the true relationship between a predictor, x, and an outcome, y is but we fit the model E(y i ) = 1 + 2x i 0.25x 2 i, y i = β 0 + β 1 x i + ɛ i. Can we diagnose this with residual plots? BIOST 515, Lecture 6 21

23 y i = β 0 + β 1 x i + ɛ i Scatterplot of x vs. y x y BIOST 515, Lecture 6 22

24 Estimate Std. Error t value Pr(> t ) (Intercept) x Normal Q Q Plot Theoretical Quantiles Sample Quantiles BIOST 515, Lecture 6 23

25 Fitted values versus residuals y i ^ e i BIOST 515, Lecture 6 24

26 Next, we fit y i = β 0 + β 1 x i + β 2 x 2 i + ɛ i. Estimate Std. Error t value Pr(> t ) (Intercept) x x Normal Q Q Plot Theoretical Quantiles Sample Quantiles BIOST 515, Lecture 6 25

27 Fitted values versus residuals y i ^ e i BIOST 515, Lecture 6 26

28 What if we have more than one predictor and only one is misspecified? So, the true model is and we fit E[y i ] = 1 + 3w i + 2x i 0.25x 2 i y i = β 0 + β 1 w i + β 2 x i + ɛ i. How can we diagnose model misspecification with residual plots in this case? BIOST 515, Lecture 6 27

29 y w x BIOST 515, Lecture 6 28

30 Estimate Std. Error t value Pr(> t ) (Intercept) x w Normal Q Q Plot Theoretical Quantiles Sample Quantiles BIOST 515, Lecture 6 29

31 y i ^ e i BIOST 515, Lecture 6 30

32 x res BIOST 515, Lecture 6 31

33 w res BIOST 515, Lecture 6 32

34 y i = β 0 + β 1 w i + β 2 x i + β 3 x 2 i ɛ i Estimate Std. Error t value Pr(> t ) (Intercept) x x w Normal Q Q Plot Theoretical Quantiles Sample Quantiles BIOST 515, Lecture 6 33

35 y i ^ e i BIOST 515, Lecture 6 34

36 Partial regression plots A variation of the plot of residuals versus predictors Allows us to study the marginal relationship of a regressor given the other variables that are in the model. Also called the added variable plot or the adjusted variable plot BIOST 515, Lecture 6 35

37 For example, suppose we are considering the model y i = β 0 + β 1 x 1 + β 2 x 2 + ɛ, and we are concerned about the correct specification of the relationship between x 1 and y 1. To create a partial regression plot, we 1. Regress y on x 2 and obtain the fitted values and residuals ŷ i (x 2 ) = ˆθ 0 + ˆθ 1 x i2 e i (y x 2 ) = y i ŷ i (x 2 ), i = 1,..., n 2. Regress x 1 on x 2 and calculate the residuals xˆ i1 (x 2 ) = ˆα 0 + ˆα 1 x i2 BIOST 515, Lecture 6 36

38 e i (x 1 x 2 ) = x i1 xˆ i1 (x 2 ), i = 1,..., n 3. Plot the y residuals e i (y x 2 ) against the x 1 residuals e i (x 1 x 2 ). BIOST 515, Lecture 6 37

39 Partial regression plot from previous example x y BIOST 515, Lecture 6 38

40 Comments on partial regression plots Use with caution - they only suggest possible relationships between the predictor and the response. In general, they will not detect interactions between regressors. The presence of strong multicollinearity can cause partial regression plots to give incorrect information. BIOST 515, Lecture 6 39

41 Next Variance stabilizing transformations Identifying outliers and influential observations. BIOST 515, Lecture 6 40

### Running head: ASSUMPTIONS IN MULTIPLE REGRESSION 1. Assumptions in Multiple Regression: A Tutorial. Dianne L. Ballance ID#

Running head: ASSUMPTIONS IN MULTIPLE REGRESSION 1 Assumptions in Multiple Regression: A Tutorial Dianne L. Ballance ID#00939966 University of Calgary APSY 607 ASSUMPTIONS IN MULTIPLE REGRESSION 2 Assumptions

### Lecture 5 Hypothesis Testing in Multiple Linear Regression

Lecture 5 Hypothesis Testing in Multiple Linear Regression BIOST 515 January 20, 2004 Types of tests 1 Overall test Test for addition of a single variable Test for addition of a group of variables Overall

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Prediction and Confidence Intervals in Regression

Fall Semester, 2001 Statistics 621 Lecture 3 Robert Stine 1 Prediction and Confidence Intervals in Regression Preliminaries Teaching assistants See them in Room 3009 SH-DH. Hours are detailed in the syllabus.

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### MTH 140 Statistics Videos

MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

### Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Full Factorial Design of Experiments

Full Factorial Design of Experiments 0 Module Objectives Module Objectives By the end of this module, the participant will: Generate a full factorial design Look for factor interactions Develop coded orthogonal

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

Lecture 11: Outliers and Influential data Regression III: Advanced Methods William G. Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Outlying Observations: Why pay attention?

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

### Multiple Regression: What Is It?

Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### AP Statistics Semester Exam Review Chapters 1-3

AP Statistics Semester Exam Review Chapters 1-3 1. Here are the IQ test scores of 10 randomly chosen fifth-grade students: 145 139 126 122 125 130 96 110 118 118 To make a stemplot of these scores, you

### Notes on Applied Linear Regression

Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

### Week 5: Multiple Linear Regression

BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Lecture 10: Other Continuous Distributions and Probability Plots

Lecture 10: Other Continuous Distributions and Probability Plots Devore: Section 4.4-4.6 Page 1 Gamma Distribution Gamma function is a natural extension of the factorial For any α > 0, Γ(α) = 0 x α 1 e

### Lecture 2. Summarizing the Sample

Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### 10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

### Statistics 101 Homework 2

Statistics 101 Homework 2 Solution Reading: January 23 January 25 Chapter 4 January 28 Chapter 5 Assignment: 1. As part of a physiology study participants had their heart rate (beats per minute) taken

### Yiming Peng, Department of Statistics. February 12, 2013

Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

### Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

### Treatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics

Treatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics Topics covered: Parameters and statistics Sample mean and sample standard deviation Order statistics and

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

### Regression Models 1. May 10, 2012 Regression Models

Regression Models 1 May 10, 2012 Regression Models Perhaps the most used statistical methodology in practice is regression analysis. Basically the idea of regression is to relate one variable, called a

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Parametric versus Semi/nonparametric Regression Models

Parametric versus Semi/nonparametric Regression Models Hamdy F. F. Mahmoud Virginia Polytechnic Institute and State University Department of Statistics LISA short course series- July 23, 2014 Hamdy Mahmoud

### SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### Introductory Statistics Notes

Introductory Statistics Notes Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Teaching Multivariate Analysis to Business-Major Students

Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis

### An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R)

DSC 2003 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc-2003/ An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R) Ernst

### Price Elasticity of Demand DRAFT Working Paper. Kentucky Energy and Environment Cabinet. July 25, 2011

Price Elasticity of Demand DRAFT Working Paper Department for Energy Development & Independence Under Dr. Arne Bathke and Aron Patrick: Shaoceng Wei, Yang Luo, Edward Roualdes July 25, 2011 Overview 1

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### FEV1 (litres) Figure 1: Models for gas consumption and lung capacity

Simple Linear Regression: Reliability of predictions Richard Buxton. 2008. 1 Introduction We often use regression models to make predictions. In Figure 1 (a), we ve fitted a model relating a household

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Statistical Models in R

Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

### Geostatistics Exploratory Analysis

Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

### Model Diagnostics for Regression

Model Diagnostics for Regression After fitting a regression model it is important to determine whether all the necessary model assumptions are valid before performing inference. If there are any violations,

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Part II. Multiple Linear Regression

Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations

### How Far is too Far? Statistical Outlier Detection

How Far is too Far? Statistical Outlier Detection Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 30-325-329 Outline What is an Outlier, and Why are

### REGRESSION LINES IN STATA

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### Using JMP with a Specific

1 Using JMP with a Specific Example of Regression Ying Liu 10/21/ 2009 Objectives 2 Exploratory data analysis Simple liner regression Polynomial regression How to fit a multiple regression model How to

### UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates 1. (a) (i) µ µ (ii) σ σ n is exactly Normally distributed. (c) (i) is approximately Normally

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

### Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS

Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### Chapter 3 Descriptive Statistics: Numerical Measures. Learning objectives

Chapter 3 Descriptive Statistics: Numerical Measures Slide 1 Learning objectives 1. Single variable Part I (Basic) 1.1. How to calculate and use the measures of location 1.. How to calculate and use the

### Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

### 1 Measures for location and dispersion of a sample

Statistical Geophysics WS 2008/09 7..2008 Christian Heumann und Helmut Küchenhoff Measures for location and dispersion of a sample Measures for location and dispersion of a sample In the following: Variable

### Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

### The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

### ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

### Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

### Lecture 5: Model Checking. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Lecture 5: Model Checking Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Regression Diagnostics Unusual and Influential Data Outliers Leverage Influence Heterosckedasticity Non-constant

### Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

### t-tests and F-tests in regression

t-tests and F-tests in regression Johan A. Elkink University College Dublin 5 April 2012 Johan A. Elkink (UCD) t and F-tests 5 April 2012 1 / 25 Outline 1 Simple linear regression Model Variance and R

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### STAT 350 Practice Final Exam Solution (Spring 2015)

PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

### 1. The standardised parameters are given below. Remember to use the population rather than sample standard deviation.

Kapitel 5 5.1. 1. The standardised parameters are given below. Remember to use the population rather than sample standard deviation. The graph of cross-validated error versus component number is presented

### Fixed vs. Random Effects

Statistics 203: Introduction to Regression and Analysis of Variance Fixed vs. Random Effects Jonathan Taylor - p. 1/19 Today s class Implications for Random effects. One-way random effects ANOVA. Two-way

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### Moderator and Mediator Analysis

Moderator and Mediator Analysis Seminar General Statistics Marijtje van Duijn October 8, Overview What is moderation and mediation? What is their relation to statistical concepts? Example(s) October 8,

### Lecture 4 Linear random coefficients models

Lecture 4 Linear random coefficients models Rats example 30 young rats, weights measured weekly for five weeks Dependent variable (Y ij ) is weight for rat i at week j Data: Multilevel: weights (observations)

### Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### Multiple Regression - Selecting the Best Equation An Example Techniques for Selecting the "Best" Regression Equation

Multiple Regression - Selecting the Best Equation When fitting a multiple linear regression model, a researcher will likely include independent variables that are not important in predicting the dependent

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association