For more information about how to cite these materials visit

Size: px
Start display at page:

Download "For more information about how to cite these materials visit"

Transcription

1 Author(s): Kerby Shedden, Ph.D., 010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please speak to your physician if you have questions about your medical condition. Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers. 1 / 8

2 Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan December 7, 015 / 8

3 Motivation When working with a linear model with design matrix X, the conventional linear modeling assumptions can be expressed as E[Y X ] col(x ) and var[y X ] = σ I. Least squares point estimates and inferences depend on these assumptions approximately holding. Inferences for small sample sizes may also depend on the distribution of Y E[Y X ] being approximately multivariate Gaussian, but for moderate or large sample sizes this is not critical. Regression diagnostics are approaches for assessing how well the key linear modeling assumptions hold in a particular data set. 3 / 8

4 Residuals Linear models can be expressed in two equivalent ways: Expression based on moments: E[Y X ] col(x ) and var[y X ] = σ I. Expression based on an additive error model: Y = X β + ɛ, where ɛ is random with E[ɛ X ] = 0, and cov[ɛ X ] I. Since the residuals can be viewed as predictions of the errors, it turns out that regression model diagnostics can often be developed using the residuals. Recall that the residuals can be expressed R (I P)Y where P is the projection onto col(x ). / 8

5 Residuals The residuals have two key mathematical properties regardless of the correctness of the model specification: The residuals sum to zero, since (I P)1 = 0 and hence 1 R = 1 (I P)Y = 0. The residuals and fitted values are orthogonal (they have zero sample covariance): ĉov(r, Ŷ X) (R R) Ŷ = R Ŷ = Y (I P)PY = 0. These properties hold as long as an intercept is included in the model (so P 1 = 1, where 1 is a vector of 1 s). 5 / 8

6 Residuals If the basic linear model assumptions hold, these two properties have population counterparts: The expected value of each residual is zero: E[R X] = (I P)E[Y X] = 0 R n. The population covariance between any residual and any fitted value is zero: cov(r, Ŷ X) = ERŶ = (I P)cov(Y X)P = σ (I P)P = 0 R n n. 6 / 8

7 Residuals If the model is correctly specified, there is a simple formula for the variances and covariances of the residuals: cov(r X) = (I P) (EYY ) (I P) = (I P) ( X ββ X + σ I ) (I P) = σ (I P). If the model is correctly specified, the standardized residuals and the Studentized residuals Y i Ŷi ˆσ Y i Ŷi ˆσ(1 P ii ) 1/ approximately have mean zero and variance one. 7 / 8

8 External standardization of residuals Let ˆσ i be the estimate of σ obtained by fitting a regression model omitting the i th case. It turns out that we can calculate this value without actually refitting the model: ˆσ i = (n p 1)ˆσ r i /(1 P ii) n p where r i is the residual for the model fit to all data. The externally standardized residuals are Y i Ŷi ˆσ i, The externally Studentized residuals are Y i Ŷ i ˆσ i (1 P ii ) 1/. 8 / 8

9 Outliers and masking In some settings, residuals can be used to identify outliers. However, in a small data set, a large outlier will increase the value of ˆσ, and hence may mask itself. Externally Studentized residuals solve the problem of a single large outlier masking itself. But masking may still occur if multiple large outliers are present. 9 / 8

10 Outliers and masking If multiple large outliers may be present we may use alternate estimates of the scale parameter σ: Interquartile range (IQR): this is the difference between the 75 th percentile and the 5 th percentile of the distribution or data. The IQR of the standard normal distribution is 1.35, so IQR/1.35 can be used to estimate σ. Median Absolute Deviation (MAD): this is the median value of the absolute deviations from the median of the distribution or data, i.e. median( Z median(z) ). The MAD of the standard normal distribution is 0.65, so MAD/0.65 can be used to estimate σ. These alternative estimates of σ can be used in place of the usual ˆσ for standardizing or Studentizing residuals. 10 / 8

11 PRESS residuals If case i is deleted and a prediction of Y i is made from the remaining data, we can compare the observed and predicted values to get the prediction residual: R (i) Y i Ŷ (i)i. A simple formula for the prediction residual is given by R (i) = Y i X i: ˆβ (i) = Y i X i: ( ˆβ R i (X X ) 1 X i /(1 P ii )) = R i /(1 P ii ). The sum of squares of the prediction residuals is called PRESS (prediction error sum of squares). It is equivalent to using leave-one-out cross validation to estimate the generalization error rate. 11 / 8

12 Leverage Leverage is a measure of how strongly the data for case i determine the fitted value Ŷi. Since Ŷ = PY, and Ŷ i = j P ij Y j, it is natural to define the leverage for case i as P ii, where P is the projection matrix onto col(x ). This is related to the fact that the variance of the i th residual is σ (1 P ii ). Since the residuals have mean zero, when P ii is close to 1, the residual will likely be close to zero. This means that fitted line will usually pass close to (X i, Y i ) if it is a high leverage point. 1 / 8

13 Leverage These are the coefficients P ij plotted against X j (for a specific value of i), in a simple linear regression: X j Ŷ k = i S + n(x i X )(X k X ) Y i ns 13 / 8

14 P ij Leverage If we use basis functions, the coefficients in each row of P are much more local. X j 1 / 8

15 Leverage What is a big leverage? The average leverage is trace(p)/n = (p + 1)/n. If the leverage for a particular case is two or more times greater than the average leverage, it may be considered to have high leverage. In simple linear regression, it is easy to show that var(y i ˆα ˆβX i ) = (n 1)σ /n σ (X i X ) / j (X j X ). This implies that when p = 1, P ii = 1/n + (X i X ) / j (X j X ). 15 / 8

16 Leverage Leverage values in a simple linear regression: Y X Leverage X 16 / 8

17 Leverage Leverage values in a linear regression with two independent variables: X X1 17 / 8

18 Leverage In general, P ii = X i (X X ) 1 X i = X i (X X /n) 1 X i /n where X i is the i th row of X (including the intercept). Let X i be row i of X without the intercept, let µ be the sample mean of the X i, and let Σ X be the sample covariance matrix of the X i (scaled by n rather than n 1). It is a fact that and therefore X i (X X /n) 1 X i = ( X i µ)σ 1 X ( X i µ) + 1 P ii = Note that this implies that P ii 1/n. ( ) ( X i µ X )Σ 1 X ( X i µ X ) + 1 /n. 18 / 8

19 Leverage The expression ( X i µ X )Σ 1 X ( X i µ X ) is the Mahalanobis distance between X i and µ X. Thus there is a direct relationship between the Mahalanobis distance of a point relative to the center of the covariate set, and its leverage. 19 / 8

20 Influence Influence measures the degree to which deletion of a case changes the fitted model. We will see that this is different from leverage a high leverage point has the potential to be influential, but is not always influential. The deleted slope for case i is the fitted slope vector that obtained upon deleting case i. The following identity allows the deleted slopes to be calculated efficiently ˆβ (i) = ˆβ R i 1 P ii (X X ) 1 X i:, where R i is the i th residual, and X i: is row i of the design matrix. 0 / 8

21 Influence The deleted fitted values Ŷ(i) are Ŷ (i) = X ˆβ (i) = Ŷ Influence can be measured by Cook s distance: R i 1 P ii X (X X ) 1 X i:. D i = = 1 (p + 1)ˆσ (Ŷ Ŷ(i)) (Ŷ Ŷ(i)) R i (1 P ii ) (p + 1)ˆσ X i:(x X ) 1 X i: P ii R s i (1 P ii )(p + 1), where R i is the residual and R s i is the studentized residual. 1 / 8

22 Influence Cook s distance approximately captures the average squared change in fitted values due to deleting case i, in error variance units. Cook s distance is large only if both the leverage P ii is high, and the studentized residual for the i th case is large. As a general rule, D i values from 1/ to 1 are high, and values greater than 1 are considered to be a possible problem. / 8

23 Influence Cook s distances in a simple linear regression: 0.10 Cook's distance X 3 / 8

24 Influence Cook s distances in a linear regression with two variables: X X / 8

25 Regression graphics Quite a few graphical techniques have been proposed to aid in visualizing regression relationships. We will discuss the following plots: 1. Scatterplots of Y against individual X variables.. Scatterplots of X variables against each other. 3. Residuals versus fitted values plot.. Added variable plots. 5. Partial residual plots. 6. Residual quantile plots. 5 / 8

26 Scatterplots of Y against individual X variables E[Y X ] = X 1 X + X 3, var[y X ] = 1, var(x j ) = 1, cor(x j, X k ) = 0.3 Y 0 Y 0 0 X 1 0 X Y 0 Y 0 0 X 3 0 X 1 X +X 3 6 / 8

27 Scatterplots of X variables against each other E[Y X ] = X 1 X + X 3, var[y X ] = 1, var(x j ) = 1, cor(x j, X k ) = 0.3 X 0 0 X 1 X 3 0 X X 0 X 1 7 / 8

28 Residuals against fitted values plot E[Y X ] = X 1 X + X 3, var[y X ] = 1, var(x j ) = 1, cor(x j, X k ) = 0.3 Residuals 0 0 Fitted values 8 / 8

29 Residuals against fitted values plots Heteroscedastic errors: E[Y X ] = X 1 + X 3, var[y X ] = + X 1 + X 3, var(x j ) = 1, cor(x j, X k ) = Residuals Fitted values 9 / 8

30 Residuals against fitted values plots Nonlinear mean structure: E[Y X ] = X 1, var[y X ] = 1, var(x j) = 1, cor(x j, X k ) = 0.3 Residuals 0 0 Fitted values 30 / 8

31 Added variable plots Suppose P j is the projection onto the span of all covariates except X j, and define Ŷ j = P j Y, Xj = P j X j. The added variable plot is a scatterplot of Y Ŷ j against X Xj. The squared correlation coefficient of the points in the added variable plot is the partial R for variable j. Added variable plots are also called partial regression plots. 31 / 8

32 Added variable plots E[Y X ] = X 1 X + X 3, var[y X ] = 1, var(x j ) = 1, cor(x j, X k ) = 0.3 Ŷ X 1 Ŷ 3 0 Ŷ 0 0 X 3 0 X 3 / 8

33 Partial residual plot Suppose we fit the model Ŷ i = ˆβ X i = ˆβ 0 + ˆβ 1 X i1 + ˆβ p X ip. The partial residual plot for covariate j is a plot of ˆβ j X ij + R i against X ij, where R i is the residual. The partial residual plot attempts to show how covariate j is related to Y, if we control for the effects of all other covariates. 33 / 8

34 Partial residual plot E[Y X ] = X 1, var[y X ] = 1, var(x j) = 1, cor(x j, X k ) = 0.3 ˆβ1 X 1 +R 0 ˆβ X +R 0 0 X 1 0 X ˆβ3 X 3 +R 0 0 X 3 3 / 8

35 Residual quantile plots E[Y X ] = X 1, var[y X ] = 1, var(x j) = 1, cor(x j, X k ) = 0.3 t distributed errors Residual quantiles (standardized) 0 0 Standard normal quantiles 35 / 8

36 Transformations If it appears that the linear model assumptions do not hold, it may be possible to continuously transform either Y or X so that the linear model becomes more consistent with the data. 36 / 8

37 Variance stabilizing transformations A common violation of the linear model assumptions is a mean/variance relationship, where EY i and var(y i ) are related. Suppose that var Y i = g(ey i )σ, and let f ( ) be a transform to be applied to the Y i. The goal is to find a transform such that the variances of the transformed responses are constant. Using a Taylor expansion, f (Y i ) f (EY i ) + f (EY i )(Y i EY i ). 37 / 8

38 Variance stabilizing transformations Therefore var f (Y i ) f (EY i ) var(y i ) = f (EY i ) g(ey i )σ. The goal is to find f such that f = 1/ g. Example: Suppose g(z) = z λ. This includes the Poisson regression case λ = 1, where the variance is proportional to the mean, and the case λ = where the standard deviation is proportional to the mean. When λ = 1, f solves f (z) = 1/ z, so f is the square root function. When λ =, f solves f (z) = 1/z, so f is the logarithm function. 38 / 8

39 Log/log regression Suppose we fit a simple linear regression of the form E(log(Y ) log(x )) = α + β log(x ). Suppose the logarithms are base 10. Let X z = X 10 z. Under the model, E(log(Y ) X z ) E(log(Y ) X ) = βz Using the crude approximation log E(Y X ) E(log(Y ) X ), we conclude E(Y X ) is approximately scaled by a factor of 10 βz when X is scaled by a factor of 10 z. This holds for relatively small values of z where the crude approximation holds. Thus in a log/log model, we may say that a f % change in X is approximately associated with a f β % change in the expected response. 39 / 8

40 Maximum likelihood estimation of a data transformation The Box-Cox family of transforms is y y λ 1, λ which makes sense only when all Y i are positive. The Box-Cox family includes the identity (λ = 1), all power transformations such as the square root (λ = 1/) and reciprocal (λ = 1), and the logarithm in the limiting case λ 0. 0 / 8

41 Maximum likelihood estimation of a data transformation Suppose we assume that for some value of λ, the transformed data follow a linear model with Gaussian errors. We can then set out to estimate λ. The joint log-likelihood of the transformed data is n log(π) n log σ 1 σ i (Y (λ) i X i β). Next we transform this back to a likelihood in terms of Y i = g 1 λ This joint log-likelihood is (Y (λ) i ). n log(π) n log σ 1 σ (g λ (Y i ) X i β) + i i log J i where the Jacobian is log J i = log g λ(y i ) = (λ 1) log Y i. 1 / 8

42 Maximum likelihood estimation of a data transformation The joint log likelihood for the Y i is n log(π) n log σ 1 σ (g λ (Y i ) X i β) + (λ 1) i i log Y i. This likelihood is maximized with respect to λ, β, and σ to identify the MLE. / 8

43 Maximum likelihood estimation of a data transformation To do the maximization, let Y (λ) g λ (Y ) denote the transformed observed responses, and let Ŷ (λ) denote the fitted values from regressing Y (λ) on X. Since σ does not appear in the Jacobian, ˆσ λ n 1 Y (λ) Ŷ (λ) will be the maximizing value of σ. Therefore the MLE of β and λ will maximize n log ˆσ λ + (λ 1) i log Y i. 3 / 8

44 Collinearity Diagnostics Collinearity inflates the sampling variances of covariate effect estimates. To understand the effect of collinearity on var ˆβ j, reorder the columns and partition the design matrix X as X = ( X j X 0 ) = ( Xj X j + X j X 0 ) where X 0 is the n p matrix consisting of all columns in X except X j, and Xj is the projection of X j onto col(x 0 ). Therefore ( H X X = X j X j (X j X j ) X 0 X 0 (X j X j ) X 0 X 0 ). var ˆβ j = σ H 1 11, so we want a simple expression for H / 8

45 Collinearity Diagnostics A symmetric block matrix can be inverted using: ( A B C B ) 1 = ( S 1 C 1 B S 1 S 1 BC 1 C 1 + C 1 B S 1 BC 1 ), where S = A BC 1 B. Therefore H 1 1,1 = 1 X j (X j Xj ) P 0 (X j Xj ), where P 0 = X 0 (X 0 X 0) 1 X 0 is the projection matrix onto col(x 0 ). 5 / 8

46 Collinearity Diagnostics Since X j X j and since X j so col(x 0 ), we can write H 1 1,1 = 1 X j X j X j, (X j Xj ) = 0, it follows that X j = X j X j + X j = X j Xj + Xj, H 1 1,1 = 1 Xj. This makes sense, since smaller values of Xj correspond to greater collinearity. 6 / 8

47 Collinearity Diagnostics Let R jx be the coefficient of determination (multiple R ) for the regression of X j on the other covariates. R jx = 1 X j (X j X j ) X j X j = 1 X j X j X j. Combining the two equations yields H 1 11 = 1 X j X j 1 1 Rjx. 7 / 8

48 Collinearity Diagnostics The two factors in the expression H 1 11 = 1 X j X j reflect two different sources of variance of ˆβ j : 1 1 Rjx. 1/ X j X j = 1/ ((n 1) var(x j )) reflects the scaling of X j The variance inflation factor (VIF) 1/(1 Rjx ) is scale-free. It is always greater than or equal to 1, and is equal to 1 only if X j is orthogonal to the other covariates. Large values of the VIF indicate that parameter estimation is strongly affected by collinearity. 8 / 8

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Factor Analysis. Factor Analysis

Factor Analysis. Factor Analysis Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

Week 5: Multiple Linear Regression

Week 5: Multiple Linear Regression BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

CAPM, Arbitrage, and Linear Factor Models

CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, Linear Factor Models 1/ 41 Introduction We now assume all investors actually choose mean-variance e cient portfolios. By equating these investors

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

More information

Lecture 8: Gamma regression

Lecture 8: Gamma regression Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4 4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

More information

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Recall that two vectors in are perpendicular or orthogonal provided that their dot Orthogonal Complements and Projections Recall that two vectors in are perpendicular or orthogonal provided that their dot product vanishes That is, if and only if Example 1 The vectors in are orthogonal

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0. Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

MATH 551 - APPLIED MATRIX THEORY

MATH 551 - APPLIED MATRIX THEORY MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

Trend and Seasonal Components

Trend and Seasonal Components Chapter 2 Trend and Seasonal Components If the plot of a TS reveals an increase of the seasonal and noise fluctuations with the level of the process then some transformation may be necessary before doing

More information

Multivariate Analysis (Slides 13)

Multivariate Analysis (Slides 13) Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Linear Models for Continuous Data

Linear Models for Continuous Data Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013 Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2 Multiple Linear

More information

Lecture 2. Summarizing the Sample

Lecture 2. Summarizing the Sample Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

Understanding and Applying Kalman Filtering

Understanding and Applying Kalman Filtering Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document

More information

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively. Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

The Truth About Linear Regression

The Truth About Linear Regression The Truth About Linear Regression 36-350, Data Mining 21 October 2009 Contents 1 Optimal Linear Prediction 2 1.1 Collinearity.............................. 3 1.2 Estimating the Optimal Linear Predictor.............

More information

1 Determinants and the Solvability of Linear Systems

1 Determinants and the Solvability of Linear Systems 1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely side-stepped

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

Week 1. Exploratory Data Analysis

Week 1. Exploratory Data Analysis Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam

More information

Linear Regression. Guy Lebanon

Linear Regression. Guy Lebanon Linear Regression Guy Lebanon Linear Regression Model and Least Squares Estimation Linear regression is probably the most popular model for predicting a RV Y R based on multiple RVs X 1,..., X d R. It

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information Finance 400 A. Penati - G. Pennacchi Notes on On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information by Sanford Grossman This model shows how the heterogeneous information

More information

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015 Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

More information

Multiple regression - Matrices

Multiple regression - Matrices Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Applied Regression Analysis and Other Multivariable Methods

Applied Regression Analysis and Other Multivariable Methods THIRD EDITION Applied Regression Analysis and Other Multivariable Methods David G. Kleinbaum Emory University Lawrence L. Kupper University of North Carolina, Chapel Hill Keith E. Muller University of

More information

UNIT 1: COLLECTING DATA

UNIT 1: COLLECTING DATA Core Probability and Statistics Probability and Statistics provides a curriculum focused on understanding key data analysis and probabilistic concepts, calculations, and relevance to real-world applications.

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information