Week 3: Residual Analysis (Chapter 3)

Similar documents
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

STATISTICAL DATA ANALYSIS IN EXCEL

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

CHAPTER 14 MORE ABOUT REGRESSION

SIMPLE LINEAR CORRELATION

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Binomial Link Functions. Lori Murray, Phil Munz

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Regression Models for a Binary Response Using EXCEL and JMP

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

How To Calculate The Accountng Perod Of Nequalty

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

L10: Linear discriminants analysis

An Alternative Way to Measure Private Equity Performance

Economic Interpretation of Regression. Theory and Applications

Statistical Methods to Develop Rating Models

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

1. Measuring association using correlation and regression

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Section 5.4 Annuities, Present Value, and Amortization

Can Auto Liability Insurance Purchases Signal Risk Attitude?

What is Candidate Sampling

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Estimation of Dispersion Parameters in GLMs with and without Random Effects

Forecasting the Direction and Strength of Stock Market Movement

ESTIMATING THE MARKET VALUE OF FRANKING CREDITS: EMPIRICAL EVIDENCE FROM AUSTRALIA

Evaluating credit risk models: A critique and a new proposal

1 Example 1: Axis-aligned rectangles

Prediction of Disability Frequencies in Life Insurance

An Empirical Study of Search Engine Advertising Effectiveness

An Interest-Oriented Network Evolution Mechanism for Online Communities

Although ordinary least-squares (OLS) regression

How To Find The Dsablty Frequency Of A Clam

BERNSTEIN POLYNOMIALS

Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Logistic Regression. Steve Kroon

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

This circuit than can be reduced to a planar circuit

Abstract. 260 Business Intelligence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING

Faraday's Law of Induction

1 De nitions and Censoring

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

Efficient Project Portfolio as a tool for Enterprise Risk Management

total A A reag total A A r eag

International University of Japan Public Management & Policy Analysis Program

Two Faces of Intra-Industry Information Transfers: Evidence from Management Earnings and Revenue Forecasts

Survival analysis methods in Insurance Applications in car insurance contracts

The Greedy Method. Introduction. 0/1 Knapsack Problem

Lecture 2: Single Layer Perceptrons Kevin Swingler

A Practitioner's Guide to Generalized Linear Models

The Effect of Mean Stress on Damage Predictions for Spectral Loading of Fiberglass Composite Coupons 1

7 ANALYSIS OF VARIANCE (ANOVA)

A statistical approach to determine Microbiologically Influenced Corrosion (MIC) Rates of underground gas pipelines.

High Correlation between Net Promoter Score and the Development of Consumers' Willingness to Pay (Empirical Evidence from European Mobile Markets)

Sketching Sampled Data Streams

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

IMPACT ANALYSIS OF A CELLULAR PHONE

The OC Curve of Attribute Acceptance Plans

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

OLA HÖSSJER, BENGT ERIKSSON, KAJSA JÄRNMALM AND ESBJÖRN OHLSSON ABSTRACT

Recurrence. 1 Definitions and main statements

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

World currency options market efficiency

Fuzzy Regression and the Term Structure of Interest Rates Revisited

The Probability of Informed Trading and the Performance of Stock in an Order-Driven Market

General Iteration Algorithm for Classification Ratemaking

Fault tolerance in cloud technologies presented as a service

Chapter 2 The Basics of Pricing with GLMs

Evaluation of E-learning Platforms: a Case Study

DEFINING %COMPLETE IN MICROSOFT PROJECT

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC

14.74 Lecture 5: Health (2)

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Brigid Mullany, Ph.D University of North Carolina, Charlotte

An Analysis of the relationship between WTI term structure and oil market fundamentals in

Traditional versus Online Courses, Efforts, and Learning Performance

Generalized Linear Models for Traffic Annuity Claims, with Application to Claims Reserving

Macro Factors and Volatility of Treasury Bond Returns

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

The Choice of Direct Dealing or Electronic Brokerage in Foreign Exchange Trading

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

Analysis of Premium Liabilities for Australian Lines of Business

Trade Adjustment and Productivity in Large Crises. Online Appendix May Appendix A: Derivation of Equations for Productivity

Copulas. Modeling dependencies in Financial Risk Management. BMI Master Thesis

Quantization Effects in Digital Filters

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

Calculation of Sampling Weights

Transcription:

Week 3: Resdual Analyss (Chapter 3) Propertes of Resduals Here s some propertes of resduals, some of whch you learned n prevous lectures:. Defnton of Resdual: e = Y Ŷ. ε = Y E( Y ) 3. ε ~ NID( 0 σ ), 4. mean: e = 0 e = 0 5. varance: ( e e) n = e n = SSE = MSE n Semstudentzed Resduals We wll use standardzed resduals n some of our analyses of resduals. The standardzaton formula, e e e e =, s often used to standardze resduals. KNNL on page 03 MSE MSE * = explan ths standardzaton would create a studentzed resdual f MSE were an estmate of the standard devaton of the resdual. However, ths formula does not produce truly studentzed resduals because MSE s only an approxmaton of the standard devaton of e. They wll dscuss how to calculate studentzed resduals n chapter 0. Stll, ths formula s the bass of many resdual analyss technques, and t s the formula used by many statstcal software packages, such as SAS, to standardze the resduals.

Sx Areas n Whch We Wll Use Resdual Analyss. regresson not lnear. non-constant varance 3. ndependence of resduals 4. outlers 5. normalty of errors 6. mportant predctor (ndependent) varables. Vsual Dagnostcs for Resduals Resdual plots are qute useful for examnng resduals n the above sx categores (Fgure 3.4, page 06). You can also use a normal probablty plot to examne f the resduals are normally dstrbuted (normal probablty plots are often standard output n statstcal analyss software). Statstcal Tests for Resduals Normalty: You can use the Shapro-Wlks statstc (ths s standard output n the Proc Freq procedure n SAS). KNNL also menton the Correlaton test for Normalty, whch they say s easer to use than the Shapro-Wlks test. You can also use standard goodness of ft tests, lke the ch-square or Kolmogorov-Smrnov tests. Autocorrelaton (randomness): You can use the Durbn-Watson statstc to determne f you have sgnfcant autocorrelaton (ths s standard output n SAS). You can also use a Runs test.

Non-constant varance (heteroscedastcty): KNNL present two tests that can be used to check for constant varance: ) the Modfed Levene or Brown-Forsythe test, and ) the Breusch-Pagan test. The Modfed Levene test does not requre that the errors be normally dstrbuted, unlke the Breusch-Pagan test. KNNL report that the Modfed Levene test s actually qute robust aganst severe departures from normalty. The sample sze, though, does need to be large. KNNL present an example of the Modfed Levene test on page 7 and an example of the Breusch-Pagan test on page 9. Lack of Ft Test A Lack of Ft test s used to determne f a regresson functon adequately fts the data. Ths test assumes that the Y s are ndependent, normally dstrbuted, and have constant varance. It also requres that replcates at one or more levels of X are avalable. When replcates are avalable, the error can be dvded nto two components: ) pure error, and ) lack of ft error. The pure error component recognzes that replcatons exst for some levels of X. The sums of squares for the pure error can be expressed as: c n ( Y Y ) SSPE =, = = where c = number of levels of X and = number of observatons for a gven level of X. The degrees of freedom assocated wth SSPE = n c. Wth ths nformaton, we can compute an unbased estmator of the error varance: 3

SSPE MSPE =. n c The lack of ft component s smply the dfference between the overall error, SSE, and the pure error component, SSPE: SSLF = SSE SSPE, or Y Ŷ = Y Y + Y Ŷ. SSLF can be expressed drectly as: SSLF = c = n ( Y Ŷ ), wth c degrees of freedom. The lack of ft aspect can be seen n the dfference, Y Ŷ. If the dfference s small, then you can conclude that the regresson model s a better ft than f the dfference s large. Wth SSLF and ts degrees of freedom, we can calculate the mean square for the lack of ft: SSLF MSLF =. c Now that we have expressons for two mean squares, we can construct an F-statstc for our lack of ft test: MSLF F =. MSPE We can carry out the test as an ANOVA n the usual fashon. Note: KNNL present the lack of ft test n the context of the general lnear test on page 7. Begnnng on page, they present the Full Model and the Reduced Model, whch provde the approprate sums of squares to construct the F-statstc for the lack of ft test. 4

Example: Problem 3.5 on page 50. In ths example, a chemst measured the concentraton of a soluton (Y) over tme (X) (n = 5 solutons). The n = 5 solutons were randomly dvded nto fve sets of three, and the fve sets were measured after, 3, 5, 7, and 9 hours, respectvely. Here are the data: Hypotheses Ho : E Ha : E α = 0.05 ( Y ) = β0 + βx ( Y ) β0 + βx X Y 9 0.07 9 0.09 3 9 0.08 4 7 0.6 5 7 0.7 6 7 0. 7 5 0.49 8 5 0.58 9 5 0.53 0 3. 3.5 3.07 3.84 4.57 5 3.0 Decson Rule If F F -α, c, n c = F.95, 3, 0 = 3.7, then reect Ho and conclude that the regresson functon does not adequate ft the data (.e., a sgnfcant lack of ft exsts). Results Regresson Equaton: Y =.5753 0.34 * X. Snce F = 58 >> 3.7 (p << 0.000), reect Ho and conclude there s a lack of ft. 5

ANOVA table Source df SS MS F P - value Regresson.597.597 Error n = 3.946 0.5 Lack of Ft c = 3.767 0.94 58.75 < 0.000 Pure Error n c = 0 0.574 0.057 Total n = 4 5.58 Remedal Measures If the SLR model does not ft the data, then you have two choces:. Fnd a new model form;.e., pck a nonlnear model.. Transform the data. Data Transformatons The followng transformatons on X wll often lnearze a nonlnear relatonshp. If they do not adequately work, then you should use a nonlnear regresson model, whch we wll dscuss later n the class. X ' = X X ' = X X ' = log X or ln X ' X = X 6

The followng transformatons of Y (smlar to the ones above for X) wll often stablze the varance and/or fx non-normalty n the errors. Often, these transformatons on Y wll fx nonnormalty and stablze the varance smultaneously. Y ' = Y Y ~ Posson Y ' = arcsn Y Y ~ Bnomal Y ' = Y Y ' = log Y or ln Y Y ~ Lognormal. You can also use weghted regresson to stablze the varance. We wll dscuss ths topc later n Chapter. Also, KNNL menton Box-Cox transformatons (page 34) as remedal measures for unequal varances, nonnormalty, etc. Box-Cox transformatons are useful when you don t know exactly whch transformaton on Y to use. The Box-Cox procedure utlzed Maxmum Lkelhood estmaton to dentfy the approprate transformaton on Y from a famly of power functons. They also present an approxmaton technque that does not requre you to mnmze the lkelhood functon snce many statstcal analyss packages typcally do not allow you to use Box-Cox transformatons (see page 36). 7