Weighted least squares

Similar documents
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Penalized regression: Introduction

Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS

2. Linear regression with multiple regressors

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Regression III: Advanced Methods

Logit Models for Binary Data

LOGIT AND PROBIT ANALYSIS

Poisson Models for Count Data

Part 2: Analysis of Relationship Between Two Variables

Chapter 7: Simple linear regression Learning Objectives

Simple linear regression

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

Two-sample inference: Continuous data

Solución del Examen Tipo: 1

STATISTICA Formula Guide: Logistic Regression. Table of Contents

CALCULATIONS & STATISTICS

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Least Squares Estimation

MEASURES OF VARIATION

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

The correlation coefficient

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

The Method of Least Squares

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

Local classification and local likelihoods

CAPM, Arbitrage, and Linear Factor Models

Using Excel for Statistical Analysis

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

A Basic Introduction to Missing Data

Multiple Regression: What Is It?

Module 5: Multiple Regression Analysis

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Association Between Variables

Comparison of sales forecasting models for an innovative agro-industrial product: Bass model versus logistic function

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Univariate Regression

2 Sample t-test (unequal sample sizes and unequal variances)

Multiple Imputation for Missing Data: A Cautionary Tale

Standard Deviation Estimator

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

CURVE FITTING LEAST SQUARES APPROXIMATION

Some useful concepts in univariate time series analysis

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

3. Regression & Exponential Smoothing

Session 9 Case 3: Utilizing Available Software Statistical Analysis

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

TESTING THE ASSUMPTIONS OF AGE-TO-AGE FACTORS. Abstract

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

Introduction to Data Analysis in Hierarchical Linear Models

Curve Fitting Best Practice

Homework 8 Solutions

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

CS 147: Computer Systems Performance Analysis

Chapter 5 Analysis of variance SPSS Analysis of variance

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Centre for Central Banking Studies

Imputing Missing Data using SAS

Standard errors of marginal effects in the heteroskedastic probit model

Coefficient of Determination

Regression Analysis: A Complete Example

Analyzing Structural Equation Models With Missing Data

Corporate Defaults and Large Macroeconomic Shocks

17. SIMPLE LINEAR REGRESSION II

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Simple Linear Regression Inference

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Factors affecting online sales

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Financial Risk Management Exam Sample Questions/Answers

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Premaster Statistics Tutorial 4 Full solutions

From the help desk: Bootstrapped standard errors

Analysis of Variance ANOVA

Logistic Regression (1/24/13)

Introduction to Regression and Data Analysis

Multiple Linear Regression in Data Mining

Simple Regression Theory II 2010 Samuel L. Baker

Specifications for this HLM2 run

How To Run Statistical Tests in Excel

SPSS Guide: Regression Analysis

Chapter 7 Factor Analysis SPSS

Causes of Inflation in the Iranian Economy

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Predictability Study of ISIP Reading and STAAR Reading: Prediction Bands. March 2014

Using R for Linear Regression

Obesity in America: A Growing Trend

Transcription:

Weighted least squares Patrick Breheny February 7 Patrick Breheny BST 760: Advanced Regression 1/17

Introduction Known weights As a precursor to fitting generalized linear models, let s first deal with the case of a normally distributed outcome with unequal variances Consider the data set statin which contains (simulated) records on diabetic patients, collected from 130 practices (hospitals, clinics, etc.) in Pennsylvania The outcome variable was the average LDL cholesterol level of the diabetic patients, and the explanatory variable was the percent of diabetic patients at the practice who are on statin drugs Only practice-level (i.e., not individual-level) data was available Patrick Breheny BST 760: Advanced Regression 2/17

Ordinary least squares fit When we fit a simple linear regression model, we see that the correlation between Statin and LDL is 0.1, and is not significant (p =.27) However, this model is treating each practice with equal weight, despite the fact that some practices (such as Philadelphia hospitals) have nearly 2500 patients while others (small rural clinics) have only 11 diabetic patients Clearly, the results for the large practices are less variable (i.e., contain more information); it therefore stands to reason that they should have greater weight in the model Patrick Breheny BST 760: Advanced Regression 3/17

Weighted least squares Recall that the negative log-likelihood of the model under the assumption of normality is (y i µ i ) 2 ; i 2σ 2 i note that we are no longer assuming equal variance In particular, since the outcome in this case is an average, it is reasonable to assume σ 2 i = σ2 /n i, where n i is the number of patients in clinic i Thus, the maximum likelihood estimate will minimize the weighted residual sum of squares, i (y i µ i ) 2 2σ 2 /n i i w i (y i µ i ) 2, where w i = n i Patrick Breheny BST 760: Advanced Regression 4/17

Solving for the WLS estimate In matrix form, we need to minimize (y Xβ) T W(y Xβ), where W is a diagonal matrix of weights Taking the derivative with respect to β, we find that the solution is β = (X T WX) 1 X T Wy Note that, for the case where n i = 1 for all i, W = I and our results reduces to the regular least squares estimate Patrick Breheny BST 760: Advanced Regression 5/17

Sampling distribution of β Note that β is once again a linear combination of {y i }, and therefore normally distributed: recall that Var(y) = σ 2 W 1 β N ( β, σ 2 (X T WX) 1) ; The subsequent derivation of confidence intervals and tests is similar to linear regression This model can be fit in R using the weights= option, or in SAS with a WEIGHT statement Patrick Breheny BST 760: Advanced Regression 6/17

Results Known weights Estimated effect of Statin: Estimate SE t p β Statin -0.19 0.07 2.78 0.01 Taking into account the heteroskedasticity, we now have a highly significant test and more accurate estimate (the data was simulated with a true β Statin = 0.2) Additionally, the (weighted) correlation increased from 0.1 to 0.24 Patrick Breheny BST 760: Advanced Regression 7/17

Introduction: The preceding approach (known as weighted least squares, or WLS) easily handles unequal variances for problems in which the relative variances of the outcomes are known (note that it was not necessary to know the actual variance, just the relative ratios) However, what about situations where the variance seems to increase with the mean? Here, we don t know the means ahead of time (otherwise we wouldn t need to fit a model) and therefore we don t know the appropriate weights, either Patrick Breheny BST 760: Advanced Regression 8/17

Hand speed data Known weights For example, consider a study in which the investigator measured hand speed (time it takes to remove a bolt from an S-shaped iron rod) for a number of subjects of various ages: 7 6 5 Time 4 3 2 1 20 40 60 80 Age Patrick Breheny BST 760: Advanced Regression 9/17

Remarks Known weights There are a number of interesting things going on in this data set: Older individuals take longer to complete the task than younger ones However, the trend does not seem to be linear As mean hand speed goes up, so does variability Patrick Breheny BST 760: Advanced Regression 10/17

Iterative reweighting Roughly speaking, it seems that SD µ, so Var(y) µ 2 Of course, we don t know µ Once we fit the model, we have estimates for {µ i } and therefore, for {w i }, where ŵ i = ˆµ 2 i However, once we change {w i }, we change the fit, and thus we change {ˆµ i }, which changes {w i }, and so on... Patrick Breheny BST 760: Advanced Regression 11/17

Iterative reweighting (cont d) Consider, then, the following approach to model fitting: (1) Fit the model, obtaining β and ˆµ (2) Use ˆµ to recalculate w (3) Repeat steps (1) and (2) until the model stops changing (i.e., until convergence) This approach is known as the iteratively reweighted least squares (IRLS) algorithm Patrick Breheny BST 760: Advanced Regression 12/17

An iterative loop in R One way to implement this algorithm is with fit.w <- lm(time~age, Data) for (i in 1:20) { w <- 1/fit.w$fitted.values^2 fit.w <- lm(time~age, Data, weights=w) } This implementation assumes that 20 iterations is enough to achieve convergence This is fine for this case, but in general, it is better to write a repeat or while loop which checks for convergence at each iteration and terminates the loop when convergence is reached Patrick Breheny BST 760: Advanced Regression 13/17

OLS vs. WLS 20 40 60 80 1 2 3 4 5 6 7 Age Time OLS WLS Estimate Std. Error t value p OLS β Age 0.034 0.0018 18.7 < 0.0001 WLS β Age 0.029 0.0013 21.9 < 0.0001 Patrick Breheny BST 760: Advanced Regression 14/17

OLS vs. WLS (cont d) Note that by taking heteroskedasticity into account, the slope is lowered somewhat, as the regression line is less influenced by the highly variable points on the right However, by more accurately modeling the distribution of the outcome, the standard deviation of our estimate is reduced from 0.0018 to 0.0013 Consequently, we even obtain a larger test statistic, despite the fact that the estimate is closer to zero Furthermore, R 2 increases from 0.41 to 0.49 Patrick Breheny BST 760: Advanced Regression 15/17

An inappropriate variance model Note that this only improves our results if we model the variance accurately Suppose we were to iteratively reweight according to ŵ i = ˆµ 2 i Fitting this model, we more than double our standard error and decrease R 2 from 0.41 to 0.30 Estimate Std. Error t value p β Age 0.065 0.0044 14.7 < 0.0001 Patrick Breheny BST 760: Advanced Regression 16/17

Quadratic model Finally, we might also consider a quadratic fit: 20 40 60 80 1 2 3 4 5 6 7 Age Time OLS WLS The fits are actually very similar in this case, although once again we achieve reductions in standard error and an improvement in R 2 from 0.49 to 0.53 Patrick Breheny BST 760: Advanced Regression 17/17