Section 6: Model Selection, Logistic Regression and more...



Similar documents
Generalized Linear Models

Penalized regression: Introduction

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Local classification and local likelihoods

STATISTICA Formula Guide: Logistic Regression. Table of Contents

5. Multiple regression

Some Essential Statistics The Lure of Statistics

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

International Statistical Institute, 56th Session, 2007: Phil Everson

Credit Risk Analysis Using Logistic Regression Modeling

Statistics in Retail Finance. Chapter 2: Statistical models of default

[3] Big Data: Model Selection

GLM I An Introduction to Generalized Linear Models

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

SAS Software to Fit the Generalized Linear Model

Statistical Machine Learning

Model Validation Techniques

Logistic Regression.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Logistic Regression (a type of Generalized Linear Model)

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee

Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS

Binary Logistic Regression

Linear Classification. Volker Tresp Summer 2015

Multivariate Logistic Regression

L3: Statistical Modeling with Hadoop

11. Analysis of Case-control Studies Logistic Regression

Discussion Section 4 ECON 139/ Summer Term II

Logistic regression (with R)

Examining a Fitted Logistic Model

Week 5: Multiple Linear Regression

Cross Validation. Dr. Thomas Jensen Expedia.com

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Section 1: Simple Linear Regression

Homework Assignment 7

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Data Mining - Evaluation of Classifiers

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Classification Problems

Nonparametric statistics and model selection

We discuss 2 resampling methods in this chapter - cross-validation - the bootstrap

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg

Simple Predictive Analytics Curtis Seare

Data Mining Practical Machine Learning Tools and Techniques

Basic Statistical and Modeling Procedures Using SAS

STA 4273H: Statistical Machine Learning

Cross-validation for detecting and preventing overfitting

Introduction to Predictive Modeling Using GLMs

data visualization and regression

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Multinomial and Ordinal Logistic Regression

SUGI 29 Statistics and Data Analysis

Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

ANOVA. February 12, 2015

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

How To Bet On An Nfl Football Game With A Machine Learning Program

VI. Introduction to Logistic Regression

IBM SPSS Neural Networks 22

Weight of Evidence Module

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Logistic Regression (1/24/13)

Lecture 14: GLM Estimation and Logistic Regression

Lab 13: Logistic Regression

Picking Winners is For Losers: A Strategy for Optimizing Investment Outcomes

Régression logistique : introduction

Regularized Logistic Regression for Mind Reading with Parallel Validation

Multiple Linear Regression in Data Mining

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Machine Learning Methods for Demand Estimation

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Predicting Health Care Costs by Two-part Model with Sparse Regularization

Latent Class Regression Part II

SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Ordinal Regression. Chapter

USING LOGIT MODEL TO PREDICT CREDIT SCORE

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Module 4 - Multiple Logistic Regression

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Predictive Modeling Techniques in Insurance

Introduction to Logistic Regression

Behavior Model to Capture Bank Charge-off Risk for Next Periods Working Paper

Multiple Linear Regression

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

Simple Linear Regression Inference

Data Mining Techniques Chapter 6: Decision Trees

Transcription:

Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1

Model Building Process When building a regression model remember that simplicity is your friend... smaller models are easier to interpret and have fewer unknown parameters to be estimated. Keep in mind that every additional parameter represents a cost!! The first step of every model building exercise is the selection of the the universe of variables to be potentially used. This task is entirely solved through you experience and context specific knowledge... Think carefully about the problem Consult subject matter research and experts Avoid the mistake of selecting too many variables 2

Model Building Process With a universe of variables in hand, the goal now is to select the model. Why not include all the variables in? Big models tend to over-fit and find features that are specific to the data in hand... ie, not generalizable relationships. The results are bad predictions and bad science! In addition, bigger models have more parameters and potentially more uncertainty about everything we are trying to learn... (check the beer and weight example!) We need a strategy to build a model in ways that accounts for the trade-off between fitting the data and the uncertainty associated with the model 3

Out-of-Sample Prediction One idea is to focus on the model s ability to predict... How do we evaluate a forecasting model? Make predictions! Basic Idea: We want to use the model to forecast outcomes for observations we have not seen before. Use the data to create a prediction problem. See how our candidate models perform. We ll use most of the data for training the model, and the left over part for validating the model. 4

Out-of-Sample Prediction In a cross-validation scheme, you fit a bunch of models to most of the data (training sample) and choose the model that performed the best on the rest (left-out sample). Fit the model on the training data Use the model to predict Ŷj values for all of the N LO left-out data points Calculate the Mean Square Error for these predictions MSE = 1 N LO N LO (Y j Ŷj) 2 j=1 5

Example To illustrate the potential problems of over-fitting the data, let s look again at the Telemarketing example... let s look at multiple polynomial terms... Calls 15 20 25 30 35 40 10 15 20 25 30 35 months 6

Example Let s evaluate the fit of each model by their R 2 (on the training data) R2 0.775 0.776 0.777 0.778 0.779 2 4 6 8 10 Polynomial Order 7

Example How about the MSE?? (on the left-out data) RMSE 2.155 2.160 2.165 2.170 2.175 2 4 6 8 10 Polynomial Order 8

BIC for Model Selection Another way to evaluate a model is to use Information Criteria metrics which attempt to quantify how well our model would have predicted the data (regardless of what you ve estimated for the β j s). A good alternative is the BIC: Bayes Information Criterion, which is based on a Bayesian philosophy of statistics. BIC = n log(s 2 ) + p log(n) You want to choose the model that leads to minimum BIC. 9

BIC for Model Selection One (very!) nice thing about the BIC is that you can interpret it in terms of model probabilities. Given a list of possible models {M 1, M 2,..., M R }, the probability that model i is correct is P(M i ) e 1 2 BIC(M i ) R r=1 e 1 2 BIC(Mr ) = e 1 2 [BIC(M i ) BIC min ] R r=1 e 1 2 [BIC(Mr ) BIC min] (Subtract BIC min = min{bic(m 1)... BIC(M R )} for numerical stability.) 10

BIC for Model Selection Thus BIC is an alternative to testing for comparing models. It is easy to calculate. You are able to evaluate model probabilities. There are no multiple testing type worries. It generally leads to more simple models than F -tests. As with testing, you need to narrow down your options before comparing models. What if there are too many possibilities? 11

Stepwise Regression One computational approach to build a regression model step-by-step is stepwise regression There are 3 options: Forward: adds one variable at the time until no remaining variable makes a significant contribution (or meet a certain criteria... could be out of sample prediction) Backwards: starts will all possible variables and removes one at the time until further deletions would do more harm them good Stepwise: just like the forward procedure but allows for deletions at each step 12

LASSO The LASSO is a shrinkage method that performs automatic selection. Yet another alternative... has similar properties as stepwise regression but it is more automatic... R does it for you! The LASSO solves the following problem: { N ( arg min β Yi X i β ) } 2 + λ β i=1 Coefficients can be set exactly to zero (automatic model selection) Very efficient computational method λ is often chosen via CV 13

One informal but very useful idea to put it all together... I like to build models from the bottom, up... Set aside a set of points to be your validating set (if dataset large enought) Working on the training data, add one variable at the time deciding which one to add based on some criteria: 1. larger increases in R 2 while significant 2. larger reduction in MSE while significant 3. BIC, etc... at every step, carefully analyze the output and check the residuals! Stop when no additional variable produces a significant improvement Always make sure you understand what the model is doing in the specific context of your problem 14

Binary Response Data Let s now look at data where the response Y is a binary variable (taking the value 0 or 1). Win or lose. Sick or healthy. Buy or not buy. Pay or default. Thumbs up or down. The goal is generally to predict the probability that Y = 1, and you can then do classification based on this estimate. 15

Binary Response Data Y is an indicator: Y = 0 or 1. The conditional mean is thus E[Y X ] = p(y = 1 X ) 1 + p(y = 0 X ) 0 = p(y = 1 X ) The mean function is a probability: We need a model that gives mean/probability values between 0 and 1. We ll use a transform function that takes the right-hand side of the model (x β) and gives back a value between zero and one. 16

Binary Response Data The binary choice model is p(y = 1 X 1... X d ) = S(β 0 + β 1 X 1... + β d X d ) where S is a function that increases in value from zero to one. 17

Binary Response Data There are two main functions that are used for this: Logistic Regression: S(z) = ez 1 + e z. Probit Regression: S(z) = pnorm(z). Both functions are S-shaped and take values in (0, 1). Probit is used by economists, logit by biologists, and the rest of us are fairly indifferent: they result in practically the same fit. 18

Logistic Regression We ll use logistic regression, such that p(y = 1 X 1... X d ) = exp[β 0 + β 1 X 1... + β d X d ] 1 + exp[β 0 + β 1 X 1... + β d X d ] The logit link is more common, and it s the default in R. These models are easy to fit in R: glm(y X1 + X2, family=binomial) g stands for generalized, and binomial indicates Y = 0 or 1. Otherwise, generalized linear models use the same syntax as lm(). 19

Logistic Regression What is happening here? Instead of least-squares, glm is maximizing the product of probabilities: n P(Y i x i ) = i=1 n i=1 ( exp[x b] ) Yi ( 1 1 + exp[x b] 1 + exp[x b] This maximizes the likelihood of our data (which is also what least-squares did). ) 1 Yi 20

Logistic Regression The important things are basically the same as before: Individual parameter p-values are interpreted as always. extractaic(reg,k=log(n)) will get your BICs. The predict function works as before, but you need to add type = response to get ˆp i = exp[x b]/(1 + exp[x b]) (otherwise it just returns the linear function x β). Unfortunately, techniques for residual diagnostics and model checking are different (but we ll not worry about that today). Also, without sums of squares there are no R 2, anova, or F -tests! 21

Example: Basketball Spreads NBA basketball point spreads: we have Las Vegas betting point spreads for 553 NBA games and the resulting scores. We can use logistic regression of scores onto spread to predict the probability of the favored team winning. Response: favwin=1 if favored team wins. Covariate: spread is the Vegas point spread. Frequency 0 40 80 120 favwin=1 favwin=0 favwin 0 1 0 10 20 30 40 spread 0 10 20 30 40 spread 22

Example: Basketball Spreads This is a weird situation where we assume is no intercept. There is considerable evidence that betting odds are efficient. A spread of zero implies p(win) = 0.5 for each team. Thus p(win) = exp[β 0 ]/(1 + exp[β 0 ]) = 1/2 β 0 = 0. The model we want to fit is thus p(favwin spread) = exp[β spread] 1 + exp[β spread] 23

Example: Basketball Spreads summary(nbareg <- glm(favwin spread-1, family=binomial)) Some things are different (z not t) and some are missing (F, R 2 ). 24

Example: Basketball Spreads The fitted model is p(favwin spread) = exp[0.156 spread] 1 + exp[0.156 spread] P(favwin) 0.5 0.6 0.7 0.8 0.9 1.0 0 5 10 15 20 25 30 spread 25

Example: Basketball Spreads We could consider other models... and compare with BIC! Our Efficient Vegas model: > extractaic(nbareg, k=log(553)) 1.000 534.287 A model that includes non-zero intercept: > extractaic(glm(favwin spread, family=binomial), k=log(553)) 2.0000 540.4333 What if we throw in home-court advantage? > extractaic(glm(favwin spread+favhome, family=binomial), k=log(553)) 3.0000 545.6371 The simplest model is best (The model probabilities are 19/20, 1/20, and zero.) 26

Example: Basketball Spreads Let s use our model to predict the result of a game: Portland vs Golden State: spread is PRT by 8 p(prt win) = exp[0.156 8] 1 + exp[0.156 8] = 0.78 Chicago vs Orlando: spread is ORL by 4 p(chi win) = 1 1 + exp[0.156 4] = 0.35 27

Example: Credit Scoring A common business application of logistic regression is in evaluating the credit quality of (potential) debtors. Take a list of borrower characteristics. Build a prediction rule for their credit. Use this rule to automatically evaluate applicants (and track your risk profile). You can do all this with logistic regression, and then use the predicted probabilities to build a classification rule. 28

Example: Credit Scoring We have data on 1000 loan applicants at German community banks, and judgement of the loan outcomes (good or bad). The data has 20 borrower characteristics, including Credit history (5 categories). Housing (rent, own, or free). The loan purpose and duration. Installment rate as a percent of income. 29

Example: Credit Scoring We can use forward step wise regression to build a model. null <- glm(y history3, family=binomial, data=credit[train,]) full <- glm(y., family=binomial, data=credit[train,]) reg <- step(null, scope=formula(full), direction="forward", k=log(n)). Step: AIC=882.94 Y[train] history3 + checkingstatus1 + duration2 + installment8 The null model has credit history as a variable, since I d include this regardless, and we ve left-out 200 points for validation. 30

Classification A common goal with logistic regression is to classify the inputs depending on their predicted response probabilities. For example, we might want to classify the German borrowers as having good or bad credit (i.e., do we loan to them?). A simple classification rule is to say that anyone with p(good x) > 0.5 can get a loan, and the rest do not. 31

Example: Credit Scoring Let s use the validation set to compare this and the full model. > full <- glm(formula(terms(y[train]., data=covars)), data=covars[train,], family=binomial) > predreg <- predict(reg, newdata=covars[-train,], type="response") > predfull <- predict(full, newdata=covars[-train,], type="response") > # 1 = false negative, -1 = false positive > errorreg <- Y[-train]-(predreg >=.5) > errorfull <- Y[-train]-(predfull >=.5) > # misclassification rates: > mean(abs(errorreg)) 0.220 > mean(abs(errorfull)) 0.265 Our model classifies borrowers correctly 78% of the time. 32