Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.



Similar documents
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Statistical Machine Learning

STA 4273H: Statistical Machine Learning

Local classification and local likelihoods

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Lecture 3: Linear methods for classification

Classification Problems

Linear Classification. Volker Tresp Summer 2015

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Christfried Webers. Canberra February June 2015

Penalized Logistic Regression and Classification of Microarray Data

Section 6: Model Selection, Logistic Regression and more...

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Data Mining Techniques Chapter 6: Decision Trees

Wes, Delaram, and Emily MA751. Exercise p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

5. Multiple regression

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

Statistics in Retail Finance. Chapter 6: Behavioural models

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Regression Modeling Strategies

The Probit Link Function in Generalized Linear Models for Data Mining Applications

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Linear Models for Classification

Poisson Models for Count Data

Elements of statistics (MATH0487-1)

Chapter 6: Multivariate Cointegration Analysis

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Penalized regression: Introduction

GLM I An Introduction to Generalized Linear Models

Machine Learning and Pattern Recognition Logistic Regression

Logistic Regression (1/24/13)

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Multivariate Logistic Regression

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Generalized Linear Models

2. Linear regression with multiple regressors

From the help desk: hurdle models

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Model Validation Techniques

SUMAN DUVVURU STAT 567 PROJECT REPORT

Linear Threshold Units

Quantile Regression under misspecification, with an application to the U.S. wage structure

Computer exercise 4 Poisson Regression

1 Maximum likelihood estimation

Testing for Lack of Fit

Bayes and Naïve Bayes. cs534-machine Learning

11. Analysis of Case-control Studies Logistic Regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Fitting Subject-specific Curves to Grouped Longitudinal Data

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Logit Models for Binary Data

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Cross-validation for detecting and preventing overfitting

Question 2 Naïve Bayes (16 points)

Dongfeng Li. Autumn 2010

Logistic regression modeling the probability of success

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Multivariate Normal Distribution

Statistics 104: Section 6!

SAS Software to Fit the Generalized Linear Model

Azure Machine Learning, SQL Data Mining and R

Analysis of Variance. MINITAB User s Guide 2 3-1

Introduction to Logistic Regression

Logistic Regression.

Binary Logistic Regression

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Master s Theory Exam Spring 2006

Lecture 14: GLM Estimation and Logistic Regression

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Statistical Machine Learning from Data

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

JetBlue Airways Stock Price Analysis and Prediction

Chapter 3 Quantitative Demand Analysis

Exercise 1.12 (Pg )

240ST014 - Data Analysis of Transport and Logistics

Predict Influencers in the Social Network

CREDIT SCORING MODEL APPLICATIONS:

Week 5: Multiple Linear Regression

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

Simple linear regression

Introduction: Overview of Kernel Methods

Prediction for Multilevel Models

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Detecting Corporate Fraud: An Application of Machine Learning

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

Loss Functions for Preference Levels: Regression with Discrete Ordered Labels

Transcription:

Prof. Dr. J. Franke All of Statistics 1.52 Binary response variables - logistic regression Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting an infection,...) Model: EY j = pr(y j = 1) = π(x j ) [0, 1] π(x) = p(x, b) = ψ(m(x, b)) where m(x, b) = b 1 + d k=2 b k f k (x) as in linear regression, and ψ(u) is the logistic function: ψ(u) = eu 1 + e u = 1 1 + e u

Prof. Dr. J. Franke All of Statistics 1.53 logistic function ψ(u) : R [0, 1]

Prof. Dr. J. Franke All of Statistics 1.54 Y 1,..., Y N independent Bernoulli (0-1) variables with parameters pr(y j = 1) = p(x j, b). Likelihood = probability for observing the data as a function of the parameter: L(b Y 1,..., Y N ) = N j=1 Maximizing the log-likelihood l(b Y 1,..., Y N ) = N j=1 ( p(x j, b) Y j (1 p(x j, b)) 1 Y j Y j log p(x j, b) + (1 Y j ) log(1 p(x j, b)) maximum likelihood estimate ˆb of b Quite similar to non-gaussian linear regression, e.g. ˆb approximately normal for large N, model selection using AIC, BIC or crossvalidation... )

Prof. Dr. J. Franke All of Statistics 1.55 logistic regression p(x) = ψ(5 x 15), x j uniformly distributed

Prof. Dr. J. Franke All of Statistics 1.55 logistic regression, including estimated p(x)

Prof. Dr. J. Franke All of Statistics 1.56 logistic regression with overfitted estimate ˆp(x) = 3 l=0ˆb l+1 x l

Prof. Dr. J. Franke All of Statistics 1.57 Regression and classification Classification problem: Given some object belonging to one of classes C 1,..., C m. Decide to which one! Based on observed features ξ 1,..., ξ p class indicator: Y = k object belongs to class C k class probabilities given the feature values Bayes classifier pr(y = k ξ 1,..., ξ p ) = p k (ξ 1,..., ξ p ) Ŷ = arg max p k(ξ 1,..., ξ p ) {1,..., p} k=1,...,p

Prof. Dr. J. Franke All of Statistics 1.58 p k, k = 1,..., p are estimated from a training set (data) Y j, ξ j1,..., ξ jp, j = 1,..., p which are assumed to be independent. For m = 2 classes, e.g. logistic regression may be used: pr(y j = 1 ξ j1,..., ξ jp ) = p 1 (ξ j1,..., ξ jp, b) = ψ(m(ξ j1,..., ξ jp, b)) where m(u 1,..., u p, b) = b 1 + d k=2 b k f k (u 1,..., u p ), and pr(y j = 2 ξ j1,..., ξ jp ) = p 2 (ξ j1,..., ξ jp, b) = 1 p 1 (ξ j1,..., ξ jp, b) For general m, p m = 1 (p 1 +... + p m 1 ), and pr(y j = l ξ j1,..., ξ jp ) = p l (ξ j1,..., ξ jp, b) = ψ(m l (ξ j1,..., ξ jp, b(l)))

Prof. Dr. J. Franke All of Statistics 1.59 Using qualitative information In regression and classification, qualitative predictor variables or features appear. Example (regression): Steel rods of various material characteristics ξ 1,... ξ p (usually quantitative) and shape, e.g. quadratic, hexagonal, octagonal and circular cross section (qualitative) How does bending strength depend on, in particular, the shape? Transform qualitative into quantitative variables using dummy variables, e.g. in the example quadratic: ξ p+1 = 0, ξ p+2 = 0, hexagonal: ξ p+1 = 0, ξ p+2 = 1, octagonal: ξ p+1 = 1, ξ p+2 = 0, circular: ξ p+1 = 1, ξ p+2 = 1.

Prof. Dr. J. Franke All of Statistics 1.60 Quick MATLAB regression Linear regression: Y j = d k=1 b k f k (ξ j ) + Z j design matrix X j,k = f k (ξ j ), j = 1,..., N, k = 1,..., d b = regress(y, X) least squares estimate [b, bint, r, rint, stats] = regress(y, X) vector r of sample residuals, confidence intervals bint, rint for coordinates of b, r (the latter for outlier detection) stats=(r 2, F-statistic, p-value, ˆσ 2 ) Logistic regression: pr(y j = 1) = ψ ( b 1 + d k=2 b k f k (ξ j ) ) design matrix X j,k = f k (ξ j ), j = 1,..., N, k = 2,..., d b = glmfit(x, Y, binomial ) ML estimate glmfit also covers other generalized linear models

Prof. Dr. J. Franke All of Statistics 2.1 Design of Experiments (Versuchsplanung) Limited amount of time/money sample size N fixed Given this constraint, can we increase the quality of an estimate or the power of a test by a clever choice of observations? Example 1: Linear regression Y j = m(x j, b) + Z j, j = 1,..., N. How to choose x 1,..., x N such that the mean-squared estimation error of the least squares estimate ˆb mse(ˆb) = E ˆb b 2 = d k=1 var ˆb k (due to unbiasedness of least squares) is minimal?

Prof. Dr. J. Franke All of Statistics 2.2 equidistant design on [0,1]: x j x j 1 = 1 N

Prof. Dr. J. Franke All of Statistics 2.3 optimal design on [0,1]: x 1 =... = x 50 = 0, x 51 =... = x 100 = 1

Prof. Dr. J. Franke All of Statistics 2.4 least squares regression lines for both designs and true curve

Prof. Dr. J. Franke All of Statistics 2.5 other realization

Prof. Dr. J. Franke All of Statistics 2.6 data generating mechanism: Y j = 0.5 + 1 x j + Z j, Z 1,..., Z N i.i.d. N (0, 1) where the same Z j have been chosen for both designs For general linear regression, always Eˆb = b, covariance matrix of ˆb = σ 2 (X T X) 1 1 σ 2 mse(ˆb) = tr(x T X) 1 where tr = trace = sum of diagonal elements. In the example: equidistant: var ˆb 1 = 0.04, var ˆb 2 = 0.12, mse(ˆb) = 0.16 optimal: var ˆb 1 = 0.02, var ˆb 2 = 0.04, mse(ˆb) = 0.06

Prof. Dr. J. Franke All of Statistics 2.7 optimal design

Prof. Dr. J. Franke All of Statistics 2.8 residual plot - no warning for model misspecification possible

Prof. Dr. J. Franke All of Statistics 2.9 equidistant design

Prof. Dr. J. Franke All of Statistics 2.10 residual plot

Prof. Dr. J. Franke All of Statistics 2.11 90% optimal design, 10 % safeguard for model misspecification

Prof. Dr. J. Franke All of Statistics 2.12 residual plot

Prof. Dr. J. Franke All of Statistics 2.13 Example 2: Classification 2 classes C 0, C 1, class indicator Y j {0, 1}, class probabilities depending on feature values pr(y j = 1 ξ j1,..., ξ jp ) = p(ξ j1,..., ξ jp ) = p(ξ j ) pr(y j = 0 ξ j1,..., ξ jp ) = 1 p(ξ j ) p(ξ j ) = ψ ( b 1 + d k=2 b k f k (ξ j ) ) Bayes classification: If p(ξ j ) > 1 2 Ŷj = 1 and = 0, else Misclassification probability: pr(y j Ŷj) small! Frequently, one misclassification type more important, e.g. pr(y j Ŷj Y j = 1) small! Problem if most ξ j lie in {z, p(z) 2 1 }

Prof. Dr. J. Franke All of Statistics 2.14 logistic regression with overfitted estimate ˆp(x) = ψ( 3 l=0ˆb l+1 x l )

Prof. Dr. J. Franke All of Statistics 2.15 Misclassification error probabilities classification rule applied to 100000 new (x i, Y i ) using ˆp(x) using true p(x) 0.011 all 0.366 only Y j = 1 0.001 only Y j = 0 0.009 all 0.222 only Y j = 1 0.003 only Y j = 0 Unbalanced design favours the majority use more balanced design or advanced techniques to improve the function estimate in particular regions