Machine Learning Logistic Regression



Similar documents
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Machine Learning and Pattern Recognition Logistic Regression

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Introduction to Logistic Regression

3F3: Signal and Pattern Processing

L3: Statistical Modeling with Hadoop

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Statistical Machine Learning

Content-Based Recommendation

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

The Artificial Prediction Market

STA 4273H: Statistical Machine Learning

Christfried Webers. Canberra February June 2015

Linear Threshold Units

Logistic Regression for Spam Filtering

Lecture 8 February 4

Logistic Regression (1/24/13)

Lecture 3: Linear methods for classification

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Lecture 6: Logistic Regression

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

LCs for Binary Classification

Classification Problems

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Supervised Learning (Big Data Analytics)

How to Win at the Track

Programming Exercise 3: Multi-class Classification and Neural Networks

Predict Influencers in the Social Network

CSC 411: Lecture 07: Multiclass Classification

Nominal and ordinal logistic regression

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Linear Models for Classification

Data Mining Part 5. Prediction

Principles of Data Mining by Hand&Mannila&Smyth

Business Analytics and Credit Scoring

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

NC STATE UNIVERSITY Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Better decision making under uncertain conditions using Monte Carlo Simulation

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CSCI567 Machine Learning (Fall 2014)

Introduction to Machine Learning Using Python. Vikram Kamath

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

ANALYSIS, THEORY AND DESIGN OF LOGISTIC REGRESSION CLASSIFIERS USED FOR VERY LARGE SCALE DATA MINING

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

Decision Trees from large Databases: SLIQ

The equivalence of logistic regression and maximum entropy models

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

How To Cluster

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Making Sense of the Mayhem: Machine Learning and March Madness

Classification algorithm in Data mining: An Overview

Weekly Sales Forecasting

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems

Classification of Bad Accounts in Credit Card Industry

A Logistic Regression Approach to Ad Click Prediction

Generalized Linear Models

Analysis Tools and Libraries for BigData

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Analysis of Bayesian Dynamic Linear Models

Simple and efficient online algorithms for real world applications

Wes, Delaram, and Emily MA751. Exercise p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

Biostatistics: Types of Data Analysis

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Some Essential Statistics The Lure of Statistics

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Knowledge Discovery from patents using KMX Text Analytics

The Scientific Data Mining Process

Data Mining. Nonlinear Classification

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Machine learning for algo trading

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

A Survey on Pre-processing and Post-processing Techniques in Data Mining

The Predictive Data Mining Revolution in Scorecards:

Statistics Graduate Courses

Data Mining Part 5. Prediction

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

LOGISTIC REGRESSION ANALYSIS

KEITH LEHNERT AND ERIC FRIEDRICH

Ordinal Regression. Chapter

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Techniques for Prognosis in Pancreatic Cancer

Linear Classification. Volker Tresp Summer 2015

Supporting Online Material for

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

Logistic regression modeling the probability of success

Solving Simultaneous Equations and Matrices

Transcription:

Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1

Logistic regression Name is somewhat misleading. Really a technique for classification, not regression. Regression comes from fact that we fit a linear model to the feature space. Involves a more probabilistic view of classification. Jeff Howbert Introduction to Machine Learning Winter 2012 2

Different ways of expressing probability Consider a two-outcome probability space, where: p( O 1 )=p p( O 2 ) = 1 p = q Can express probability of O 1 as: notation range equivalents standard probability p 0 0.5 1 odds p / q 0 1 + log odds (logit) log( p / q ) - 0 + Jeff Howbert Introduction to Machine Learning Winter 2012 3

Log odds Numeric treatment of outcomes O 1 and O 2 is equivalent If neither outcome is favored over the other, then log odds = 0. If one outcome is favored with log odds = x, then other outcome is disfavored with log odds = -x. Especially useful in domains where relative probabilities can be miniscule Example: multiple sequence alignment in computational biology Jeff Howbert Introduction to Machine Learning Winter 2012 4

From probability to log odds (and back again) p logit function 1 log p p p z = 1 1 z z e e p p = logistic function 1 1 1 z z e e e p + = + = Jeff Howbert Introduction to Machine Learning Winter 2012 5

Standard logistic function Jeff Howbert Introduction to Machine Learning Winter 2012 6

Scenario: Logistic regression A multidimensional feature space (features can be categorical or continuous). Outcome is discrete, not continuous. We ll focus on case of two classes. It seems plausible that a linear decision boundary (hyperplane) will give good predictive accuracy. Jeff Howbert Introduction to Machine Learning Winter 2012 7

Using a logistic regression model Model consists of a vector β in d-dimensional feature space For a point x in feature space, project it onto β to convert it into a real number z in the range - to + z = α + β x = α + β x + L+ β 1 1 d x d Map z to the range 0 to 1 using the logistic function p = 1/(1 + e Overall, logistic regression maps a point x in d- dimensional feature space to a value in the range 0 to 1 Jeff Howbert Introduction to Machine Learning Winter 2012 8 z )

Using a logistic regression model Can interpret prediction from a logistic regression model as: A probability of class membership A class assignment, by applying threshold to probability threshold represents decision boundary in feature space Jeff Howbert Introduction to Machine Learning Winter 2012 9

Training a logistic regression model Need to optimize β so the model gives the best possible reproduction of training set labels Usually done by numerical approximation of maximum likelihood On really large datasets, may use stochastic gradient descent Jeff Howbert Introduction to Machine Learning Winter 2012 10

Logistic regression in one dimension Jeff Howbert Introduction to Machine Learning Winter 2012 11

Logistic regression in one dimension Jeff Howbert Introduction to Machine Learning Winter 2012 12

Logistic regression in one dimension Parameters control shape and location of sigmoid curve α controls location of midpoint β controls slope of rise Jeff Howbert Introduction to Machine Learning Winter 2012 13

Logistic regression in one dimension Jeff Howbert Introduction to Machine Learning Winter 2012 14

Logistic regression in one dimension Jeff Howbert Introduction to Machine Learning Winter 2012 15

Logistic regression in two dimensions Subset of Fisher iris dataset Two classes First two columns (SL, SW) decision boundary Jeff Howbert Introduction to Machine Learning Winter 2012 16

Logistic regression in two dimensions Interpreting the model vector of coefficients From MATLAB: B = [ 13.0460-1.9024-0.4047 ] α = B( 1 ), β = [ β 1 β 2 ] = B( 2 : 3 ) α, β define location and orientation of decision boundary - α is distance of decision boundary from origin decision boundary is perpendicular to β magnitude of β defines gradient of probabilities between 0 and 1 β Jeff Howbert Introduction to Machine Learning Winter 2012 17

Heart disease dataset 13 attributes (see heart.docx for details) 2 demographic (age, gender) 11 clinical measures of cardiovascular status and performance 2 classes: absence ( 1 ) or presence ( 2 ) of heart disease 270 samples Dataset t taken from UC Irvine Machine Learning Repository: http://archive.ics.uci.edu/ml/datasets/statlog+(heart) Preformatted for MATLAB as heart.mat. t Jeff Howbert Introduction to Machine Learning Winter 2012 18

MATLAB interlude matlab_demo_05.m Jeff Howbert Introduction to Machine Learning Winter 2012 19

Logistic regression Advantages: Makes no assumptions about distributions of classes in feature space Easily extended to multiple classes (multinomial regression) Natural probabilistic view of class predictions Quick to train Very fast at classifying unknown records Good accuracy for many simple data sets Resistant to overfitting Can interpret model coefficients as indicators of feature importance Disadvantages: Linear decision i boundary Jeff Howbert Introduction to Machine Learning Winter 2012 20