Classification using Logistic Regression



Similar documents
Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Lecture 8 February 4

CSCI567 Machine Learning (Fall 2014)

Programming Exercise 3: Multi-class Classification and Neural Networks

: Introduction to Machine Learning Dr. Rita Osadchy

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

Introduction to Logistic Regression

STA 4273H: Statistical Machine Learning

Lecture 6: Logistic Regression

Logistic Regression for Spam Filtering

CSC 411: Lecture 07: Multiclass Classification

(Quasi-)Newton methods

Linear Threshold Units

Supervised Learning (Big Data Analytics)

Statistical Machine Learning

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Lecture 2: August 29. Linear Programming (part I)

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Advanced analytics at your hands

Chapter 12 Discovering New Knowledge Data Mining

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Introduction to Online Learning Theory

Lecture 3: Linear methods for classification

Machine Learning and Pattern Recognition Logistic Regression

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Scalable Machine Learning - or what to do with all that Big Data infrastructure

Machine Learning Logistic Regression

Boosting.

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

Predict Influencers in the Social Network

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

3F3: Signal and Pattern Processing

Distributed Machine Learning and Big Data

Azure Machine Learning, SQL Data Mining and R

Introduction to Learning & Decision Trees

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

An Introduction to Machine Learning and Natural Language Processing Tools

Chapter 4: Artificial Neural Networks

Big Data - Lecture 1 Optimization reminders

Data Mining Practical Machine Learning Tools and Techniques

Lecture 2: The SVM classifier

ADVANCED MACHINE LEARNING. Introduction

Classification of Bad Accounts in Credit Card Industry

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

A Simple Introduction to Support Vector Machines

Data Mining - Evaluation of Classifiers

Multi-Class and Structured Classification

Data Mining Classification: Decision Trees

Big Data Analytics CSCI 4030

Cross-validation for detecting and preventing overfitting

Machine Learning Big Data using Map Reduce

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

Big Data Analysis. Rajen D. Shah (Statistical Laboratory, University of Cambridge) joint work with Nicolai Meinshausen (Seminar für Statistik, ETH

Making Sense of the Mayhem: Machine Learning and March Madness

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Spark: Cluster Computing with Working Sets

Learning is a very general term denoting the way in which agents:

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

1 What is Machine Learning?

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Knowledge-based systems and the need for learning

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Version Spaces.

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

NC STATE UNIVERSITY Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Bootstrapping Big Data

L3: Statistical Modeling with Hadoop

LCs for Binary Classification

Microsoft Azure Machine learning Algorithms

Simple and efficient online algorithms for real world applications

Evolutionary Detection of Rules for Text Categorization. Application to Spam Filtering

Machine Learning for Data Science (CS4786) Lecture 1

Applied Multivariate Analysis - Big data analytics

Linear Models for Classification

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Feedforward Neural Networks and Backpropagation

Introduction to Support Vector Machines. Colin Campbell, Bristol University

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Better credit models benefit us all

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

1. Classification problems

Systematic Reviews and Meta-analyses

Pattern Recognition and Prediction in Equity Market

Christfried Webers. Canberra February June 2015

ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science

Football Match Winner Prediction

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Data Mining Techniques for Prognosis in Pancreatic Cancer

Transcription:

Classification using Logistic Regression Ingmar Schuster Patrick Jähnichen using slides by Andrew Ng Institut für Informatik

This lecture covers Logistic regression hypothesis Decision Boundary Cost function (why we need a new one) Simplified Cost function & Gradient Descent Advanced Optimization Algorithms Multiclass classification Logistic regression 2

Logistic regression Hypothesis Representation Logistic regression 3

Classification Problems Classification malignant or benign cancer Spam or Ham Human face or no human face Positive Sentiment? Binary Decision Task (in most simple case) Want Data point belongs to class if close to 1 Doesn't belong to class if close to 0 Logistic regression 4

Logistic Function (Sigmoid Function) maps into interval [0;1] 0 asymptote for 1 asymptote for Sigmoid Function (S-shape) Logistic Function Logistic regression 5

Hypothesis Interpretation Because probabilites should sum to 1, define If interpret as 70% chance data point belongs to class If classify as positive sentiment, malignant tumor,... Logistic regression 6

Logistic regression Decision boundary Logistic regression 7

If or equivalently predict y = 1 If or equivalently predict y = 0 Logistic regression 8

Example If and Prediction y = 1 whenever Logistic regression 9

Example If and Prediction y = 1 whenever Logistic regression 10

Logistic regression Cost Function Logistic regression 11

Training and cost function Training data wih m datapoints, n features where Average cost Logistic regression 12

Reusing Linear Regression cost Cost from linear regression with logistic regression hypothesis leads to non-convex average cost Convex J easier to optimize (no local optima) All function values below intersection with any line 13

Logistic Regression Cost function If y = 1 and h(x) = 1, Cost = 0 But for Corresponds to intuition: if prediction is h(x) = 0 but actual value was y = 1, learning algorithm will be penalized by large cost Logistic regression 14

Logistic Regression Cost function If y = 0 and h(x) = 0, Cost = 0 But for Logistic regression 15

Logistic regression Simplified Cost Function & Gradient Descent Logistic regression 16

Simplified Cost Function (1) Original cost of single training example Because we always have y = 0 or y = 1 we can simplify the cost function definition to To convince yourself, use the simplified cost function to calculate Logistic regression 17

Simplified Cost Function (2) Cost function for training set Find parameter argument that minimizes J: To make predictions given new x output Logistic regression 18

Gradient Descent for logistic regression Gradient Descent to minimize logistic regression cost function with identical algorithm as for linear regression Logistic regression 19

Beyond Gradient Descent - Advanced Optimization Logistic regression 20

Advanced Optimization Algorithms Given functions to compute an optimization algorithm will compute Optimization Algorithms (Gradient Descent) Conjugate Gradient BFGS & L-BFGS Advantages Often faster convergence No learning rate to choose Disadvantages Complex Logistic regression 21

Preimplemented Alorithms Advanced optimization algorithms exist already in Machine Learning packages for important languages Octave/Matlab R Java Rapidminer under the hood Logistic regression 22

Multiclass Classification (by cheap trickery) Logistic regression 23

Multiclass classification problems Classes of Emails: Work, Friends, Invoices, Job Offers Medical diagnosis: Not ill, Asthma, Lung Cancer Weather: Sunny, Cloudy, Rain, Snow Number classes as 1, 2, 3,... Logistic regression 24

Binary vs. Multiclass Classification Logistic regression 25

One versus all Logistic regression 26

Train logistic regression classifier for each class i to predict probability of y = i On new x predict class i which satisfies Logistic regression 27

This lecture covered Logistic regression hypothesis Decision Boundary Cost function(why we need a new one) Simplified Cost function & Gradient Descent Advanced Optimization Algorithms Multiclass classification Logistic regression 28

Pictures Tumor picture by flickr-user bc the path, License CC SA NC Lightbulb picture from openclipart.org, public domain Machine Learning Introduction 29