Logistic regression CS434

Similar documents
Probabilistic Linear Classifier: Logistic Regression. CS534-Machine Learning

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Lecture 2: Single Layer Perceptrons Kevin Swingler

What is Candidate Sampling

Logistic Regression. Steve Kroon

Forecasting the Direction and Strength of Stock Market Movement

Lecture 5,6 Linear Methods for Classification. Summary

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

L10: Linear discriminants analysis

Single and multiple stage classifiers implementing logistic discrimination

Bag-of-Words models. Lecture 9. Slides from: S. Lazebnik, A. Torralba, L. Fei-Fei, D. Lowe, C. Szurka

Production. 2. Y is closed A set is closed if it contains its boundary. We need this for the solution existence in the profit maximization problem.

Credit Limit Optimization (CLO) for Credit Cards

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

Chapter 6. Classification and Prediction

Figure 1. Training and Test data sets for Nasdaq-100 Index (b) NIFTY index

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

The Mathematical Derivation of Least Squares

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

1 Example 1: Axis-aligned rectangles

Prediction of Disability Frequencies in Life Insurance

Intelligent stock trading system by turning point confirming and probabilistic reasoning

Machine Learning and Data Mining Lecture Notes

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Support Vector Machines

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Monitoring Network Traffic to Detect Stepping-Stone Intrusion

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning

How To Find The Dsablty Frequency Of A Clam

Solder paste inspection and 3D shape estimation using directional LED lightings

FORECASTING TELECOMMUNICATION NEW SERVICE DEMAND BY ANALOGY METHOD AND COMBINED FORECAST

SVM Tutorial: Classification, Regression, and Ranking

ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Statistical Methods to Develop Rating Models

Performance attribution for multi-layered investment decisions

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION

Goals Rotational quantities as vectors. Math: Cross Product. Angular momentum

Prediction of Stock Market Index Movement by Ten Data Mining Techniques

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Research Article Integrated Model of Multiple Kernel Learning and Differential Evolution for EUR/USD Trading

Recurrence. 1 Definitions and main statements

Applied Research Laboratory. Decision Theory and Receiver Design

STATISTICAL DATA ANALYSIS IN EXCEL

Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data

Abstract. Clustering ensembles have emerged as a powerful method for improving both the

Searching for Interacting Features for Spam Filtering

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Improving Resource Allocation Strategy Against Human Adversaries in Security Games

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

This paper can be downloaded without charge from the Social Sciences Research Network Electronic Paper Collection:

Optimal Bidding Strategies for Generation Companies in a Day-Ahead Electricity Market with Risk Management Taken into Account

New Approaches to Support Vector Ordinal Regression

A Lyapunov Optimization Approach to Repeated Stochastic Games

A Probabilistic Theory of Coherence

Online Advertisement, Optimization and Stochastic Networks

Active Learning for Interactive Visualization

Economic Interpretation of Regression. Theory and Applications

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

Regression Models for a Binary Response Using EXCEL and JMP

Point cloud to point cloud rigid transformations. Minimizing Rigid Registration Errors

Optimal Customized Pricing in Competitive Settings

Data Visualization by Pairwise Distortion Minimization

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Filtering Junk

Fisher Markets and Convex Programs

Support vector domain description

Implementation of Deutsch's Algorithm Using Mathcad

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

A PREDICTIVE MODEL FOR CUSTOMER PURCHASE BEHAVIOR IN E-COMMERCE CONTEXT

When do data mining results violate privacy? Individual Privacy: Protect the record

Support Vector Machine Model for Currency Crisis Discrimination. Arindam Chaudhuri 1. Abstract

1. Measuring association using correlation and regression

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets

Dynamic Pricing and Inventory Control: Uncertainty. and Competition

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Evaluating the generalizability of an RCT using electronic health records data

Planning for Marketing Campaigns

Resource Scheduling Scheme Based on Improved Frog Leaping Algorithm in Cloud Environment

Improved SVM in Cloud Computing Information Mining

Lecture 3: Force of Interest, Real Interest Rate, Annuity

A novel Method for Data Mining and Classification based on

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

Prediction of Success or Fail of Students on Different Educational Majors at the End of the High School with Artificial Neural Networks Methods

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Social Nfluence and Its Models

Fragility Based Rehabilitation Decision Analysis

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

Transcription:

Logstc regresson CS434

Baes (Naïve or not) Classfers: Generatve Approach What do e mean b Generatve approach: Assumng that each data pont s generated follong a generatve process governed b p() and p(x ) Learn p(), p(x ) and then appl Baes rule to compute p( x) for makng predctons

Generatve approach s just one tpe of learnng approaches used n machne learnng Learnng a correct generatve model p(x ) s dffcult denst estmaton s a challengng problem n ts on And sometmes unnecessar In contrast, LTU, KNN and DT are hat e call dscrmnatve methods The are not concerned about an generatve models The onl care about fndng a good dscrmnatve functon LTU, KNN and DT learn determnstc functons, not probablstc One can also take a probablstc approach to learnng dscrmnatve functons.e., Learn p( x) drectl thout learnng p(x ) Logstc regresson s one such approach

Logstc regresson Recall the problem of regresson Learns a mappng from nput vector x to a contnuous output Logstc regresson extends tradtonal regresson to handle bnar output g(t) In partcular, e assume that P ( x) ( 0 x... m x m ) e T T g ( x) x e Sgmod functon t

Logstc Regresson Equvalentl, e have the follong: P( x) log 0 x P( 0 x) Odds of =... m x m Sde Note: the odds n favor of an event are the quantt p /( p), here p s the probablt of the event If I toss a far dce, hat are the odds that I ll have a sx? In other ords, LR assumes that the log odds s a lnear functon of the nput features

Learnng for logstc regresson Gven a set of tranng data ponts, e ould lke to fnd a eght vector such that P ( x) T x e s large (e.g. ) for postve tranng examples, and small (e.g. 0) otherse In other ords, a good eght vector should satsf the follong: x should be large negatve values for ponts x should be large postve valuse for + ponts

Learnng for logstc regresson Ths can be captured b the log lkelhood functon: L() log P( [ x log P(, ) x, ) ( )log( P( x, ))] Note that the superscrpt s an ndex to the examples n the tranng set Ths s call the lkelhood functon of, and b maxmzng ths objectve functon, e perform hat e call maxmum lkelhood estmaton of the parameter.

MLE for logstc regresson n n P P l ), ( log ), ( log ) ( x x n n MLE P P P l )), ( )log( ( ), ( log arg max ), ( log arg max ) ( arg max x x x Equvalentl, gven a set of tranng data ponts, e ould lke to fnd a eght vector such that s large (e.g. ) for postve tranng examples, and small (e.g. 0) otherse the same as our ntuton ) ( x, P

Optmzng l() Unfortunatel ths does not have a close form soluton Instead, e teratvel search for the optmal Start th a random, teratvel mprove (smlar to Perceptron) b movng toard the gradent drecton (the fastest ncreasng drecton)

. Gradent Descend/Ascend Example Start from a random ntal pont Iteratvel move toard the drecton that mproves the objectve at maxmal rate Stop hen reachng local optmal pont ( 0

Batch Learnng for Logstc Regresson Note: takes 0/ here, not / Gven : tranng examples ( x Let (0,0,0,...,0) Repeat untl convergence d (0,0,0,...,0) For to N do x e error d d error x d, ),,..., N Gradent contrbuton from the th example Learnng rate

Logstc Regresson Vs. Perceptron Note the strkng smlart beteen the to algorthms In fact LR learns a lnear decson boundar ho so? Home ork assgnment What are the dfference? Dfferent as to tran the eghts LR produces a probablt estmaton

Logstc Regresson vs. Naïve Baes If e use Naïve Baes and assume Gaussan dstrbuton for p(x ), e can sho that p(= X) takes the exact same functonal form of Logstc Regresson What are the dfferences here? Dfferent as of tranng Naïve baes estmates θ b maxmzng P(X =v, θ ), and hle dong so assumes condtonal ndependence among attrbutes Logstc regresson estmates b maxmzng P( x, ) and make no condtonal ndependence assumpton.

Comparatvel Naïve Baes generatve model: P(x ) makes strong condtonal ndependence assumpton about the data attrbutes When the assumptons are ok, Naïve Baes can use a small amount of tranng data and estmate a reasonable model Logstc regresson dscrmnatve model: drectl learn p( X) has feer parameters to estmate, but the are ted together and make learnng harder Makes eaker assumptons Ma need large number of tranng examples Bottom lne: f the naïve baes assumpton holds and the probablstc models are accurate (.e., x s gaussan gven etc.), NB ould be a good choce; otherse, logstc regresson orks better

Summar We ntroduced the concept of generatve vs. dscrmnatve method Gven a method that e dscussed n class, ou need to kno hch categor t belongs to Logstc regresson Assumes that the log odds of = s a lnear functon of x (.e., x) Learnng goal s to learn a eght vector such that examples th = are predcted to have hgh P(= x) and vce versa Maxmum lkelhood estmaton s a approach that acheves ths Iteratve algorthm to learn usng MLE Smlart and dfference beteen LR and Perceptrons Logstc regresson learns a lnear decson boundares