robablstc Lnear Classfer: Logstc Regresson CS534-Machne Learnng
Three Man Approaches to learnng a Classfer Learn a classfer: a functon f, ŷ f Learn a probablstc dscrmnatve model,.e., the condtonal dstrbuton Learn a probablstc generatve model,.e., the ont probablt dstrbuton:, Eamples: Learn a classfer: erceptron, LDA proecton th threshold ve Learn a condtonal dstrbuton: Logstc regresson Learn the ont dstrbuton: a probablstc ve of Lnear Dscrmnant Analss LDA 2
Notaton Shft S{, :,, N} --- superscrpt for eample nde. N s the total number of eamples Subscrpt for element nde thn a vector,.e., represents the th element of the th tranng eample Class labels are 0 and not and - 3
Logstc Regresson Gven tranng set D, stc regresson learns the condtonal dstrbuton We ll assume onl to classes 0 and and a parametrc form for, ere s the parameter vector p ; p e p 0 ; p It s eas to sho that ths s equvalent to p p ; 0 ;.e. the odds of class s a lnear functon of. 4
Wh the Logstc Sgmod Functon g, ep A lnear functon has a range from [, ], the stc functon transforms the range to [0,] to be a probablt. blt 5
Logstc Regresson Yelds Lnear Classfer Recall that t gven e predct ŷ fth the epected loss of predctng 0 s greater than predctng for no assume L0, L,0 E [ L0, ] > E L0, > [ L, ] L, 0 L00 L0 > 0 L0 L > 0 0 > > 0 0 > 0 Ths assumed L0,L,0 A smlar dervaton can be done for arbtrar L0, and L,0. 6
Mamum Lkelhood Learnng Recall that the lkelhood functon s the probablt of the data D gven the parameters pd It s a functon of the parameters Mamum lkelhood learnng fnds the parameters that mamze ths lkelhood functon A common trck s to ork th -lkelhood,.e., take the arthm of the lkelhood functon pd
Computng the Lkelhood In our frameork, e assume each tranng eample, s dran ndependentl from the same but unknon dstrbuton, the famous dassumpton..d assumpton, hence e can rte D,, Jont dstrbuton a,b can be factored as a bb arg ma D arg ma arg ma,, Further, because t does not depend on, so: arg ma D arg ma, 8
Computng the Lkelhood Recall arg ma D arg ma, p, g, e p 0, g, Ths can be compactl rtten as p, We ll take our learnng obectve functon to be: L D, [ ] 9
Fttng Logstc Regresson b Gradent Ascent L ] l l [ l L ] [, L ] [, Recall that ep, ep t e have ep for t t t g ep p g t 2 t g t g -t So N 0 N L N L
Batch Gradent Ascent for LR Gven : tranng eamples Let 000 0,0,0,...,0 0 Repeat untl convergence d 0,0,0,...,0 For to N do e error d d error η d,,,..., N Onlne gradent ascent algorthm can be easl constructed
Connecton Beteen Logstc Regresson & erceptron Algorthm If e replace the stc functon th a step functon: h e f > 0 h 0 otherse Both algorthms uses the same updatng rule : η h 2
Mult-Class Cases Choose class K to be the reference class and represent each of the other classes as a stc functon of the odds of class k versus class K: of class k versus class K: 2 l K 2 2 K K M K K K Gradent ascent can be appled to smultaneousl tran all eght vectors k 3
Mult-Class Cases Condtonal probablt for class k K can be computed as ep k k K ep l l For class K, the condtonal probablt blt s K K ep l l 4
Summar of Logstc Regresson Learns condtonal probablt dstrbuton Local Search begns th ntal eght vector. Modfes t teratvel to mamze the lkelhood of the data Onlne or Batch both onlne and batch varants of the algorthm est 5