Logistic Regression. Steve Kroon


 Patrick Briggs
 2 years ago
 Views:
Transcription
1 Logstc Regresson Steve Kroon Course notes sectons: Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro supervsed classfcaton We are gven tranng data {(x, y ) : =,..., n} from some (mxture) dstrbuton, where y ndcates class membershp. Am: gven a new x, predct the correspondng y. Ths stuaton s often not determnstc (e.g. gven heght and weght nfo, predctng gender). Usng class membershp probabltes Gven pror probabltes for each class, and a generatve model for each class, we can use the maxmum lkelhood estmate. Ths s the class the pont has the hghest probablty of beng n. To do ths, we only need to know whch class has hghest P (y x) at each x. More generally, we mght not want to pck the class wth hghest probablty (e.g. spam classfcaton, cancer dagnoss, extreme sports). Decdng when ths s the case, and what to do then, s the subject of decson theory. The theory makes use of a socalled loss functon. Key nsght s that the actual probabltes of each class are useful beyond just the maxmum. However, we stll only need to know P (y x) at each x. Generatve vs dscrmnatve models We can get class probabltes f we have generatve models. (A generatve model s a full specfcaton of P (x, y).) The key ssue: The more parameters you have to estmate from data, the less sure you are of each estmate.
2 Snce we usually don t actually know the model for each class, we must estmate t from the class data. Two phases: certan assumptons/pror knowledge, such as normalty; followed by estmatng parameters from the data. If we need the model, there s no problem wth ths approach. However, f we only want to classfy, we don t need to know the margnal dstrbuton P (x), even though generatve models provde ths nformaton. Dscrmnatve models are specfcatons of the condtonal dstrbuton P (y x). Snce generatve models usually have more parameters than dscrmnatve ones, dscrmnatve models often outperform generatve models for classfcaton. Note that generatve models can be used for tasks dscrmnatve models can t perform. What should a dscrmnatve model look lke? We don t know a model for P (y x), and have no ntuton yet. To develop an ntuton, let us look at what P (y x) looks lke when we do know the models generatng the data. Assume we have two classes C and C2, wth pror probabltes P (C) and P (C2). Then P (C x) = = = P (C, x) P (x) P (x C)P (C) P (x C)P (C) + P (x C2)P (C2) + P (x C2)P (C2) P (x C)P (C) = + exp ( a(x)) = σ(a(x)) where we convenently defne a(x) = ln P (x C)P (C) P (x C2)P (C2) and the logstc functon σ(y) = +exp ( y). Note that σ(y) les n (0, ), and that a(x) s the socalled logodds for class membershp of x. (You should see a smlar expresson turnng up n assgnment 2.) Also t s worth verfyng that the dervatve of σ(y) s σ(y)( σ(y)). Next, we wll assume the classes each have Gaussan dstrbutons, wth means µ and µ 2 and covarance matrces Σ and Σ 2. What s a(x) then? a(x) = ln P (x C) ln P (x C2) + ln P (C) P (C2) = 2 ln 2πΣ 2 (x µ ) T Σ (x µ ) + 2 ln 2πΣ (x µ 2) T Σ 2 (x µ 2) + ln P (C) P (C2) 2
3 If we further assume that Σ = Σ 2 = Σ, we get some cancellaton, yeldng ln P (C) P (C2) /2[(x µ ) T Σ (x µ ) (x µ 2 ) T Σ (x µ 2 )] Multplyng out, we get: /2 [ (µ 2 µ ) T Σ x + x T Σ (µ 2 µ ) + (µ T Σ µ µ T 2 Σ µ 2 ) ] + ln P (C) P (C2) [ = [Σ (µ 2 µ )] T x + /2µ T Σ µ + /2µ T 2 Σ µ 2 + ln P (C) ] P (C2) = w T x + w 0 where these equatons defne w and w 0. Thus, we fnd that for 2 classes wth equal covarances, but dfferent means, the logodds s a lnear functon of the observatons. It follows that n ths case, f we used the data to drectly estmate the means and covarance matrx, we would estmate 2d+d(d+)/2 parameters, whle f we could drectly estmate (w, w 0 ), we would only be estmatng d + parameters. The multvarate normal case Let us now consder the same problem, but wth k classes. Then P (C x) = P (x C)P (C)/ [P (x C )P (C)] We could go the same route as before (dvdng the numerator and denomnator by P (x C)P (C), but that leads to complcatons wth more than 2 classes. Instead, we shall wrte a (x) = ln P (x C )P (C ), so that P (C x) = exp (a (x))/ exp (a (x)). 2 Agan assumng Gaussans wth shared covarance, we eventually conclude that a (x) = w T x + w 0, where and w = Σ µ w 0 = µt Σ µ + ln P (C ) 2 Comparng the number of parameters, we have kd+d(d+)/2 for a generatve approach versus k(d + ) for the dscrmnatve approach. If we restrct ourself to usng a dagonal covarance matrx n the generatve approach, a la Nave Bayes, the number of parameters s reduced to k(d + ). But now there s a hgher chance the model s wrong. Fndng w These examples motvate modellng P (y x) by a logstc functon of the logodds of the observaton, whch we model usng lnear functons (for 2 classes); or a softmax functon, usng lnear functons of the observatons as exponents If we assume dfferent covarance matrces, we get a quadratc functon of the observatons. 2 Thus a(x) n the twoclass case s a (x) a 2 (x). 3
4 (for multclass problems). More generally, we could use quadratc functons, or even more generally, a lnear functon of some transformaton of the observaton. The extenson to transformatons of the data s n the textbook; we wll stck to the lnear case here. However, we add a to the feature vector for each observaton to get rd of the nconvenent w 0. Let us try to select w usng maxmum lkelhood on a tranng set. Thus, we try to dentfy whch selecton of w was most lkely to generate the labels n the tranng set! We begn by wrtng down the lkelhood of the tranng data as a functon of w (bnary case n notes). 3 We have P (X, Y w) = P (Y X, w)p (X w), but snce P (X w) = P (X), ths equals P (Y X, w)p (X) = P (X) P (y x, w) Ths factorzaton assumes that the label of x s c.. of other observatons and labels, gven the observaton x. 4 For mathematcal convenence defne t j = f y = Cj, and 0 for the other k classes. 5 Then the lkelhood becomes P (X) P (y = Cj x, w) tj j In order to maxmze ths, we mnmze the negatve loglkelhood w.r.t. w. Ths equals ln P (X) t j ln P (y = Cj x, w) j Wrtng P (y = Cj x, w) = ln P (X) exp (aj(x)) r exp (ar(x)), we get [ t j a j (x ) ln r j exp (a r (x )) ] where the a r are lnear functons of x : a r (x ) = w T r x. To mnmze, we take the gradent w.r.t. w: 6 wv = [ t v x exp (a ] v(x ))x r exp (a r(x )) = [ ] exp (av (x )) r exp (a r(x )) t v x For an optmum all k of these gradents must smultaneously be zero. Ths s a nonlnear system of k(d + ) equatons n k(d + ) unknowns, so we wll make use of a numercal optmzaton technque, NewtonRaphson optmzaton. 3 Here X s the observaton matrx and Y the vector of labels. 4 A common settng for supervsed learnng s assumng IID data, whch satsfes ths. Ths assumpton keeps thngs smple, even f often not qute true. 5 Ths s known as a ofk encodng. 6 Note that ln P (X) s constant w.r.t. w, allowng ths term to be removed from the optmzaton problem. Wth generatve models, ths s not the case. 4
5 NewtonRaphson for multclass logstc regresson Recall that we wanted to fnd the elements of w mnmzng the negatve loglkelhood l(w) = ln P (X) t j [a j (x ) ln exp (a r (x ))] j r wth a r (x ) = wr T x. Settng the gradent l(w) to zero drectly yelded a large nonlnear system of equatons, whch we could not solve analytcally. To apply logstc regresson, we need not only l and l, but also H l, so we must do further dfferentaton. To smplfy ths, let us defne y v (b) = exp (b v )/ r exp (b r), wth y v = y v (a(x )). In ths notaton, the gradent of the negatve loglkelhood (w.r.t. w v ) turns out to smply be [y v t v ]x. Another advantage of ths defnton s to smplfy the calculus. Let us frst calculate b y v. We have y v = exp (b v)( r exp (b r) exp (b v )) b v ( r exp (b r)) 2 = y v (b)( y v (b)) smlar to the dervatve of the logstc functon, whle for j v we have y v = exp (b v) exp (b j ) b j ( r exp (b r)) 2 = y v(b)y j (b) ( r exp (b r)) 2 Note that these results can be pooled as yv b j = y v (b)(i vj y j (b)), so that we do not need to handle the case j = v separately. Usng ths, we can fnd the entres of the Hessan as follows (where x (k) denotes the kth component of x ): 7 2 l w v,d w v2,d 2 = = = w v,d x (d2) x (d2) [y v2 t v2 ]x (d2) y v2 (a(x )) w v,d y v2 (a(x ))(I v2j y j (a(x ))) a j (x ) w v,d j where the last step follows from the chan rule. Now, w v a,d j (x ) s zero for j v, and x (d) for j = v, so that the above expresson equals y v2 (a(x ))(I v2v y v (a(x )))x (d2)x (d) so that we can wrte the block of the Hessan correspondng to w v and w v2 as y v2 (I v2v y v )x x T Now that we can calculate the Hessan and the gradent, we can start wth an ntal guess (for example, settng all the w s to zero ntally), and then applyng NewtonRaphson updates. We leave showng that the Hessan s postve semdefnte to the nterested reader. 7 Here w v,dj refers to the d j th component of w v. 5
6 Twoclass logstc regresson It s worth notcng that the soluton to the multclass problem above s not unque: addng a constant to any component of all the w vectors yelds the same soluton. Thus, we can assume that the soluton vector for one of the classes s the zero vector. Ths means that we only need to fnd the w vectors for k classes, rather than k. In the bnary case, ths smplfes thngs consderably. It s left to the reader to verfy that after the adjustment mentoned n the prevous paragraph (settng w = 0), the softmax functon for class probablty for the frst class reduces to the logstc functon dscussed earler, where w now represents the adjusted weght w 2. 8 The negatve loglkelhood of the observatons, as obtaned earler, s l = ln P (X) + t j ln P (y = Cj x, w) j In ths case, we have P (y = C x, w) = σ(a(x )) and P (y = C2 x, w) = σ(a(x )), wth a(x) = w T x (agan, an extra feature has been added to the observatons to cater for the bas term). For the bnary case, t s more convenent to replace the ofk encodng t j wth a bnary encodng: t = f x s n class, and 0 otherwse. Then, the negatve loglkelhood becomes ln P (X) (t ln σ(a(x )) + ( t ) ln( σ(a(x )))) Next we derve the gradent and Hessan of l: l = w d t [σ(a(x ))] σ(a(x ))( σ(a(x )))x (d) = ( t )[ σ(a(x ))] σ(a(x ))( σ(a(x )))x (d) = [t ( σ(a(x ))) ( t )σ(a(x ))]x (d) = (σ(a(x )) t )x (d) so that w l = (σ(a(x )) t )x. Next 2 l w d w d2 = x (d2)σ(a(x ))[ σ(a(x ))]x (d) so that H l (w) = σ(a(x ))[ σ(a(x ))]x x T. In order to ensure that our optmzaton fnds a mnmum, we show that the Hessan matrx s postve semdefnte. Frst note that for any, x x T s postve semdefnte, snce for any u, u T x x T u = (x T u) T (x T u) = x T u It should also be easy to verfy that the new weght vector w equals the dfference of the weght vectors obtaned usng the multclass approach,.e. w 2 w. 6
7 Next we note that snce the range of σ s (0, ), the coeffcent of x x T s always postve, so that the Hessan s a sum of postve semdefnte matrces, and s thus postve semdefnte. 9 Fnally, we can apply NewtonRaphson optmzaton 0 to the loglkelhood to obtan the weght vector w. A complcaton wth logstc regresson overfttng Suppose that a weght vector w leads to a perfect classfcaton of the data set n the bnary case, usng classfcaton by the class wth hghest probablty, and where we consder all classes equally lkely. In such a case, we say the data set s lnearly seperable. For ths classfcaton, the classfcaton boundary (or decson surface) les where P (C x) = 0.5. Snce P (C x) = (+exp (w T x)), we must have that w T x = 0,.e. the decson surface s a lne passng through the orgn. Now, consder what happens f we rather classfy wth w = 2w. In such a case, the decson boundary and all the pont classfcatons reman the same. However, the lkelhoods assocated wth each pont now becomes greater, yeldng a hgher lkelhood soluton than the orgnal w. We can contnue doublng w repeatedly n ths way, leadng n the lmt to a stuaton where the predcted probabltes become step functons at the decson boundary. (To clarfy ths, draw the logstc functon as ts argument ncreases.) Although the maths s dfferent, ths behavour manfest to varyng degrees even when there are multple classes, pror probabltes for the classes are unequal, and the classes are not lnearly separable. Ths s a common problem wth many machne learnng approaches that estmate parameters by optmzaton, and s known as overfttng. To see why ths s a problem, note that your model s now very confdent about ts classfcaton of future ponts close to the decson boundary, even though t has never observed data there! Essentally, the only gude avalable to the algorthm s the data t has been gven, and the lnear constrant on the decson surface. However, we usually do not expect the probabltes of the classes to change abruptly between 0 and. Thus, we must fnd some way of takng ths nto account n our calculatons. There are two major approaches to dong ths, whch are closely related: frst, one can use a pror dstrbuton on the weght vector w, whch s then updated by the lkelhood calculaton to obtan a maxmum a posteror (MAP) estmate; second, one can penalze choces of w leadng to undesrable behavour n our classfer by addng an extra term to the lkelhood functon ths s known as regularzaton. The relatonshp between these approaches s that the sze of the penalty, or regularzaton, term, should depend on how lkely you thnk certan values of w are n advance. Thus, the choce of regularzaton functon effectvely encodes a pror dstrbuton on the parameters under nvestgaton nto the lkelhood, 9 Under farly general condtons, the Hessan can n fact be shown to be postve defnte, by notng that t s a weghted sum of the rank matrces x x T. However, we do not go nto that here, snce the regularzaton we apply later wll easly lead to a postve defnte Hessan, n any case. 0 Because each quadratc approxmaton step n the NewtonRaphson optmzaton s effectvely a weghted least squares ft to the data (a common approach for estmatng parameters n statstcs), ths procedure s sometmes called teratve reweghted least squares (IRLS). 7
8 so that the optmum of ths regularzed lkelhood functon s actually the MAP estmate for the correspondng pror. Prors and regularzaton Let us assume a normal pror dstrbuton on w. We would prefer smaller w, so let us set the mean of the pror to 0. Also, we have no reason to expect that certan components of w must be larger than others, or that they should be correlated, so let us assume a dagonal covarance matrx, wth equal entres on the dagonals (.e. Σ = λi for some λ > 0). Gven the data set (X, Y ), what s the posteror dstrbuton for w? We have p(w X, Y ) = p(w)p(x, Y w) p(x, Y ) where the denomnator s not dependent on w. We can maxmze ths by mnmzng the negatve logarthm of the numerator, log p(w) log p(x, Y w) = wt w 2λ + l(w) + C for a constant C, and n ths formulaton we see that the pror dstrbuton on w has led to the regularzaton penalty λ J(w) = wt w 2λ. Ths partcular form s very convenent from a calculus pont of vew, snce J(w) = w, and thus H J (w) = I. The choce of λ determnes how strong one wshes the penalty term to be, and must usually be determned emprcally. To apply regularzaton n ths context s a straght forward modfcaton of the earler approach: one stll uses NewtonRaphson optmzaton, but rather than optmzng l(w), one optmzes l(w) + λ J(w), whch has a slghtly modfed gradent and Hessan 2. Choce of λ Let us next dscuss the selecton of λ: λ s an example of an algorthm parameter whch we can adjust, or tune, n the hope of obtanng good performance for our classfer, although we have no gudance of our selecton. One way to get an ndcaton of a good choce s to keep some of our tranng data asde (let us call ths part the valdaton set), and do parameter estmaton on the remanng tranng set for varous choces of λ. Our fnal choce of λ can then be obtaned by comparng the performance of the varous classfers bult wth dfferng choces of λ on the valdaton set. Fnally, we mght reestmate the parameters for ths choce of λ usng the whole orgnal tranng set for our fnal classfer. 3 Revewng the math, we see a general rule of thumb that the regularzaton penalty corresponds roughly to the negatve logarthm of the pror dstrbuton, snce for MAP estmates, we typcally mnmze the negatve log posteror, whch equals the negatve loglkelhood plus the negatve logpror. 2 Note that the modfed Hessan s postve defnte now, rather than postve semdefnte. 3 Many other approaches, such as crossvaldaton, are possble, and fndng good approaches to handlng parameter tunng s somewhat of an art. Much research has been done n ths area, but t s fraught wth dffcultes. 8
9 An alternatve vew If we consder l(w) for the bnary case, and gnore the constant ln P (X), we have (t ln σ(a(x )) + ( t ) ln( σ(a(x )))) For each data pont, ths functon calculates the predcted class probablty p for the actual class of that pont, and adds ln p to a total. If the probablty s close to one, the amount added s small, but for ponts whch are badly msclassfed, the amount added can be much larger. Ths nterpretaton helps us understand why maxmum llelhood approaches overft there s pressure to reduce these penaltes. However, when regularzaton s performed, an extra λ J(w) s added to ths functon, n such a way that overfttng to reduce the loss functon s prevented by a compensatng ncrease n ths regularzaton term. Many other classfcaton technques can also be formulated n terms of a regularzaton functon combned wth penalty functon on classfcaton of ponts. 9
Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
More informationSupport Vector Machines
Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.
More informationL10: Linear discriminants analysis
L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss
More informationWhat is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
More informationCS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
More informationLecture 5,6 Linear Methods for Classification. Summary
Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson
More informationCausal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes causeandeffect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
More informationbenefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
More informationAn Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
More informationBERNSTEIN POLYNOMIALS
OnLne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful
More informationModule 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
More informationThe covariance is the two variable analog to the variance. The formula for the covariance between two variables is
Regresson Lectures So far we have talked only about statstcs that descrbe one varable. What we are gong to be dscussng for much of the remander of the course s relatonshps between two or more varables.
More information1 Approximation Algorithms
CME 305: Dscrete Mathematcs and Algorthms 1 Approxmaton Algorthms In lght of the apparent ntractablty of the problems we beleve not to le n P, t makes sense to pursue deas other than complete solutons
More informationTHE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES
The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered
More information8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by
6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng
More informationInequality and The Accounting Period. Quentin Wodon and Shlomo Yitzhaki. World Bank and Hebrew University. September 2001.
Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.
More informationSingle and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul  PUCRS Av. Ipranga,
More information1. Measuring association using correlation and regression
How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a
More informationPSYCHOLOGICAL RESEARCH (PYC 304C) Lecture 12
14 The Chsquared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
More informationx f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60
BIVARIATE DISTRIBUTIONS Let be a varable that assumes the values { 1,,..., n }. Then, a functon that epresses the relatve frequenc of these values s called a unvarate frequenc functon. It must be true
More informationForecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract  Stock market s one of the most complcated systems
More information8 Algorithm for Binary Searching in Trees
8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the
More informationInstitute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
More informationCHAPTER 14 MORE ABOUT REGRESSION
CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp
More informationThe OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
More information1 Example 1: Axisaligned rectangles
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton
More informationCHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable
More informationv a 1 b 1 i, a 2 b 2 i,..., a n b n i.
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are
More informationFormula of Total Probability, Bayes Rule, and Applications
1 Formula of Total Probablty, Bayes Rule, and Applcatons Recall that for any event A, the par of events A and A has an ntersecton that s empty, whereas the unon A A represents the total populaton of nterest.
More informationCS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering
Lecture 7a Clusterng Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Clusterng Groups together smlar nstances n the data sample Basc clusterng problem: dstrbute data nto k dfferent groups such that
More informationMachine Learning and Data Mining Lecture Notes
Machne Learnng and Data Mnng Lecture Notes CSC 411/D11 Computer Scence Department Unversty of Toronto Verson: February 6, 2012 Copyrght c 2010 Aaron Hertzmann and Davd Fleet CONTENTS Contents Conventons
More informationProbabilistic Linear Classifier: Logistic Regression. CS534Machine Learning
robablstc Lnear Classfer: Logstc Regresson CS534Machne Learnng Three Man Approaches to learnng a Classfer Learn a classfer: a functon f, ŷ f Learn a probablstc dscrmnatve model,.e., the condtonal dstrbuton
More informationThe Mathematical Derivation of Least Squares
Pscholog 885 Prof. Federco The Mathematcal Dervaton of Least Squares Back when the powers that e forced ou to learn matr algera and calculus, I et ou all asked ourself the ageold queston: When the hell
More informationLecture 3: Force of Interest, Real Interest Rate, Annuity
Lecture 3: Force of Interest, Real Interest Rate, Annuty Goals: Study contnuous compoundng and force of nterest Dscuss real nterest rate Learn annutymmedate, and ts present value Study annutydue, and
More informationStaff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall
SP 200502 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 148537801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent
More information6. EIGENVALUES AND EIGENVECTORS 3 = 3 2
EIGENVALUES AND EIGENVECTORS The Characterstc Polynomal If A s a square matrx and v s a nonzero vector such that Av v we say that v s an egenvector of A and s the correspondng egenvalue Av v Example :
More informationRegression Models for a Binary Response Using EXCEL and JMP
SEMATECH 997 Statstcal Methods Symposum Austn Regresson Models for a Bnary Response Usng EXCEL and JMP Davd C. Trndade, Ph.D. STATTECH Consultng and Tranng n Appled Statstcs San Jose, CA Topcs Practcal
More informationLatent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006
Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model
More informationECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble
1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at UrbanaChampagn, Urbana, IL, USA Abstract In
More informationNPAR TESTS. OneSample ChiSquare Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6
PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has
More informationTable of Contents EQ.10...46 EQ.6...46 EQ.8...46
Table of Contents CHAPTER II  PATTERN RECOGNITION.... THE PATTERN RECOGNITION PROBLEM.... STATISTICAL FORMULATION OF CLASSIFIERS...6 3. CONCLUSIONS...30 UNDERSTANDING BAYES RULE...3 BAYESIAN THRESHOLD...33
More informationMANY machine learning and pattern recognition applications
1 Trace Rato Problem Revsted Yangqng Ja, Fepng Ne, and Changshu Zhang Abstract Dmensonalty reducton s an mportant ssue n many machne learnng and pattern recognton applcatons, and the trace rato problem
More information2.4 Bivariate distributions
page 28 2.4 Bvarate dstrbutons 2.4.1 Defntons Let X and Y be dscrete r.v.s defned on the same probablty space (S, F, P). Instead of treatng them separately, t s often necessary to thnk of them actng together
More informationUsing Series to Analyze Financial Situations: Present Value
2.8 Usng Seres to Analyze Fnancal Stuatons: Present Value In the prevous secton, you learned how to calculate the amount, or future value, of an ordnary smple annuty. The amount s the sum of the accumulated
More informationUsing Mixture Covariance Matrices to Improve Face and Facial Expression Recognitions
Usng Mxture Covarance Matrces to Improve Face and Facal Expresson Recogntons Carlos E. homaz, Duncan F. Glles and Raul Q. Fetosa 2 Imperal College of Scence echnology and Medcne, Department of Computng,
More informationMulticlass sparse logistic regression for classification of multiple cancer types using gene expression data
Computatonal Statstcs & Data Analyss 51 (26) 1643 1655 www.elsever.com/locate/csda Multclass sparse logstc regresson for classfcaton of multple cancer types usng gene expresson data Yongda Km a,, Sunghoon
More informationA Probabilistic Theory of Coherence
A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want
More informationANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 6105194390,
More informationGRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 NORM
GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 NORM BARRIOT JeanPerre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: jeanperre.barrot@cnes.fr 1/Introducton The
More informationSection 5.3 Annuities, Future Value, and Sinking Funds
Secton 5.3 Annutes, Future Value, and Snkng Funds Ordnary Annutes A sequence of equal payments made at equal perods of tme s called an annuty. The tme between payments s the payment perod, and the tme
More informationChapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT
Chapter 4 ECOOMIC DISATCH AD UIT COMMITMET ITRODUCTIO A power system has several power plants. Each power plant has several generatng unts. At any pont of tme, the total load n the system s met by the
More informationProbabilities and Probabilistic Models
Probabltes and Probablstc Models Probablstc models A model means a system that smulates an obect under consderaton. A probablstc model s a model that produces dfferent outcomes wth dfferent probabltes
More informationRecurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
More informationExhaustive Regression. An Exploration of RegressionBased Data Mining Techniques Using Super Computation
Exhaustve Regresson An Exploraton of RegressonBased Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The
More informationA hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):18841889 Research Artcle ISSN : 09757384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
More informationDEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMISP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
More informationJoe Pimbley, unpublished, 2005. Yield Curve Calculations
Joe Pmbley, unpublshed, 005. Yeld Curve Calculatons Background: Everythng s dscount factors Yeld curve calculatons nclude valuaton of forward rate agreements (FRAs), swaps, nterest rate optons, and forward
More informationFace Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)
Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton
More informationStudy on CET4 Marks in China s Graded English Teaching
Study on CET4 Marks n Chna s Graded Englsh Teachng CHE We College of Foregn Studes, Shandong Insttute of Busness and Technology, P.R.Chna, 264005 Abstract: Ths paper deploys Logt model, and decomposes
More informationPRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION
PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIIOUS AFFILIATION AND PARTICIPATION Danny CohenZada Department of Economcs, Benuron Unversty, BeerSheva 84105, Israel Wllam Sander Department of Economcs, DePaul
More informationForecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
More informationNumber of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000
Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from
More informationTrade Adjustment and Productivity in Large Crises. Online Appendix May 2013. Appendix A: Derivation of Equations for Productivity
Trade Adjustment Productvty n Large Crses Gta Gopnath Department of Economcs Harvard Unversty NBER Brent Neman Booth School of Busness Unversty of Chcago NBER Onlne Appendx May 2013 Appendx A: Dervaton
More informationAn Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services
An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsnyng Wu b a Professor (Management Scence), Natonal Chao
More information1 De nitions and Censoring
De ntons and Censorng. Survval Analyss We begn by consderng smple analyses but we wll lead up to and take a look at regresson on explanatory factors., as n lnear regresson part A. The mportant d erence
More informationLecture 3: Annuity. Study annuities whose payments form a geometric progression or a arithmetic progression.
Lecture 3: Annuty Goals: Learn contnuous annuty and perpetuty. Study annutes whose payments form a geometrc progresson or a arthmetc progresson. Dscuss yeld rates. Introduce Amortzaton Suggested Textbook
More informationgreatest common divisor
4. GCD 1 The greatest common dvsor of two ntegers a and b (not both zero) s the largest nteger whch s a common factor of both a and b. We denote ths number by gcd(a, b), or smply (a, b) when there s no
More informationStatistical Methods to Develop Rating Models
Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and
More informationOn Mean Squared Error of Hierarchical Estimator
S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta
More informationFisher Markets and Convex Programs
Fsher Markets and Convex Programs Nkhl R. Devanur 1 Introducton Convex programmng dualty s usually stated n ts most general form, wth convex objectve functons and convex constrants. (The book by Boyd and
More informationLuby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
More informationTHE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek
HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo
More informationCredit Limit Optimization (CLO) for Credit Cards
Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt
More informationLinear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits
Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.
More informationNonlinear data mapping by neural networks
Nonlnear data mappng by neural networks R.P.W. Dun Delft Unversty of Technology, Netherlands Abstract A revew s gven of the use of neural networks for nonlnear mappng of hgh dmensonal data on lower dmensonal
More informationLecture 9: Logit/Probit. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II
Lecture 9: Logt/Probt Prof. Sharyn O Halloran Sustanable Development U96 Econometrcs II Revew of Lnear Estmaton So far, we know how to handle lnear estmaton models of the type: Y = β 0 + β *X + β 2 *X
More informationSimon Acomb NAG Financial Mathematics Day
1 Why People Who Prce Dervatves Are Interested In Correlaton mon Acomb NAG Fnancal Mathematcs Day Correlaton Rsk What Is Correlaton No lnear relatonshp between ponts Comovement between the ponts Postve
More informationSolution: Let i = 10% and d = 5%. By definition, the respective forces of interest on funds A and B are. i 1 + it. S A (t) = d (1 dt) 2 1. = d 1 dt.
Chapter 9 Revew problems 9.1 Interest rate measurement Example 9.1. Fund A accumulates at a smple nterest rate of 10%. Fund B accumulates at a smple dscount rate of 5%. Fnd the pont n tme at whch the forces
More informationLecture 18: Clustering & classification
O CPS260/BGT204. Algorthms n Computatonal Bology October 30, 2003 Lecturer: Pana K. Agarwal Lecture 8: Clusterng & classfcaton Scrbe: Daun Hou Open Problem In HomeWor 2, problem 5 has an open problem whch
More informationFINANCIAL MATHEMATICS. A Practical Guide for Actuaries. and other Business Professionals
FINANCIAL MATHEMATICS A Practcal Gude for Actuares and other Busness Professonals Second Edton CHRIS RUCKMAN, FSA, MAAA JOE FRANCIS, FSA, MAAA, CFA Study Notes Prepared by Kevn Shand, FSA, FCIA Assstant
More information+ + +   This circuit than can be reduced to a planar circuit
MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to
More informationExtending Probabilistic Dynamic Epistemic Logic
Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σalgebra: a set
More informationBagofWords models. Lecture 9. Slides from: S. Lazebnik, A. Torralba, L. FeiFei, D. Lowe, C. Szurka
BagofWords models Lecture 9 Sldes from: S. Lazebnk, A. Torralba, L. FeFe, D. Lowe, C. Szurka Bagoffeatures models Overvew: Bagoffeatures models Orgns and motvaton Image representaton Dscrmnatve
More informationLecture 2: Single Layer Perceptrons Kevin Swingler
Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCullochPtts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses
More informationClustering Gene Expression Data. (Slides thanks to Dr. Mark Craven)
Clusterng Gene Epresson Data Sldes thanks to Dr. Mark Craven Gene Epresson Proles we ll assume we have a D matr o gene epresson measurements rows represent genes columns represent derent eperments tme
More informationTransition Matrix Models of Consumer Credit Ratings
Transton Matrx Models of Consumer Credt Ratngs Abstract Although the corporate credt rsk lterature has many studes modellng the change n the credt rsk of corporate bonds over tme, there s far less analyss
More informationCan Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? ChuShu L Department of Internatonal Busness, Asa Unversty, Tawan ShengChang
More informationEE201 Circuit Theory I 2015 Spring. Dr. Yılmaz KALKAN
EE201 Crcut Theory I 2015 Sprng Dr. Yılmaz KALKAN 1. Basc Concepts (Chapter 1 of Nlsson  3 Hrs.) Introducton, Current and Voltage, Power and Energy 2. Basc Laws (Chapter 2&3 of Nlsson  6 Hrs.) Voltage
More informationAbteilung für Stadt und Regionalentwicklung Department of Urban and Regional Development
Abtelung für Stadt und Regonalentwcklung Department of Urban and Regonal Development Gunther Maer, Alexander Kaufmann The Development of Computer Networks Frst Results from a Mcroeconomc Model SREDscusson
More informationFast Fuzzy Clustering of Web Page Collections
Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng OttovonGuerckeUnversty of Magdeburg Unverstätsplatz, D396 Magdeburg,
More informationGeneral Iteration Algorithm for Classification Ratemaking
General Iteraton Algorthm for Classfcaton Ratemakng by Luyang Fu and Chengsheng eter Wu ABSTRACT In ths study, we propose a flexble and comprehensve teraton algorthm called general teraton algorthm (GIA)
More informationLearning from Multiple Outlooks
Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel maayanga@tx.technon.ac.l she@ee.technon.ac.l
More informationLearning to Classify Ordinal Data: The Data Replication Method
Journal of Machne Learnng Research 8 (7) 39349 Submtted /6; Revsed 9/6; Publshed 7/7 Learnng to Classfy Ordnal Data: The Data Replcaton Method Jame S. Cardoso INESC Porto, Faculdade de Engenhara, Unversdade
More informationSPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:
SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and
More informationInfluence and Correlation in Social Networks
Influence and Correlaton n Socal Networks Ars Anagnostopoulos Rav Kumar Mohammad Mahdan Yahoo! Research 701 Frst Ave. Sunnyvale, CA 94089. {ars,ravkumar,mahdan}@yahoonc.com ABSTRACT In many onlne socal
More informationPERRON FROBENIUS THEOREM
PERRON FROBENIUS THEOREM R. CLARK ROBINSON Defnton. A n n matrx M wth real entres m, s called a stochastc matrx provded () all the entres m satsfy 0 m, () each of the columns sum to one, m = for all, ()
More informationOn the Solution of Indefinite Systems Arising in Nonlinear Optimization
On the Soluton of Indefnte Systems Arsng n Nonlnear Optmzaton Slva Bonettn, Valera Ruggero and Federca Tnt Dpartmento d Matematca, Unverstà d Ferrara Abstract We consder the applcaton of the precondtoned
More informationLogical Development Of Vogel s Approximation Method (LDVAM): An Approach To Find Basic Feasible Solution Of Transportation Problem
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE, FEBRUARY ISSN 77866 Logcal Development Of Vogel s Approxmaton Method (LD An Approach To Fnd Basc Feasble Soluton Of Transportaton
More informationOn the Optimal Control of a Cascade of HydroElectric Power Stations
On the Optmal Control of a Cascade of HydroElectrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
More informationAn Empirical Study of Search Engine Advertising Effectiveness
An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan RmmKaufman, RmmKaufman
More information