Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions

Size: px
Start display at page:

Download "Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions"

Transcription

1 Ensemble Methods n Data Mnng: Improvng Accuracy Through Combnng Predctons

2

3 Synthess Lectures on Data Mnng and Knowledge Dscovery Edtor Robert Grossman, Unversty of Illnos, Chcago Ensemble Methods n Data Mnng: Improvng Accuracy Through Combnng Predctons Govann Sen and John F. Elder 2010 Modelng and Data Mnng n Blogosphere Ntn Agarwal and Huan Lu 2009

4 Copyrght 2010 by Morgan & Claypool All rghts reserved. No part of ths publcaton may be reproduced, stored n a retreval system, or transmtted n any form or by any means electronc, mechancal, photocopy, recordng, or any other except for bref quotatons n prnted revews, wthout the pror permsson of the publsher. Ensemble Methods n Data Mnng: Improvng Accuracy Through Combnng Predctons Govann Sen and John F. Elder ISBN: ISBN: paperback ebook DOI /S00240ED1V01Y200912DMK002 A Publcaton n the Morgan & Claypool Publshers seres SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE DISCOVERY Lecture #2 Seres Edtor: Robert Grossman, Unversty of Illnos, Chcago Seres ISSN Synthess Lectures on Data Mnng and Knowledge Dscovery Prnt Electronc

5 Ensemble Methods n Data Mnng: Improvng Accuracy Through Combnng Predctons Govann Sen Elder Research, Inc. and Santa Clara Unversty John F. Elder Elder Research, Inc. and Unversty of Vrgna SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE DISCOVERY #2 & M C Morgan & claypool publshers

6 ABSTRACT Ensemble methods have been called the most nfluental development n Data Mnng and Machne Learnng n the past decade. They combne multple models nto one usually more accurate than the best of ts components. Ensembles can provde a crtcal boost to ndustral challenges from nvestment tmng to drug dscovery, and fraud detecton to recommendaton systems where predctve accuracy s more vtal than model nterpretablty. Ensembles are useful wth all modelng algorthms, but ths book focuses on decson trees to explan them most clearly. After descrbng trees and ther strengths and weaknesses, the authors provde an overvew of regularzaton today understood to be a key reason for the superor performance of modern ensemblng algorthms. The book contnues wth a clear descrpton of two recent developments: Importance Samplng (IS) and Rule Ensembles (RE).IS reveals classc ensemble methods baggng, random forests, and boostng to be specal cases of a sngle algorthm, thereby showng how to mprove ther accuracy and speed. REs are lnear rule models derved from decson tree ensembles. They are the most nterpretable verson of ensembles, whch s essental to applcatons such as credt scorng and fault dagnoss. Lastly, the authors explan the paradox of how ensembles acheve greater accuracy on new data despte ther (apparently much greater) complexty. Ths book s amed at novce and advanced analytc researchers and practtoners especally n Engneerng, Statstcs, and Computer Scence. Those wth lttle exposure to ensembles wll learn why and how to employ ths breakthrough method, and advanced practtoners wll gan nsght nto buldng even more powerful models. Throughout, snppets of code n R are provded to llustrate the algorthms descrbed and to encourage the reader to try the technques 1. The authors are ndustry experts n data mnng and machne learnng who are also adjunct professors and popular speakers. Although early poneers n dscoverng and usng ensembles, they here dstll and clarfy the recent groundbreakng work of leadng academcs (such as Jerome Fredman) to brng the benefts of ensembles to practtoners. The authors would apprecate hearng of errors n or suggested mprovements to ths book, and may be emaled at sen@datamnnglab.com and elder@datamnnglab.com. Errata and updates wll be avalable from KEYWORDS ensemble methods, rule ensembles, mportance samplng, boostng, random forest, baggng, regularzaton, decson trees, data mnng, machne learnng, pattern recognton, model nterpretaton, model complexty, generalzed degrees of freedom 1 R s an Open Source Language and envronment for data analyss and statstcal modelng avalable through the Comprehensve R Archve Network (CRAN). The R system s lbrary packages offer extensve functonalty, and be downloaded form cran.r-project.org/ for many computng platforms. The CRAN web ste also has ponters to tutoral and comprehensve documentaton. A varety of excellent ntroductory books are also avalable; we partcularly lke Introductory Statstcs wth R by Peter Dalgaard and Modern Appled Statstcs wth S by W.N. Venables and B.D. Rpley.

7 To the lovng memory of our fathers, Tto and Fletcher

8

9 x Contents Acknowledgments...x Foreword by Jaffray Woodrff...xv Foreword by Tn Kam Ho...xv 1 Ensembles Dscovered Buldng Ensembles Regularzaton Real-World Examples: Credt Scorng + the Netflx Challenge Organzaton of Ths Book Predctve Learnng and Decson Trees Decson Tree Inducton Overvew Decson Tree Propertes Decson Tree Lmtatons Model Complexty, Model Selecton and Regularzaton What s the Rght Sze of a Tree? Bas-Varance Decomposton Regularzaton Regularzaton and Cost-Complexty Tree Prunng Cross-Valdaton Regularzaton va Shrnkage Regularzaton va Incremental Model Buldng Example Regularzaton Summary 37

10 x CONTENTS 4 Importance Samplng and the Classc Ensemble Methods Importance Samplng Parameter Importance Measure Perturbaton Samplng Generc Ensemble Generaton Baggng Example Why t Helps? Random Forest AdaBoost Example Why the Exponental Loss? AdaBoost s Populaton Mnmzer Gradent Boostng MART Parallel vs. Sequental Ensembles Rule Ensembles and Interpretaton Statstcs Rule Ensembles Interpretaton Smulated Data Example Varable Importance Partal Dependences Interacton Statstc Manufacturng Data Example Summary Ensemble Complexty Complexty Generalzed Degrees of Freedom Examples: Decson Tree Surface wth Nose...83

11 CONTENTS x 6.4 R Code for GDF and Example Summary and Dscusson...89 A AdaBoost Equvalence to FSF Procedure...93 B Gradent Boostng and Robust Loss Functons...97 Bblography Authors Bographes...107

12

13 Acknowledgments We would lke to thank the many people who contrbuted to the concepton and completon of ths project. Govann had the prvlege of meetng wth Jerry Fredman regularly to dscuss many of the statstcal concepts behnd ensembles. Prof. Fredman s nfluence s deep. Bart Goethels and the organzers of ACM-KDD07 frst welcomed our tutoral proposal on the topc.tn Kam Ho favorably revewed the book dea, Keth Bettnger offered many helpful suggestons on the manuscrpt, and Matt Strampe asssted wth R code. The staff at Morgan & Claypool especally executve edtor Dane Cerra were dlgent and patent n turnng the manuscrpt nto a book. Fnally, we would lke to thank our famles for ther love and support. Govann Sen and John F. Elder January 2010

14

15 Foreword by Jaffray Woodrff John Elder s a well-known expert n the feld of statstcal predcton. He s also a good frend who has mentored me about many technques for mnng complex data for useful nformaton. I have been qute fortunate to collaborate wth John on a varety of projects, and there must be a good reason that ensembles played the prmary role each tme. I need to explan how we met, as ensembles are responsble! I spent my four years at the Unversty of Vrgna nvestgatng the markets. My plan was to become an nvestment manager after I graduated. All I needed was a proftable techncal style that ft my sklls and personalty (that s all!). After I graduated n 1991, I followed where the data led me durng one partcular caffenefueled, double all-nghter. In a ft of crazed tral and error branstormng I stumbled upon the wnnng concept of creatng one super-model from a large and dverse group of base predctve models. After ten years of combnng models for nvestment management, I decded to nvestgate where my deas ft n the general academc body of work. I had moved back to Charlottesvlle after a stnt as a propretary trader on Wall Street, and I sought out a local expert n the feld. I found John s frm, Elder Research, on the web and hoped that they d have the tme to talk to a data mnng novce. I quckly realzed that John was not only a leadng expert on statstcal learnng, but a very accomplshed speaker popularzng these methods. Fortunately for me, he was curous to talk about predcton and my deas. Early on, he ponted out that my multple model method for nvestng descrbed by the statstcal predcton term, ensemble. John and I have worked together on nterestng projects over the past decade. I teamed wth Elder Research to compete n the KDD Cup n We wrote an extensve proposal for a government grant to fund the creaton of ensemble-based research and software. In 2007 we joned up to compete aganst thousands of other teams on the Netflx Prze - achevng a thrd-place rankng at one pont (thanks partly to smple ensembles). We even pulled a branstormng all-nghter codng up our user ratng model, whch brought back fond memores of that ntal breakthrough so many years before. The practcal mplementatons of ensemble methods are enormous. Most current mplementatons of them are qute prmtve and ths book wll defntely rase the state of the art. Govann Sen s thorough mastery of the cuttng-edge research and John Elder s practcal experence have combned to make an extremely readable and useful book. Lookng forward, I can magne software that allows users to seamlessly buld ensembles n the manner, say, that sklled archtects use CAD software to create desgn mages. I expect that

16 xv FOREWORD BY JAFFRAY WOODRIFF Govann and John wll be at the forefront of developments n ths area, and, f I am lucky, I wll be nvolved as well. Jaffray Woodrff CEO, Quanttatve Investment Management Charlottesvlle, Vrgna January 2010 [Edtor s note: Mr. Woodrff s nvestment frm has experenced consstently postve results, and has grown to be the largest hedge fund manager n the South-East U.S.]

17 Foreword by Tn Kam Ho Frutful solutons to a challengng task have often been found to come from combnng an ensemble of experts. Yet for algorthmc solutons to a complex classfcaton task, the utltes of ensembles were frst wtnessed only n the late 1980 s, when the computng power began to support the exploraton and deployment of a rch set of classfcaton methods smultaneously. The next two decades saw more and more such approaches come nto the research arena, and the development of several consstently successful strateges for ensemble generaton and combnaton. Today, whle a complete explanaton of all the elements remans elusve, the ensemble methodology has become an ndspensable tool for statstcal learnng. Every researcher and practtoner nvolved n predctve classfcaton problems can beneft from a good understandng of what s avalable n ths methodology. Ths book by Sen and Elder provdes a tmely, concse ntroducton to ths topc. After an ntutve, hghly accessble sketch of the key concerns n predctve learnng, the book takes the readers through a shortcut nto the heart of the popular tree-based ensemble creaton strateges, and follows that wth a compact yet clear presentaton of the developments n the fronters of statstcs, where actve attempts are beng made to explan and explot the mysteres of ensembles through conventonal statstcal theory and methods. Throughout the book, the methodology s llustrated wth vared real-lfe examples, and augmented wth mplementatons n R-code for the readers to obtan frst-hand experence. For practtoners, ths handy reference opens the door to a good understandng of ths rch set of tools that holds hgh promses for the challengng tasks they face. For researchers and students, t provdes a succnct outlne of the crtcally relevant peces of the vast lterature, and serves as an excellent summary for ths mportant topc. The development of ensemble methods s by no means complete. Among the most nterestng open challenges are a more thorough understandng of the mathematcal structures, mappng of the detaled condtons of applcablty, fndng scalable and nterpretable mplementatons, dealng wth ncomplete or mbalanced tranng samples, and evolvng models to adapt to envronmental changes. It wll be exctng to see ths monograph encourage talented ndvduals to tackle these problems n the comng decades. Tn Kam Ho Bell Labs, Alcatel-Lucent January 2010

18

19 CHAPTER 1 1 Ensembles Dscovered and n a multtude of counselors there s safety. Proverbs 24:6b A wde varety of competng methods are avalable for nducng models from data, and ther relatve strengths are of keen nterest. The comparatve accuracy of popular algorthms depends strongly on the detals of the problems addressed, as shown n Fgure 1.1 (from Elder and Lee (1997)), whch plots the relatve out-of-sample error of fve algorthms for sx publc-doman problems. Overall, neural network models dd the best on ths set of problems, but note that every algorthm scored best or next-to-best on at least two of the sx data sets. ) Error Relatve to Peer Techn nques (lowe er s better) Relatve Performance Examples: 5 Algorthms on 6 Datasets (John Elder, Elder Research & Stephen Lee, U. Idaho, 1997) Neural Network Logstc Regresson Lnear Vector Quantzaton Projecton Pursut Regresson Decson Tree.00 Dabetes Gaussan Hypothyrod German Credt Waveform Investment Fgure 1.1: Relatve out-of-sample error of fve algorthms on sx publc-doman problems (based on Elder and Lee (1997)).

20 2 1. ENSEMBLES DISCOVERED How can we tell, ahead of tme, whch algorthm wll excel for a gven problem? Mche et al. (1994) addressed ths queston by executng a smlar but larger study (23 algorthms on 22 data sets) and buldng a decson tree to predct the best algorthm to use gven the propertes of a data set 1. Though the study was skewed toward trees they were 9 of the 23 algorthms, and several of the (academc) data sets had unrealstc thresholds amenable to trees the study dd reveal useful lessons for algorthm selecton (as hghlghted n Elder, J. (1996a)). Stll, there s a way to mprove model accuracy that s easer and more powerful than judcous algorthm selecton: one can gather models nto ensembles. Fgure 1.2 reveals the out-of-sample accuracy of the models of Fgure 1.1 when they are combned four dfferent ways, ncludng averagng, votng, and advsor perceptrons (Elder and Lee, 1997). Whle the ensemble technque of advsor perceptrons beats smple averagng on every problem, the dfference s small compared to the dfference between ensembles and the sngle models. Every ensemble method competes well here aganst the best of the ndvdual algorthms. Ths phenomenon was dscovered by a handful of researchers, separately and smultaneously, to mprove classfcaton whether usng decson trees (Ho, Hull, and Srhar, 1990), neural networks (Hansen and Salamon, 1990), or math theory (Klenberg, E., 1990). The most nfluental early developments were by Breman, L. (1996) wth Baggng, and Freund and Shapre (1996) wth AdaBoost (both descrbed n Chapter 4). One of us stumbled across the marvel of ensemblng (whch we called model fuson or bundlng ) whle strvng to predct the speces of bats from features of ther echo-locaton sgnals (Elder, J., 1996b) 2. We bult the best model we could wth each of several very dfferent algorthms, such as decson trees, neural networks, polynomal networks, and nearest neghbors (see Nsbet et al. (2009) for algorthm descrptons). These methods employ dfferent bass functons and tranng procedures, whch causes ther dverse surface forms as shown n Fgure 1.3 and often leads to surprsngly dfferent predcton vectors, even when the aggregate performance s very smlar. The project goal was to classfy a bat s speces nonnvasvely, by usng only ts chrps. Unversty of Illnos Urbana-Champagn bologsts captured 19 bats, labeled each as one of 6 speces, then recorded 98 sgnals, from whch UIUC engneers calculated 35 tme-frequency features 3. Fgure 1.4 llustrates a two-dmensonal projecton of the data where each class s represented by a dfferent color and symbol. The data dsplays useful clusterng but also much class overlap to contend wth. Each bat contrbuted 3 to 8 sgnals, and we realzed that the set of sgnals from a gven bat had to be kept together (n ether tranng or evaluaton data) to farly test the model s ablty to predct a speces of an unknown bat. That s, any bat wth a sgnal n the evaluaton data must have no other 1 The researchers (Mche et al., 1994, Secton 10.6) examned the results of one algorthm at a tme and bult a C4.5 decson tree (Qunlan, J., 1992) to separate those datasets where the algorthm was applcable (where t was wthn a tolerance of the best algorthm) to those where t was not. They also extracted rules from the tree models and used an expert system to adjudcate between conflctng rules to maxmze net nformaton score. The book s onlne at ac.uk/ charles/statlog/whole.pdf 2 Thanks to collaboraton wth Doug Jones and hs EE students at the Unversty of Illnos, Urbana-Champagn. 3 Features such as low frequency at the 3-decbel level, tme poston of the sgnal peak, and ampltude rato of 1st and 2nd harmoncs.

21 3 Ensemble methods all mprove performance ) Error Relatve to Peer Techn nques (lowe er s better) Advsor Perceptron AP weghted average Vote Average.00 Dabetes Gaussan Hypothyrod German Credt Waveform Investment Fgure 1.2: Relatve out-of-sample error of four ensemble methods on the problems of Fgure 1.1(based on Elder and Lee (1997)). sgnals from t n tranng. So, evaluatng the performance of a model type conssted of buldng and cross-valdatng 19 models and accumulatng the out-of-sample results ( a leave-one-bat-out method). On evaluaton, the baselne accuracy (always choosng the pluralty class) was 27%. Decson trees got 46%, and a tree algorthm that was mproved to look two-steps ahead to choose splts (Elder, J., 1996b) got 58%. Polynomal networks got 64%. The frst neural networks tred acheved only 52%. However, unlke the other methods, neural networks don t select varables; when the nputs were then pruned n half to reduce redundancy and collnearty, neural networks mproved to 63% accuracy. When the nputs were pruned further to be only the 8 varables the trees employed, neural networks mproved to 69% accuracy out-of-sample. (Ths result s a clear demonstraton of the need for regularzaton, as descrbed n Chapter 3, to avod overft.) Lastly, nearest neghbors, usng those same 8 varables for dmensons, matched the neural network score of 69%. Despte ther overall scores beng dentcal, the two best models neural network and nearest neghbor dsagreed a thrd of the tme; that s, they made errors on very dfferent regons of the data. We observed that the more confdent of the two methods was rght more often than not.

22 4 1. ENSEMBLES DISCOVERED (Ther estmates were between 0 and 1 for a gven class; the estmate more close to an extreme was usually more correct.) Thus, we tred averagng together the estmates of four of the methods twostep decson tree, polynomal network, neural network, and nearest neghbor and acheved 74% accuracy the best of all. Further study of the lessons of each algorthm (such as when to gnore an estmate due to ts nputs clearly beng outsde the algorthm s tranng doman) led to mprovement reachng 80%. In short, t was dscovered to be possble to break through the asymptotc performance celng of an ndvdual algorthm by employng the estmates of multple algorthms. Our fascnaton wth what came to be known as ensemblng began. 1.1 BUILDING ENSEMBLES Buldng an ensemble conssts of two steps:(1) constructng vared models and (2) combnng ther estmates (see Secton 4.2). One may generate component models by, for nstance, varyng case weghts, data values, gudance parameters, varable subsets, or parttons of the nput space. Combnaton can be accomplshed by votng, but s prmarly done through model estmate weghts, wth gatng and advsor perceptrons as specal cases. For example, Bayesan model averagng sums estmates of possble Fgure 1.3: Example estmaton surfaces for fve modelng algorthms. Clockwse from top left: decson tree, Delaunay planes (based on Elder, J. (1993)), nearest neghbor, polynomal network (or neural network), kernel.

23 1.1. BUILDING ENSEMBLES 5 Var4 t10 \ \ bw \ \ \ Fgure 1.4: Sample projecton of sgnals for 6 dfferent bat speces. models, weghted by ther posteror evdence. Baggng (bootsrap aggregatng; Breman, L. (1996)) bootstraps the tranng data set (usually to buld vared decson trees) and takes the majorty vote or the average of ther estmates (see Secton 4.3). Random Forest (Ho, T., 1995; Breman, L., 2001) adds a stochastc component to create more dversty among the trees beng combned (see Secton 4.4) AdaBoost (Freund and Shapre, 1996) and ARCng (Breman, L., 1996) teratvely buld models by varyng case weghts (up-weghtng cases wth large current errors and down-weghtng those accurately estmated) and employs the weghted sum of the estmates of the sequence of models (see Secton 4.5). Gradent Boostng (Fredman, J., 1999, 2001) extended the AdaBoost algorthm to a varety of error functons for regresson and classfcaton (see Secton 4.6). The Group Method of Data Handlng (GMDH) (Ivakhenko, A., 1968) and ts descendent, Polynomal Networks (Barron et al., 1984; Elder and Brown, 2000), can be thought of as early ensemble technques.they buld multple layers of moderate-order polynomals, ft by lnear regresson,

24 6 1. ENSEMBLES DISCOVERED where varety arses from dfferent varable sets beng employed by each node. Ther combnaton s nonlnear snce the outputs of nteror nodes are nputs to polynomal nodes n subsequent layers. Network constructon s stopped by a smple cross-valdaton test (GMDH) or a complexty penalty. An early popular method, Stackng (Wolpert, D., 1992) employs neural networks as components (whose varety can stem from smply usng dfferent gudance parameters, such as ntalzaton weghts), combned n a lnear regresson traned on leave-1-out estmates from the networks. Models have to be ndvdually good to contrbute to ensemblng, and that requres knowng when to stop; that s, how to avod overft the chef danger n model nducton, as dscussed next. 1.2 REGULARIZATION A wdely held prncple n Statstcal and Machne Learnng model nference s that accuracy and smplcty are both desrable. But there s a tradeoff between the two: a flexble (more complex) model s often needed to acheve hgher accuracy, but t s more susceptble to overfttng and less lkely to generalze well. Regularzaton technques damp down the flexblty of a model fttng procedure by augmentng the error functon wth a term that penalzes model complexty. Mnmzng the augmented error crteron requres a certan ncrease n accuracy to pay for the ncrease n model complexty (e.g., addng another term to the model). Regularzaton s today understood to be one of the key reasons for the superor performance of modern ensemblng algorthms. An nfluental paper was Tbshran s ntroducton of the Lasso regularzaton technque for lnear models (Tbshran, R., 1996).The Lasso uses the sum of the absolute value of the coeffcents n the model as the penalty functon and had roots n work done by Breman on a coeffcent post-processng technque whch he had termed Garotte (Breman et al., 1993). Another mportant development came wth the LARS algorthm by Efron et al.(2004),whch allows for an effcent teratve calculaton of the Lasso soluton. More recently, Fredman publshed a technque called Path Seeker (PS) that allows combnng the Lasso penalty wth a varety of loss (error) functons (Fredman and Popescu, 2004), extendng the orgnal Lasso paper whch was lmted to the Least-Squares loss. Careful comparson of the Lasso penalty wth alternatve penalty functons (e.g., usng the sum of the squares of the coeffcents) led to an understandng that the penalty functon has two roles: controllng the sparseness of the soluton (the number of coeffcents that are non-zero) and controllng the magntude of the non-zero coeffcents ( shrnkage ). Ths led to development of the Elastc Net (Zou and Haste, 2005) famly of penalty functons whch allow searchng for the best shrnkage/sparseness tradeoff accordng to characterstcs of the problem at hand (e.g., data sze, number of nput varables, correlaton among these varables, etc.). The Coordnate Descent algorthm of Fredman et al. (2008) provdes fast solutons for the Elastc Net. Fnally, an extenson of the Elastc Net famly to non-convex members producng sparser solutons (desrable when the number of varables s much larger than the number of observatons) s now possble wth the Generalzed Path Seeker algorthm (Fredman, J., 2008).

25 1.3. REAL-WORLD EXAMPLES: CREDIT SCORING + THE NETFLIX CHALLENGE REAL-WORLD EXAMPLES: CREDIT SCORING + THE NETFLIX CHALLENGE Many of the examples we show are academc; they are ether curostes (bats) or kept very smple to best llustrate prncples. We close Chapter 1 by llustratng that even smple ensembles can work n very challengng ndustral applcatons. Fgure 1.5 reveals the out-of-sample results of ensemblng up to fve dfferent types of models on a credt scorng applcaton. (The output of each model s ranked, those ranks are averaged and re-ranked, and the credt defaulters n a top percentage s counted. Thus, lower s better.) The combnatons are ordered on the horzontal axs by the number of models used, and Fgure 1.6 hghlghts the fndng that the mean error reduces wth ncreasng degree of combnaton. Note that the fnal model wth all fve component models does better than the best of the sngle models. 80 #Defaulters Mssed (fewer s better) Bundled Trees Stepwse Regresson Polynomal Network Neural Network MARS NT NS MT PT MS MP ST PS NP MN SNT SPT PNT MPT SMN MPN SMT SPN MNT SMP SPNT SMPT SMNT SMPN MPNT SMPNT #Models combned (averagng output range) Fgure 1.5: Out-of-sample errors on a credt scorng applcaton when combnng one to fve dfferent types of models nto ensembles. T represents bagged trees; S, stepwse regresson; P, polynomal networks; N, neural networks; M, MARS. The best model, MPN, thus averages the models bult by MARS, a polynomal network, and a neural network algorthm. Each model n the collecton represents a great deal of work, and t was constructed by advocates of that modelng algorthm competng to beat the other methods. Here, MARS was the best and bagged trees was the worst of the fve methods (though a consderable mprovement over sngle trees, as also shown n many examples n Chapter 4).

26 8 1. ENSEMBLES DISCOVERED 75 Number of Defaulters Mssed Number of models n combnaton Fgure 1.6: Box plot for Fgure 1.5; medan (and mean) error decreased as more models are combned. Most of the ensemblng beng done n research and applcatons use varatons of one knd of modelng method partcularly decson trees (as descrbed n Chapter 2 and throughout ths book). But one great example of heterogenous ensemblng captured the magnaton of the geek communty recently. In the Netflx Prze, a contest ran for two years n whch the frst team to submt a model mprovng on Netflx s nternal recommendaton system by 10% would wn $1,000,000. Contestants were suppled wth entres from a huge move/user matrx (only 2% non-mssng) and asked to predct the rankng (from 1 to 5) of a set of the blank cells. A team one of us was on, Ensemble Experts, peaked at 3 rd place at a tme when over 20,000 teams had submtted. Movng that hgh n the rankngs usng ensembles may have nspred other leadng compettors, snce near the end of the contest, when the two top teams were extremely close to each other and to wnnng the prze, the fnal edge was obtaned by weghng contrbutons from the models of up to 30 compettors. Note that the ensemblng technques explaned n ths book are even more advanced than those employed n the fnal stages of the Netflx prze. 1.4 ORGANIZATION OF THIS BOOK Chapter 2 presents the formal problem of predctve learnng and detals the most popular nonlnear method decson trees, whch are used throughout the book to llustrate concepts. Chapter 3 dscusses model complexty and how regularzng complexty helps model selecton. Regularzaton technques play an essental role n modern ensemblng. Chapters 4 and 5 are the heart of the book; there, the useful new concepts of Importance Samplng Learnng Ensembles (ISLE) and Rule Ensembles developed by J. Fredman and colleagues are explaned clearly. The ISLE framework

27 1.4. ORGANIZATION OF THIS BOOK 9 allows us to vew the classc ensemble methods of Baggng, Random Forest, AdaBoost, and Gradent Boostng as specal cases of a sngle algorthm. Ths unfed vew clarfes the propertes of these methods and suggests ways to mprove ther accuracy and speed. Rule Ensembles s a new ISLEbased model bult by combnng smple, readable rules. Whle mantanng (and often mprovng) the accuracy of the classc tree ensemble, the rule-based model s much more nterpretable. Chapter 5 also llustrates recently proposed nterpretaton statstcs, whch are applcable to Rule Ensembles as well as to most other ensemble types. Chapter 6 concludes by explanng why ensembles generalze much better than ther apparent complexty would seem to allow. Throughout, snppets of code n R are provded to llustrate the algorthms descrbed.

28

29 CHAPTER 2 Predctve Learnng and Decson Trees 11 In ths chapter, we provde an overvew of predctve learnng and decson trees. Before ntroducng formal notaton, consder a very smple data set represented by the followng data matrx: Table 2.1: A smple data set. Each row represents a data pont and each column corresponds to an attrbute. Sometmes, attrbute values could be unknown or mssng (denoted by a? below). TI PE Response 1.0 M2 good 2.0 M1 bad 4.5 M5? Each row n the matrx represents an observaton or data pont. Each column corresponds to an attrbute of the observatons: TI, PE, and Response, n ths example. TI s a numerc attrbute, PE s an ordnal attrbute, and Response s a categorcal attrbute. A categorcal attrbute s one that has two or more values, but there s no ntrnsc orderng to the values e.g., ether good or bad n Table 2.1. An ordnal attrbute s smlar to a categorcal one but wth a clear orderng of the attrbute values. Thus, n ths example M1 comes before M2, M2 comes before M3, etc. Graphcally, ths data set can be represented by a smple two-dmensonal plot wth numerc attrbute TI rendered on the horzontal axs and ordnal attrbute PE, rendered on the vertcal axs (Fgure 2.1). When presented wth a data set such as the one above, there are two possble modelng tasks: 1. Descrbe: Summarze exstng data n an understandable and actonable way 2. Predct: What s the Response (e.g., class) of new pont? See (Haste et al., 2009). More formally, we say we are gven tranng data D ={y,x 1,x 2,,x n } N 1 ={y, x } N 1 where - y,x j are measured values of attrbutes (propertes, characterstcs) of an object - y s the response (or output) varable

30 12 2. PREDICTIVE LEARNING AND DECISION TREES PE M9. M4 M3 M2 M1 2 5 TI Fgure 2.1: A graphcal renderng of the data set from Table 2.1. Numerc and ordnal attrbutes make approprate axes because they are ordered, whle categorcal attrbutes requre color codng the ponts. The dagonal lne represents the best lnear boundary separatng the blue cases from the green cases. - x j are the predctor (or nput) varables - x s the nput vector made of all the attrbute values for the -th observaton - n s the number of attrbutes; thus, we also say that the sze of x s n - N s the number of observatons - D s a random sample from some unknown (jont) dstrbuton p(x,y).e., t s assumed there s a true underlyng dstrbuton out there, and that through a data collecton effort, we ve drawn a random sample from t. Predctve Learnng s the problem of usng D to buld a functonal model y = ˆF(x 1,x 2,,x n ) = ˆF(x) whch s the best predctor of y gven nput x. It s also often desrable for the model to offer an nterpretable descrpton of how the nputs affect the outputs.when y s categorcal, the problem s termed a classfcaton problem; when y s numerc, the problem s termed a regresson problem. The smplest model, or estmator, s a lnear model, wth functonal form ˆF(x) = a 0 + n a j x j.e., a weghted lnear combnaton of the predctors. The coeffcents {a j } n 0 are to be determned va a model fttng process such as ordnary lnear regresson (after assgnng numerc labels to the ponts.e., +1 to the blue cases and 1 to the green cases). We use the notaton ˆF(x) to refer j=1

31 to the output of the fttng process an approxmaton to the true but unknown functon F (x) lnkng the nputs to the output. The decson boundary for ths model, the ponts where ˆF(x) = 0, s a lne (see Fgure 2.1), or a plane, f n>2. The classfcaton rule smply checks whch sde of the boundary a gven pont s at.e., 13 ˆF(x) { 0 (blue) else (green) In Fgure 2.1, the lnear model sn t very good, wth several blue ponts on the (mostly) green sde of the boundary. Decson trees (Breman et al., 1993; Qunlan, J., 1992) nstead create a decson boundary by askng a sequence of nested yes/no questons. Fgure 2.2 shows a decson tree for classfyng the data of Table 2.1. The frst, or root, node splts on varable TI: cases for whch TI 5, follow the left branch and are all classfed as blue; cases for whch TI <5, go to the rght daughter of the root node, where they are subject to addtonal splt tests. PE M9... true TI 5 false M4 M3 M2 M1 2 5 TI PE Є {M1, M2, M3 } TI 2 Fgure 2.2: Decson tree example for the data of Table 2.1. There are two types of nodes: splt and termnal. Termnal nodes are gven a class label. When readng the tree, we follow the left branch when a splt test condton s met and the rght branch otherwse. At every new node the splttng algorthm takes a fresh look at the data that has arrved at t, and at all the varables and all the splts that are possble. When the data arrvng at a gven node s mostly of a sngle class, then the node s no longer splt and s assgned a class label correspondng to the majorty class wthn t; these nodes become termnal nodes. To classfy a new observaton, such as the whte dot n Fgure 2.1, one smply navgates the tree startng at the top (root), followng the left branch when a splt test condton s met and the rght branch otherwse, untl arrvng at a termnal node. The class label of the termnal node s returned as the tree predcton.

32 14 2. PREDICTIVE LEARNING AND DECISION TREES The tree of Fgure 2.2 can also be expressed by the followng expert system rule (assumng green = bad and blue = good ): TI [2, 5] AND PE {M1,M2,M3} bad ELSE good whch offers an understandable summary of the data (a descrptve model). Imagne ths data came from a manufacturng process, where M1,M2,M3, etc., were the equpment names of machnes used at some processng step, and that the TI values represented trackng tmes for the machnes. Then, the model also offers an actonable summary: certan machnes used at certan tmes lead to bad outcomes (e.g., defects). The ablty of decson trees to generate nterpretable models lke ths s an mportant reason for ther popularty. In summary, the predctve learnng problem has the followng components: - Data: D ={y, x } N 1 - Model: the underlyng functonal form sought from the data e.g., a lnear model, a decson tree model, etc. We say the model represents a famly F of functons, each ndexed by a parameter vector p: ˆF(x) = ˆF(x; p) F In the case where F are decson trees, for example, the parameter vector p represents the splts defnng each possble tree. - Score crteron: judges the qualty of a ftted model. Ths has two parts: Loss functon: Penalzes ndvdual errors n predcton. Examples for regresson tasks nclude the squared-error loss, L(y, ŷ) = (y ŷ) 2, and the absolute-error loss, L(y, ŷ) = y ŷ. Examples for 2-class classfcaton nclude the exponental loss, L(y, ŷ) = exp( y ŷ), and the (negatve) bnomal log-lkelhood, L(y, ŷ) = log(1 + e y ŷ ). Rsk: the expected loss over all predctons, R(p) = E y,x L(y, F (x; p)), whch we often approxmate by the average loss over the tranng data: ˆR(p) = 1 N N L(y, ˆF(x ; p)) (2.1) =1 In the case of ordnary lnear regresson (OLR), for nstance, whch uses squared-error loss, we have 2 ˆR(p) = ˆR(a) = 1 N n y a 0 a j x j N =1 j=1

33 2.1. DECISION TREE INDUCTION OVERVIEW 15 - Search Strategy: the procedure used to mnmze the rsk crteron.e., the means by whch we solve ˆp = arg mn p ˆR(p) In the case of OLR, the search strategy corresponds to drect matrx algebra. In the case of trees, or neural networks, the search strategy s a heurstc teratve algorthm. It should be ponted out that no model famly s unversally better; each has a class of target functons, sample sze, sgnal-to-nose rato, etc., for whch t s best. For nstance, trees work well when 100 s of varables are avalable, but the output vector only depends on a few of them (say < 10); the opposte s true for Neural Networks (Bshop, C., 1995) and Support Vector Machnes (Scholkopf et al., 1999). How to choose the rght model famly then? We can do the followng: - Match the assumptons for partcular model to what s known about the problem, or - Try several models and choose the one that performs the best, or - Use several models and allow each subresult to contrbute to the fnal result (the ensemble method). 2.1 DECISION TREE INDUCTION OVERVIEW In ths secton, we look more closely at the algorthm for buldng decson trees. Fgure 2.3 shows an example surface bult by a regresson tree. It s a pece-wse constant surface: there s a regon R m n nput space for each termnal node n the tree.e., the (hyper) rectangles nduced by tree cuts. There s a constant assocated wth each regon, whch represents the estmated predcton ŷ =ĉ m that the tree s makng at each termnal node. Formally, an M-termnal node tree model s expressed by: ŷ = T(x) = M ĉ m I ˆR m (x) m=1 where I A (x) s1fx A and 0 otherwse. Because the regons are dsjont, every possble nput x belongs n a sngle one, and the tree model can be thought of as the sum of all these regons. Trees allow for dfferent loss functons farly easly.the two most used for regresson problems are squared-error where the optmal constant ĉ m s the mean and the absolute-error where the optmal constant s the medan of the data ponts wthn regon R m (Breman et al., 1993).

34 16 2. PREDICTIVE LEARNING AND DECISION TREES Fgure 2.3: Sample regresson tree and correspondng surface n nput (x) space (adapted from (Haste et al., 2001)). If we choose to use squared-error loss, then the search problem, fndng the tree T(x) wth lowest predcton rsk, s stated: { } M ĉ m, ˆR m 1 = arg mn {c m,r m } M 1 = arg mn {c m,r m } M 1 N [y T(x )] 2 =1 [ N y =1 M c m I Rm (x ) m=1 ] 2 To solve, one searches over the space of all possble constants and regons to mnmze average loss. Unrestrcted optmzaton wth respect to {R m } M 1 s very dffcult, so one unversal technque s to restrct the shape of the regons (see Fgure 2.4). Jont optmzaton wth respect to {R m } M 1 and {c m} M 1, smultaneously, s also extremely dffcult, so a greedy teratve procedure s adopted (see Fgure 2.5). The procedure starts wth all the data ponts beng n a sngle regon R and computng a score for t; n the case of squared-error loss ths s smply: ê(r) = 1 ) 2 (y mean ({y } N 1 N ) x R Then each nput varable x j, and each possble test s j on that partcular varable for splttng R nto R l (left regon) and R r (rght regon), s consdered, and scores ê(r l ) and ê(r r ) computed. The

35 x 2 x DECISION TREE INDUCTION OVERVIEW 17 X x 1 x 1 Fgure 2.4: Examples of nvald and vald regons nduced by decson trees. To make the problem of buldng a tree computatonally fast, the regon boundares are restrcted to be rectangles parallel to the axes. Resultng regons are smple, dsjont, and cover the nput space (adapted from (Haste et al., 2001)). - Startng wth a sngle regon --.e., all gven data - At the m-th teraton: Fgure 2.5: Forward stagewse addtve procedure for buldng decson trees. qualty, or mprovement, score of the splt s j s deemed to be Î(x j,s j ) = ê(r) ê(r l ) ê(r r ).e., the reducton n overall error as a result of the splt. The algorthm chooses the varable and the splt that mproves the ft the most, wth no regard to what s gong to happen subsequently. And then the orgnal regon s replaced wth the two new regons and the splttng process contnues teratvely (recursvely). Note the data s consumed exponentally each splt leads to solvng two smaller subsequent problems. So, when should the algorthm stop? Clearly, f all the elements of the set {x : x R} have the same value of y, then no splt s gong to mprove the score.e., reduce the rsk; n ths case,

36 18 2. PREDICTIVE LEARNING AND DECISION TREES we say the regon R s pure. One could also specfy a maxmum number of desred termnal nodes, maxmum tree depth, or mnmum node sze. In the next chapter, we wll dscuss a more prncpled way of decdng the optmal tree sze. Ths smple algorthm can be coded n a few lnes. But, of course, to handle real and categorcal varables, mssng values and varous loss functons takes thousands of lnes of code. In R, decson trees for regresson and classfcaton are avalable n the rpart package (rpart). 2.2 DECISION TREE PROPERTIES As recently as 2007, a KDNuggets poll (Data Mnng Methods, 2007) concluded that trees were the method most frequently used by practtoners. Ths s so because they have many desrable data mnng propertes. These are as follows: 1. Ablty to deal wth rrelevant nputs. Snce at every node,we scan all the varables and pck the best, trees naturally do varable selecton. And, thus, anythng you can measure, you can allow as a canddate wthout worryng that they wll unduly skew your results. Trees also provde a varable mportance score based on the contrbuton to error (rsk) reducton across all the splts n the tree (see Chapter 5). 2. No data preprocessng needed. Trees naturally handle numerc, bnary, and categorcal varables. Numerc attrbutes have splts of the form x j <cut_value; categorcal attrbutes have splts of the form x j {value1, value2,...}. Monotonc transformatons won t affect the splts, so you don t have problems wth nput outlers. If cut_value = 3 and a value x j s 3.14 or 3,100, t s greater than 3, so t goes to the same sde. Output outlers can stll be nfluental, especally wth squared-error as the loss. 3. Scalable computaton.trees are very fast to buld and run compared to other teratve technques. Buldng a tree has approxmate tme complexty of O (nn log N). 4. Mssng value tolerant. Trees do not suffer much loss of accuracy due to mssng values. Some tree algorthms treat mssng values as a separate categorcal value. CART handles them va a clever mechansm termed surrogate splts (Breman et al., 1993); these are substtute splts n case the frst varable s unknown, whch are selected based on ther ablty to approxmate the splttng of the orgnally ntended varable. One may alternatvely create a new bnary varable x j _s_na(not avalable) when one beleves that there may be nformaton n x j s beng mssng.e., that t may not be mssng at random. 5. Off-the-shelf procedure: there are only few tunable parameters. One can typcally use them wthn mnutes of learnng about them.

37 2.3. DECISION TREE LIMITATIONS Interpretable model representaton. The bnary tree graphc s very nterpretable,at least to a few levels. 2.3 DECISION TREE LIMITATIONS Despte ther many desrable propertes, trees also suffer from some severe lmtatons: 1. Dscontnuous pecewse constant model. If one s tryng to ft a trend, pecewse constants are a very poor way to do that (see Fgure 2.6). In order to approxmate a trend well, many splts would be needed, and n order to have many splts, a large data set s requred. x <= cutvalue C 2 y F * (x) C 1 C 2 C 1 cutvalue x Fgure 2.6: A 2-termnal node tree approxmaton to a lnear functon. 2. Data fragmentaton. Each splt reduces tranng data for subsequent splts. Ths s especally problematc n hgh dmensons where the data s already very sparse and can lead to overft (as dscussed n Chapter 6). 3. Not good for low nteracton target functons F (x).ths s related to pont 1 above.consder that we can equvalently express a lnear target as a sum of sngle-varable functons: F (x)=a o + = n j=1 n a j x j j=1 f j ( xj ).e., no nteractons, addtve model and n order for x j to enter the model, the tree must splt on t, but once the root splt varable s selected, addtonal varables enter as products of ndcator functons. For nstance, ˆR 1 n Fgure 2.3 s defned by the product of I(x 1 > 22) and I(x 2 > 27).

38 20 2. PREDICTIVE LEARNING AND DECISION TREES 4. Not good for target functons F (x) that have dependence on many varables. Ths s related to pont 2 above. Many varables mply that many splts are needed, but then we wll run nto the data fragmentaton problem. 5. Hgh varance caused by greedy search strategy (local optma).e., small changes n the data (say due to samplng fluctuatons) can cause bg changes n the resultng tree. Furthermore, errors n upper splts are propagated down to affect all splts below t. As a result, very deep trees mght be questonable. Sometmes, the second tree followng a data change may have a very smlar performance to the frst; ths happens because typcally n real data some varables are very correlated. So the end-estmated values mght not be as dfferent as the apparent dfference by lookng at the varables n the two trees. Ensemble methods, dscussed n Chapter 4, mantan tree advantages-except for perhaps nterpretablty-whle dramatcally ncreasng ther accuracy. Technques to mprove the nterpretablty of ensemble methods are dscussed n Chapter 5.

39 CHAPTER 3 Model Complexty, Model Selecton and Regularzaton 21 Ths chapter provdes an overvew of model complexty, model selecton, and regularzaton. It s ntended to help the reader develop an ntuton for what bas and varance are; ths s mportant because ensemble methods succeed by reducng bas, reducng varance, or fndng a good tradeoff between the two. We wll present a defnton for regularzaton and see three dfferent mplementatons of t. Regularzaton s a varance control technque whch plays an essental role n modern ensemblng. We wll also revew cross-valdaton whch s used to estmate meta parameters ntroduced by the regularzaton process. We wll see that fndng the optmal value of these meta-parameters s equvalent to selectng the optmal model. 3.1 WHAT IS THE RIGHT SIZE OF A TREE? We start by revstng the queston of how bg to grow a tree, what s ts rght sze? As llustrated n Fgure 3.1, the dlemma s ths: f the number of regons (termnal nodes) s too small, then the pecewse constant approxmaton s too crude. That ntutvely leads to what s called bas, and t creates error. Fgure 3.1: Representaton of a tree model ft for smple 1-dmensonal data. From left to rght, a lnear target functon, a 2-termnal node tree approxmaton to ths target functon, and a 3-termnal node tree approxmaton. As the number of nodes n the tree grows, the approxmaton s less crude but overfttng can occur. If, on the other hand, the tree s too large, wth many termnal nodes, overfttng occurs. A tree can be grown all the way to havng one termnal node for every sngle data pont n the tranng

40 22 3. MODEL COMPLEXITY, MODEL SELECTION AND REGULARIZATION data. 1 Such a tree wll have zero error on the tranng data; however, f we were to obtan a second batch of data-test data-t s very unlkely that the orgnal tree wll perform as well on the new data. The tree wll have ftted the nose as well as the sgnal n the tranng data-analogous to a chld memorzng some partcular examples wthout graspng the underlyng concept. Wth very flexble fttng procedures such as trees, we also have the stuaton where the varaton among trees, ftted to dfferent data samples from a sngle phenomenon, can be large. Consder a semconductor manufacturng plant where for several consecutve days, t s possble to collect a data sample characterzng the devces beng made. Imagne that a decson tree s ft to each sample to classfy the defect-free vs. faled devces. It s the same process day to day, so one would expect the data dstrbuton to be very smlar. If, however, the trees are not very smlar to each other, that s known as varance. 3.2 BIAS-VARIANCE DECOMPOSITION More formally, suppose that the data we have comes from the addtve error model: y = F (x) + ε (3.1) where F (x) s the target functon that we are tryng to learn. We don t really know F, and because ether we are not measurng everythng that s relevant, or we have problems wth our measurement equpment, or what we measure has nose n t, the response varable we have contans the truth plus some error.we assume that these errors are ndependent and dentcally dstrbuted. Specfcally, we assume ε s normally dstrbuted.e., ε N(0,σ 2 ) (although ths s not strctly necessary). Now consder the dealzed aggregate estmator F(x) = E ˆF D (x) (3.2) whch s the average ft over all possble data sets. One can thnk of the expectaton operator as an averagng operator. Gong back to the manufacturng example, each ˆF represents the model ft to the data set from a gven day. And assumng many such data sets can be collected, F can be created as the average of all those ˆF s. Now, let s look at what the error of one of these ˆF s s on one partcular data pont, say x 0, under one partcular loss functon, the squared-error loss, whch allows easy analytcal manpulaton. The error, known as the Mean Square Error (MSE) n ths case, at that partcular pont s the expectaton of the squared dfference between the target y and ˆF : ] 2 Err(x 0 ) = E [y ˆF(x) x = x 0 [ ] 2 = E F (x 0 ) ˆF(x 0 ) + σ 2 [ ] 2 = E F (x 0 ) F(x 0 ) + F(x 0 ) ˆF(x 0 ) + σ 2 1 Unless two cases have dentcal nput values and dfferent output values.

41 3.2. BIAS-VARIANCE DECOMPOSITION 23 The dervaton above follows from equatons Equatons (3.1) and (3.2), and propertes of the expectaton operator. Contnung, we arrve at: = E [ F(x 0 ) F (x 0 ) ] [ ] E ˆF(x 0 ) F(x 0 ) + σ 2 = [ F(x 0 ) F (x 0 ) ] [ ] E ˆF(x 0 ) F(x 0 ) + σ 2 = Bas 2 ( ˆF(x 0 )) + Var( ˆF(x 0 )) + σ 2 The fnal expresson says that the error s made of three components: - [ F(x 0 ) F (x 0 )] 2 : known as squared-bas, s the amount by whch the average estmator F dffers from the truth F. In practce, squared-bas can t be computed, but t s a useful theoretcal concept. - E[ ˆF(x 0 ) F(x 0 )] 2 : known as varance, s the spread of the ˆF s around ther mean F. - σ 2 : s the rreducble error, the error that was present n the orgnal data, and cannot be reduced unless the data s expanded wth new, more relevant, attrbutes, or the measurement equpment s mproved, etc. Fgure 3.2 depcts the notons of squared-bas and varance graphcally. The blue shaded area s ndcatve of the σ of the error. Each data set collected represents dfferent realzatons of the truth F, each resultng n a dfferent y; the spread of these y s around F s represented by the blue crcle. The model famly F, or model space, s represented by the regon to the rght of the red curve. For a gven target realzaton y, one ˆF s ft, whch s the member from the model space F that s closest to y. After repeatng the fttng process many tmes, the average F can be computed. Thus, the orange crcle represents varance, the spread of the ˆF s around ther mean F. Smlarly, the dstance between the average estmator F and the truth F represents model bas, the amount by whch the average estmator dffers from the truth. Because bas and varance add up to MSE, they act as two opposng forces. If bas s reduced, varance wll often ncrease, and vce versa. Fgure 3.3 llustrates another aspect of ths tradeoff between bas and varance. The horzontal axs corresponds to model complexty. In the case of trees, for example, model complexty can be measured by the sze of the tree. At the orgn, mnmum complexty, there would be a tree of sze one, namely a stump. At the other extreme of the complexty axs, there would be a tree that has been grown all the way to havng one termnal node per observaton n the data (maxmum complexty). For the complex tree, the tranng error can be zero (t s only non-zero f cases have dfferent response y wth all nputs x j the same). Thus, tranng error s not a useful measurement of model qualty and a dfferent dataset, the test data set, s needed to assess performance. Assumng a test set s avalable, f for each tree sze performance s measured on t, then the error curve s typcally U-shaped as shown. That s, somewhere on the x-axs there s a M, where the test error s at ts mnmum, whch corresponds to the optmal tree sze for the gven problem.

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

An Empirical Study of Search Engine Advertising Effectiveness

An Empirical Study of Search Engine Advertising Effectiveness An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Activity Scheduling for Cost-Time Investment Optimization in Project Management PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng

More information

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM BARRIOT Jean-Perre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: jean-perre.barrot@cnes.fr 1/Introducton The

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000 Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from

More information

Intra-year Cash Flow Patterns: A Simple Solution for an Unnecessary Appraisal Error

Intra-year Cash Flow Patterns: A Simple Solution for an Unnecessary Appraisal Error Intra-year Cash Flow Patterns: A Smple Soluton for an Unnecessary Apprasal Error By C. Donald Wggns (Professor of Accountng and Fnance, the Unversty of North Florda), B. Perry Woodsde (Assocate Professor

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Lecture 5,6 Linear Methods for Classification. Summary

Lecture 5,6 Linear Methods for Classification. Summary Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1. HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background: SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Fixed income risk attribution

Fixed income risk attribution 5 Fxed ncome rsk attrbuton Chthra Krshnamurth RskMetrcs Group chthra.krshnamurth@rskmetrcs.com We compare the rsk of the actve portfolo wth that of the benchmark and segment the dfference between the two

More information

Section 5.4 Annuities, Present Value, and Amortization

Section 5.4 Annuities, Present Value, and Amortization Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

Regression Models for a Binary Response Using EXCEL and JMP

Regression Models for a Binary Response Using EXCEL and JMP SEMATECH 997 Statstcal Methods Symposum Austn Regresson Models for a Bnary Response Usng EXCEL and JMP Davd C. Trndade, Ph.D. STAT-TECH Consultng and Tranng n Appled Statstcs San Jose, CA Topcs Practcal

More information

Chapter 6. Classification and Prediction

Chapter 6. Classification and Prediction Chapter 6. Classfcaton and Predcton What s classfcaton? What s Lazy learners (or learnng from predcton? your neghbors) Issues regardng classfcaton and Frequent-pattern-based predcton classfcaton Classfcaton

More information

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

Abstract. Clustering ensembles have emerged as a powerful method for improving both the

Abstract. Clustering ensembles have emerged as a powerful method for improving both the Clusterng Ensembles: {topchyal, Models jan, of punch}@cse.msu.edu Consensus and Weak Parttons * Alexander Topchy, Anl K. Jan, and Wllam Punch Department of Computer Scence and Engneerng, Mchgan State Unversty

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

Portfolio Loss Distribution

Portfolio Loss Distribution Portfolo Loss Dstrbuton Rsky assets n loan ortfolo hghly llqud assets hold-to-maturty n the bank s balance sheet Outstandngs The orton of the bank asset that has already been extended to borrowers. Commtment

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Trade Adjustment and Productivity in Large Crises. Online Appendix May 2013. Appendix A: Derivation of Equations for Productivity

Trade Adjustment and Productivity in Large Crises. Online Appendix May 2013. Appendix A: Derivation of Equations for Productivity Trade Adjustment Productvty n Large Crses Gta Gopnath Department of Economcs Harvard Unversty NBER Brent Neman Booth School of Busness Unversty of Chcago NBER Onlne Appendx May 2013 Appendx A: Dervaton

More information

Financial Mathemetics

Financial Mathemetics Fnancal Mathemetcs 15 Mathematcs Grade 12 Teacher Gude Fnancal Maths Seres Overvew In ths seres we am to show how Mathematcs can be used to support personal fnancal decsons. In ths seres we jon Tebogo,

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

Loop Parallelization

Loop Parallelization - - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze

More information

The Application of Fractional Brownian Motion in Option Pricing

The Application of Fractional Brownian Motion in Option Pricing Vol. 0, No. (05), pp. 73-8 http://dx.do.org/0.457/jmue.05.0..6 The Applcaton of Fractonal Brownan Moton n Opton Prcng Qng-xn Zhou School of Basc Scence,arbn Unversty of Commerce,arbn zhouqngxn98@6.com

More information

Return decomposing of absolute-performance multi-asset class portfolios. Working Paper - Nummer: 16

Return decomposing of absolute-performance multi-asset class portfolios. Working Paper - Nummer: 16 Return decomposng of absolute-performance mult-asset class portfolos Workng Paper - Nummer: 16 2007 by Dr. Stefan J. Illmer und Wolfgang Marty; n: Fnancal Markets and Portfolo Management; March 2007; Volume

More information

Implementation of Deutsch's Algorithm Using Mathcad

Implementation of Deutsch's Algorithm Using Mathcad Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters Frequency Selectve IQ Phase and IQ Ampltude Imbalance Adjustments for OFDM Drect Converson ransmtters Edmund Coersmeer, Ernst Zelnsk Noka, Meesmannstrasse 103, 44807 Bochum, Germany edmund.coersmeer@noka.com,

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information