Data Mining Techniques: Classification and Prediction. Classification vs. Prediction. Induction: Model Construction. Deduction: Using the Model
|
|
|
- Dorthy Fox
- 10 years ago
- Views:
Transcription
1 Classcato a Precto Overvew ata Mg Techques: Classcato a Precto Mre Reewal Some sles ase o presetatos y Ha/Kamer/Pe, Ta/Steach/Kumar, a Arew Moore Itroucto ecso Trees Statstcal ecso Theory Nearest Neghor Bayesa Classcato Artcal Neural Networs Support Vector Maches SVMs Precto Accuracy a rror Measures semle Methos Classcato vs. Precto Iucto: Moel Costructo Assumpto: ater ata preparato, we have a ata set where each recor has attrutes,,, a. Goal: lear a ucto :,,, the use ths ucto to prect y or a gve put recor x,,x. Classcato: s a screte attrute, calle the class lael Usually a categorcal attrute wth small oma Precto: s a cotuous attrute Calle supervse learg, ecause true laels values are ow or the tally prove ata Typcal applcatos: cret approval, target maretg, mecal agoss, rau etecto 3 NAM RANK Trag ata ARS TNUR Me Assstat Pro 3 o Mary Assstat Pro 7 yes Bll Proessor yes Jm Assocate Pro 7 yes ave Assstat Pro 6 o Ae Assocate Pro 3 o Classcato Algorthm Moel Fucto IF ra = proessor OR years > 6 THN teure = yes 4 eucto: Usg the Moel Classcato a Precto Overvew Test ata Moel Fucto NAM RANK ARS TNUR Tom Assstat Pro o Merlsa Assocate Pro 7 o George Proessor 5 yes Joseph Assstat Pro 7 yes Usee ata Je, Proessor, 4 Teure? 5 Itroucto ecso Trees Statstcal ecso Theory Bayesa Classcato Artcal Neural Networs Support Vector Maches SVMs Nearest Neghor Precto Accuracy a rror Measures semle Methos 6
2 xample o a ecso Tree Aother xample o ecso Tree T Reu Martal Taxale Status Icome Cheat es Sgle 5K No No Marre 00K No 3 No Sgle 70K No 4 es Marre 0K No 5 No vorce 95K es 6 No Marre 60K No 7 es vorce 0K No 8 No Sgle 85K es 9 No Marre 75K No 0 No Sgle 90K es Splttg Attrutes Reu es No MarSt Sgle, vorce Marre TaxIc < 80K > 80K S T Reu Martal Taxale Status Icome Cheat es Sgle 5K No No Marre 00K No 3 No Sgle 70K No 4 es Marre 0K No 5 No vorce 95K es 6 No Marre 60K No 7 es vorce 0K No 8 No Sgle 85K es 9 No Marre 75K No 0 No Sgle 90K es Marre MarSt es Sgle, vorce Reu No TaxIc < 80K > 80K S There coul e more tha oe tree that ts the same ata! Trag ata Moel: ecso Tree 7 8 Apply Moel to Test ata Start rom the root o tree. Test ata Apply Moel to Test ata Test ata es Reu No es Reu No MarSt MarSt Sgle, vorce Marre Sgle, vorce Marre TaxIc TaxIc < 80K > 80K < 80K > 80K S S 9 0 Apply Moel to Test ata Test ata Reu Martal Status Taxale Icome Cheat Apply Moel to Test ata Test ata Reu Martal Status Taxale Icome Cheat es Reu No No Marre 80K? es Reu No No Marre 80K? MarSt MarSt Sgle, vorce Marre Sgle, vorce Marre TaxIc TaxIc < 80K > 80K < 80K > 80K S S
3 0 0 Apply Moel to Test ata Test ata Reu Martal Status Taxale Icome Cheat Apply Moel to Test ata Test ata Reu Martal Status Taxale Icome Cheat es Reu No No Marre 80K? es Reu No No Marre 80K? MarSt Sgle, vorce Marre MarSt Sgle, vorce Marre Assg Cheat to No TaxIc TaxIc < 80K > 80K < 80K > 80K S S 3 4 ecso Tree Iucto Basc greey algorthm Top-ow, recursve ve-a-coquer At start, all the trag recors are at the root Trag recors parttoe recursvely ase o splt attrutes Splt attrutes selecte ase o a heurstc or statstcal measure e.g., ormato ga Cotos or stoppg parttog Reu Pure oe all recors elog es to same class No No remag attrutes or MarSt urther parttog Maorty votg or classyg the lea No cases let Sgle, vorce TaxIc Marre < 80K > 80K S 5 x ecso Bouary x es < 0.47? : 4 : 0 es < 0.43? : 0 : 4 No < 0.33? No es No ecso ouary = orer etwee two eghorg regos o eret classes. For trees that splt o a sgle attrute at a tme, the ecso ouary s parallel to the axes. 6 : 0 : 3 : 4 : 0 Olque ecso Trees x + y < How to Specy Splt Coto? epes o attrute types Nomal Oral Numerc cotuous Class = + Class = Test coto may volve multple attrutes More expressve represetato Fg optmal test coto s computatoally expesve 7 epes o umer o ways to splt -way splt Mult-way splt 8 3
4 Splttg Nomal Attrutes Splttg Oral Attrutes Mult-way splt: use as may parttos as stct values. CarType Famly Luxury Sports Bary splt: ves values to two susets ee to optmal parttog. {Sports, Luxury} CarType {Famly} OR {Famly, Luxury} CarType {Sports} Mult-way splt: Bary splt: {Small, Meum} Sze Sze Small Meum {Large} What aout ths splt? OR Large {Small, Large} {Meum, Large} Sze Sze {Meum} {Small} 9 0 Splttg Cotuous Attrutes Splttg Cotuous Attrutes eret optos scretzato to orm a oral categorcal attrute Statc scretze oce at the egg yamc rages ou y equal terval ucetg, equal requecy ucetg percetles, or clusterg. Bary ecso: A < v or A v Coser all possle splts, choose est oe Taxale Icome > 80K? es No Bary splt < 0K Taxale Icome? Mult-way splt > 80K 0K,5K 5K,50K 50K,80K Ow Car? How to eterme Best Splt es No Famly C0: 6 C: 4 C0: 4 C: 6 Beore Splttg: 0 recors o class 0, 0 recors o class C0: C: 3 Car Type? Sports C0: 8 C: 0 Luxury c c 0 C0: C: 7 Whch test coto s the est? C0: C: 0... Stuet I? C0: C: 0 c C0: 0 C: c 0... C0: 0 C: How to eterme Best Splt Greey approach: Noes wth homogeeous class struto are preerre Nee a measure o oe mpurty: C0: 5 C: 5 No-homogeeous, Hgh egree o mpurty C0: 9 C: Homogeeous, Low egree o mpurty 3 4 4
5 Attrute Selecto Measure: Iormato Ga Select attrute wth hghest ormato ga p = proalty that a artrary recor elogs to class C, =,,m xpecte ormato etropy eee to classy a recor : m Io p log p Iormato eee ater usg attrute A to splt to v parttos,, v : v Io A Io Iormato gae y splttg o attrute A: Ga Io Io A A 5 xample Prect someoy wll uy a computer Age Gve ata set: Icome Stuet Cret_ratg Buys_computer 30 Hgh No Ba No 30 Hgh No Goo No 3 40 Hgh No Ba es > 40 Meum No Ba es > 40 Low es Ba es > 40 Low es Goo No Low es Goo es 30 Meum No Ba No 30 Low es Ba es > 40 Meum es Ba es 30 Meum es Goo es Meum No Goo es Hgh es Ba es > 40 Meum No Goo No 6 Iormato Ga xample Ga Rato or Attrute Selecto Class P: uys_computer = yes Class N: uys_computer = o Io I9,5 log log Age #yes #o I#yes, #o > Age Icome Stuet Cret_ratg Buys_computer Hgh No Ba No Hgh No Goo No 3 40 Hgh No Ba es > 40 Meum No Ba es > 40 Low es Ba es > 40 Low es Goo No Low es Goo es 30 Meum No Ba No 30 Low es Ba es > 40 Meum es Ba es 30 Meum es Goo es Meum No Goo es Hgh es Ba es > 40 Meum No Goo No 5 4 Io age I,3 I4, I3, I,3 meas age 30 has 5 out o 4 4 samples, wth yes es a 3 o s. Smlar or the other terms Hece Ga age Io Ioage 0.46 Smlarly, Ga come 0.09 Ga stuet 0.5 Ga cret_ratg Thereore we choose age as the splttg attrute 7 Iormato ga s ase towars attrutes wth a large umer o values Use ga rato to ormalze ormato ga: GaRato A = Ga A / SpltIo A v SpltIo A log.g., SpltIo come log log log GaRato come = 0.09/0.96 = 0.03 Attrute wth maxmum ga rato s selecte as splttg attrute G Iex G ex, g, s ee as g m p I ata set s splt o A to v susets,, v, the g ex g A s ee as v g A g Reucto Impurty: g g g A Attrute that proves smallest g splt = largest reucto mpurty s chose to splt the oe A Comparg Attrute Selecto Measures No clear wer a there are may more Iormato ga: Base towars multvalue attrutes Ga rato: Tes to preer ualace splts where oe partto s much smaller tha the others G ex: Base towars multvalue attrutes Tes to avor tests that result equal-sze parttos a purty oth parttos
6 3 recors Practcal Issues o Classcato Uerttg a overttg Mssg values Computatoal cost xpressveess How Goo s the Moel? Trag set error: compare precto o trag recor wth true value Not a goo measure or the error o usee ata. scusse soo. Test set error: or recors that were ot use or trag, compare moel precto a true value Use holout ata rom avalale ata set 3 3 Trag versus Test Set rror Test ata We ll create a trag ataset Fve puts, all ts, are geerate all 3 possle comatos Output y = copy o e, except a raom 5% o the recors have y set to the opposte o e Geerate test ata usg the same metho: copy o e, 5% verte oe epeetly rom prevous ose process Some y s that were corrupte the trag set wll e ucorrupte the testg set. Some y s that were ucorrupte the trag set wll e corrupte the test set. a c e y a c e y trag ata y test ata : : : : : : : : : : : : : Full Tree or The Trag ata Testg The Tree wth The Test Set e=0 a=0 a= Root e= a=0 a= /4 o the tree oes are corrupte /4 o the test set /6 o the test set wll recors are corrupte e correctly precte or the wrog reasos 3/4 are e 3/6 o the test set wll e wrogly precte ecause the test recor s corrupte 3/4 are e 3/6 o the test prectos wll e wrog ecause the tree oe s corrupte 9/6 o the test prectos wll e e 5% o these lea oe laels wll e corrupte I total, we expect to e wrog o 3/8 o the test set prectos ach lea cotas exactly oe recor, hece o error prectg the trag ata!
7 3 recors What s Ths xample Show Us? screpacy etwee trag a test set error But more mportatly t cates that there s somethg we shoul o aout t we wat to prect well o uture ata. Suppose We Ha Less ata These ts are he a c e y : : : : : : Output y = copy o e, except a raom 5% o the recors have y set to the opposte o e Tree Leare Wthout Access to The Irrelevat Bts Root Tree Leare Wthout Access to The Irrelevat Bts Root e=0 e= These oes wll e uexpaale e=0 e= I aout o the 6 recors ths oe the output wll e 0 I aout o the 6 recors ths oe the output wll e 39 So ths wll almost certaly prect 0 So ths wll almost certaly prect 40 Tree Leare Wthout Access to The Irrelevat Bts Root e=0 e= /4 o the test set recors are corrupte almost certaly oe o the tree oes are corrupte /a almost certaly all are e /4 o the test set wll e wrogly precte ecause the test recor s corrupte 3/4 are e /a 3/4 o the test prectos wll e e Typcal Oservato Overttg Moel M overts the trag ata aother moel M exsts, such that M has smaller error tha M over the trag examples, ut M has smaller error tha M over the etre struto o staces. I total, we expect to e wrog o oly /4 o the test set prectos 4 Uerttg: whe moel s too smple, oth trag a test errors are large 4 7
8 0 0 Reasos or Overttg Nose Too closely ttg the trag ata meas the moel s prectos relect the ose as well Isucet trag ata Not eough ata to eale the moel to geeralze eyo osycrases o the trag recors ata ragmetato specal prolem or trees Numer o staces gets smaller as you traverse ow the tree Numer o staces at a lea oe coul e too small to mae ay coet ecso aout class Avog Overttg Geeral ea: mae the tree smaller Aresses all three reasos or overttg Preprug: Halt tree costructo early o ot splt a oe ths woul result the gooess measure allg elow a threshol cult to choose a approprate threshol, e.g., tree or OR Postprug: Remove raches rom a ully grow tree Use a set o ata eret rom the trag ata to ece whe to stop prug Valato ata: tra tree o trag ata, prue o valato ata, the test o test ata Mmum escrpto Legth ML y A es 0 0 A? C? Alteratve to usg valato ata Motvato: ata mg s aout g regular patters ata regularty ca e use to compress the ata metho that acheves greatest compresso ou most regularty a hece s est Mmze CostMoel,ata = CostMoel + CostataMoel Cost s the umer o ts eee or ecog. CostataMoel ecoes the msclasscato errors. CostMoel uses oe ecog plus splttg coto ecog. No B? B B C C B y?? 3? 4?? ML-Base Prug Ituto Cost CostMoel, ata CostMoel=moel sze Lowest total cost CostataMoel=moel errors small large Tree sze Best tree sze Halg Mssg Attrute Values strute Istaces Mssg values aect ecso tree costructo three eret ways: How mpurty measures are compute How to strute stace wth mssg value to chl oes How a test stace wth mssg value s classe 47 T Reu Martal Taxale Status Icome Class es Sgle 5K No No Marre 00K No 3 No Sgle 70K No 4 es Marre 0K No 5 No vorce 95K es 6 No Marre 60K No 7 es vorce 0K No 8 No Sgle 85K es 9 No Marre 75K No Reu es No Class=es 0 Cheat=es Class=No 3 Cheat=No 4 T Reu Martal Status es Class=es 0 + 3/9 Class=No 3 Reu Taxale Icome Class 0? Sgle 90K es No Class=es + 6/9 Class=No 4 Proalty that Reu=es s 3/9 Proalty that Reu=No s 6/9 Assg recor to the let chl wth weght = 3/9 a to the rght chl wth weght = 6/9 48 8
9 0 0 Computg Impurty Measure Classy Istaces T Reu Martal Status Taxale Icome Class es Sgle 5K No No Marre 00K No 3 No Sgle 70K No 4 es Marre 0K No 5 No vorce 95K es 6 No Marre 60K No 7 es vorce 0K No 8 No Sgle 85K es 9 No Marre 75K No 0? Sgle 90K es Beore Splttg: tropyparet = -0.3 log log0.7 = 0.88 Splt o Reu: assume recors wth mssg values are strute as scusse eore 3/9 o recor 0 go to Reu=es 6/9 o recor 0 go to Reu=No tropyreu=es = -/3 / 0/3log/3 / 0/3 3 / 0/3log3 / 0/3 = tropyreu=no = -8/3 / 0/3log8/3 / 0/3 4 / 0/3log4 / 0/3 = 0.97 tropychlre = /3* /3*0.97 = Ga = = New recor: T Reu Martal Status es Reu Sgle, vorce No TaxIc Taxale Icome No? 85K? MarSt < 80K > 80K S Class Marre Marre Sgle vorce Total Class=No Class=es 6/9.67 Total Proalty that Martal Status = Marre s 3.67/6.67 Proalty that Martal Status ={Sgle,vorce} s 3/ Tree Cost Aalyss Fg a optmal ecso tree s NP-complete Optmzato goal: mmze expecte umer o ary tests to uquely ety ay recor rom a gve te set Greey algorthm O#attrutes * #trag_staces * log#trag_staces At each tree epth, all staces cosere Assume tree epth s logarthmc arly alace splts Nee to test each attrute at each oe What aout ary splts? Sort ata oce o each attrute, use to avo re-sortg susets Icremetally mata couts or class struto as eret splt pots are explore I practce, trees are cosere to e ast oth or trag whe usg the greey algorthm a mag prectos Tree xpressveess Ca represet ay te screte-value ucto But t mght ot o t very ecetly xample: party ucto Class = there s a eve umer o Boolea attrutes wth truth value = True Class = 0 there s a o umer o Boolea attrutes wth truth value = True For accurate moelg, must have a complete tree Not expressve eough or moelg cotuous attrutes But we ca stll use a tree or them practce t ust caot accurately represet the true ucto 5 54 Rule xtracto rom a ecso Tree Oe rule s create or each path rom the root to a lea Precoto: coucto o all splt precates o oes o path Cosequet: class precto rom lea Rules are mutually exclusve a exhaustve xample: Rule extracto rom uys_computer ecso-tree IF age = youg AN stuet = o THN uys_computer = o IF age = youg AN stuet = yes THN uys_computer = yes IF age = m-age THN uys_computer = yes IF age = ol AN cret_ratg = excellet THN uys_computer = yes IF age = youg AN cret_ratg = ar THN uys_computer = o stuet? age? <= >40 yes cret ratg? Classcato Large ataases Scalalty: Classy ata sets wth mllos o examples a hures o attrutes wth reasoale spee Why use ecso trees or ata mg? Relatvely ast learg spee Ca hale all attrute types Covertle to tellgle classcato rules Goo classcato accuracy, ut ot as goo as ewer methos ut tree esemles are top! excellet ar o yes o yes yes
10 Scalale Tree Iucto Hgh cost whe the trag ata at a oe oes ot t memory Soluto : specal I/O-aware algorthm Keep oly class lst memory, access attrute values o s Mata separate lst or each attrute Use cout matrx or each attrute Soluto : Samplg Commo soluto: tra tree o a sample that ts memory More sophstcate versos o ths ea exst, e.g., Raorest Bul tree o sample, ut o ths or may ootstrap samples Come all to a sgle ew tree that s guaratee to e almost etcal to the oe trae rom etre ata set Ca e compute wth two ata scas Tree Coclusos Very popular ata mg tool asy to uersta asy to mplemet asy to use: lttle tug, hales all attrute types a mssg values Computatoally relatvely cheap Overttg prolem Focuse o classcato, ut easy to exte to precto uture lecture Classcato a Precto Overvew Itroucto ecso Trees Statstcal ecso Theory Nearest Neghor Bayesa Classcato Artcal Neural Networs Support Vector Maches SVMs Precto Accuracy a rror Measures semle Methos Theoretcal Results Trees mae sese tutvely, ut ca we get some har evece a eeper uerstag aout ther propertes? Statstcal ecso theory ca gve some aswers Nee some proalty cocepts rst 60 6 Raom Varales Itutve verso o the eto: Ca tae o oe o possly may values, each wth a certa proalty These proaltes ee the proalty struto o the raom varale.g., let e the outcome o a co toss, the Pr= heas =0.5 a Pr= tals =0.5 struto s uorm Coser a screte raom varale wth umerc values x,...,x xpectato: ] = x *Pr=x Varace: Var = ] ] = ] ] Worg wth Raom Varales + ] = ] + ] Var + = Var + Var + Cov, For costats a, a + ] = a ] + Vara + = Vara = a Var Iterate expectato: ] = ] ], where ] = y *Pr=y =x s the expectato o or a gve value x o,.e., s a ucto o I geeral or ay ucto,:,,] =, ] ]
11 What s the Optmal Moel? 64 0 Notce : ] : a let Coser theerror or a specc value o thesquare error? mmze wll Whch ucto. s trae moel Thesquare error o ale raom output var value - real a raom put varale a value - real eote a Let, Optmal Moel cot. 65. mea s,oeca show that theest moel asolute error Notcethat or mmzg ] or every. choosg y mmze thesquare error s Hece Hece. Notethat ]. or mmze s ut, ot aect oes o The choce,,, Iterpretg the Result To mmze mea square error, the est precto or put =x s the mea o the -values o all trag recors x,y wth x=x.g., assume there are trag recors 5,, 5,4, 5,6, 5,8. The optmal precto or put =5 woul e estmate as /4 = 5. Prolem: to relaly estmate the mea o or a gve =x, we ee sucetly may trag recors wth =x. I practce, ote there s oly oe or o trag recor at all or a =x o terest. I there were may such recors wth =x, we woul ot ee a moel a coul ust retur the average or that =x. The eet o a goo ata mg techque s ts alty to terpolate a extrapolate rom ow trag recors to mae goo prectos eve or - values that o ot occur the trag ata at all. Classcato or two classes: ecoe as 0 a, use square error as eore The = =x] = *Pr= =x + 0*Pr=0 =x = Pr= =x Classcato or classes: ca show that or 0- loss error = 0 correct class, error = wrog class precte the optmal choce s to retur the maorty class or a gve put =x Ths s calle the Bayes classer. 66 Implcatos or Trees Sce there are ot eough, or oe at all, trag recors wth =x, the output or put =x has to e ase o recors the eghorhoo A tree lea correspos to a mult-mesoal rage the ata space Recors the same lea are eghors o each other Soluto: estmate mea or put =x rom the trag recors the same lea oe that cotas put =x Classcato: lea returs maorty class or class proaltes estmate rom racto o trag recors the lea Precto: lea returs average o -values or ts a local moel Mae sure there are eough trag recors the lea to ota relale estmates 67 Bas-Varace Traeo Let s tae ths oe step urther a see we ca uersta overttg through statstcal ecso theory As eore, coser two raom varales a From a trag set wth recors, we wat to costruct a ucto that returs goo approxmatos o or uture puts Mae epeece o o explct y wrtg Goal: mmze mea square error over all,, a,.e.,,, - ] 68 Bas-Varace Traeo ervato 69 ] ] ] ] we thereore ota : Overall 0. ] ] ecause zero, Thethr terms ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] Coser the seco term :. ], ] hece ot epeo, The rst term oes ] ] ucto. eore or optmal Same ervato as ], ], Now coser the er term :.,,,,,
12 Bas-Varace Traeo a Overttg ] ] : as ] : varace ] : rreucle error oes ot epe o a s smply thevarace o gve. Opto : =,] Bas: sce,] ] = ], as s zero Varace:,]-,]] =,]- ] ca e very large sce,] epes heavly o Mght overt! Opto : = or other ucto epeet o Varace: - ] =- =0 Bas: ]- ] =- ] ca e large, ecause ] mght e completely eret rom Mght uert! F est compromse etwee ttg trag ata too closely opto a completely gorg t opto Implcatos or Trees Bas ecreases as tree ecomes larger Larger tree ca t trag ata etter Varace creases as tree ecomes larger Sample varace aects prectos o larger tree more F rght traeo as scusse earler Valato ata to est prue tree ML prcple 70 7 Classcato a Precto Overvew Itroucto ecso Trees Statstcal ecso Theory Nearest Neghor Bayesa Classcato Artcal Neural Networs Support Vector Maches SVMs Precto Accuracy a rror Measures semle Methos Lazy vs. ager Learg Lazy learg: Smply stores trag ata or oly mor processg a wats utl t s gve a test recor ager learg: Gve a trag set, costructs a classcato moel eore recevg ew test ata to classy Geeral tre: Lazy = aster trag, slower prectos Accuracy: ot clear whch oe s etter! Lazy metho: typcally rve y local ecsos ager metho: rve y gloal a local ecsos 7 73 Nearest-Neghor Recall our statstcal ecso theory aalyss: Best precto or put =x s the mea o the -values o all recors x,y wth x=x maorty class or classcato Prolem was to estmate =x] or maorty class or =x rom the trag ata Soluto was to approxmate t Use -values rom trag recors eghorhoo arou =x Nearest-Neghor Classers Uow tuple Requres: Set o store recors stace metrc or pars o recors Commo choce: uclea p, q p q Parameter Numer o earest eghors to retreve To classy a recor: F ts earest eghors eterme output ase o stace-weghte average o eghors output 74 75
13 eto o Nearest Neghor -Nearest Neghor Voroo agram a -earest eghor -earest eghor c 3-earest eghor K-earest eghors o a recor x are ata pots that have the smallest stace to x Nearest Neghor Classcato ect o Chagg Choosg the value o : too small: sestve to ose pots too large: eghorhoo may clue pots rom other classes Source: Haste, Tshra, a Frema. The lemets o Statstcal Learg xplag the ect o Recall the as-varace traeo Small,.e., prectos ase o ew eghors Hgh varace, low as Large, e.g., average over etre ata set Low varace, ut hgh as Nee to that acheves est traeo Ca o that usg valato ata xpermet 50 trag pots x, y x, selecte uormly at raom y = x + ε, where ε s selecte uormly at raom rom rage -0.5, 0.5] Test ata sets: 500 pots rom same struto as trag ata, ut ε = 0 Plot : all x, NNx or 5 test sets Plot : all x, AVGNNx, average over 00 test ata set Same or NN0 a NN
14 ] ] : as ] : varace ] : rreucle error oes ot epe o a s smply thevarace o gve. 8 ] ] : as ] : varace ] : rreucle error oes ot epe o a s smply thevarace o gve. 83 ] ] : as ] : varace ] : rreucle error oes ot epe o a s smply thevarace o gve. 84 ] ] : as ] : varace ] : rreucle error oes ot epe o a s smply thevarace o gve. 85 ] ] : as ] : varace ] : rreucle error oes ot epe o a s smply thevarace o gve. 86 ] ] : as ] : varace ] : rreucle error oes ot epe o a s smply thevarace o gve. 87 4
15 Scalg Issues Attrutes may have to e scale to prevet stace measures rom eg omate y oe o the attrutes xample: Heght o a perso may vary rom.5m to.8m Weght o a perso may vary rom 90l to 300l Icome o a perso may vary rom $0K to $M Icome erece woul omate recor stace Other Prolems Prolem wth uclea measure: Hgh mesoal ata: curse o mesoalty Ca prouce couter-tutve results 0 0 vs =.44 =.44 Soluto: Normalze the vectors to ut legth Irrelevat attrutes mght omate stace Soluto: elmate them Computatoal Cost Brute orce: O#tragRecors For each trag recor, compute stace to test recor, eep amog top- Pre-compute Voroo agram expesve, the search spatal ex o Voroo cells: lucy Olog#tragRecors Store trag recors mult-mesoal search tree, e.g., R-tree: lucy Olog#tragRecors Bul-compute prectos or may test recors usg spatal o etwee trag a test set Same worst-case cost as oe-y-oe prectos, ut usually much aster practce Classcato a Precto Overvew Itroucto ecso Trees Statstcal ecso Theory Nearest Neghor Bayesa Classcato Artcal Neural Networs Support Vector Maches SVMs Precto Accuracy a rror Measures semle Methos Bayesa Classcato Perorms proalstc precto,.e., prects class memershp proaltes Base o Bayes Theorem Icremetal trag Upate proaltes as ew trag recors arrve Ca come pror owlege wth oserve ata ve whe Bayesa methos are computatoally tractale, they ca prove a staar o optmal ecso mag agast whch other methos ca e measure Bayesa Theorem: Bascs = raom varale or ata recors evece H = hypothess that specc recor =x elogs to class C Goal: eterme PH =x Proalty that hypothess hols gve a recor x PH = pror proalty The tal proalty o the hypothess.g., perso x wll uy computer, regarless o age, come etc. P=x = proalty that ata recor x s oserve P=x H = proalty o oservg recor x, gve that the hypothess hols.g., gve that x wll uy a computer, what s the proalty that x s age group , has meum come, etc.?
16 Bayes Theorem Gve ata recor x, the posteror proalty o a hypothess H, PH =x, ollows rom Bayes theorem: P H x P x H P H P x Iormally: posteror = lelhoo * pror / evece Amog all caate hypotheses H, the maxmally proaly oe, calle maxmum a posteror MAP hypothess Note: P=x s the same or all hypotheses I all hypotheses are equally proale a pror, we oly ee to compare P=x H Wg hypothess s calle the maxmum lelhoo ML hypothess Practcal cultes: requres tal owlege o may proaltes a has hgh computatoal cost Towars Naïve Bayes Classer Suppose there are m classes C, C,, C m Classcato goal: or recor x, class C that has the maxmum posteror proalty PC =x Bayes theorem: P x C P C P C x P x Sce P=x s the same or all classes, oly ee to maxmum o P x C P C 0 Computg P=xC a PC stmate PC y coutg the requecy o class C the trag ata Ca we o the same or P=xC? Nee very large set o trag ata Have * * * *m eret comatos o possle values or a C Nee to see every stace x may tmes to ota relale estmates Soluto: ecompose to lower-mesoal prolems xample: Computg P=xC a PC Puys_computer = yes = 9/4 Puys_computer = o = 5/4 Page>40, come=low, stuet=o, cret_ratg=a uys_computer=yes = 0? Age Icome Stuet Cret_ratg Buys_computer 30 Hgh No Ba No 30 Hgh No Goo No 3 40 Hgh No Ba es > 40 Meum No Ba es > 40 Low es Ba es > 40 Low es Goo No Low es Goo es 30 Meum No Ba No 30 Low es Ba es > 40 Meum es Ba es 30 Meum es Goo es Meum No Goo es Hgh es Ba es > 40 Meum No Goo No 3 Cotoal Iepeece,, Z raom varales s cotoally epeet o, gve Z, P,Z = P Z quvalet to: P, Z = P Z * P Z xample: people wth loger arms rea etter Cooug actor: age oug chl has shorter arms a lacs reag slls o ault I age s xe, oserve relatoshp etwee arm legth a reag slls sappears ervato o Naïve Bayes Classer Smplyg assumpto: all put attrutes cotoally epeet, gve class P x,, x C P x C P x C P x C P x C ach P =x C ca e estmate roustly I s categorcal attrute P =x C = #recors C that have value x or, ve y #recors o class C trag ata set I s cotuous, we coul scretze t Prolem: terval selecto Too may tervals: too ew trag cases per terval Too ew tervals: lmte choces or ecso ouary 4 5 6
17 stmatg P =x C or Cotuous Attrutes wthout scretzato P =x C compute ase o Gaussa struto wth mea μ a staar evato σ: x g x,, e as P x C g x,, C,, C stmate,c rom sample mea o attrute or all trag recors o class C stmate,c smlarly rom sample 6 Naïve Bayes xample Classes: C :uys_computer = yes C :uys_computer = o ata sample x age 30, come = meum, stuet = yes, a cret_ratg = a Age Icome Stuet Cret_ratg Buys_computer 30 Hgh No Ba No 30 Hgh No Goo No 3 40 Hgh No Ba es > 40 Meum No Ba es > 40 Low es Ba es > 40 Low es Goo No Low es Goo es 30 Meum No Ba No 30 Low es Ba es > 40 Meum es Ba es 30 Meum es Goo es Meum No Goo es Hgh es Ba es > 40 Meum No Goo No 7 Naïve Bayesa Computato Compute PC or each class: Puys_computer = yes = 9/4 = Puys_computer = o = 5/4= Compute P =x C or each class Page = 30 uys_computer = yes = /9 = 0. Page = 30 uys_computer = o = 3/5 = 0.6 Pcome = meum uys_computer = yes = 4/9 = Pcome = meum uys_computer = o = /5 = 0.4 Pstuet = yes uys_computer = yes = 6/9 = Pstuet = yes uys_computer = o = /5 = 0. Pcret_ratg = a uys_computer = yes = 6/9 = Pcret_ratg = a uys_computer = o = /5 = 0.4 Compute P=x C usg the Nave Bayes assumpto P30, meum, yes, ar uys_computer = yes = 0. * * * = P30, meum, yes, ar uys_computer = o = 0.6 * 0.4 * 0. * 0.4 = 0.09 Compute al result P=x C * PC P=x uys_computer = yes * Puys_computer = yes = 0.08 P=x uys_computer = o * Puys_computer = o = Thereore we prect uys_computer = yes or put x = age = 30, come = meum, stuet = yes, cret_ratg = a Zero-Proalty Prolem Naïve Bayesa precto requres each cotoal proalty to e o-zero why? P x,, x C P x C P x C P x C P x C xample: 000 recors or uys_computer=yes wth come=low 0, come= meum 990, a come = hgh 0 For put wth come=low, cotoal proalty s zero Use Laplaca correcto or Laplace estmator y ag ummy recor to each come level Procome = low = /003 Procome = meum = 99/003 Procome = hgh = /003 Correcte proalty estmates close to ther ucorrecte couterparts, ut oe s zero 8 9 Naïve Bayesa Classer: Commets asy to mplemet Goo results otae may cases Roust to solate ose pots Hales mssg values y gorg the stace urg proalty estmate calculatos Roust to rrelevat attrutes savatages Assumpto: class cotoal epeece, thereore loss o accuracy Practcally, epeeces exst amog varales How to eal wth these epeeces? Proaltes Summary o elemetary proalty acts we have use alreay a/or wll ee soo Let e a raom varale as usual Let A e some precate over ts possle values A s true or some values o, alse or others.g., s outcome o throw o a e, A coul e value s greater tha 4 PA s the racto o possle worls whch A s true Pe value s greater tha 4 = / 6 = /3 0 7
18 0 PA PTrue = PFalse = 0 Axoms PA B = PA + PB - PA B Theorems rom the Axoms 0 PA, PTrue =, PFalse = 0 PA B = PA + PB - PA B From these we ca prove: Pot A = P~A = - PA PA = PA B + PA ~B 3 Cotoal Proalty PAB = Fracto o worls whch B s true that also have A true F H = Have a heaache F = Comg ow wth Flu PH = /0 PF = /40 PHF = / eto o Cotoal Proalty PA B PA B = PB Corollary: the Cha Rule H Heaaches are rare a lu s rarer, ut you re comg ow wth lu there s a chace you ll have a heaache. PA B = PA B PB 4 5 Multvalue Raom Varales Suppose ca tae o more tha values s a raom varale wth arty t ca tae o exactly oe value out o {v, v,, v } Thus P v v 0 P v v... v 6 asy Fact aout Multvalue Raom Varales Usg the axoms o proalty 0 PA, PTrue =, PFalse = 0 PA B = PA + PB - PA B A assumg that oeys P v v 0 P v v... v We ca prove that P v v... v P A thereore: P v v 7 8
19 Useul asy-to-prove Facts P A B P~ A B The Jot struto Recpe or mag a ot struto o varales: xample: Boolea varales A, B, C P v B 8 9 The Jot struto Recpe or mag a ot struto o varales: xample: Boolea varales A, B, C A B C The Jot struto Recpe or mag a ot struto o varales: xample: Boolea varales A, B, C A B C Pro Mae a truth tale lstg all comatos o values o your varales has rows or Boolea varales Mae a truth tale lstg all comatos o values o your varales has rows or Boolea varales.. For each comato o values, say how proale t s The Jot struto Recpe or mag a ot struto o varales: xample: Boolea varales A, B, C A B C Pro Usg the Jot st.. Mae a truth tale lstg all comatos o values o your varales has rows or Boolea varales.. For each comato o values, say how proale t s. 3. I you suscre to the axoms o proalty, those umers must sum to A C Oce you have the J you ca as or the proalty o ay logcal expresso volvg your attrute P Prow rows matchg 0.30 B
20 Usg the Jot st. Usg the Jot st. PPoor Male = P Prow rows matchg PPoor = P Prow rows matchg Ierece wth the Jot st. Ierece wth the Jot st. P P P rows matchg a rows matchg Prow Prow P P P rows matchg a rows matchg Prow Prow PMale Poor = / = Jot strutos What Woul Help? Goo ews: Oce you have a ot struto, you ca aswer mportat questos that volve ucertaty. Ba ews: Impossle to create ot struto or more tha aout te attrutes ecause there are so may umers eee whe you ul t. Full epeece Pgeer=g hours_wore=h wealth=w = Pgeer=g * Phours_wore=h * Pwealth=w Ca recostruct ull ot struto rom a ew margals Full cotoal epeece gve class value Naïve Bayes What aout somethg etwee Naïve Bayes a geeral ot struto?
21 Bayesa Bele Networs Suset o the varales cotoally epeet Graphcal moel o causal relatoshps Represets epeecy amog the varales Gves a speccato o ot proalty struto Z P Noes: raom varales Ls: epeecy a are the parets o Z, a s the paret o P Gve, Z a P are epeet Has o loops or cycles Bayesa Networ Propertes ach varale s cotoally epeet o ts o-esceets the graph, gve ts parets Naïve Bayes as a Bayesa etwor: 40 4 Geeral Propertes P,,3=P,3P3P3 P,,3=P3,PP Networ oes ot ecessarly relect causalty 3 3 Structural Property Mssg ls smply computato o P,,, Geeral: = P,,, Fully coecte: l etwee every par o oes Gve etwor: = P parets Some ls are mssg The terms P parets are gve as cotoal proalty tales CPT the etwor Sparse etwor allows etter estmato o CPT s ewer comatos o paret values, hece more relale to estmate rom lmte ata a aster computato 4 43 Small xample S: Stuet stues a lot or 60 L: Stuet lears a lot a gets a goo grae J: Stuet gets a great o PS = 0.4 S Computg PSJ Proalty that a stuet who got a great o was og her homewor PS J = PS, J / PJ PS, J = PS, J, L + PS, J, ~L PJ = PJ, S, L + PJ, S, ~L + PJ, ~S, L + PJ, ~S, ~L PJ, L, S = PJ L, S * PL, S = PJ L * PL S * PS = 0.8*0.9*0.4 PJ, ~L, S = PJ ~L, S * P~L, S = PJ ~L * P~L S * PS = 0.3*-0.9*0.4 PJ, L, ~S = PJ L, ~S * PL, ~S = PJ L * PL ~S * P~S = 0.8*0.*-0.4 PJ, ~L, ~S = PJ ~L, ~S * P~L, ~S = PJ ~L * P~L ~S * P~S = 0.3*-0.*- 0.4 L PLS = 0.9 PL~S = 0. Puttg ths all together, we ota: PH J = 0.8*0.9* *0.*0.4 / 0.8*0.9* *0.* *0.* *0.8*0.6 = 0.3 / 0.54 = 0.56 PJL = 0.8 PJ~L = 0.3 J 44 45
22 More Complex xample Computg wth Bayes Net S M? R T: The lecture starte o tme L: The lecturer arrves late R: The lecture cocers ata mg M: The lecturer s Me S: It s sowg PS=0.3 PLM, S=0.05 PLM, ~S=0. PL~M, S=0. PL~M, ~S=0. S L T M PTL=0.3 PT~L=0.8 PM=0.6 R PRM=0.3 PR~M=0.6 T: The lecture starte o tme L: The lecturer arrves late R: The lecture cocers ata mg M: The lecturer s Me S: It s sowg T L PT, ~R, L, ~M, S = PT L P~R ~M PL ~M, S P~M PS Computg wth Bayes Net PS=0.3 PLM, S=0.05 PLM, ~S=0. PL~M, S=0. PL~M, ~S=0. PR T, ~S = PR, T, ~S / PT, ~S S L T PR, T, ~S = PL, M, R, T, ~S + P~L, M, R, T, ~S + PL, ~M, R, T, ~S + P~L, ~M, R, T, ~S Compute PT, ~S smlarly. Prolem: There are ow 8 such terms to e compute. M PTL=0.3 PT~L=0.8 PM=0.6 R PRM=0.3 PR~M=0.6 T: The lecture starte o tme L: The lecturer arrves late R: The lecture cocers ata mg M: The lecturer s Me S: It s sowg Ierece wth Bayesa Networs Ca prect the proalty or ay attrute, gve ay suset o the other attrutes PM L, R, PT S, ~M, R a so o asy case: P,,, where parets {,,, } Ca rea aswer rectly rom s CPT What values are ot gve or all parets o? xact erece o proaltes geeral or a artrary Bayesa etwor s NP-har Solutos: proalstc erece, trae precso or ececy Trag Bayesa Networs Several scearos: Networ structure ow, all varales oservale: lear oly the CPTs Networ structure ow, some he varales: graet escet greey hll-clmg metho, aalogous to eural etwor learg Networ structure uow, all varales oservale: search through the moel space to recostruct etwor topology Uow structure, all he varales: No goo algorthms ow or ths purpose Re.:. Hecerma: Bayesa etwors or ata mg Classcato a Precto Overvew Itroucto ecso Trees Statstcal ecso Theory Nearest Neghor Bayesa Classcato Artcal Neural Networs Support Vector Maches SVMs Precto Accuracy a rror Measures semle Methos 50 5
23 Basc Bulg Bloc: Perceptro x x x Iput vector x w w w Weght vector w Calle the as Weghte sum + For xample x sg Actvato ucto w x Output y 53 x Perceptro ecso Hyperplae +w x +w x = 0 x Iput: {x, x, y, } Output: classcato ucto x x > 0: retur + x 0: retur = - ecso hyperplae: +w x = 0 Note: +w x > 0, a oly w x represets a threshol or whe the perceptro res. 54 Represetg Boolea Fuctos AN wth two-put perceptro =-0.8, w =w =0.5 OR wth two-put perceptro =-0.3, w =w =0.5 m-o- ucto: true at least m out o puts are true All put weghts 0.5, threshol weght s set accorg to m, Ca also represet NAN, R What aout OR? Perceptro Trag Rule Goal: correct +/- output or each trag recor Start wth raom weghts, costat learg rate Whle some trag recors are stll correctly classe o For each trag recor x, y Let ol x e the output o the curret perceptro or x Set := +, where = y - ol x For all, set w := w + w, where w = y - ol xx Coverges to correct ecso ouary, the classes are learly separale a a small eough s use Graet escet I trag recors are ot learly separale, est t approxmato Graet escet to search the space o possle weght vectors Bass or Bacpropagato algorthm Coser u-threshole perceptro o sg ucto apple,.e., ux = + w x Measure trag error y square error, w y u x = trag ata x, y 57 Graet escet Rule F weght vector that mmzes,w y alterg t recto o steepest escet Set,w :=,w +,w, where,w = -,w -,w= /, /w,, /w ] s the graet, hece : y u x x, y w : w w y u x x w x, y Start wth raom weghts, terate utl covergece Wll coverge to gloal mmum s small eough w -4 4 w,w w 3
24 Graet escet Summary Multlayer Feeorwar Networs poch upatg atch moe Compute graet over etre trag set Chages moel oce per sca o etre trag set Case upatg cremetal moe, stochastc graet escet Compute graet or a sgle trag recor Chages moel ater every sgle trag recor mmeately Case upatg ca approxmate epoch upatg artrarly close s small eough What s the erece etwee perceptro trag rule a case upatg or graet escet? rror computato o threshole vs. uthreshole ucto Use aother perceptro to come output o lower layer What aout lear uts oly? Ca oly costruct lear uctos! Nee olear compoet sg ucto: ot eretale graet escet! Use sgmo: x=/+e -x /+exp-x Perceptro ucto: y wx e Output layer He layer Iput layer He Layer ANN xample Mag Prectos x x w w w 3 w w w 3 v g v g v 3 g 3 N INS N INS N INS w x w x w 3 x w w 3 w Out g B N HI W v g s usually the sgmo ucto Iput recor e smultaeously to the uts o the put layer The weghte a e smultaeously to a he layer Weghte outputs o the last he layer are the put to the uts the output layer, whch emts the etwor's precto The etwor s ee-orwar Noe o the weghts cycles ac to a put ut or to a output ut o a prevous layer Statstcal pot o vew: eural etwors perorm olear regresso 6 6 Bacpropagato Algorthm arler scusso: graet escet or a sgle perceptro usg a smple u-threshole ucto I sgmo or other eretale ucto s apple to weghte sum, use complete ucto or graet escet Multple perceptros: optmze over all weghts o all perceptros Prolems: huge search space, local mma Bacpropagato Italze all weghts wth small raom values Iterate may tmes Compute graet, startg at output a worg ac rror o he ut h: how o we get the true output value? Use weghte sum o errors o each ut luece y h Upate all weghts the etwor Overttg Whe o we stop upatg the weghts? Overttg tes to happe later teratos Weghts tally small raom values Weghts all smlar => smooth ecso surace Surace complexty creases as weghts verge Prevetg overttg Weght ecay: ecrease each weght y small actor urg each terato, or Use valato ata to ece whe to stop teratg
25 Neural Networ ecso Bouary Source: Haste, Tshra, a Frema. The lemets o Statstcal Learg 65 Bacpropagato Remars Computatoal cost ach terato costs O*w, wth trag recors a w weghts Numer o teratos ca e expoetal, the umer o puts practce ote tes o thousas Local mma ca trap the graet escet algorthm: covergece guaratee to local mmum, ot gloal Bacpropagato hghly eectve practce May varats to eal wth local mma ssue, use o case upatg 66 eg a Networ. ece etwor topology #put uts, #he layers, #uts per he layer, #output uts oe output ut per class or prolems wth > classes. Normalze put values or each attrute to 0.0,.0] Nomal/oral attrutes: oe put ut per oma value For attrute grae wth values A, B, C, have 3 puts that are set to,0,0 or grae A, to 0,,0 or grae B, a 0,0, or C Why ot map t to a sgle put wth oma 0.0,.0]? 3. Choose learg rate, e.g., / #trag teratos Too small: taes too log to coverge Too large: mght ever coverge oversteps mmum 4. Ba results o test ata? Chage etwor topology, tal weghts, or learg rate try aga. Represetatoal Power Boolea uctos ach ca e represete y a -layer etwor Numer o he uts ca grow expoetally wth umer o puts Create he ut or each put recor Set ts weghts to actvate oly or that put Implemet output ut as OR gate that oly actvates or esre output patters Cotuous uctos very oue cotuous ucto ca e approxmate artrarly close y a -layer etwor Ay ucto ca e approxmate artrarly close y a 3-layer etwor Neural Networ as a Classer Weaesses Log trag tme May o-trval parameters, e.g., etwor topology Poor terpretalty: What s the meag eh leare weghts a he uts? Note: he uts are alteratve represetato o put values, capturg ther relevat eatures Stregths Hgh tolerace to osy ata Well-sute or cotuous-value puts a outputs Successul o a we array o real-worl ata Techques exst or extracto o rules rom eural etwors Classcato a Precto Overvew Itroucto ecso Trees Statstcal ecso Theory Nearest Neghor Bayesa Classcato Artcal Neural Networs Support Vector Maches SVMs Precto Accuracy a rror Measures semle Methos
26 SVM Support Vector Maches Newer a very popular classcato metho Uses a olear mappg to trasorm the orgal trag ata to a hgher meso Searches or the optmal separatg hyperplae.e., ecso ouary the ew meso SVM s ths hyperplae usg support vectors essetal trag recors a margs ee y the support vectors SVM Hstory a Applcatos Vap a colleagues 99 Grouwor rom Vap & Chervoes statstcal learg theory 960s Trag ca e slow ut accuracy s hgh Alty to moel complex olear ecso ouares marg maxmzato Use oth or classcato a precto Applcatos: hawrtte gt recogto, oect recogto, speaer etcato, echmarg tme-seres precto tests 7 73 Lear Classers Lear Classers eotes + x,w, = sgwx + eotes + x,w, = sgwx + eotes - eotes - How woul you classy ths ata? How woul you classy ths ata? Lear Classers Lear Classers eotes + x,w, = sgwx + eotes + x,w, = sgwx + eotes - eotes - How woul you classy ths ata? How woul you classy ths ata?
27 Lear Classers Classer Marg eotes + eotes - x,w, = sgwx + Ay o these woul e e....ut whch s est? eotes + eotes - x,w, = sgwx + ee the marg o a lear classer as the wth that the ouary coul e crease y eore httg a ata recor Maxmum Marg Maxmum Marg eotes + eotes - x,w, = sgwx + F the maxmum marg lear classer. eotes + eotes - x,w, = sgwx + Ths s the smplest o SVM, calle lear SVM or LSVM. Support Vectors are those atapots that the marg pushes up agast 80 8 Why Maxmum Marg? I we mae a small error the locato o the ouary, ths gves us the least chace o causg a msclasscato. Moel s mmue to removal o ay osupport-vector ata recors. There s some theory usg VC meso that s relate to ut ot the same as the proposto that ths s a goo thg. mprcally t wors very well. 8 Specyg a Le a Marg Plus-plae = { x : wx + = + } Mus-plae = { x : wx + = - } Plus-Plae Classy as + w x + - wx + - Classer Bouary Mus-Plae what - < wx + <? 83 7
28 Computg Marg Wth M = Marg Wth Computg Marg Wth x + M = Marg Wth x - Plus-plae = { x : wx + = + } Mus-plae = { x : wx + = - } Goal: compute M terms o w a Note: vector w s perpecular to plus-plae Coser two vectors u a v o plus-plae a show that wu-v=0 Hece t s also perpecular to the mus-plae Choose artrary pot x - o mus-plae Let x + e the pot plus-plae closest to x - Sce vector w s perpecular to these plaes, t hols that x + = x - + w, or some value o Puttg It All Together We have so ar: wx + + = + a wx - + = - x + = x - + w x + - x - = M ervato: wx - + w + = +, hece wx ww = Ths mples ww =,.e., = / ww Sce M = x + - x - = w = w = ww 0.5 We ota M = ww 0.5 / ww = / ww 0.5 Fg the Maxmum Marg How o we w a such that the marg s maxmze a all trag recors are the correct zoe or ther class? Soluto: Quaratc Programmg QP QP s a well-stue class o optmzato algorthms to maxmze a quaratc ucto o some real-value varales suect to lear costrats. There exst algorthms or g such costrae quaratc optma ecetly a relaly F Suect to arg max u a Quaratc Programmg T T u Ru c u a u a u... a u m m a u a u... a u A suect to a : m m a u a u... a u a e u a u a u a e : m m u... a u... a u... a m m m m e m m u u u Quaratc crtero atoal lear equalty costrats e e atoal lear equalty costrats 88 What Are the SVM Costrats? M w w What s the quaratc optmzato crtero? Coser trag recors x, y, where y = +/- How may costrats wll we have? What shoul they e? 89 8
29 What Are the SVM Costrats? What s the quaratc optmzato crtero? Mmze ww M w w Coser trag recors x, y, where y = +/- How may costrats wll we have?. What shoul they e? For each : wx +, y= wx + -, y=- eotes + eotes - Prolem: Classes Not Learly Separale Iequaltes or trag recors are ot satsale y ay w a 90 9 Soluto? Soluto? eotes + eotes - F mmum ww, whle also mmzg umer o trag set errors Not a well-ee optmzato prolem caot optmze two thgs at the same tme eotes + eotes - Mmze ww + C#traSetrrors C s a traeo parameter Prolems: Caot e expresse as QP, hece g soluto mght e slow oes ot stgush etwee sastrous errors a ear msses 9 93 eotes + eotes - Soluto 3 Mmze ww + Cstace o error recors to ther correct place Ths wors! But stll ee to o somethg aout the usatsale set o equaltes 94 What Are the SVM Costrats? What s the quaratc optmzato crtero? Mmze w w C 7 M w w ε Coser trag recors x, y, where y = +/- How may costrats wll we have?. What shoul they e? For each : wx+ -, y= wx+ -+, y=
30 Facts Aout the New Prolem Formulato Orgal QP ormulato ha + varales w, w,..., w a New QP ormulato has ++ varales w, w,..., w a,,..., C s a ew parameter that ees to e set or the SVM Cotrols traeo etwee payg atteto to marg sze versus msclasscatos 96 ect o Parameter C Source: Haste, Tshra, a Frema. The lemets o Statstcal Learg 97 A quvalet QP The ual Maxmze α ααl y y l x x l Suect to these costrats: The ee: w α y x l : 0 α AVG x w :0 C y C The classy wth: α y 0 x,w, = sgwx + 98 Importat Facts ual ormulato o QP ca e optmze more qucly, ut result s equvalet ata recors wth > 0 are the support vectors Those wth 0 < < C le o the plus- or mus-plae Those wth = C are o the wrog se o the classer ouary have > 0 Computato or w a oly epes o those recors wth > 0,.e., the support vectors Alteratve QP has aother maor avatage, as we wll see ow asy To Separate What woul SVMs o wth ths ata? asy To Separate Not a g surprse Postve plae Negatve plae
31 Harer To Separate What ca e oe aout ths? = Harer To Separate No-lear ass uctos: Orgal ata:, Trasorme:,, Th o as a ew attrute, e.g., 0 03 Now Separato Is asy Aga = Correspog Plaes Orgal Space Rego aove plus- plae Rego elow mus- plae Commo SVM Bass Fuctos Polyomal o attrutes,..., o certa max egree, e.g., 4 Raal ass ucto Symmetrc arou ceter,.e., KerelFucto - c / erelwth Sgmo ucto o, e.g., hyperolc taget Let x e the trasorme put recor Prevous example: x = x, x 06 x x : x x x : x Φx x x xx3 : xx xx3 : xx : x x Costat Term Lear Terms Pure Quaratc Terms Quaratc Cross-Terms Quaratc Bass Fuctos Numer o terms assumg put attrutes: +-choose- = ++/ / Why we choose ths specc trasormato? 07 3
32 3 ual QP Wth Bass Fuctos 08 Maxmze l l y y α α α l l x Φ x Φ Suect to these costrats: The ee: y α x Φ w w x Φ AVG :0 y C The classy wth: x,w, = sgwx + 0 y α C α 0 : Computato Challege Iput vector x has compoets ts attrute values The trasorme put vector x has / compoets Hece computg xxl ow costs orer / stea o orer operatos atos, multplcatos...or s there a etter way to o ths? Tae avatage o propertes o certa trasormatos 09 Quaratc ot Proucts 0 a a a a a a a a a a a a a a a a a a : : : : : : : : : : Φ Φ a a a a a Quaratc ot Proucts Φ Φ a a a a a Now coser aother ucto o a a : a a a a a a a a a a a a Quaratc ot Proucts The results o a a o a+ are etcal Computg a costs aout /, whle computg a+ costs oly aout + operatos Ths meas that we ca wor the hgh-mesoal space / mesos where the trag recors are more easly separale, ut pay aout the same cost as worg the orgal space mesos Savgs are eve greater whe ealg wth hgheregree polyomals,.e., egree q>, that ca e compute as a+ q Ay Other Computato Prolems? What aout computg w? Fally ee x,w, = sgwx + : Ca e compute usg the same trc as eore Ca apply the same trc aga to, ecause 3 y α x Φ w w x Φ AVG :0 y C Φ x Φ x Φ x w y α y α x Φ Φ x w x Φ
33 SVM Kerel Fuctos For whch trasormatos, calle erels, oes the same trc wor? Polyomal: Ka,=a +q Raal-Bass-style RBF: a K a, exp Neural-et-style sgmoal: K a, tah a q,,, a are magc parameters that must e chose y a moel selecto metho. Overttg Wth the rght erel ucto, computato hgh mesoal trasorme space s o prolem But what aout overttg? There seem to e so may parameters... Usually ot a prolem, ue to maxmum marg approach Oly the support vectors eterme the moel, hece SVM complexty epes o umer o support vectors, ot mesos stll, hgher mesos there mght e more support vectors Mmzg ww scourages extremely large weghts, whch smoothes the ucto recall weght ecay or eural etwors! 4 5 eret Kerels Source: Haste, Tshra, a Frema. The lemets o Statstcal Learg 6 Mult-Class Classcato SVMs ca oly hale two-class outputs.e. a categorcal output varale wth arty. Wth output arty N, lear N SVM s SVM lears Output== vs Output!= SVM lears Output== vs Output!= : SVM N lears Output==N vs Output!= N Prect wth each SVM a out whch oe puts the precto the urthest to the postve rego. 7 Why Is SVM ectve o Hgh mesoal ata? Complexty o trae classer s characterze y the umer o support vectors, ot mesoalty o the ata I all other trag recors are remove a trag s repeate, the same separatg hyperplae woul e ou The umer o support vectors ca e use to compute a upper ou o the expecte error rate o the SVM, whch s epeet o ata mesoalty Thus, a SVM wth a small umer o support vectors ca have goo geeralzato, eve whe the mesoalty o the ata s hgh SVM vs. Neural Networ SVM Relatvely ew cocept etermstc algorthm Nce Geeralzato propertes Har to tra leare atch moe usg quaratc programmg techques Usg erels ca lear very complex uctos Neural Networ Relatvely ol Noetermstc algorthm Geeralzes well ut oes t have strog mathematcal ouato Ca easly e leare cremetal asho To lear complex uctos use multlayer perceptro ot that trval
34 Classcato a Precto Overvew Itroucto ecso Trees Statstcal ecso Theory Nearest Neghor Bayesa Classcato Artcal Neural Networs Support Vector Maches SVMs Precto Accuracy a rror Measures semle Methos What Is Precto? ssetally the same as classcato, ut output s cotuous, ot screte Costruct a moel, the use moel to prect cotuous output value or a gve put Maor metho or precto: regresso May varats o regresso aalyss statstcs lterature ot covere ths class Neural etwor a -NN ca o regresso outo-the-ox SVMs or regresso exst What aout trees? Regresso Trees a Moel Trees Regresso tree: propose CART system Brema et al. 984 CART: Classcato A Regresso Trees ach lea stores a cotuous-value precto Average output value or the trag recors the lea Moel tree: propose y Qula 99 ach lea hols a regresso moel a multvarate lear equato Trag: le or classcato trees, ut uses varace stea o purty measure or selectg splt precates Classcato a Precto Overvew Itroucto ecso Trees Statstcal ecso Theory Nearest Neghor Bayesa Classcato Artcal Neural Networs Support Vector Maches SVMs Precto Accuracy a rror Measures semle Methos 3 4 Classer Accuracy Measures uy_computer = yes Precte class uy_computer = o Accuracy o a classer M, accm: percetage o test recors that are correctly classe y M rror rate msclasscato rate o M = accm Gve m classes, CM,], a etry a couso matrx, cates # o recors class that are laele y the classer as class total True class uy_computer = yes uy_computer = o total C C C True postve False egatve C False postve True egatve 5 Precso a Recall Precso: measure o exactess t-pos / t-pos + -pos Recall: measure o completeess t-pos / t-pos + -eg F-measure: comato o precso a recall * precso * recall / precso + recall Note: Accuracy = t-pos + t-eg / t-pos + t-eg + -pos + -eg 6 34
35 Lmtato o Accuracy Cost-Sestve Measures: Cost Matrx Coser a -class prolem Numer o Class 0 examples = 9990 Numer o Class examples = 0 C PRICT CLASS Class=es Class=No I moel prects everythg to e class 0, accuracy s 9990/0000 = 99.9 % Accuracy s msleag ecause moel oes ot etect ay class example Always prectg the maorty class ees the asele A goo classer shoul o etter tha asele ACTUAL CLASS Class=es Ceses CNoes Class=No CesNo CNoNo C : Cost o msclassyg class example as class 7 8 Computg Cost o Classcato Moel M ACTUAL CLASS Cost Matrx ACTUAL CLASS PRICT CLASS Accuracy = 80% Cost = PRICT CLASS C Moel M ACTUAL CLASS PRICT CLASS Accuracy = 90% Cost = 455 Precto rror Measures Cotuous output: t matters how ar o the precto s rom the true value Loss ucto: stace etwee y a precte value y Asolute error: y y Square error: y y Test error geeralzato error: average loss over the test set Mea asolute error: Mea square error: y y' y y ' Relatve asolute error: y y' Relatve square error: y y' Square-error exaggerates the presece o outlers y y y y 9 30 valuatg a Classer or Prector Holout metho The gve ata set s raomly parttoe to two sets Trag set e.g., /3 or moel costructo Test set e.g., /3 or accuracy estmato Ca repeat holout multple tmes Accuracy = avg. o the accuraces otae Cross-valato -ol, where = 0 s most popular Raomly partto ata to mutually exclusve susets, each approxmately equal sze I -th terato, use as test set a others as trag set Leave-oe-out: ols where = # o recors xpesve, ote results hgh varace o perormace metrc Learg Curve Accuracy versus sample sze ect o small sample sze: Bas estmate Varace o estmate Helps eterme how much trag ata s eee Stll ee to have eough test a valato ata to e represetatve o struto
36 ROC Recever Operatg Characterstc evelope 950s or sgal etecto theory to aalyze osy sgals Characterzes trae-o etwee postve hts a alse alarms ROC curve plots T-Pos rate y-axs agast F-Pos rate x-axs Perormace o each classer s represete as a pot o the ROC curve Chagg the threshol o the algorthm, sample struto or cost matrx chages the locato o the pot 33 ROC Curve -mesoal ata set cotag classes postve a egatve Ay pot locate at x > t s classe as postve At threshol t: TPR=0.5, FPR=0. 34 TPR, FPR: 0,0: eclare everythg to e egatve class,: eclare everythg to e postve class,0: eal agoal le: Raom guessg ROC Curve agoal Le or Raom Guessg Classy a recor as postve wth xe proalty p, rrespectve o attrute values Coser test set wth a postve a egatve recors True postves: p*a, hece true postve rate = p*a/a = p False postves: p*, hece alse postve rate = p*/ = p For every value 0p, we get pot p,p o ROC curve Usg ROC or Moel Comparso How to Costruct a ROC curve Nether moel cosstetly outperorms the other M etter or small FPR M etter or large FPR Area uer the ROC curve Ieal: area = Raom guess: area = 0.5 recor P+x True Class Use classer that prouces posteror proalty P+x or each test recor x Sort recors accorg to P+x ecreasg orer Apply threshol at each uque value o P+x Cout umer o TP, FP, TN, FN at each threshol TP rate, TPR = TP/TP+FN FP rate, FPR = FP/FP+TN
37 How To Costruct A ROC Curve Class P Threshol >= TP FP TN FN TPR FPR true postve rate ROC Curve: Test o Sgcace Gve two moels: Moel M: accuracy = 85%, teste o 30 staces Moel M: accuracy = 75%, teste o 5000 staces Ca we say M s etter tha M? How much coece ca we place o accuracy o M a M? Ca the erece accuracy e explae as a result o raom luctuatos the test set? alse postve rate Coece Iterval or Accuracy Classcato ca e regare as a Beroull tral A Beroull tral has possle outcomes, correct or wrog or classcato Collecto o Beroull trals has a Bomal struto Proalty o gettg c correct prectos moel accuracy s p =proalty to get a sgle precto rght: c c p p c Gve c, or equvaletly, ACC = c / a #test recors, ca we prect p, the true accuracy o the moel? 4 Coece Iterval or Accuracy Bomal struto or = umer o correctly classe test recors out o =p, Var=p-p Accuracy = / ACC = p, VarACC = p-p / For large test sets >30, Bomal struto s closely approxmate y ormal struto wth same mea a varace ACC has a ormal struto wth mea=p, varace=p-p/ ACC p P Z / Z / p p / Coece Iterval or p: ACC Z p / Z Area = - Z / Z - / / Z 4ACC 4ACC / 4 Coece Iterval or Accuracy Coser a moel that prouces a accuracy o 80% whe evaluate o 00 test staces = 00, ACC = 0.8 Let - = % coece From proalty tale, Z / =.96 - Z N plower pupper ACC Z p / Z / Z 4ACC 4ACC / Comparg Perormace o Two Moels Gve two moels M a M, whch s etter? M s teste o sze=, ou error rate = e M s teste o sze=, ou error rate = e Assume a are epeet I a are sucetly large, the err ~ N, err ~ N, stmate: e e ˆ e ˆ a 44 37
38 Testg Sgcace o Accuracy erece Coser raom varale = err err Sce err, err are ormally strute, so s ther erece Hece ~ N t, t where t s the true erece stmator or t : ] = err -err ] = err ] err ] e - e Sce a are epeet, varace as up: e e e e ˆ ˆ ˆ t At - coece level, t ] Z ˆ / t A Illustratve xample Gve: M: = 30, e = 0.5 M: = 5000, e = 0.5 ] = e e = 0. -se test: t = 0 versus t ˆ t At 95% coece level, Z / =.96 t Iterval cotas zero, hece erece may ot e statstcally sgcat But: may reect ull hypothess t 0 at lower coece level Sgcace Test or K-Fol Cross- Valato ach learg algorthm prouces moels: L prouces M, M,, M L prouces M, M,, M Both moels are teste o the same test sets,,, For each test set, compute = e, e, For large eough, s ormally strute wth mea t a varace t stmate: ˆ t t t ˆ, t t-struto: get t coecet t -,- rom tale y loog up coece level - a egrees o reeom - 47 Classcato a Precto Overvew Itroucto ecso Trees Statstcal ecso Theory Nearest Neghor Bayesa Classcato Artcal Neural Networs Support Vector Maches SVMs Precto Accuracy a rror Measures semle Methos 48 semle Methos Geeral Iea Costruct a set o classers rom the trag ata Prect class lael o prevously usee recors y aggregatg prectos mae y multple classers Step : Create Multple ata Sets Step : Bul Multple Classers Orgal Trag ata... t- t C C C t - C t Step 3: Come Classers C *
39 Why oes It Wor? Base Classer vs. semle rror Coser -class prolem Suppose there are 5 ase classers ach classer has error rate = 0.35 Assume the classers are epeet Retur maorty vote o the 5 classers Proalty that the esemle classer maes a wrog precto: Moel Averagg a Bas-Varace Traeo Sgle moel: lowerg as wll usually crease varace Smoother moel has lower varace ut mght ot moel ucto well eough semles ca overcome ths prolem. Let moels overt Low as, hgh varace. Tae care o the varace prolem y averagg may o these moels Ths s the asc ea eh aggg Baggg: Bootstrap Aggregato Gve trag set wth recors, sample recors raomly wth replacemet Orgal ata Baggg Rou Baggg Rou Baggg Rou Tra classer or each ootstrap sample Note: each trag recor has proalty / o eg selecte at least oce a sample o sze Bagge Trees Typcal Result Create trees rom trag ata Bootstrap sample, grow large trees esg goal: epeet moels, hgh varalty etwee moels semle precto = average o vual tree prectos or maorty vote Wors the same way or other classers / + / + + /
40 Typcal Result Typcal Result Baggg Challeges Ieal case: all moels epeet o each other Tra o epeet ata samples Prolem: lmte amout o trag ata Trag set ees to e represetatve o ata struto Bootstrap samplg allows creato o may almost epeet trag sets versy moels, ecause smlar sample mght result smlar tree Raom Forest: lmt choce o splt attrutes to small raom suset o attrutes ew selecto o suset or each oe whe trag tree Use eret moel types same esemle: tree, ANN, SVM, regresso moels Atve Grove semle techque or prectg cotuous output Istea o vual trees, tra atve moels Precto o sgle Grove moel = sum o tree prectos Precto o esemle = average o vual Grove prectos Comes large trees a atve moels Challege: how to tra the atve moels wthout havg the rst trees t the trag ata too well Next tree s trae o resuals o prevously trae trees same Grove moel I prevously trae trees capture trag ata too well, ext tree s mostly trae o ose / / / Trag Groves Typcal Grove Perormace Root mea square error Lower s etter Horzotal axs: tree sze Fracto o trag ata whe to stop splttg Vertcal axs: umer o trees each sgle Grove moel 00 aggg teratos
41 Boostg Iteratve proceure to aaptvely chage struto o trag ata y ocusg more o prevously msclasse recors Itally, all recors are assge equal weghts Recor weghts may chage at the e o each oostg rou Boostg Recors that are wrogly classe wll have ther weghts crease Recors that are classe correctly wll have ther weghts ecrease Orgal ata Boostg Rou Boostg Rou Boostg Rou Assume recor 4 s har to classy Its weght s crease, thereore t s more lely to e chose aga susequet rous xample: AaBoost Base classers: C, C,, C T rror rate trag recors, w are weghts that sum to : w C x y Importace o a classer: l 65 Weght upate: AaBoost etals w C x y w Z C x y where Z s the ormalzato actor Weghts talze to / Z esures that weghts a to I ay termeate rous prouce error rate hgher tha 50%, the weghts are reverte ac to / a the resamplg proceure s repeate Fal classcato: T C * x arg max C x y y 66 Orgal Boostg Illustratg AaBoost Ital weghts or each ata pot ata B Rou ata pots or trag New weghts =.9459 Illustratg AaBoost B Boostg Rou Boostg B Rou B Boostg Rou =.9459 =.933 = Note: The umers appear to e wrog, ut they covey the rght ea 67 Overall Note: The umers appear to e wrog, ut they covey the rght ea 68 4
42 Baggg vs. Boostg Aalogy Baggg: agoss ase o multple octors maorty vote Boostg: weghte vote, ase o octors prevous agoss accuracy Samplg proceure Baggg: recors have same weght easy to tra parallel Boostg: weghts recor hgher moel prects t wrog heretly sequetal process Overttg Baggg roust agast overttg Boostg susceptle to overttg: mae sure vual moels o ot overt Accuracy usually sgcatly etter tha a sgle classer Best ooste moel ote etter tha est agge moel Atve Grove Comes stregths o aggg a oostg atve moels Show emprcally to mae etter prectos o may ata sets Trag more trcy, especally whe ata s very osy Classcato/Precto Summary Forms o ata aalyss that ca e use to tra moels rom ata a the mae prectos or ew recors ectve a scalale methos have ee evelope or ecso tree ucto, Nave Bayesa classcato, Bayesa etwors, rule-ase classers, Bacpropagato, Support Vector Maches SVM, earest eghor classers, a may other classcato methos Regresso moels are popular or precto. Regresso trees, moel trees, a ANNs are also use or precto Classcato/Precto Summary K-ol cross-valato s a popular metho or accuracy estmato, ut etermg accuracy o large test set s equally accepte I test sets are large eough, a sgcace test or g the est moel s ot ecessary Area uer ROC curve a may other commo perormace measures exst semle methos le aggg a oostg ca e use to crease overall accuracy y learg a comg a seres o vual moels Ote state-o-the-art precto qualty, ut expesve to tra, store, use No sgle metho s superor over all others or all ata sets Issues such as accuracy, trag a precto tme, roustess, terpretalty, a scalalty must e cosere a ca volve trae-os 7 4
Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.
Computatoal Geometry Chapter 6 Pot Locato 1 Problem Defto Preprocess a plaar map S. Gve a query pot p, report the face of S cotag p. S Goal: O()-sze data structure that eables O(log ) query tme. C p E
STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1
STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ
6.7 Network analysis. 6.7.1 Introduction. References - Network analysis. Topological analysis
6.7 Network aalyss Le data that explctly store topologcal formato are called etwork data. Besdes spatal operatos, several methods of spatal aalyss are applcable to etwork data. Fgure: Network data Refereces
Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology
I The Name of God, The Compassoate, The ercful Name: Problems' eys Studet ID#:. Statstcal Patter Recogto (CE-725) Departmet of Computer Egeerg Sharf Uversty of Techology Fal Exam Soluto - Sprg 202 (50
Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering
Moder Appled Scece October, 2009 Applcatos of Support Vector Mache Based o Boolea Kerel to Spam Flterg Shugag Lu & Keb Cu School of Computer scece ad techology, North Cha Electrc Power Uversty Hebe 071003,
Numerical Methods with MS Excel
TMME, vol4, o.1, p.84 Numercal Methods wth MS Excel M. El-Gebely & B. Yushau 1 Departmet of Mathematcal Sceces Kg Fahd Uversty of Petroleum & Merals. Dhahra, Saud Araba. Abstract: I ths ote we show how
APPENDIX III THE ENVELOPE PROPERTY
Apped III APPENDIX III THE ENVELOPE PROPERTY Optmzato mposes a very strog structure o the problem cosdered Ths s the reaso why eoclasscal ecoomcs whch assumes optmzg behavour has bee the most successful
Average Price Ratios
Average Prce Ratos Morgstar Methodology Paper August 3, 2005 2005 Morgstar, Ic. All rghts reserved. The formato ths documet s the property of Morgstar, Ic. Reproducto or trascrpto by ay meas, whole or
ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data
ANOVA Notes Page Aalss of Varace for a Oe-Wa Classfcato of Data Cosder a sgle factor or treatmet doe at levels (e, there are,, 3, dfferet varatos o the prescrbed treatmet) Wth a gve treatmet level there
IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki
IDENIFICAION OF HE DYNAMICS OF HE GOOGLE S RANKING ALGORIHM A. Khak Sedgh, Mehd Roudak Cotrol Dvso, Departmet of Electrcal Egeerg, K.N.oos Uversty of echology P. O. Box: 16315-1355, ehra, Ira [email protected],
A Hierarchical Latent Variable Model for Data Visualization
IEEE ASACIOS O PAE AAYSIS AD MACHIE IEIGECE, VO. 0, O. 3, MACH 998 8 A Herarchcal atet Varable Moel for Data Vsualzato Chrstopher M. Bshop a Mchael E. ppg Abstract Vsualzato has prove to be a powerful
THE EFFECT OF SHAPE FACTOR ON THE AVERAGE BED SHEAR STRESS IN OPEN CHANNEL FLOW
roceegs o the 4 th Iteratoal oerece o Evrometal cece a Techology hoes, Greece, 3-5 eptemer 05 THE EFFET OF HE FTO ON THE EGE BED HE TE IN OEN HNNEL FLOW HDDIN N. Departmet o vl Egeerg, The Uversty o Jora,
Chapter Eight. f : R R
Chapter Eght f : R R 8. Itroducto We shall ow tur our atteto to the very mportat specal case of fuctos that are real, or scalar, valued. These are sometmes called scalar felds. I the very, but mportat,
The Time Value of Money
The Tme Value of Moey 1 Iversemet Optos Year: 1624 Property Traded: Mahatta Islad Prce : $24.00, FV of $24 @ 6%: FV = $24 (1+0.06) 388 = $158.08 bllo Opto 1 0 1 2 3 4 5 t ($519.37) 0 0 0 0 $1,000 Opto
Speeding up k-means Clustering by Bootstrap Averaging
Speedg up -meas Clusterg by Bootstrap Averagg Ia Davdso ad Ashw Satyaarayaa Computer Scece Dept, SUNY Albay, NY, USA,. {davdso, ashw}@cs.albay.edu Abstract K-meas clusterg s oe of the most popular clusterg
Statistical Intrusion Detector with Instance-Based Learning
Iformatca 5 (00) xxx yyy Statstcal Itruso Detector wth Istace-Based Learg Iva Verdo, Boja Nova Faulteta za eletroteho raualštvo Uverza v Marboru Smetaova 7, 000 Marbor, Sloveja [email protected] eywords:
On Error Detection with Block Codes
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 3 Sofa 2009 O Error Detecto wth Block Codes Rostza Doduekova Chalmers Uversty of Techology ad the Uversty of Gotheburg,
CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning
CIS63 - Artfcal Itellgece Logstc regresso Vasleos Megalookoomou some materal adopted from otes b M. Hauskrecht Supervsed learg Data: D { d d.. d} a set of eamples d < > s put vector ad s desred output
The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev
The Gompertz-Makeham dstrbuto by Fredrk Norström Master s thess Mathematcal Statstcs, Umeå Uversty, 997 Supervsor: Yur Belyaev Abstract Ths work s about the Gompertz-Makeham dstrbuto. The dstrbuto has
Constrained Cubic Spline Interpolation for Chemical Engineering Applications
Costraed Cubc Sple Iterpolato or Chemcal Egeerg Applcatos b CJC Kruger Summar Cubc sple terpolato s a useul techque to terpolate betwee kow data pots due to ts stable ad smooth characterstcs. Uortuatel
1. The Time Value of Money
Corporate Face [00-0345]. The Tme Value of Moey. Compoudg ad Dscoutg Captalzato (compoudg, fdg future values) s a process of movg a value forward tme. It yelds the future value gve the relevat compoudg
Integrating Production Scheduling and Maintenance: Practical Implications
Proceedgs of the 2012 Iteratoal Coferece o Idustral Egeerg ad Operatos Maagemet Istabul, Turkey, uly 3 6, 2012 Itegratg Producto Schedulg ad Mateace: Practcal Implcatos Lath A. Hadd ad Umar M. Al-Turk
Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity
Computer Aded Geometrc Desg 19 (2002 365 377 wwwelsevercom/locate/comad Optmal mult-degree reducto of Bézer curves wth costrats of edpots cotuty Guo-Dog Che, Guo-J Wag State Key Laboratory of CAD&CG, Isttute
A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree
, pp.277-288 http://dx.do.org/10.14257/juesst.2015.8.1.25 A New Bayesa Network Method for Computg Bottom Evet's Structural Importace Degree usg Jotree Wag Yao ad Su Q School of Aeroautcs, Northwester Polytechcal
Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract
Preset Value of Autes Uder Radom Rates of Iterest By Abraham Zas Techo I.I.T. Hafa ISRAEL ad Uversty of Hafa, Hafa ISRAEL Abstract Some attempts were made to evaluate the future value (FV) of the expected
The Digital Signature Scheme MQQ-SIG
The Dgtal Sgature Scheme MQQ-SIG Itellectual Property Statemet ad Techcal Descrpto Frst publshed: 10 October 2010, Last update: 20 December 2010 Dalo Glgorosk 1 ad Rue Stesmo Ødegård 2 ad Rue Erled Jese
SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN
SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN Wojcech Zelńsk Departmet of Ecoometrcs ad Statstcs Warsaw Uversty of Lfe Sceces Nowoursyowska 66, -787 Warszawa e-mal: wojtekzelsk@statystykafo Zofa Hausz,
Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK
Fractal-Structured Karatsuba`s Algorthm for Bary Feld Multplcato: FK *The authors are worg at the Isttute of Mathematcs The Academy of Sceces of DPR Korea. **Address : U Jog dstrct Kwahadog Number Pyogyag
T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :
Bullets bods Let s descrbe frst a fxed rate bod wthout amortzg a more geeral way : Let s ote : C the aual fxed rate t s a percetage N the otoal freq ( 2 4 ) the umber of coupo per year R the redempto of
10.5 Future Value and Present Value of a General Annuity Due
Chapter 10 Autes 371 5. Thomas leases a car worth $4,000 at.99% compouded mothly. He agrees to make 36 lease paymets of $330 each at the begg of every moth. What s the buyout prce (resdual value of the
A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis
IEMS Vol. 4, No., pp. 0-08, Jue 005. A Comparatve Study o Medcal Data Classcato Methods Based o Decso Tree ad System Recostructo Aalyss Tzug-I Tag Departmet o Iormato & Electroc Commerce Kaa Uversty, Tawa
RUSSIAN ROULETTE AND PARTICLE SPLITTING
RUSSAN ROULETTE AND PARTCLE SPLTTNG M. Ragheb 3/7/203 NTRODUCTON To stuatos are ecoutered partcle trasport smulatos:. a multplyg medum, a partcle such as a eutro a cosmc ray partcle or a photo may geerate
Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts
Optmal replacemet ad overhaul decsos wth mperfect mateace ad warraty cotracts R. Pascual Departmet of Mechacal Egeerg, Uversdad de Chle, Caslla 2777, Satago, Chle Phoe: +56-2-6784591 Fax:+56-2-689657 [email protected]
The simple linear Regression Model
The smple lear Regresso Model Correlato coeffcet s o-parametrc ad just dcates that two varables are assocated wth oe aother, but t does ot gve a deas of the kd of relatoshp. Regresso models help vestgatg
Classic Problems at a Glance using the TVM Solver
C H A P T E R 2 Classc Problems at a Glace usg the TVM Solver The table below llustrates the most commo types of classc face problems. The formulas are gve for each calculato. A bref troducto to usg the
Conversion of Non-Linear Strength Envelopes into Generalized Hoek-Brown Envelopes
Covero of No-Lear Stregth Evelope to Geeralzed Hoek-Brow Evelope Itroducto The power curve crtero commoly ued lmt-equlbrum lope tablty aaly to defe a o-lear tregth evelope (relatohp betwee hear tre, τ,
Settlement Prediction by Spatial-temporal Random Process
Safety, Relablty ad Rs of Structures, Ifrastructures ad Egeerg Systems Furuta, Fragopol & Shozua (eds Taylor & Fracs Group, Lodo, ISBN 978---77- Settlemet Predcto by Spatal-temporal Radom Process P. Rugbaapha
Robust Realtime Face Recognition And Tracking System
JCS& Vol. 9 No. October 9 Robust Realtme Face Recogto Ad rackg System Ka Che,Le Ju Zhao East Cha Uversty of Scece ad echology Emal:[email protected] Abstract here s some very mportat meag the study of realtme
CSSE463: Image Recognition Day 27
CSSE463: Image Recogto Da 27 Ths week Toda: Alcatos of PCA Suda ght: roject las ad relm work due Questos? Prcal Comoets Aalss weght grth c ( )( ) ( )( ( )( ) ) heght sze Gve a set of samles, fd the drecto(s)
Real-Time Scheduling Analysis
DOT/FAA/AR-05/7 Real-Tme Scheulg Aalyss Offce of Avato Research a Developmet Washgto, D.C. 059 November 005 Fal Report Ths ocumet s avalable to the U.S. publc through the Natoal Techcal Iformato Servce
ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil
ECONOMIC CHOICE OF OPTIMUM FEEDER CABE CONSIDERING RISK ANAYSIS I Camargo, F Fgueredo, M De Olvera Uversty of Brasla (UB) ad The Brazla Regulatory Agecy (ANEE), Brazl The choce of the approprate cable
Credibility Premium Calculation in Motor Third-Party Liability Insurance
Advaces Mathematcal ad Computatoal Methods Credblty remum Calculato Motor Thrd-arty Lablty Isurace BOHA LIA, JAA KUBAOVÁ epartmet of Mathematcs ad Quattatve Methods Uversty of ardubce Studetská 95, 53
Defining Perfect Location Privacy Using Anonymization
Defg Perfect Locato Prvacy Usg Aoymzato Zarr otazer Electrcal a Computer Egeerg Departmet Uversty of assachusetts Amherst, assachusetts Emal: [email protected] Amr Houmasar College of Iformato a Computer
An IG-RS-SVM classifier for analyzing reviews of E-commerce product
Iteratoal Coferece o Iformato Techology ad Maagemet Iovato (ICITMI 205) A IG-RS-SVM classfer for aalyzg revews of E-commerce product Jaju Ye a, Hua Re b ad Hagxa Zhou c * College of Iformato Egeerg, Cha
An Operating Precision Analysis Method Considering Multiple Error Sources of Serial Robots
MAEC Web of Cofereces 35, 02013 ( 2015) DOI: 10.1051/ mateccof/ 2015 3502013 C Owe by the authors, publshe by EDP Sceces, 2015 A Operatg Precso Aalyss Metho Coserg Multple Error Sources of Seral Robots
The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0
Chapter 2 Autes ad loas A auty s a sequece of paymets wth fxed frequecy. The term auty orgally referred to aual paymets (hece the ame), but t s ow also used for paymets wth ay frequecy. Autes appear may
STATIC ANALYSIS OF TENSEGRITY STRUCTURES
SI NYSIS O ENSEGIY SUUES JUIO ES OE HESIS PESENED O HE GDUE SHOO O HE UNIVESIY O OID IN PI UIEN O HE EQUIEENS O HE DEGEE O SE O SIENE UNIVESIY O OID o m mother for her fte geerost. KNOWEDGENS I wat to
of the relationship between time and the value of money.
TIME AND THE VALUE OF MONEY Most agrbusess maagers are famlar wth the terms compoudg, dscoutg, auty, ad captalzato. That s, most agrbusess maagers have a tutve uderstadg that each term mples some relatoshp
Curve Fitting and Solution of Equation
UNIT V Curve Fttg ad Soluto of Equato 5. CURVE FITTING I ma braches of appled mathematcs ad egeerg sceces we come across epermets ad problems, whch volve two varables. For eample, t s kow that the speed
FINANCIAL MATHEMATICS 12 MARCH 2014
FINNCIL MTHEMTICS 12 MRCH 2014 I ths lesso we: Lesso Descrpto Make use of logarthms to calculate the value of, the tme perod, the equato P1 or P1. Solve problems volvg preset value ad future value autes.
Bayesian Network Representation
Readgs: K&F 3., 3.2, 3.3, 3.4. Bayesa Network Represetato Lecture 2 Mar 30, 20 CSE 55, Statstcal Methods, Sprg 20 Istructor: Su-I Lee Uversty of Washgto, Seattle Last tme & today Last tme Probablty theory
Measuring the Quality of Credit Scoring Models
Measur the Qualty of Credt cor Models Mart Řezáč Dept. of Matheatcs ad tatstcs, Faculty of cece, Masaryk Uversty CCC XI, Edurh Auust 009 Cotet. Itroducto 3. Good/ad clet defto 4 3. Measur the qualty 6
Load Balancing Control for Parallel Systems
Proc IEEE Med Symposum o New drectos Cotrol ad Automato, Chaa (Grèce),994, pp66-73 Load Balacg Cotrol for Parallel Systems Jea-Claude Heet LAAS-CNRS, 7 aveue du Coloel Roche, 3077 Toulouse, Frace E-mal
MDM 4U PRACTICE EXAMINATION
MDM 4U RCTICE EXMINTION Ths s a ractce eam. It does ot cover all the materal ths course ad should ot be the oly revew that you do rearato for your fal eam. Your eam may cota questos that do ot aear o ths
Simple Linear Regression
Smple Lear Regresso Regresso equato a equato that descrbes the average relatoshp betwee a respose (depedet) ad a eplaator (depedet) varable. 6 8 Slope-tercept equato for a le m b (,6) slope. (,) 6 6 8
An Evaluation of Naive Bayesian Anti-Spam Filtering
Proceedgs of the workshop o Mache earg the New Iformato Age, G. Potamas, V. Moustaks ad M. va omere (eds.), th Europea Coferece o Mache earg, Barceloa, pa, pp. 9-7, 2000. A Evaluato of Nave Bayesa At-pam
Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software
J. Software Egeerg & Applcatos 3 63-69 do:.436/jsea..367 Publshed Ole Jue (http://www.scrp.org/joural/jsea) Dyamc Two-phase Trucated Raylegh Model for Release Date Predcto of Software Lafe Qa Qgchua Yao
Using Phase Swapping to Solve Load Phase Balancing by ADSCHNN in LV Distribution Network
Iteratoal Joural of Cotrol ad Automato Vol.7, No.7 (204), pp.-4 http://dx.do.org/0.4257/jca.204.7.7.0 Usg Phase Swappg to Solve Load Phase Balacg by ADSCHNN LV Dstrbuto Network Chu-guo Fe ad Ru Wag College
Group Nearest Neighbor Queries
Group Nearest Neghbor Queres Dmtrs Papadas Qogmao She Yufe Tao Kyrakos Mouratds Departmet of Computer Scece Hog Kog Uversty of Scece ad Techology Clear Water Bay, Hog Kog {dmtrs, qmshe, kyrakos}@cs.ust.hk
Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems
Fto: A Faster, Permutable Icremetal Gradet Method for Bg Data Problems Aaro J Defazo Tbéro S Caetao Just Domke NICTA ad Australa Natoal Uversty AARONDEFAZIO@ANUEDUAU TIBERIOCAETANO@NICTACOMAU JUSTINDOMKE@NICTACOMAU
Relaxation Methods for Iterative Solution to Linear Systems of Equations
Relaxato Methods for Iteratve Soluto to Lear Systems of Equatos Gerald Recktewald Portlad State Uversty Mechacal Egeerg Departmet [email protected] Prmary Topcs Basc Cocepts Statoary Methods a.k.a. Relaxato
Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds.
Proceedgs of the 21 Wter Smulato Coferece B. Johasso, S. Ja, J. Motoya-Torres, J. Huga, ad E. Yücesa, eds. EMPIRICAL METHODS OR TWO-ECHELON INVENTORY MANAGEMENT WITH SERVICE LEVEL CONSTRAINTS BASED ON
Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =
Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS Objectves of the Topc: Beg able to formalse ad solve practcal ad mathematcal problems, whch the subjects of loa amortsato ad maagemet of cumulatve fuds are
A PRACTICAL SOFTWARE TOOL FOR GENERATOR MAINTENANCE SCHEDULING AND DISPATCHING
West Ida Joural of Egeerg Vol. 30, No. 2, (Jauary 2008) Techcal aper (Sharma & Bahadoorsgh) 57-63 A RACTICAL SOFTWARE TOOL FOR GENERATOR MAINTENANCE SCHEDULING AND DISATCHING C. Sharma & S. Bahadoorsgh
Common p-belief: The General Case
GAMES AND ECONOMIC BEHAVIOR 8, 738 997 ARTICLE NO. GA97053 Commo p-belef: The Geeral Case Atsush Kaj* ad Stephe Morrs Departmet of Ecoomcs, Uersty of Pesylaa Receved February, 995 We develop belef operators
Using Data Mining Techniques to Predict Product Quality from Physicochemical Data
Usg Data Mg Techques to Predct Product Qualty from Physcochemcal Data A. Nachev 1, M. Hoga 1 1 Busess Iformato Systems, Cares Busess School, NUI, Galway, Irelad Abstract - Product qualty certfcato s sometmes
M. Salahi, F. Mehrdoust, F. Piri. CVaR Robust Mean-CVaR Portfolio Optimization
M. Salah, F. Mehrdoust, F. Pr Uversty of Gula, Rasht, Ira CVaR Robust Mea-CVaR Portfolo Optmzato Abstract: Oe of the most mportat problems faced by every vestor s asset allocato. A vestor durg makg vestmet
Reinsurance and the distribution of term insurance claims
Resurace ad the dstrbuto of term surace clams By Rchard Bruyel FIAA, FNZSA Preseted to the NZ Socety of Actuares Coferece Queestow - November 006 1 1 Itroducto Ths paper vestgates the effect of resurace
The Present Value of an Annuity
Module 4.4 Page 492 of 944. Module 4.4: The Preset Value of a Auty Here we wll lear about a very mportat formula: the preset value of a auty. Ths formula s used wheever there s a seres of detcal paymets
CHAPTER 2. Time Value of Money 6-1
CHAPTER 2 Tme Value of Moey 6- Tme Value of Moey (TVM) Tme Les Future value & Preset value Rates of retur Autes & Perpetutes Ueve cash Flow Streams Amortzato 6-2 Tme les 0 2 3 % CF 0 CF CF 2 CF 3 Show
Three Dimensional Interpolation of Video Signals
Three Dmesoal Iterpolato of Vdeo Sgals Elham Shahfard March 0 th 006 Outle A Bref reve of prevous tals Dgtal Iterpolato Bascs Upsamplg D Flter Desg Issues Ifte Impulse Respose Fte Impulse Respose Desged
A particle swarm optimization to vehicle routing problem with fuzzy demands
A partcle swarm optmzato to vehcle routg problem wth fuzzy demads Yag Peg, Ye-me Qa A partcle swarm optmzato to vehcle routg problem wth fuzzy demads Yag Peg 1,Ye-me Qa 1 School of computer ad formato
Optimal Packetization Interval for VoIP Applications Over IEEE 802.16 Networks
Optmal Packetzato Iterval for VoIP Applcatos Over IEEE 802.16 Networks Sheha Perera Harsha Srsea Krzysztof Pawlkowsk Departmet of Electrcal & Computer Egeerg Uversty of Caterbury New Zealad [email protected]
How To Value An Annuity
Future Value of a Auty After payg all your blls, you have $200 left each payday (at the ed of each moth) that you wll put to savgs order to save up a dow paymet for a house. If you vest ths moey at 5%
ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN
Colloquum Bometrcum 4 ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN Zofa Hausz, Joaa Tarasńska Departmet of Appled Mathematcs ad Computer Scece Uversty of Lfe Sceces Lubl Akademcka 3, -95 Lubl
Software Aging Prediction based on Extreme Learning Machine
TELKOMNIKA, Vol.11, No.11, November 2013, pp. 6547~6555 e-issn: 2087-278X 6547 Software Agg Predcto based o Extreme Learg Mache Xaozh Du 1, Hum Lu* 2, Gag Lu 2 1 School of Software Egeerg, X a Jaotog Uversty,
Beta. A Statistical Analysis of a Stock s Volatility. Courtney Wahlstrom. Iowa State University, Master of School Mathematics. Creative Component
Beta A Statstcal Aalyss of a Stock s Volatlty Courtey Wahlstrom Iowa State Uversty, Master of School Mathematcs Creatve Compoet Fall 008 Amy Froelch, Major Professor Heather Bolles, Commttee Member Travs
Software Reliability Index Reasonable Allocation Based on UML
Sotware Relablty Idex Reasoable Allocato Based o UML esheg Hu, M.Zhao, Jaeg Yag, Guorog Ja Sotware Relablty Idex Reasoable Allocato Based o UML 1 esheg Hu, 2 M.Zhao, 3 Jaeg Yag, 4 Guorog Ja 1, Frst Author
Green Master based on MapReduce Cluster
Gree Master based o MapReduce Cluster Mg-Zh Wu, Yu-Chag L, We-Tsog Lee, Yu-Su L, Fog-Hao Lu Dept of Electrcal Egeerg Tamkag Uversty, Tawa, ROC Dept of Electrcal Egeerg Tamkag Uversty, Tawa, ROC Dept of
AN ALGORITHM ABOUT PARTNER SELECTION PROBLEM ON CLOUD SERVICE PROVIDER BASED ON GENETIC
Joural of Theoretcal ad Appled Iformato Techology 0 th Aprl 204. Vol. 62 No. 2005-204 JATIT & LLS. All rghts reserved. ISSN: 992-8645 www.jatt.org E-ISSN: 87-395 AN ALGORITHM ABOUT PARTNER SELECTION PROBLEM
A Hybrid Data-Model Fusion Approach to Calibrate a Flush Air Data Sensing System
AIAA Ifotech@Aerospace - Aprl, Atlata, Georga AIAA -3347 A Hybrd Data-Model Fuso Approach to Calbrate a Flush Ar Data Sesg System Akur Srvastava Rce Uversty, Housto, Texas, 775 Adrew J. Meade Rce Uversty,
Maintenance Scheduling of Distribution System with Optimal Economy and Reliability
Egeerg, 203, 5, 4-8 http://dx.do.org/0.4236/eg.203.59b003 Publshed Ole September 203 (http://www.scrp.org/joural/eg) Mateace Schedulg of Dstrbuto System wth Optmal Ecoomy ad Relablty Syua Hog, Hafeg L,
Lecture 7. Norms and Condition Numbers
Lecture 7 Norms ad Codto Numbers To dscuss the errors umerca probems vovg vectors, t s usefu to empo orms. Vector Norm O a vector space V, a orm s a fucto from V to the set of o-egatve reas that obes three
n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.
UMEÅ UNIVERSITET Matematsk-statstska sttutoe Multvarat dataaalys för tekologer MSTB0 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multvarat dataaalys för tekologer B, 5 poäg.
In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
