3. INDUCTIVE LEARNING METHODS 3.1 Classifiers A classifier is a function that maps an input attribute vector,
|
|
|
- Chester Mathews
- 9 years ago
- Views:
Transcription
1 Inductve Leanng Algothms and Repesentatons fo Text Categozaton Susan Dumas Mcosoft Reseach One Mcosoft Way Redmond, WA John Platt Mcosoft Reseach One Mcosoft Way Redmond, WA Davd Heckeman Mcosoft Reseach One Mcosoft Way Redmond, WA Mehan Saham Compute Scence Depatment Stanfod Unvesty Stanfod, CA ABSTRACT Text categozaton the assgnment of natual language texts to one o moe pedefned categoes based on the content s an mpotant component n many nfomaton oganzaton and management tasks. We compae the effectveness of fve dffeent automatc leanng algothms fo text categozaton n tems of leanng speed, ealtme classfcaton speed, and classfcaton accuacy. We also examne tanng set sze, and altenatve document epesentatons. Vey accuate text classfes can be leaned automatcally fom tanng examples. Lnea Suppot Vecto Machnes (SVMs) ae patculaly pomsng because they ae vey accuate, quck to tan, and quck to evaluate. 1.1 Keywods Text categozaton, classfcaton, suppot vecto machnes, machne leanng, nfomaton management. 2. INTRODUCTION As the volume of nfomaton avalable on the Intenet and copoate ntanets contnues to ncease, thee s gowng nteest n helpng people bette fnd, flte, and manage these esouces. Text categozaton the assgnment of natual language texts to one o moe pedefned categoes based on the content s an mpotant component n many nfomaton oganzaton and management tasks. Its most wdespead applcaton to date has been fo assgnng subject categoes to documents to suppot text eteval, outng and flteng. Automatc text categozaton can play an mpotant ole n a wde vaety of moe flexble, dynamc and pesonalzed nfomaton management tasks as well: eal-tme sotng of emal o fles nto folde heaches; topc dentfcaton to suppot topc-specfc pocessng opeatons; stuctued seach and/o bowsng; o fndng documents that match long-tem standng nteests o moe dynamc task-based nteests. Classfcaton technologes should be able to suppot categoy stuctues that ae vey geneal, consstent acoss ndvduals, and elatvely statc (e.g., Dewey Decmal o Lbay of Congess classfcaton systems, Medcal Subject Headngs (MeSH), o Yahoo! s topc heachy), as well as those that ae moe dynamc and customzed to ndvdual nteests o tasks (e.g., emal about the CIKM confeence). In many contexts (Dewey, MeSH, Yahoo!, CybePatol), taned pofessonals ae employed to categoze new tems. Ths pocess s vey tme-consumng and costly, thus lmtng ts applcablty. Consequently thee s nceased nteest n developng technologes fo automatc text categozaton. Rule-based appoaches smla to those used n expet systems ae common (e.g., Hayes and
2 Wensten s CONSTRUE system fo classfyng Reutes news stoes, 1990), but they geneally eque manual constucton of the ules, make gd bnay decsons about categoy membeshp, and ae typcally dffcult to modfy. Anothe stategy s to use nductve leanng technques to automatcally constuct classfes usng labeled tanng data. Text classfcaton poses many challenges fo nductve leanng methods snce thee can be mllons of wod featues. The esultng classfes, howeve, have many advantages: they ae easy to constuct and update, they depend only on nfomaton that s easy fo people to povde (.e., examples of tems that ae n o out of categoes), they can be customzed to specfc categoes of nteest to ndvduals, and they allow uses to smoothly tadeoff pecson and ecall dependng on the task. A gowng numbe of statstcal classfcaton and machne leanng technques have been appled to text categozaton, ncludng multvaate egesson models (Fuh et al., 1991; Yang and Chute, 1994; Schütze et al., 1995), neaest neghbo classfes (Yang, 1994), pobablstc Bayesan models (Lews and Rnguette, 1994), decson tees (Lews and Rnguette, 1994), neual netwoks (Wene et al., 1995; Schütze et al., 1995), and symbolc ule leanng (Apte et al., 1994; Cohen and Snge, 1996). Moe ecently, Joachms (1998) has exploed the use of Suppot Vecto Machnes (SVMs) fo text classfcaton wth pomsng esults. In ths pape we descbe esults fom expements usng a collecton of hand-tagged fnancal newswe stoes fom Reutes. We use supevsed leanng methods to buld ou classfes, and evaluate the esultng models on new test cases. The focus of ou wok has been on compang the effectveness of dffeent nductve leanng algothms (Fnd Smla, Naïve Bayes, Bayesan Netwoks, Decson Tees, and Suppot Vecto Machnes) n tems of leanng speed, eal-tme classfcaton speed, and classfcaton accuacy. We also exploed altenatve document epesentatons (wods vs. syntactc phases, and bnay vs. non-bnay featues), and tanng set sze. 3. INDUCTIVE LEARNING METHODS 3.1 Classfes A classfe s a functon that maps an nput attbute vecto, x = ( x 1,x 2,x3,...xn), to a confdence that the nput belongs to a class that s, f ( x ) = confdence(class),. In the case of text classfcaton, the attbutes ae wods n the document and the classes coespond to text categoes (e.g., typcal Reutes categoes nclude acqustons, eanngs, nteest). Examples of classfes fo the Reutes categoy nteest nclude: f (nteest AND ate) OR (quately), then confdence(nteest categoy) = 0.9 confdence(nteest categoy) = 0.3*nteest + 0.4*ate + 0.7*quately Some of the classfes that we consde (decson tees, naïve-bayes classfe, and Bayes nets) ae pobablstc n the sense that confdence(class) s a pobablty dstbuton. 3.2 Inductve Leanng of Classfes Ou goal s to lean classfes lke these usng nductve leanng methods. In ths pape we compaed fve leanng methods: Fnd Smla (a vaant of Roccho s method fo elevance feedback) Decson Tees Naïve Bayes Bayes Nets Suppot Vecto Machnes (SVM) We descbe these dffeent models n detal n secton 2.4. All methods eque only on a small amount of labeled tanng data (.e., examples of tems n each categoy) as nput. Ths tanng data s used to lean paametes of the classfcaton model. In the testng o evaluaton phase, the effectveness of the model s tested on pevously unseen nstances. Leaned classfes ae easy to constuct and update. They eque only subject knowledge ( I know t when I see t ) and not pogammng o ule-wtng sklls. Inductvely leaned classfes make t easy fo uses to customze categoy defntons, whch s mpotant fo some applcatons. In addton, all the leanng methods we looked at povde gaded estmates of categoy membeshp allowng fo tadeoffs between pecson and ecall, dependng on the task. 3.3 Text Repesentaton and Featue Selecton Each document s epesented as a vecto of wods, as s typcally done n the popula vecto epesentaton fo nfomaton eteval (Salton & McGll, 1983). Fo the Fnd Smla algothm, tf*df tem weghts ae computed and all featues ae used. Fo the othe leanng algothms, the featue space s educed substantally (as descbed below) and only bnay featue values ae used a wod ethe occus o does not occu n a document. Fo easons of both effcency and effcacy, featue selecton s wdely used when applyng machne leanng methods to text categozaton. To educe the numbe of featues, we fst emove featues based on oveall fequency counts, and then select a small numbe of featues based on the ft to categoes. Yang and Pedesen (1997) compae a numbe of methods fo featue selecton. We used the mutual nfomaton measue. The mutual nfomaton MI(x, c) between a featue, x, and a categoy, c s defned as:
3 MI( x, c) = x { 0,1} c {, 1} P( x, c)log P( x, c) 0 P( x ) P( c) We select the k featues fo whch mutual nfomaton s lagest fo each categoy. These featues ae used as nput to the vaous nductve leanng algothms. Fo the SVM and decson-tee methods we used k=300, and fo the emanng methods we used k=50. We dd not goously exploe the optmum numbe of featues fo ths poblem, but these numbes povded good esults on a tanng valdaton set so they wee used fo testng. 3.4 Inductve Leanng of Classfes Fnd Smla Ou Fnd Smla method s a vaant of Roccho s method fo elevance feedback (Roccho,. 1971) whch s a popula method fo expandng use quees on the bass of elevance judgements. In Roccho s fomulaton, the weght assgned to a tem s a combnaton of ts weght n an ognal quey, and judged elevant and elevant documents. x j = α x x, j el q, j + β + γ n x, j non el N n The paametes α, β, and γ contol the elatve mpotance of the ognal quey vecto, the postve examples and the negatve examples. In the context of text classfcaton, thee s no ntal quey, so α=0. We also set γ=0 so we could easly use avalable code. Thus, fo ou Fnd Smla method the weght of each tem s smply the aveage (o centod) of ts weghts n postve nstances of the categoy. Thee s no explct eo mnmzaton nvolved n computng the Fnd Smla weghts. Thus, thee s no leanng tme so to speak, except fo takng the sum of weghts fom postve examples of each categoy. Test nstances ae classfed by compang them to the categoy centods usng the Jaccad smlaty measue. If the scoe exceeds a theshold, the tem s classfed as belongng to the categoy Decson Tees A decson tee was constucted fo each categoy usng the appoach descbed by Chckeng et al. (1997). The decson tees wee gown by ecusve geedy splttng, and splts wee chosen usng the Bayesan posteo pobablty of model stuctue. We used a stuctue po that penalzed each addtonal paamete wth pobablty 0.1, and deved paamete pos fom a po netwok as descbed n Chckeng et al. (1997) wth an equvalent sample sze of 10. A class pobablty athe than a bnay decson s etaned at each node Naïve Bayes A naïve-bayes classfe s constucted by usng the tanng data to estmate the pobablty of each categoy gven the document featue values of a new nstance. We use Bayes theoem to estmate the pobabltes: P( x C = ck ) P( C = ck ) P( C = ck x) = P( x) P x C = c ( k The quantty ) s often mpactcal to compute wthout smplfyng assumptons. Fo the Naïve Bayes classfe (Good, 1965), we assume that the featues X 1, X n ae condtonally ndependent, gven the categoy vaable C. Ths smplfes the computatons yeldng: P ( x C = c ) = P( x C = c ) k Despte the fact the assumpton of condtonal ndependence s geneally not tue fo wod appeaance n documents, the Naïve Bayes classfe s supsngly effectve Bayes Nets Moe ecently, thee has been nteest n leanng moe expessve Bayesan netwoks (Heckeman et al., 1995) as well as methods fo leanng netwoks specfcally fo classfcaton (Saham, 1996). Saham, fo example, allows fo a lmted fom of dependence between featue vaables, thus elaxng the vey estctve assumptons of the Naïve Bayes classfe. We used a 2-dependence Bayesan classfe that allows the pobablty of each featue x to be dectly nfluenced by the appeaance/non-appeaance of at most two othe featues Suppot Vecto Machnes (SVMs) Vapnk poposed Suppot Vecto Machnes (SVMs) n 1979 (Vapnk, 1995), but they have only ecently been ganng populaty n the leanng communty. In ts smplest lnea fom, an SVM s a hypeplane that sepaates a set of postve examples fom a set of negatve examples wth maxmum magn see Fgue 1. w Fgue 1 Lnea Suppot Vecto Machne The fomula fo the output of a lnea SVM s u = w x b, whee w s the nomal vecto to the hypeplane, and x s the nput vecto. In the lnea case, the magn s defned by the dstance of the hypeplane to the neaest of the postve and negatve k VXSSRUWYHFWRUV
4 examples. Maxmzng the magn can be expessed as an optmzaton poblem: y ( w x b) 1, mnmze 1 w 2 2 subject to whee x s the th tanng example and y s the coect output of the SVM fo the th tanng example. Of couse, not all poblems ae lnealy sepaable. Cotes and Vapnk (1995) poposed a modfcaton to the optmzaton fomulaton that allows, but penalzes, examples that fall on the wong sde of the decson bounday. Addtonal extensons to non-lnea classfes wee descbed by Bose et al. n SVMs have been shown to yeld good genealzaton pefomance on a wde vaety of classfcaton poblems, ncludng: handwtten chaacte ecognton (LeCun et al., 1995), face detecton (Osuna et al., 1997) and most ecently text categozaton (Joachms, 1998). We used the smplest lnea veson of the SVM because t povded good classfcaton accuacy, s fast to lean and fast fo classfyng new nstances. Tanng an SVM eques the soluton of a QP poblem Any quadatc pogammng (QP) optmzaton method can be used to lean the weghts, w, on the bass of tanng examples. Howeve, many QP methods can be vey slow fo lage poblems such as text categozaton. We used a new and vey fast method developed by Platt (1998) whch beaks the lage QP poblem down nto a sees of small QP poblems that can be solved analytcally. Addtonal mpovements can be ealzed because the tanng sets used fo text classfcaton ae spase and bnay. Once the weghts ae leaned, new tems ae classfed by computng w x whee w s the vecto of leaned weghts, and x s the bnay vecto epesentng the new document to classfy. Afte tanng the SVM, we ft a sgmod to the output of the SVM usng egulazed maxmum lkelhood fttng, so that the SVM can poduce posteo pobabltes that ae dectly compaable between categoes. 4. REUTERS DATA SET 4.1 Reutes (ModApte splt) We used the new veson of Reutes, the so-called Reutes collecton. (Ths collecton s publcly avalable at: We used the 12,902 stoes that had been classfed nto 118 categoes (e.g., copoate acqustons, eanngs, money maket, gan, and nteest). The stoes aveage about 200 wods n length. We followed the ModApte splt n whch 75% of the stoes (9603 stoes) ae used to buld classfes and the emanng 25% (3299 stoes) to test the accuacy of the esultng models n epoducng the manual categoy assgnments. The stoes ae splt tempoally, so the tanng tems all occu befoe the test tems. The mean numbe of categoes assgned to a stoy s 1.2, but many stoes ae not assgned to any of the 118 categoes, and some stoes ae assgned to 12 categoes. The numbe of stoes n each categoy vaed wdely as well, angng fom eanngs whch contans 3964 documents to casto-ol whch contans only one test document. Table 1 shows the ten most fequent categoes along wth the numbe of tanng and test examples n each. These 10 categoes account fo 75% of the tanng nstances, wth the emande dstbuted among the othe 108 categoes. &DWHJRU\Ã1DPH 1XPÃ7UDLQ 1XPÃ7HVW (DUQ $FTXLVLWLRQV 0RQH\I[ *UDLQ &UXGH 7UDGH,QWHUHVW 6KLS :KHDW &RUQ Table 1 Numbe of Tanng/Test Items 4.2 Summay of Inductve Leanng Pocess fo Reutes Fgue 2 summazes the pocess we use fo testng the vaous leanng algothms. Text fles ae pocessed usng Mcosoft s Index Seve. All featues ae saved along wth the tf*df weghts. We dstngushed between wods occung n the Ttle and Body of the stoes. Fo the Fnd Smla method, smlaty s computed between test examples and categoy centods usng all these featues. Fo all othe methods, we educe the featue space by elmnatng wods that appea n only a sngle document (hapax legomena), then selectng the k wods wth hghest mutual nfomaton wth each categoy. These k-element bnay featue vectos ae used as nput to fou dffeent leanng algothms. Fo SVMs and decson tees k=300, and fo the othe methods, k=50. Decson tee text fles wod counts pe fle data set Naïve Bayes Index Seve test classfe Featue selecton Bayes nets Fgue 2 Schematc of Leanng Pocess Fnd smla Leanng Methods Suppot vecto machne
5 A sepaate classfe s leaned fo each categoy. New nstances ae classfed by computng a scoe and compang the scoe wth a leaned theshold. New nstances exceedng the theshold ae sad to belong to the categoy. As aleady mentoned, all classfes output a gaded measue of categoy membeshp, so dffeent thesholds can be set to favo pecson o ecall dependng on the applcaton fo Reutes we optmzed the aveage of pecson and ecall (detals below). All model paametes and thesholds ae set to optmze pefomance on a valdaton set and ae not modfed dung testng. Fo Reutes, the tanng set contans 9603 stoes and the test set 3299 stoes. In ode to decde whch models to use we pefomed ntal expements on a subset of the tanng data, whch we subdvded nto 7147 tanng stoes and 2456 valdaton stoes fo ths pupose. We used ths to set the numbe of featues (k), decson thesholds and document epesentatons to use fo the fnal uns. We estmated paametes fo these chosen models usng the full 9603 tanng stoes and evaluated pefomance on the 3299 test tems. We dd not futhe optmze pefomance by tunng paametes to acheve optmal pefomance n the test set. 5. RESULTS 5.1 Tanng Tme Tanng tmes fo the 9603 tanng examples vay substantally acoss methods. We tested these algothms on a 266MHz Pentum II unnng Wndows NT. Unless othewse noted tmes ae fo the 10 lagest categoes, because they take longest to lean. Fnd Smla s the fastest leanng method (<1 CPU sec/categoy) because thee s no explct eo mnmzaton. The lnea SVM s the next fastest (<2 CPU secs/categoy). These ae both substantally faste than Naïve Bayes (8 CPU secs/categoy), Bayes Nets (~145 CPU secs/categoy) o Decson Tees (~70 CPU secs/categoy). In geneal, pefomng the mutual-nfomaton featue-extacton step takes much moe tme than any of the nductve leanng algothms. The lnea SVM wth SMO, fo example, takes an aveage of 0.26 CPU seconds to tan a categoy when aveaged ove all 118 Reutes categoes. The tanng speeds fo the SVM ae patculaly mpessve, snce tanng speed has been a bae to ts wde spead applcablty fo lage poblems. Platt s SMO algothm s oughly 30 tmes faste than the popula chunkng algothm on the Reutes data set (Vapnk, 1995). 5.2 Classfcaton Speed fo New Instances In many applcatons, t s mpotant to quckly classfy new nstances. All of the classfes we exploed ae vey fast n ths egad all eque less than 2 msec to detemne f a new document should be assgned to a patcula categoy. Fa moe tme s spent n pe-pocessng the text to extact even smple wods than s spent n categozaton. Wth the SVM model, fo example, we need only compute w x, whee w s the vecto of leaned weghts, and x s featue vecto fo the new nstance. Snce featues ae bnay, ths s just the sum of up to 300 numbes. 5.3 Classfcaton Accuacy Many evaluaton ctea fo classfcaton have been poposed. The most popula measues ae based on pecson and ecall. Pecson s the popoton of tems placed n the categoy that ae eally n the categoy, and Recall s the popoton of tems n the categoy that ae actually placed n the categoy. We epot the aveage of pecson and ecall (the so-called beakeven pont) fo compaablty to eale esults n text classfcaton. In addton, we plot pecson as a functon of ecall n ode to undestand the elatonshp among methods at dffeent ponts along ths cuve. Table 2 summazes mcoaveaged beak even pefomance fo the 5 dffeent leanng algothms fo the 10 most fequent categoes as well as the oveall scoe fo all 118 categoes. Suppot Vecto Machnes wee the most accuate method, aveagng 92% fo the 10 most fequent categoes and 87% ove all 118 categoes. Accuacy fo Decson Tees was 3.6% lowe, aveagng 88.4% fo the 10 most fequent categoes. Bayes Nets povded some pefomance mpovement ove Naïve Bayes as expected, but the advantages wee athe small. As has pevously been epoted, all the moe advanced leanng algothms ncease pefomance by 15-20% compaed wth Roccho- Fndsm NBayes BayesNets Tees LneaSVM ean 92.9% 95.9% 95.8% 97.8% 98.0% acq 64.7% 87.8% 88.3% 89.7% 93.6% money-fx 46.7% 56.6% 58.8% 66.2% 74.5% gan 67.5% 78.8% 81.4% 85.0% 94.6% cude 70.1% 79.5% 79.6% 85.0% 88.9% tade 65.1% 63.9% 69.0% 72.5% 75.9% nteest 63.4% 64.9% 71.3% 67.1% 77.7% shp 49.2% 85.4% 84.4% 74.2% 85.6% wheat 68.9% 69.7% 82.7% 92.5% 91.8% con 48.2% 65.3% 76.4% 91.8% 90.3% Avg Top % 81.5% 85.0% 88.4% 92.0% Avg All Cat 61.7% 75.2% 80.0% N/A 87.0% Table 2 Beakeven Pefomance fo 10 Lagest Categoes, and ove all 118 Categoes.
6 style quey expanson (Fnd Smla). Both SVMs and Decson Tees poduce vey hgh oveall classfcaton accuacy, and ae among the best known esults fo ths test collecton. Most pevous esults have used the olde Reutes collecton, so t s dffcult to compae pecsely, but 85% s the best mco-aveaged beakeven pont pevously epoted (Yang, 1997). Joachms (1998) used the new collecton, and ou SVM esults ae moe accuate (87% fo ou lnea SVM vs. 84.2% fo Joachms lnea SVM and 86.5% fo hs adal bass functon netwok wth gamma equals 0.8) and fa moe effcent fo both ntal model leanng and fo ealtme classfcaton of new nstances. It s also woth notng that Joachms chose optmal paametes based on the test data and used only the 90 categoes that have at least one tanng and test tem, and ou esults would mpove some f we dd the same. Apte, et al. (1998) have ecently epoted accuaces slghtly bette than ous (87.8%) fo a system wth 100 decson tees. The appoach nvolves leanng many decson tees usng an adaptve esamplng appoach (boostng) and s much moe complex to lean than ou one smple lnea classfe. The 92% beakeven pont (fo the top 10 categoes) coesponds oughly to 92% pecson at 92% ecall. Note, howeve, that the decson theshold can be vaed to poduce hghe pecson (at the cost of lowe ecall), o hghe ecall (at the cost of lowe pecson), as appopate fo dffeent applcatons. A use would be qute happy wth 92% pecson fo nfomaton dscovey tasks, but mght want addtonal human confmaton befoe deletng mpotant emal messages wth ths level of accuacy. Fgue 3 shows a epesentatve ROC cuve fo the categoy gan. The advantages of SVM can be seen ove the ente ecall-pecson space. Pecson LSVM Decson Tee Naïve Bayes Fnd Smla Re call Fgue 3 Pecson-Recall Cuve fo Categoy gan Although we have not conducted any fomal tests, the leaned classfes appea to be ntutvely easonable. Fo example, the SVM epesentaton fo the categoy nteest ncludes the wods pme (.70), ate (.67), nteest (.63), ates (.60), and dscount (.46) wth lage postve weghts, and the wods goup (-.24), yea (-.25), sees (-.33) wold (-.35), and dls (-.71) wth lage negatve weghts. 5.4 Othe Expements Sample Sze Fo an applcaton lke Reutes, t s easy to magne developng a lage tanng copus of the sot we woked wth (e.g., a few categoes had moe than 1000 postve tanng nstances). Fo othe applcatons, tanng data may be much hade to come by. Fo ths eason we examned how many postve tanng examples wee necessay to povde good genealzaton pefomance. We looked at pefomance fo the 10 most fequent categoes, vayng the numbe of postve nstances but keepng the negatve data the same. Fo the lnea SVM, usng 100% of the tanng data (7147 stoes), the mco-aveaged beakeven pont s 92%. Fo smalle tanng sets we took multple andom samples and epot the aveage scoe. Usng only 10% of the tanng sets data pefomance s 89.6%, wth a 5% sample 86.2%, and wth a 1% sample 72.6%. When we get down to a tanng set wth only 1% of the postve examples, most of the categoes have fewe than 5 tanng nstances esultng n somewhat unstable pefomance fo some categoes. In geneal, havng 20 o moe tanng nstances povdes stable genealzaton pefomance. Whle the numbe of examples needed pe categoy wll vay acoss applcaton, we fnd these esults encouagng. In addton, t s mpotant to note that n most categozaton scenaos, the dstbuton of nstances vaes temendously acoss categoes some categoes wll have hundeds o thousands of nstances, and othes only a few (a knd of Zpf s law fo categoy sze). In such cases, the most popula categoes wll quckly eceve the necessay numbe of tanng examples n the nomal couse of opeaton Smple wods vs. NLP-deved phases Fo all the esults epoted so fa, we smply used the default pe-pocessng povded by Mcosoft s Index Seve, esultng n sngle wods as ndex tems. We wanted to exploe how NLP analyses mght mpove classfcaton accuacy. Fo example, the phase nteest ate s moe pedctve of the Reutes categoy nteest than s ethe the wod nteest o ate. We used NLP analyses n a vey smply fashon to ad n the extacton of che phases fo ndexng accuacy (see Lews and Spack Jones, 1996 fo an ovevew of elated NLP ssues). We consdeed: factods (e.g., Salomon_Bothes_Intenatonal, Apl_8) mult-wod dctonay entes (e.g., New_Yok, nteest_ate) noun phases (e.g., fst_quate, modest_gowth)
7 As befoe, we used tf*df weghts fo Fnd Smla and the mutual nfomaton cteon fo selectng featues fo Naïve Bayes and SVM. Unfotunately, the NLP-deved phases dd not mpove classfcaton accuacy. Fo the SVM, the NLP featues actually educed pefomance on the 118 categoes by 0.2% Because of these ntal esults, we dd not ty the NLP-deved phases fo Decson Tees o the moe complex 2-dependence Bayesan netwok, o use NLP featues n any of the fnal evaluatons Bnay vs. 0/1/2 featues We also looked at whethe movng to a che epesentaton than bnay featues would mpove categozaton accuacy. To ths end, we consdeed a epesentaton that encoded wods as appeang 0,1, o >=2 tmes n each document. Intal esults usng ths epesentaton wth Decson Tee classfes dd not yeld mpoved pefomance, so we dd not pusue ths futhe. 6. SUMMARY Vey accuate text classfes can be leaned automatcally fom tanng examples, as othes have shown. The accuacy of ou smple lnea SVM s among the best epoted fo the Reutes collecton. In addton, the model s vey smple (300 bnay featues pe categoy), and Platt s SMO tanng method fo SVMs povdes a vey effcent method fo leanng the classfe at least 30 tmes faste than the chunkng method fo QP, and 35 tmes faste than the next most accuate classfe (Decson Tees) we examned. Classfcaton of new tems s fast as well snce we need only compute the sum of the leaned weghts fo featues n the test tems. We found that the smplest document epesentaton (usng ndvdual wods delmted by whte spaces wth no stemmng) was at least as good as epesentatons nvolvng moe complcated syntactc and mophologcal analyss. And, epesentng documents as bnay vectos of wods, chosen usng a mutual nfomaton cteon fo each categoy, was as good as fne-ganed codng (at least fo Decson Tees). Joachms (1998) wok s smla to ous n ts use of SVMs fo the pupose of text categozaton. Ou esults ae somewhat moe accuate than hs, but, moe mpotantly, based on a much smple and moe effcent model. Joachms best esults ae obtaned usng a non-lnea adal bass functon of 9962 eal-valued nput featues (based on the popula tf*df tem weghts). In contast, we use a sngle lnea functon of 300 bnay featues pe categoy. SVMs wok well because they ceate a classfe whch maxmzes the magn between postve and negatve examples. Othe algothms, such as boostng (Schape, et al., 1998), have been shown to maxmze magn and ae also vey effectve at text categozaton. We have also used SVMs fo categozng emal messages and Web pages wth esults compaable to those epoted hee -- SVMs ae the most accuate classfe and the fastest to tan. We hope to extend the text epesentaton models to nclude addtonal stuctual nfomaton about documents, as well as knowledge-based featues whch have been shown to povde substantal mpovements n classfcaton accuacy (Saham et al., 1998). Fnally, we wll look at extendng ths wok to automatcally classfy tems nto heachcal categoy stuctues. We beleve that nductve leanng methods lke the ones we have descbed can be used to suppot flexble, dynamc, and pesonalzed nfomaton access and management n a wde vaety of tasks. Lnea SVMs ae patculaly pomsng snce they ae both vey accuate and fast. 7. REFERENCES [1] Apte, C., Dameau, F. and Wess, S. Automated leanng of decson ules fo text categozaton. ACM Tansactons on Infomaton Systems, 12(3), , [2] Apte, C., Dameau, F. and Wess, S.. Text Mnng wth decson ules and decson tees. Poceedngs of the Confeence on Automated Leanng and Dscovey, CMU, June, [3] Bose, B. E., Guyon, I. M., and Vapnk, V., A Tanng Algothm fo Optmal Magn Classfes. Ffth Annual Wokshop on Computatonal Leanng Theoy, ACM, [4] Chckeng D., Heckeman D., and Meek, C. A Bayesan appoach fo leanng Bayesan netwoks wth local stuctue. In Poceedngs of Thteenth Confeence on Uncetanty n Atfcal Intellgence, [5] Cohen, W.W. and Snge, Y. Context-senstve leanng methods fo text categozaton In SIGIR 96: Poceedngs of the 19th Annual Intenatonal ACM SIGIR Confeence on Reseach and Development n Infomaton Reteval, , [6] Cotes, C., and Vapnk, V., Suppot vecto netwoks. Machne Leanng, 20, , [7] Fuh, N., Hatmanna, S., Lustg, G., Schwantne, M., and Tzeas, K. A/X A ule-based mult-stage ndexng system fo lage subject felds. In Poceedngs of RIAO 91, , [8] Good, I.J. The Estmaton of Pobabltes: An Essay on Moden Bayesan Methods. MIT Pess, [9] Hayes, P.J. and Wensten. S.P. CONSTRUE/TIS: A system fo content-based ndexng of a database of news stoes. In Second Annual Confeence on Innovatve Applcatons of Atfcal Intellgence, [10] Heckeman, D. Gege, D. and Chckeng, D.M. Leanng Bayesan netwoks: the combnaton of knowledge and statstcal data. Machne Leanng, 20, , 1995.
8 [11] Joachms, T. Text categozaton wth suppot vecto machnes: Leanng wth many elevant featues. In Poceedngs 10 th Euopean Confeence on Machne Leanng (ECML), Spnge Velag, [12] LeCun, Y., Jackel, L. D., Bottou, L., Cotes, C., Denke, J. S., Ducke, H., Guyon, I., Mulle, U. A., Sacknge, E., Smad, P. and Vapnk, V. Leanng algothms fo classfcaton: A compason on handwtten dgt ecognton. Neual Netwoks: The Statstcal Mechancs Pespectve, , [13] Lews, D.D.. An evaluaton of phasal and clusteed epesentatons on a text categozaton task. In SIGIR 92: Poceedngs of the 15 th Annual Intenatonal ACM SIGIR Confeence on Reseach and Development n Infomaton Reteval, 37-50, [14] Lews, D.D. and Hayes, P.J. (Eds.) ACM Tansactons on Infomaton Systems Specal Issue on Text Categozaton, 12(3), [15] Lews, D.D. and Rnguette, M.. A compason of two leanng algothms fo text categozaton. In Thd Annual Symposum on Document Analyss and Infomaton Reteval, 81-93, [16] Lews. D.D. and Spack Jones. K. Natual language pocessng fo nfomaton eteval. Communcatons of the ACM, 39(1), , Januay [17] Lews, D.D., Schape, R., Callan, J.P., and Papka, R. Tanng algothms fo lnea text classfes. In SIGIR '96: Poceedngs of the 19th Annual Intenatonal ACM SIGIR Confeence on Reseach and Development n Infomaton Reteval, , [18] Osuna, E., Feund, R., and Gos, F. Tanng suppot vecto machnes: An applcaton to face detecton. In Poceedngs of Compute Vson and Patten Recognton '97, , [19] Platt, J. Fast tanng of SVMs usng sequental mnmal optmzaton. To appea n: B. Scholkopf, C. Buges, and A. Smola (Eds.) Advances n Kenel Methods Suppot Vecto Leanng, MIT Pess, [20] Roccho, J.J. J. Relevance feedback n nfomaton eteval. In G.Salton (Ed.), The SMART Reteval System: Expements n Automatc Document Pocessng, Pentce Hall, [21] Saham, M. Leanng Lmted Dependence Bayesan Classfes. In KDD-96: Poceedngs of the Second Intenatonal Confeence on Knowledge Dscovey and Data Mnng, , AAAI Pess, [22] Saham, M., Dumas, S., Heckeman, D., Hovtz, E. A Bayesan appoach to flteng junk e-mal. AAAI 98 Wokshop on Text Categozaton, July [23] Salton, G. and McGll, M. Intoducton to Moden Infomaton Reteval. McGaw Hll, [24] Schape, R., Feund, Y., Batlett, P. and Lee, W. S. Boostng the magn: A new explanaton fo the effectveness of votng methods. Annals of Statstcs, to appea, [25] Schütze, H., Hull, D. and Pedesen, J.O. A compason of classfes and document epesentatons fo the outng poblem. In SIGIR 95: Poceedngs of the 18th Annual Intenatonal ACM SIGIR Confeence on Reseach and Development n Infomaton Reteval, , [26] Vapnk, V., The Natue of Statstcal Leanng Theoy, Spnge-Velag, [27] Wene E., Pedesen, J.O. and Wegend, A.S. A neual netwok appoach to topc spottng. In Poceedngs of the Fouth Annual Symposum on Document Analyss and Infomaton Reteval (SDAIR 95), [28] Yang, Y. Expet netwok: Effectve and effcent leanng fom human decsons n text categozaton and eteval. SIGIR '94: Poceedngs of the 17th Annual Intenatonal ACM SIGIR Confeence on Reseach and Development n Infomaton Reteval, 13-22, [29] Yang. Y. and Chute, C.G. An example-based mappng method fo text categozaton and eteval. ACM Tansactons on Infomaton Systems, 12(3), , [30] Yang, Y. and Pedesen, J.O. A compaatve study on featue selecton n text categozaton. In Machne Leanng: Poceedngs of the Fouteenth Intenatonal Confeence (ICML 97), , [31] Yang, Y. An evaluaton of statstcal appoaches to text categozaton. CMU Techncal Repot, CMU-CS , Apl [32] The Reutes collecton s avalable at:
Efficient Evolutionary Data Mining Algorithms Applied to the Insurance Fraud Prediction
Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 Effcent Evolutonay Data Mnng Algothms Appled to the Insuance Faud Pedcton Jenn-Long Lu, Chen-Lang Chen, and Hsng-Hu Yang Abstact
Additional File 1 - A model-based circular binary segmentation algorithm for the analysis of array CGH data
1 Addtonal Fle 1 - A model-based ccula bnay segmentaton algothm fo the analyss of aay CGH data Fang-Han Hsu 1, Hung-I H Chen, Mong-Hsun Tsa, Lang-Chuan La 5, Ch-Cheng Huang 1,6, Shh-Hsn Tu 6, Ec Y Chuang*
(Semi)Parametric Models vs Nonparametric Models
buay, 2003 Pobablty Models (Sem)Paametc Models vs Nonpaametc Models I defne paametc, sempaametc, and nonpaametc models n the two sample settng My defnton of sempaametc models s a lttle stonge than some
AREA COVERAGE SIMULATIONS FOR MILLIMETER POINT-TO-MULTIPOINT SYSTEMS USING STATISTICAL MODEL OF BUILDING BLOCKAGE
Radoengneeng Aea Coveage Smulatons fo Mllmete Pont-to-Multpont Systems Usng Buldng Blockage 43 Vol. 11, No. 4, Decembe AREA COVERAGE SIMULATIONS FOR MILLIMETER POINT-TO-MULTIPOINT SYSTEMS USING STATISTICAL
Bending Stresses for Simple Shapes
-6 Bendng Stesses fo Smple Sapes In bendng, te maxmum stess and amount of deflecton can be calculated n eac of te followng stuatons. Addtonal examples ae avalable n an engneeng andbook. Secton Modulus
An Algorithm For Factoring Integers
An Algothm Fo Factong Integes Yngpu Deng and Yanbn Pan Key Laboatoy of Mathematcs Mechanzaton, Academy of Mathematcs and Systems Scence, Chnese Academy of Scences, Bejng 100190, People s Republc of Chna
Perturbation Theory and Celestial Mechanics
Copyght 004 9 Petubaton Theoy and Celestal Mechancs In ths last chapte we shall sketch some aspects of petubaton theoy and descbe a few of ts applcatons to celestal mechancs. Petubaton theoy s a vey boad
Joint Virtual Machine and Bandwidth Allocation in Software Defined Network (SDN) and Cloud Computing Environments
IEEE ICC 2014 - Next-Geneaton Netwokng Symposum 1 Jont Vtual Machne and Bandwdth Allocaton n Softwae Defned Netwok (SDN) and Cloud Computng Envonments Jonathan Chase, Rakpong Kaewpuang, Wen Yonggang, and
A Novel Lightweight Algorithm for Secure Network Coding
A Novel Lghtweght Algothm fo Secue Netwok Codng A Novel Lghtweght Algothm fo Secue Netwok Codng State Key Laboatoy of Integated Sevce Netwoks, Xdan Unvesty, X an, Chna, E-mal: {wangxaoxao,wangmeguo}@mal.xdan.edu.cn
Statistical modelling of gambling probabilities
Ttle Statstcal modellng of gamblng pobabltes Autho(s) Lo, Su-yan, Vcto.; 老 瑞 欣 Ctaton Issued Date 992 URL http://hdl.handle.net/0722/3525 Rghts The autho etans all popetay ghts, (such as patent ghts) and
Keywords: Transportation network, Hazardous materials, Risk index, Routing, Network optimization.
IUST Intenatonal Jounal of Engneeng Scence, Vol. 19, No.3, 2008, Page 57-65 Chemcal & Cvl Engneeng, Specal Issue A ROUTING METHODOLOGY FOR HAARDOUS MATIALS TRANSPORTATION TO REDUCE THE RISK OF ROAD NETWORK
A New replenishment Policy in a Two-echelon Inventory System with Stochastic Demand
A ew eplenshment Polcy n a wo-echelon Inventoy System wth Stochastc Demand Rasoul Haj, Mohammadal Payesh eghab 2, Amand Babol 3,2 Industal Engneeng Dept, Shaf Unvesty of echnology, ehan, Ian ([email protected],
Electric Potential. otherwise to move the object from initial point i to final point f
PHY2061 Enched Physcs 2 Lectue Notes Electc Potental Electc Potental Dsclame: These lectue notes ae not meant to eplace the couse textbook. The content may be ncomplete. Some topcs may be unclea. These
TRUCK ROUTE PLANNING IN NON- STATIONARY STOCHASTIC NETWORKS WITH TIME-WINDOWS AT CUSTOMER LOCATIONS
TRUCK ROUTE PLANNING IN NON- STATIONARY STOCHASTIC NETWORKS WITH TIME-WINDOWS AT CUSTOMER LOCATIONS Hossen Jula α, Maged Dessouky β, and Petos Ioannou γ α School of Scence, Engneeng and Technology, Pennsylvana
PCA vs. Varimax rotation
PCA vs. Vamax otaton The goal of the otaton/tansfomaton n PCA s to maxmze the vaance of the new SNP (egensnp), whle mnmzng the vaance aound the egensnp. Theefoe the dffeence between the vaances captued
On the Efficiency of Equilibria in Generalized Second Price Auctions
On the Effcency of Equlba n Genealzed Second Pce Auctons Ioanns Caaganns Panagots Kanellopoulos Chstos Kaklamans Maa Kyopoulou Depatment of Compute Engneeng and Infomatcs Unvesty of Patas and RACTI, Geece
Gravitation. Definition of Weight Revisited. Newton s Law of Universal Gravitation. Newton s Law of Universal Gravitation. Gravitational Field
Defnton of Weght evsted Gavtaton The weght of an object on o above the eath s the gavtatonal foce that the eath exets on the object. The weght always ponts towad the cente of mass of the eath. On o above
Mixed Task Scheduling and Resource Allocation Problems
Task schedulng and esouce allocaton 1 Mxed Task Schedulng and Resouce Allocaton Poblems Mae-José Huguet 1,2 and Pee Lopez 1 1 LAAS-CNRS, 7 av. du Colonel Roche F-31077 Toulouse cedex 4, Fance {huguet,lopez}@laas.f
How a Global Inter-Country Input-Output Table with Processing Trade Account. Can be constructed from GTAP Database
How a lobal Inte-County Input-Output Table wth Pocessng Tade Account Can be constucted fom TAP Database Manos Tsgas and Zh Wang U.S. Intenatonal Tade Commsson* Mak ehlha U.S. Depatment of Inteo* (Pelmnay
A Mathematical Model for Selecting Third-Party Reverse Logistics Providers
A Mathematcal Model fo Selectng Thd-Pat Revese Logstcs Povdes Reza Fazpoo Saen Depatment of Industal Management, Facult of Management and Accountng, Islamc Azad Unvest - Kaaj Banch, Kaaj, Ian, P. O. Box:
Converting knowledge Into Practice
Conveting knowledge Into Pactice Boke Nightmae srs Tend Ride By Vladimi Ribakov Ceato of Pips Caie 20 of June 2010 2 0 1 0 C o p y i g h t s V l a d i m i R i b a k o v 1 Disclaime and Risk Wanings Tading
A New Estimation Model for Small Organic Software Project
8 03 ACADEMY PUBLISHER A New Estmaton Model o Small Oganc Sotwae Poject Wan-Jang Han, Tan-Bo Lu, and Xao-Yan Zhang School O Sotwae Engneeng, Bejng Unvesty o Posts and Telecommuncaton, Bejng, Chna Emal:
A Coverage Gap Filling Algorithm in Hybrid Sensor Network
A Coveage Ga Fllng Algothm n Hybd Senso Netwok Tan L, Yang Mnghua, Yu Chongchong, L Xuanya, Cheng Bn A Coveage Ga Fllng Algothm n Hybd Senso Netwok 1 Tan L, 2 Yang Mnghua, 3 Yu Chongchong, 4 L Xuanya,
The Can-Order Policy for One-Warehouse N-Retailer Inventory System: A Heuristic Approach
Atcle Te Can-Ode Polcy fo One-Waeouse N-Retale Inventoy ystem: A Heustc Appoac Vaapon Pukcanon, Paveena Caovaltongse, and Naagan Pumcus Depatment of Industal Engneeng, Faculty of Engneeng, Culalongkon
Statistical Discrimination or Prejudice? A Large Sample Field Experiment. Michael Ewens, Bryan Tomlin, and Liang Choon Wang.
Statstcal Dscmnaton o Pejudce? A Lage Sample Feld Expement Mchael Ewens, yan Tomln, and Lang Choon ang Abstact A model of acal dscmnaton povdes testable mplcatons fo two featues of statstcal dscmnatos:
Prejudice and the Economics of Discrimination
Pelmnay Pejudce and the Economcs of Dscmnaton Kewn Kof Chales Unvesty of Chcago and NB Jonathan Guyan Unvesty of Chcago GSB and NB Novembe 17, 2006 Abstact Ths pape e-examnes the ole of employe pejudce
SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:
SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and
Orbit dynamics and kinematics with full quaternions
bt dynamcs and knematcs wth full quatenons Davde Andes and Enco S. Canuto, Membe, IEEE Abstact Full quatenons consttute a compact notaton fo descbng the genec moton of a body n the space. ne of the most
Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
REAL TIME MONITORING OF DISTRIBUTION NETWORKS USING INTERNET BASED PMU. Akanksha Eknath Pachpinde
REAL TME MONTORNG OF DSTRBUTON NETWORKS USNG NTERNET BASED PMU by Akanksha Eknath Pachpnde A Thess submtted to the Faculty of the Gaduate School of the Unvesty at Buffalo, State Unvesty of New Yok n patal
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.
Ilona V. Tregub, ScD., Professor
Investment Potfolio Fomation fo the Pension Fund of Russia Ilona V. egub, ScD., Pofesso Mathematical Modeling of Economic Pocesses Depatment he Financial Univesity unde the Govenment of the Russian Fedeation
econstor zbw www.econstor.eu
econsto www.econsto.eu De Open-Access-Publkatonsseve de ZBW Lebnz-Infomatonszentum Wtschaft The Open Access Publcaton Seve of the ZBW Lebnz Infomaton Cente fo Economcs Babazadeh, Reza; Razm, Jafa; Ghods,
The Greedy Method. Introduction. 0/1 Knapsack Problem
The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton
Charging the Internet Without Bandwidth Reservation: An Overview and Bibliography of Mathematical Approaches
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 9, -xxx (2003) Chagng the Intenet Wthout Bandwdth Resevaton: An Ovevew and Bblogaphy of Mathematcal Appoaches IRISA-INRIA Campus Unvestae de Beauleu 35042
Reduced Pattern Training Based on Task Decomposition Using Pattern Distributor
> PNN05-P762 < Reduced Patten Taining Based on Task Decomposition Using Patten Distibuto Sheng-Uei Guan, Chunyu Bao, and TseNgee Neo Abstact Task Decomposition with Patten Distibuto (PD) is a new task
The transport performance evaluation system building of logistics enterprises
Jounal of Industial Engineeing and Management JIEM, 213 6(4): 194-114 Online ISSN: 213-953 Pint ISSN: 213-8423 http://dx.doi.og/1.3926/jiem.784 The tanspot pefomance evaluation system building of logistics
A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
Security of Full-State Keyed Sponge and Duplex: Applications to Authenticated Encryption
Secuty of Full-State Keyed Sponge and uplex: Applcatons to Authentcated Encypton Bat Mennnk 1 Reza Reyhantaba 2 aman Vzá 2 1 ept. Electcal Engneeng, ESAT/COSIC, KU Leuven, and Mnds, Belgum [email protected]
An Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
Simultaneous Detection and Estimation, False Alarm Prediction for a Continuous Family of Signals in Gaussian Noise
Sultaneous Detecton and Estaton, False Ala Pedcton fo a Contnuous Faly of Sgnals n Gaussan Nose D Mchael Mlde, Robet G Lndgen, and Mos M Bean Abstact New pobles ase when the standad theoy of jont detecton
An Introduction to Omega
An Intoduction to Omega Con Keating and William F. Shadwick These distibutions have the same mean and vaiance. Ae you indiffeent to thei isk-ewad chaacteistics? The Finance Development Cente 2002 1 Fom
Modeling and computing constrained
F EAURE A RICLE HE COMPUAION OF CONSRAINED DYNAMICAL SYSEMS: MACHING PHYSICAL MODELING WIH NUMERICAL MEHODS Reseaches have nvestgated modelng and computaton of constaned dynamcal systems, but scentsts
Support Vector Machines
Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.
4. SHAFT SENSORLESS FORCED DYNAMICS CONTROL OF RELUCTANCE SYNCHRONOUS MOTOR DRIVES
4. SHAFT SENSORLESS FORCED DYNAMICS CONTROL OF RELUCTANCE SYNCHRONOUS MOTOR DRIVES 4.. VECTOR CONTROLLED RELUCTANCE SYNCHRONOUS MOTOR DRIVES WITH PRESCRIBED CLOSED-LOOP SPEED DYNAMICS Abstact: A new spee
LINES ON BRIESKORN-PHAM SURFACES
LIN ON BRIKORN-PHAM URFAC GUANGFNG JIANG, MUTUO OKA, DUC TAI PHO, AND DIRK IRMA Abstact By usng toc modfcatons and a esult of Gonzalez-pnbeg and Lejeune- Jalabet, we answe the followng questons completely
Lecture 2: Single Layer Perceptrons Kevin Swingler
Lecture 2: Sngle Layer Perceptrons Kevn Sngler [email protected] Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses
est using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years.
9.2 Inteest Objectives 1. Undestand the simple inteest fomula. 2. Use the compound inteest fomula to find futue value. 3. Solve the compound inteest fomula fo diffeent unknowns, such as the pesent value,
Financing Terms in the EOQ Model
Financing Tems in the EOQ Model Habone W. Stuat, J. Columbia Business School New Yok, NY 1007 [email protected] August 6, 004 1 Intoduction This note discusses two tems that ae often omitted fom the standad
Department of Economics Working Paper Series
Depatment of Economcs Wokng Pape Sees Reputaton and Effcency: A Nonpaametc Assessment of Ameca s Top-Rated MBA Pogams Subhash C. Ray Unvesty of Connectcut Yongl Jeon Cental Mchgan Unvest Wokng Pape 23-3
Research on Risk Assessment of the Transformer Based on Life Cycle Cost
ntenational Jounal of Smat Gid and lean Enegy eseach on isk Assessment of the Tansfome Based on Life ycle ost Hui Zhou a, Guowei Wu a, Weiwei Pan a, Yunhe Hou b, hong Wang b * a Zhejiang Electic Powe opoation,
PREVENTIVE AND CORRECTIVE SECURITY MARKET MODEL
REVENTIVE AND CORRECTIVE SECURITY MARKET MODEL Al Ahmad-hat Rachd Cheaou and Omd Alzadeh Mousav Ecole olytechnque Fédéale de Lausanne Lausanne Swzeland [email protected] [email protected] [email protected]
Single and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,
AN EQUILIBRIUM ANALYSIS OF THE INSURANCE MARKET WITH VERTICAL DIFFERENTIATION
QUIIRIUM YI OF T IUR MRKT WIT VRTI IFFRTITIO Mahto Okua Faculty of conomcs, agasak Unvesty, 4-- Katafuch, agasak, 8508506, Japan [email protected] TRT ach nsuance poduct pe se s dentcal but the nsuance
I = Prt. = P(1+i) n. A = Pe rt
11 Chapte 6 Matheatcs of Fnance We wll look at the atheatcs of fnance. 6.1 Sple and Copound Inteest We wll look at two ways nteest calculated on oney. If pncpal pesent value) aount P nvested at nteest
Determinants of Borrowing Limits on Credit Cards Shubhasis Dey and Gene Mumy
Bank of Canada Banque du Canada Wokng Pape 2005-7 / Document de taval 2005-7 Detemnants of Boowng mts on Cedt Cads by Shubhass Dey and Gene Mumy ISSN 1192-5434 Pnted n Canada on ecycled pape Bank of Canada
Competitive Targeted Advertising with Price Discrimination
Compette Tageted Adetsng wth Pce Dscmnaton Rosa Banca Estees Unesdade do Mnho and NIPE [email protected] Joana Resende Faculdade de Economa, Unesdade do Poto and CEF.UP [email protected] Septembe 8, 205
Calculation of Sampling Weights
Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample
ITEM: PUBLIC HEARING ( PUBLISHED NOTICE) TO CONSIDER FISCAL YEAR 2016/ 17 ASSESSMENTS DISTRICT NO. 93-1 WOODFIELD PARK MAINTENANCE ZONE
I CITY MANAGER' S REPORT ITEM 5. 3 MAY 2, 2016 CITY COUNCIL MEETING ITEM: PUBLIC HEARING ( PUBLISHED NOTICE) TO CONSIDER FISCAL YEAR 2016/ 17 ASSESSMENTS FOR THE LANDSCAPE AND LIGHTING MAINTEMANCE DISTRICT
Distributed Computing and Big Data: Hadoop and MapReduce
Distibuted Computing and Big Data: Hadoop and Map Bill Keenan, Diecto Tey Heinze, Achitect Thomson Reutes Reseach & Development Agenda R&D Oveview Hadoop and Map Oveview Use Case: Clusteing Legal Documents
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
The Binomial Distribution
The Binomial Distibution A. It would be vey tedious if, evey time we had a slightly diffeent poblem, we had to detemine the pobability distibutions fom scatch. Luckily, thee ae enough similaities between
Spirotechnics! September 7, 2011. Amanda Zeringue, Michael Spannuth and Amanda Zeringue Dierential Geometry Project
Spiotechnics! Septembe 7, 2011 Amanda Zeingue, Michael Spannuth and Amanda Zeingue Dieential Geomety Poject 1 The Beginning The geneal consensus of ou goup began with one thought: Spiogaphs ae awesome.
Database Management Systems
Contents Database Management Systems (COP 5725) D. Makus Schneide Depatment of Compute & Infomation Science & Engineeing (CISE) Database Systems Reseach & Development Cente Couse Syllabus 1 Sping 2012
STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION
Page 1 STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION C. Alan Blaylock, Hendeson State Univesity ABSTRACT This pape pesents an intuitive appoach to deiving annuity fomulas fo classoom use and attempts
Continuous Compounding and Annualization
Continuous Compounding and Annualization Philip A. Viton Januay 11, 2006 Contents 1 Intoduction 1 2 Continuous Compounding 2 3 Pesent Value with Continuous Compounding 4 4 Annualization 5 5 A Special Poblem
REAL INTERPOLATION OF SOBOLEV SPACES
REAL INTERPOLATION OF SOBOLEV SPACES NADINE BADR Abstact We pove that W p s a eal ntepolaton space between W p and W p 2 fo p > and p < p < p 2 on some classes of manfolds and geneal metc spaces, whee
Statistics Norway Department of Economic Statistics
2004/3 Mach 2004 Documents Statstcs Noway Depatment of Economc Statstcs Jan Henk Wang Non-esponse n the Nowegan Busness Tendency Suvey Peface Ths pape descbes methods fo adjustng fo non-esponse set n the
SUPPORT VECTOR MACHINE FOR BANDWIDTH ANALYSIS OF SLOTTED MICROSTRIP ANTENNA
Intenational Jounal of Compute Science, Systems Engineeing and Infomation Technology, 4(), 20, pp. 67-7 SUPPORT VECTOR MACHIE FOR BADWIDTH AALYSIS OF SLOTTED MICROSTRIP ATEA Venmathi A.R. & Vanitha L.
L10: Linear discriminants analysis
L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss
International Business Cycles and Exchange Rates
Revew of Intenatonal Economcs, 7(4), 682 698, 999 Intenatonal Busness Cycles and Exchange Rates Chstan Zmmemann* Abstact Models of ntenatonal eal busness cycles ae not able to account fo the hgh volatlty
Drag force acting on a bubble in a cloud of compressible spherical bubbles at large Reynolds numbers
Euopean Jounal of Mechancs B/Fluds 24 2005 468 477 Dag foce actng on a bubble n a cloud of compessble sphecal bubbles at lage Reynolds numbes S.L. Gavlyuk a,b,,v.m.teshukov c a Laboatoe de Modélsaton en
IMPACT ANALYSIS OF A CELLULAR PHONE
4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng
2 r2 θ = r2 t. (3.59) The equal area law is the statement that the term in parentheses,
3.4. KEPLER S LAWS 145 3.4 Keple s laws You ae familia with the idea that one can solve some mechanics poblems using only consevation of enegy and (linea) momentum. Thus, some of what we see as objects
AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM
AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM Main Golub Faculty of Electical Engineeing and Computing, Univesity of Zageb Depatment of Electonics, Micoelectonics,
CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht [email protected] 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
Chapter 3 Savings, Present Value and Ricardian Equivalence
Chapte 3 Savings, Pesent Value and Ricadian Equivalence Chapte Oveview In the pevious chapte we studied the decision of households to supply hous to the labo maket. This decision was a static decision,
A Simple Approach to Clustering in Excel
A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa
Valuation of Floating Rate Bonds 1
Valuation of Floating Rate onds 1 Joge uz Lopez us 316: Deivative Secuities his note explains how to value plain vanilla floating ate bonds. he pupose of this note is to link the concepts that you leaned
Semipartial (Part) and Partial Correlation
Semipatial (Pat) and Patial Coelation his discussion boows heavily fom Applied Multiple egession/coelation Analysis fo the Behavioal Sciences, by Jacob and Paticia Cohen (975 edition; thee is also an updated
A PARTICLE-BASED LAGRANGIAN CFD TOOL FOR FREE-SURFACE SIMULATION
C A N A L D E E X P E R I E N C I A S H I D R O D I N Á M I C A S, E L P A R D O Publcacón núm. 194 A PARTICLE-BASED LAGRANGIAN CFD TOOL FOR FREE-SURFACE SIMULATION POR D. MUÑOZ V. GONZÁLEZ M. BLAIN J.
Bag-of-Words models. Lecture 9. Slides from: S. Lazebnik, A. Torralba, L. Fei-Fei, D. Lowe, C. Szurka
Bag-of-Words models Lecture 9 Sldes from: S. Lazebnk, A. Torralba, L. Fe-Fe, D. Lowe, C. Szurka Bag-of-features models Overvew: Bag-of-features models Orgns and motvaton Image representaton Dscrmnatve
Recurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
Forecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems
Episode 401: Newton s law of universal gravitation
Episode 401: Newton s law of univesal gavitation This episode intoduces Newton s law of univesal gavitation fo point masses, and fo spheical masses, and gets students pactising calculations of the foce
Figure 2. So it is very likely that the Babylonians attributed 60 units to each side of the hexagon. Its resulting perimeter would then be 360!
1. What ae angles? Last time, we looked at how the Geeks intepeted measument of lengths. Howeve, as fascinated as they wee with geomety, thee was a shape that was much moe enticing than any othe : the
High Availability Replication Strategy for Deduplication Storage System
Zhengda Zhou, Jingli Zhou College of Compute Science and Technology, Huazhong Univesity of Science and Technology, *, [email protected] [email protected] Abstact As the amount of digital data
Supplementary Material for EpiDiff
Supplementay Mateial fo EpiDiff Supplementay Text S1. Pocessing of aw chomatin modification data In ode to obtain the chomatin modification levels in each of the egions submitted by the use QDCMR module
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
What is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
A Probabilistic Theory of Coherence
A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want
