AN FILTERING AGENT BASED ON SUPPORT VECTOR MACHINES

Size: px
Start display at page:

Download "AN E-MAIL FILTERING AGENT BASED ON SUPPORT VECTOR MACHINES"

Transcription

1 BULETINUL INSTITUTULUI POLITEHNIC DIN IAŞI Publcat de Unverstatea Tehncă Gheorghe Asach dn Iaş Tomul LVI (LX), Fasc. 3, 200 SecŃa AUTOMATICĂ ş CALCULATOARE AN FILTERING AGENT BASED ON SUPPORT VECTOR MACHINES BY CONSTANTIN LAZURCĂ and FLORIN LEON Abstract. E-mal flterng has recently become an mportant ssue due to the ncreasng popularty of the electronc mal communcaton. Therefore, there s a constant need to mprove the detecton of unsolcted messages, or spam. Many researchers have appled machne learnng technques for flterng spam messages, and they were proven to be successful. In ths paper we present a spam detecton agent based on support vector machnes (SVM), one of the best classfcaton methods avalable today. We test several methods of extractng numercal features from text documents, and assess the optmal values of SVM parameters needed for ths classfcaton problem. The best results show a very good classfcaton accuracy of 94%. Key words: e-mal flterng agent, support vector machnes, classfcaton Mathematcs Subject Classfcaton: 68T50, 68T42.. Introducton Snce ts early years, Internet was most popularly used as a means of communcaton. From the begnnng, users exchanged messages n a system smlar to the postal system. Thus, e-mal servce has developed, wth all the benefts of speed and flexblty that tradtonal mal does not have. Today s e-mal servces are avalable n a wde range of choces, from servers and clents usng protocols such as POP (Post Offce Protocol) and SMTP (Smple Mal Transfer Protocol) to complex e-mal servces that work drectly n the browser. One way to make proft from electronc mal s advertsng. Many companes are wllng to pay consderable amounts of money to ensure that an advertsng message on ther products reach a certan number of users. Often the recpent of these messages has not expressed a desre to receve them and ths practce regularly prevents the effcent use of hs/her e-mal servce. Brefly stated, unwanted messages are a source of proft for the senders and a source of frustraton for the recevers.

2 44 Constantn Lazurcă and Florn Leon Presently unwanted messages, or spam, total over 80% of all messages sent to the servers of an e-mal servce provder. Out of these, 8% spam messages contan nformaton on pharmaceutcal advertsng, % contan advertsng nformaton on onlne casnos and 2.3% are attempts to obtan confdental data such as bank detals or personal nformaton from the recpent. The majorty of all spam, approxmately 80% s automatcally sent by computers nfected wth vruses. Most owners of an nfected computer do not know that ther devce s part of a spammng network. Unwanted messages n 2007 caused damage of over a hundred bllon dollars. Gven ths consderable sum, e-mal servce provders and end users always look for new ways to detect and remove unwanted messages. New methods are contnuously beng developed to detect them, but spam senders regularly mprove ther technques to overcome the detecton methods. Ths creates a contnuous battle between spammers and researchers seekng new ways to protect users from the burden of spam. Spammers have quckly learned how to pass a flter based on content. Thus, parsng researchers reached the concluson that n order to be effcent, flters based on artfcal ntellgence have to be frequently updated wth new data. Sometmes even the updates were not frequent enough to prevent all unwanted messages, as senders found technques to hde ther content. Therefore, most flters based on artfcal ntellgence are accompaned by flters based on black-lsts or other methods. 2. Spam Detecton Spam research s a fascnatng subject taken separately, but t s equally nterestng how t relates to other areas. Most spam flters use at least one method of artfcal ntellgence. Spam research has revealed defcences n current artfcal ntellgence technologes and has helped to remedy them. 2.. Content-Based Flterng By the tme spam was becomng a major problem, Mcrosoft research dvson started work n 997 on developng methods of artfcal ntellgence that could be traned to flter spam [9]. In ths approach programs are provded examples of wanted e-mal and examples of spam e-mal. A learnng algorthm s then used to automatcally fnd features of wanted e-mal n comparson wth the characterstcs of spam messages. After the tranng process s complete an ncomng message can be classfed as havng a hgh probablty to be ham (a desred e-mal), a hgh probablty of beng spam, or a value n-between. The frst attempts were relatvely smple when matched up to today s technologes, usng the naïve Bayes method based on how often a word or other features appear n a spam message and n a message that s desred.

3 Bul. Inst. Polt. Iaş, t. LVI (LX), f. 3, Algorthms based on support vector machnes can reduce by half the amount of spam passng the flters based on naïve Bayes method. These learnng processes may requre repeated adjustments of the parameters that nfluence the results, but a flter based on over messages can now be traned n less than an hour due to the contnuous development of technology. Spam flterng s not the only benefcary of the development of tranng technques of artfcal ntellgences, but also motvates new, nterestng research. For example, AI algorthms are typcally desgned to maxmze accuracy (how often ther predctons are correct). But n practce, on spam flterng, fraud detecton, and many other problems, these systems are too conservatve. Only f the algorthm s almost certan that a message s spam, the message s to be classfed as spam. Ths problem has recently led researchers to develop specal methods of tranng for these partcular stuatons. A clever technque that can reduce spam by 20% or more, developed by Scott Yh at Mcrosoft research center, nvolves the creaton of two flters. The frst s traned to dentfy the dffcult cases, and the second s only traned to classfy these cases. By focusng on such cases the overall results are mproved [5] Flterng by Usng Sender s Address Flterng technques based on message content are sometmes too easly defeated because of the numerous ways to hde content. Thus, many researchers n the feld of spam flterng have focused on aspects of spam messages whch cannot be hdden, e.g. the sender of a message, dentfed by hs/her IP address, s the most mportant of them Secure Identty Numerous attempts have been made to ntroduce cryptographcally secure dentty to e-mal, ncludng standards such as PGP and S/MIME, but none has been wdely adopted. Identty s partcularly mportant n spam flterng. Almost all spam flters have some form of safe lsts, allowng users and admnstrators to dentfy senders whose e-mals can be trusted. Tradtonal cryptographc approaches to dentty securty ressted most attacks, but they were too dffcult to mplement for practcal reasons. 3. Support Vector Machnes Support vector machnes (SVM) are classfcaton systems that use hypotheses constructed n a multdmensonal space, drven by an optmzaton algorthm derved from statstcal learnng theory [3]. Ths learnng method developed by Boser, Guyon and Vapnk [] s based on well-defned mathematcal prncples. Shortly after t was ntroduced, t quckly overcame n performance the majorty of other systems such as the classcal multlayer

4 46 Constantn Lazurcă and Florn Leon perceptron neural networks n a wde range of applcatons. Support vector machnes have become popular because of the success they had n recognzng handwrtten dgts wth an error of only.% on the test set. Due to good expermental performance, they are consdered by many researchers as the best current method of classfcaton [3]. In the support vector machne classfcaton problem we are gven n l examples ( x, y),...,( x l, yl ), wth x R and y {,} for all. The goal s to fnd a hyperplane and threshold ( w, b) that separate the postve and negatve examples wth maxmum margn, also penalzng ponts nsde the margn based on a user-selected regularzaton parameter C > 0. The SVM classfcaton problem can be restated as fndng an optmal soluton to the followng quadratc programmng problem: () 2 l mn w + C,, 2 = ζ ζ w b y ( w x + b) ζ ζ 0 =,..., l Ths formulaton s motvated by the fact that mnmzng the norm of w s equvalent to maxmzng the margn; the goal of maxmzng the margn s n turn motvated by attempts to bound the generalzaton error va structural rsk mnmzaton [8]. Due to the very large number of tranng vectors and problem dmensons, t was found that a much more effcent way of solvng ths optmzaton problem was to address ts dual form: (2) max α W ( α) = s. t. 0 α C, =,..., m m = α y ( ) l = = 0 α 2 l, j= y ( ) y ( j) αα j x ( ), x ( j) Support vector machnes use a functon Φ to frst map the examples nto a hgher dmensonal space and then construct a separatng hyperplane there. The dea s to transform the data nto a new space where the data becomes lnearly separable. Then, usng the hyperplane as a decson functon, we can classfy unseen data based on whch sde of the hyperplane they le. Transformng data wth Φ can be expensve n hgh dmensonal spaces. Instead, an SVM employs a kernel functon K whch gves the dot product of the two examples n the hgher dmensonal space wthout actually

5 Bul. Inst. Polt. Iaş, t. LVI (LX), f. 3, transformng them nto that space. Ths noton dubbed as the kernel trck, allows us to perform the Φ transformaton for purposes of classfcaton to large dmensonal spaces. One ssue wth SVMs s fndng an approprate kernel for the gven data. Most research reles on a pror knowledge to select the correct kernel and then tweaks the kernel parameters va machne learnng or tral-and-error []. We can thus apply a kernel functon to the dual form of the optmzaton problem from Eq. 2, gvng: (3) max α s. t. W ( α) = 0 α C, =,..., m m = α y ( ) l = α = 0 2 l, j= y ( ) y ( j) αα K j ( ) ( j) ( x, x ) ( ) ( j) The choce of the kernel functon K ( x, x ) and the resultant feature space determnes the functonal form of the support vectors; thus, dfferent kernels produce dfferent levels of performance. Some commonly used kernels are [4], [5] and [7]: (4) Lnear: K ( x, y) = ( x y) (5) Polynomal: K ( x, y) = ( x y) 2 2 (6) Radal Bass Functon (RBF): K( x, y) = exp( x y /(2σ )) (7) Sgmod: K ( x, y) = tanh( x y+ θ ) Sequental Mnmal Optmzaton (SMO) s an algorthm for solvng large quadratc programmng (QP) optmzaton problems, wdely used for the tranng of support vector machnes. Frst developed by John C. Platt [7], SMO breaks up large QP problems nto a seres of smallest possble problems, whch are then solved analytcally [4]. 4. Case Studes Although support vector machnes are a very good classfcaton technque, they have to be confgured for each type of problem, f the user ants the best results. For every type of classfcaton problems t was necessary to develop a specalzed support vector machne wth a specalzed radal bass 2 functon kernel of the form: K( x, y) exp( x y ) d = γ and unque parameters.

6 48 Constantn Lazurcă and Florn Leon 4.. Methodology Tranng a support vector machne s a long process requrng a large amount of data and many experments to acheve optmal results. For ths applcaton we chose to tran the machne on data freely avalable from TREC [6], a project supported by the U.S. government, prmarly amed at buldng a set of text data as a benchmark for classfcaton systems. Gordon V. Cormack founded and coordnated a specal collecton to evaluate spam flters on real e-mal messages. More mportantly, t defnes standard measures and collectons for future tests. It s based on two collectons of e-mal: Synthetc, consstng of a collecton of publshed e-mal, combned wth a set of recent, carefully modfed spam messages. The collecton can be freely dstrbuted, and researchers can test ther flters on t; Prvate, where researchers submt ther code to testers whch run t on prvate collectons and return summary results only, thus mantanng confdentalty. The results n terms of dfferentatng spam e-mals from desred e-mal on the two collectons are smlar, suggestng that conclusve tests can be performed on the synthetc data set wthout affectng the valdty of the results. Therefore, we decded to use ths publc data set for tranng and testng the support vector machne. The current verson and the one used for the tests presented n ths paper s TREC 2007 and contans over messages. The documentaton provded by Chang and Ln [2] descrbes the crossvaldaton method for fndng optmal parameters for tranng a support vector machne. The method nvolves dvdng the data nto a number of subsets, usually fve, usng one part as test data, and the others as tranng data. The method terates over dfferent values of the C parameter, whch s used to balance the desre to obtan fewer errors on the tranng set and the desre to create a more general-purpose machne and the γ parameter, whch gves a measure of the nfluence a support vector has on the data space. Standard heurstcs for teratng over the values of these parameters are: For the γ parameter the values terated should be from 2-5 to 2 3 wth 2 2 as the teraton step; For the C parameter the values terated should be from 2-5 to 2 5 wth 2 2 as the teraton step. Thus, for each combnaton of values C and γ a tranng and an evaluaton phase are performed, computng the accuracy as the number of vectors correctly classfed relatve to the total number of vectors, expressed as a percentage value. After each tranng stage the resultng accuracy s compared wth the prevous values and the values for C and γ wth the hghest precson are saved. Once the parameters C and γ that gve the best results are determned, a fnal tranng cycle s run for the whole tranng data set, and a test set s then used to evaluate the generalzaton capabltes of the model.

7 Bul. Inst. Polt. Iaş, t. LVI (LX), f. 3, To perform experments wth the TREC 2007 data set, we needed a processng method to convert text data to numercal values relevant to the support vectors machne. The establshed method to extract numercal features from text s to use a dctonary of words and count the number of tmes these words appear n the document. The dctonary contanng the words that wll be counted s loaded, and the documents are dvded nto words. When a word from the dctonary s found, the correspondng components of a vector used to represent the document s ncremented. The result s passed to the support vector machne for classfcaton. In ths case, the dctonary of terms s the complete Englsh dctonary wth all possble forms of basc terms totalng over words. Thus, the resultng data belongs to a dmensonal space. The frst approach for classfcaton usng the TREC 2007 set used crossvaldaton wth classcal heurstcs on a relatvely small subset of data. We chose, for the frst experment, 000 legtmate messages and 000 spam messages for the tranng set, and the same numbers for the test set. After runnng the algorthm over all values for parameters usng the heurstcs suggested above, we acheved 00% accuracy for tranng data suggestng a perfect classfcaton, but runnng the traned classfer on the test data resulted n an accuracy of only 64%, whch suggested the over-fttng of the tran data and a loss of generalty. A frst fndng was that the γ parameter had a negatve mpact on classfcaton accuracy, a hgher value of ths parameter resulted n a lower accuracy of the classfcaton regardless of the value of C. Therefore, we decded that for the followng experments γ would have a very low value and terate only the values of parameter C. It was also notced that most classfcaton errors were caused by false-postves,.e. the classfcaton of legtmate messages as spam. False-negatve results, the wrong classfcaton of spam as legtmate e-mal, were also found. Ths error occurred wth very long text message that apparently determned the support vector machne to gnore the fragments of text that would normally dentfy a message as spam. The concluson drawn from these results was that there were too few tranng data to construct a suffcently general model. Therefore, we decded to ncrease the number of examples to 3000 legtmate messages and 3000 spam messages, for both the tranng set and the test set. New methods to extract the numercal features of the text were used, as suggested by Sculley and Wachman [0]. They rely on a dctonary of terms lke the word countng method, but nstead of usng a dctonary of complete words they use dctonares of word fragments of 3 or 4 characters. These dctonares have been bult from the complete Englsh dctonary by extractng fragments of 3 or 4 characters and addng them to the new dctonary. Thus, we obtaned words for the dctonary of 4-character words and 5776 terms for the dctonary of 3-character words. The vectors representng the documents are bult by parsng and countng the occurrences of words from the dctonary n the document. Ths s done smlarly as for the

8 50 Constantn Lazurcă and Florn Leon method n whch we use the complete Englsh dctonary. The document s dvded nto ts component words, but nstead of lookng up the word n the dctonary as n the classcal method, the word s dvded nto 3 or 4 character fragments whch wll be looked up n the dctonary. At frst glance ths method requres more tme because each word s dvded nto 3 or 4 fragments of all possble characters whch are then looked up n the dctonary. However, ths s not true. Although every word wth more than 3 or 4 characters s dvded nto all the possble fragments, the words wth fewer characters than those present n the dctonary are gnored, and thus the tme requred for the parsng stage remans about the same as the tme needed for the classcal method. Intutvely, we could thnk that the computng tme for a tranng method that uses 4-character dctonary words to be lower than the tme requred by the classcal method, and the tme requred for tranng usng a dctonary of 3-character words should be less than the tme requred n the 4- character case. However, the opposte s true. A very advantageous aspect n workng wth support vector machnes on data vectors n a space wth a large number of dmensons s that the values of many dmensons of a vector are 0, whch greatly reduces the computng tme. For example, when usng the complete dctonary of Englsh words, t has terms, therefore the number of dmensons s But an e-mal contans an average of 00 to 000 dfferent words, whch means that the vector representng the document has the 0 value on more than 75% of ts dmensons. By decreasng the number of terms, and mplctly the number of dmensons as t s the case when usng dctonares of 3 or 4 characters, one can greatly ncrease the number of dmensons wth values dfferent from 0 for the vector representng the document. More specfcally the method whch was found to have the shortest computng tme was the one that used the full Englsh dctonary, and the most neffcent n terms of computng tme was the method that uses the dctonary of 3-character terms Results For smplcty we use the followng conventon, the method that uses the complete Englsh dctonary wll be called Full Words, the method that uses a dctonary consstng of terms of 4 characters wll be called 4-Grams, and the method that uses a dctonary consstng of terms of 3 characters wll be called 3-Grams. For the new data set bult on the bass of TREC 2007 from 3000 spam messages and 3000 legtmate messages we used a seres of successve tranngs wth the tranng set and tests wth the test set usng several values for the C parameter. The frst method used was Full Words, and ts results are presented n Fg.. There were successve tranngs usng the full set of tranng data, ncrementng the C parameter from 50 to 950 wth a step of 50.

9 Bul. Inst. Polt. Iaş, t. LVI (LX), f. 3, ,93 0,925 0,92 Accuracy 0,95 0,9 0,905 0,9 0, C Fg. Results for the Full Words method on a data set of 6000 messages. The followng experment was conducted for the same data set usng the 4-Grams method ncrementng over values from 50 to 950 wth a step of 50 for the C parameter. The results are shown n Fg. 2. 0,905 0,9 0,895 0,89 Accuracy 0,885 0,88 0,875 0,87 0,865 0, C Fg. 2 Results for the 4-Grams method on a data set of 6000 messages. A fnal experment was performed usng the 3-Grams method on the same set of data, but ncrementng the value of C from 20 to 80 wth a step of 20. The results are shown n Fg. 3. From all these graphs t can be seen that n order to acheve a hgh classfcaton accuracy the value of the C parameter must be large. It means that the margn of the decson lmt of the support vector machne has to be flexble. It can also be seen that there s a rse n the classfcaton accuracy wth ncreasng the value of C, after whch the precson s constant and then a declne follows. It s less steep than the ntal growth, but notceable. In addton, there was a dfference n executon tmes correspondng to the machne that used the 3-feature extracton methods. Tranng took 0 mn on average usng the Full Words method, 2 mn usng the 4-Grams method and 5 mn wth 3-Grams method.

10 52 Constantn Lazurcă and Florn Leon 0,9 0,905 0,9 Accuracy 0,895 0,89 0,885 0,88 0, Fg. 3 Results for the 3-Grams method on a data set of 6000 messages. Snce we wanted to get better results, a hgher classfcaton accuracy and a more general model for the support vector machne, we decded to ncrease the tranng data set to 5000 legtmate messages and 5000 spam messages. The sze of the testng set was ncreased to the same values. The frst experment was performed wth Full Words method, coverng the range from 50 to 950 wth a step of 50 for the value of C. The results are shown n Fg. 4. 0,94 C 0,92 0,9 Accuracy 0,88 0,86 0,84 0, Fg. 4 Results for the Full Words method on a data set of 0000 messages. Fg. 4 shows a cap on a farly large range of C values wthout a sgnfcant decrease. Snce we antcpated smlar results for the 4-Grams method, we decded to decrease the step to 20 and based on the results of the experments carred out on the set of 6000 messages we decded to establsh the nterval between 20 and 800. The results are shown n Fg. 5. C

11 Bul. Inst. Polt. Iaş, t. LVI (LX), f. 3, ,94 0,92 0,9 Accuracy 0,88 0,86 0,84 0,82 0, Fg. 5 Results for the 4-Grams method on a data set of 0000 messages. We performed an experment on the same data set usng 3-Grams wth the same step for teratng over the C parameter, but consderng the sharp decrease observed n the experment presented n Fg. 3, we decded to lmt the maxmum value to 400. The results are shown n Fg. 6. Accuracy 0,9 0,9 0,89 0,88 0,87 0,86 0,85 0,84 0,83 0,82 0, Fg. 6 Results for the 3-Grams method on a data set of 0000 messages. Ths last set of experments also shows the cappng and slght decrease of the accuracy for large values of the C parameter. As n the experments conducted on the data set wth 6000 messages the method used for determnng the numercal characterstcs of a document based on the complete dctonary of Englsh words gves better results than the other methods. In addton, the average computng tme of a tranng cycle usng Full Words ncreased to 20 mn for the dataset wth 0000 messages, whle for method 4-Grams t ncreased to 30 mn and 40 mn for the 3-Grams method. The tests were made on a computer equpped wth an Athlon AM processor, dual core, each wth 2200 MHz frequency, and 2 GB of RAM. Gven the notceably better results obtaned usng the Full Words method and shorter computng tme for a tranng cycle when compared wth the C C

12 54 Constantn Lazurcă and Florn Leon other methods, we chose to use only ths method for the fnal experment. The tranng data set for the last experment was ncreased to 0000 legtmate messages and 0000 spam messages, the same changes beng made on the test set. The teratons for the C parameter were made n the nterval 50 to 2000 wth a step of 50. The results are shown n Fg. 7. 0,95 0,94 0,93 0,92 Accuracy 0,9 0,9 0,89 0,88 0,87 0, Fg. 7 Results for the Full Words method on a data set of messages. Ths experment also shows same cappng of the accuracy for large values of the C parameter but a slght declne after reachng the maxmum of 94% at 550. The computng tme for a tranng cycle wth the data set of messages was 45 mn on average. Therefore, the fnal model of the support vector machne for the spam detecton agent s the one traned on the dataset of messages wth the C parameter equal to 550. It has a very good accuracy and takng nto account the large amount of data that was used, t can be consdered general enough to be used n real applcatons. Table shows an overvew of the three methods used to buld the numercal characterstcs of text documents, for an easer comparson. Table Comparson of Results Obtaned wth Three Methods of Parsng the Messages of Data Sets wth Dfferent Szes Text processng method Accuracy for 6000 tranng nstances C Accuracy for 0000 tranng nstances Accuracy for tranng nstances Full Words Grams Grams The columns ndcate the total number of e-mals used for tranng and the lnes mark the methods used to buld the numercal characterstcs of the messages. It can clearly be seen that the Full Words method based on the

13 Bul. Inst. Polt. Iaş, t. LVI (LX), f. 3, complete dctonary of the Englsh language s better than other methods based on dctonares that use words formed of fragments of 3 or 4 characters. Another mportant aspect to be noted s that for the last experment the amount of tranng data was doubled, but the accuracy ncreased by only 0.5%. Therefore, we can say wth a hgh degree of certanty that further ncreasng the number of messages of the tranng set wll not sgnfcantly mprove the classfcaton accuracy. Table 2 presents the average executon tmes of a tranng cycle for the experments performed. Table 2 Comparson of Tme Requred for a Tranng Cycle wth Three Methods of Parsng the Messages and Tranng Sets of Dfferent Szes Text processng method Tranng tme for 6000 nstances Tranng tme for 0000 nstances Tranng tme for nstances Full Words Grams Grams Conclusons Support vector machnes are one of the best classfcaton methods currently avalable (arguably the best). They are successfully appled n ndustry and research n areas such as text classfcaton, handwrtten character dentfcaton and gene classfcaton based on proten sequences. SVMs have a sold mathematcal background and they can produce very good results f properly used. Ths artcle shows that SVM can be used to obtan good results n the constantly challengng feld of flterng unwanted messages, or spam. The results were very good wth all three presented methods, but the best was obtaned usng the complete dctonary of the Englsh language n terms of both classfcaton accuracy, 94%, and n terms of computng tme. A c k n o w l e d g e m e n t s. Ths work was supported by CNCSIS- UEFISCSU, project number PNII-IDEI 36/2008, Behavoural Patterns Lbrary for Intellgent Agents Used n Engneerng and Management. Receved: July 8, 200 Gheorghe Asach Techncal Unversty of Iaş, Department of Computer Engneerng e-mal: fleon@cs.tuas.ro R E F E R E N C E S. Boser B.E., Guyon I.M., Vapnk V.N., A Tranng Algorthm for Optmal Margn Classfers. Proc. of the Ffth Annual Workshop on Comput. Learnng Theory, COLT 92, New York, NY, USA, ACM Press, 44 52, Chang C.C., Ln C.J., LIBSVM - A Lbrary for Support Vector Machnes. ntu.edu. tw/~cjln/lbsvm, 200.

14 56 Constantn Lazurcă and Florn Leon 3. Crstann N., Shawe-Taylor J., An Introducton to Support Vector Machnes and Other Kernel-Based Learnng Methods. Cambrdge Unv. Press, Herbch R., Learnng Kernel Classfers: Theory and Algorthms. The MIT Press, Cambrdge, Khan F.M., Arnold M.G., Pottenger W.M., Hardware-Based Support Vector Machne Classfcaton n Logarthmc Number Systems. IEEE Internat. Symp. on Crcuts a. Syst., Vol. 5, , * * * Test Data for Text Retreval Conferences. Natonal Insttute of Standards and Technology, Platt J.C., Fast Tranng of Support Vector Machnes Usng Sequental Mnmal Optmzaton. Advances n Kernel Methods: Support Vector Learnng, MIT Press, Cambrdge, MA, USA, , Pontl M., Rfkn R., Evgenou T., From Regresson to Classfcaton n Support Vector Machnes. Europ. Symp. on Artfcal Neural Networks, Bruges, Belgum, (999). 9. Saham M., Dumas S., Heckerman D., Horvtz E., A Bayesan Approach to Flterng Junk E-mal. Learnng for Text Categorzaton, AAAI Workshop, AAAI Techn. Report WS-98-05, Madson, WI, Sculley D., Wachman G.M., Relaxed Onlne SVMs for Spam Flterng. Proc. of the 30th Ann. Internat. ACM SIGIR Conf. on Research and Develop. n Informat. Retreval, , Sullvan K., Luke S., Evolvng Kernels for Support Vector Machne Classfcaton. Proc. of the 9th Ann. Conf. on Genetc and Evolutonary, London, England, (2007). 2. Vapnk V.N., The Nature of Statstcal Learnng Theory. Sprnger Verlag, Vapnk V.N., Chervonenks A.Ya., Theory of Pattern Recognton: Statstcal Problems of Learnng. Moscow, Nauka, * * * Sequental Mnmal Optmzaton. The Free Encyclopeda, org/wk/sequental_mnmal_optmzaton, Yh W., Goodman J., Hulten G., Learnng at Low False Postve Rates. Proc. of the Thrd Conf. on E-mal and Ant-Spam, CEAS, Mountan Vew, CA, AGENT DE FILTRARE A MESAJELOR DE BAZAT PE MAŞINI CU VECTORI SUPORT (Rezumat) Fltrarea mesajelor de e-mal a devent în ultmul tmp o problemă mportantă datortă populartăń în contnuă creştere a comuncăr prn ntermedul poşte electronce. De aceea, exstă o nevoe constantă de a îmbunătăń detecńa mesajelor nesolctate, a spamulu. MulŃ cercetător au aplcat tehnc de învăńare automată pentru fltrarea mesajelor spam ar acestea s-au dovedt încununate de succes. În acest artcol se prezntă un agent de detecńe a spam-ulu bazat pe maşn cu vector suport (engl. support vector machnes, SVM), una dn cele ma bune metode de clasfcare dsponble în prezent. Se testează câteva metode de extragere a trăsăturlor numerce dn documentele text ş se evaluează valorle optme ale parametrlor SVM necesare pentru această problemă de clasfcare. Cele ma bune rezultate ndcă o precze foarte bună a clasfcăr, de 94%.

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Web Spam Detection Using Machine Learning in Specific Domain Features

Web Spam Detection Using Machine Learning in Specific Domain Features Journal of Informaton Assurance and Securty 3 (2008) 220-229 Web Spam Detecton Usng Machne Learnng n Specfc Doman Features Hassan Najadat 1, Ismal Hmed 2 Department of Computer Informaton Systems Faculty

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Support Vector Machine Model for Currency Crisis Discrimination. Arindam Chaudhuri 1. Abstract

Support Vector Machine Model for Currency Crisis Discrimination. Arindam Chaudhuri 1. Abstract Support Vector Machne Model for Currency Crss Dscrmnaton Arndam Chaudhur Abstract Support Vector Machne (SVM) s powerful classfcaton technque based on the dea of structural rsk mnmzaton. Use of kernel

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets Improved Mnng of Software Complexty Data on Evolutonary Fltered Tranng Sets VILI PODGORELEC Insttute of Informatcs, FERI Unversty of Marbor Smetanova ulca 17, SI-2000 Marbor SLOVENIA vl.podgorelec@un-mb.s

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

A spam filtering model based on immune mechanism

A spam filtering model based on immune mechanism Avalable onlne www.jocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):2533-2540 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A spam flterng model based on mmune mechansm Ya-png

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

Support vector domain description

Support vector domain description Pattern Recognton Letters 20 (1999) 1191±1199 www.elsever.nl/locate/patrec Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) 2127472, Fax: (370-5) 276 1380, Email: info@teltonika.

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) 2127472, Fax: (370-5) 276 1380, Email: info@teltonika. VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA ) February 17, 2011 Andrew J. Hatnay ahatnay@kmlaw.ca Dear Sr/Madam: Re: Re: Hollnger Canadan Publshng Holdngs Co. ( HCPH ) proceedng under the Companes Credtors Arrangement Act ( CCAA ) Update on CCAA Proceedngs

More information

LSSVM-ABC Algorithm for Stock Price prediction Osman Hegazy 1, Omar S. Soliman 2 and Mustafa Abdul Salam 3

LSSVM-ABC Algorithm for Stock Price prediction Osman Hegazy 1, Omar S. Soliman 2 and Mustafa Abdul Salam 3 LSSVM-ABC Algorthm for Stock Prce predcton Osman Hegazy 1, Omar S. Solman 2 and Mustafa Abdul Salam 3 1, 2 (Faculty of Computers and Informatcs, Caro Unversty, Egypt) 3 (Hgher echnologcal Insttute (H..I),

More information

Software project management with GAs

Software project management with GAs Informaton Scences 177 (27) 238 241 www.elsever.com/locate/ns Software project management wth GAs Enrque Alba *, J. Francsco Chcano Unversty of Málaga, Grupo GISUM, Departamento de Lenguajes y Cencas de

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Development of an intelligent system for tool wear monitoring applying neural networks

Development of an intelligent system for tool wear monitoring applying neural networks of Achevements n Materals and Manufacturng Engneerng VOLUME 14 ISSUE 1-2 January-February 2006 Development of an ntellgent system for tool wear montorng applyng neural networks A. Antć a, J. Hodolč a,

More information

IT09 - Identity Management Policy

IT09 - Identity Management Policy IT09 - Identty Management Polcy Introducton 1 The Unersty needs to manage dentty accounts for all users of the Unersty s electronc systems and ensure that users hae an approprate leel of access to these

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc. Paper 1837-2014 The Use of Analytcs for Clam Fraud Detecton Roosevelt C. Mosley, Jr., FCAS, MAAA Nck Kucera Pnnacle Actuaral Resources Inc., Bloomngton, IL ABSTRACT As t has been wdely reported n the nsurance

More information

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently. Corporate Polces & Procedures Human Resources - Document CPP216 Leave Management Frst Produced: Current Verson: Past Revsons: Revew Cycle: Apples From: 09/09/09 26/10/12 09/09/09 3 years Immedately Authorsaton:

More information

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech.,

More information

Using Series to Analyze Financial Situations: Present Value

Using Series to Analyze Financial Situations: Present Value 2.8 Usng Seres to Analyze Fnancal Stuatons: Present Value In the prevous secton, you learned how to calculate the amount, or future value, of an ordnary smple annuty. The amount s the sum of the accumulated

More information

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1. HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher

More information

Statistical Approach for Offline Handwritten Signature Verification

Statistical Approach for Offline Handwritten Signature Verification Journal of Computer Scence 4 (3): 181-185, 2008 ISSN 1549-3636 2008 Scence Publcatons Statstcal Approach for Offlne Handwrtten Sgnature Verfcaton 2 Debnath Bhattacharyya, 1 Samr Kumar Bandyopadhyay, 2

More information

Heuristic Static Load-Balancing Algorithm Applied to CESM

Heuristic Static Load-Balancing Algorithm Applied to CESM Heurstc Statc Load-Balancng Algorthm Appled to CESM 1 Yur Alexeev, 1 Sher Mckelson, 1 Sven Leyffer, 1 Robert Jacob, 2 Anthony Crag 1 Argonne Natonal Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439,

More information

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM BARRIOT Jean-Perre, SARRAILH Mchel BGI/CNES 18.av.E.Beln 31401 TOULOUSE Cedex 4 (France) Emal: jean-perre.barrot@cnes.fr 1/Introducton The

More information

Dynamic Pricing for Smart Grid with Reinforcement Learning

Dynamic Pricing for Smart Grid with Reinforcement Learning Dynamc Prcng for Smart Grd wth Renforcement Learnng Byung-Gook Km, Yu Zhang, Mhaela van der Schaar, and Jang-Won Lee Samsung Electroncs, Suwon, Korea Department of Electrcal Engneerng, UCLA, Los Angeles,

More information

Investigation of Normalization Techniques and Their Impact on a Recognition Rate in Handwritten Numeral Recognition

Investigation of Normalization Techniques and Their Impact on a Recognition Rate in Handwritten Numeral Recognition S C H E D A E I N F O R M A T I C A E VOLUME 19 010 Investgaton of Normalzaton Technques and Ther Impact on a Recognton Rate n Handwrtten Numeral Recognton WIESŁAW CHMIELNICKI 1, KATARZYNA STĄPOR 1 Faculty

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits Lnear Crcuts Analyss. Superposton, Theenn /Norton Equalent crcuts So far we hae explored tmendependent (resste) elements that are also lnear. A tmendependent elements s one for whch we can plot an / cure.

More information

SVM Tutorial: Classification, Regression, and Ranking

SVM Tutorial: Classification, Regression, and Ranking SVM Tutoral: Classfcaton, Regresson, and Rankng Hwanjo Yu and Sungchul Km 1 Introducton Support Vector Machnes(SVMs) have been extensvely researched n the data mnng and machne learnng communtes for the

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application Internatonal Journal of mart Grd and lean Energy Performance Analyss of Energy onsumpton of martphone Runnng Moble Hotspot Applcaton Yun on hung a chool of Electronc Engneerng, oongsl Unversty, 511 angdo-dong,

More information

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告 行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告 畫 類 別 : 個 別 型 計 畫 半 導 體 產 業 大 型 廠 房 之 設 施 規 劃 計 畫 編 號 :NSC 96-2628-E-009-026-MY3 執 行 期 間 : 2007 年 8 月 1 日 至 2010 年 7 月 31 日 計 畫 主 持 人 : 巫 木 誠 共 同

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign PAS: A Packet Accountng System to Lmt the Effects of DoS & DDoS Debsh Fesehaye & Klara Naherstedt Unversty of Illnos-Urbana Champagn DoS and DDoS DDoS attacks are ncreasng threats to our dgtal world. Exstng

More information

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems

Application of Multi-Agents for Fault Detection and Reconfiguration of Power Distribution Systems 1 Applcaton of Mult-Agents for Fault Detecton and Reconfguraton of Power Dstrbuton Systems K. Nareshkumar, Member, IEEE, M. A. Choudhry, Senor Member, IEEE, J. La, A. Felach, Senor Member, IEEE Abstract--The

More information

A Genetic Programming Based Stock Price Predictor together with Mean-Variance Based Sell/Buy Actions

A Genetic Programming Based Stock Price Predictor together with Mean-Variance Based Sell/Buy Actions Proceedngs of the World Congress on Engneerng 28 Vol II WCE 28, July 2-4, 28, London, U.K. A Genetc Programmng Based Stock Prce Predctor together wth Mean-Varance Based Sell/Buy Actons Ramn Rajaboun and

More information

LITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING

LITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING LITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING 1 MS. POOJA.P.VASANI, 2 MR. NISHANT.S. SANGHANI 1 M.Tech. [Software Systems] Student, Patel College of Scence and

More information

Damage detection in composite laminates using coin-tap method

Damage detection in composite laminates using coin-tap method Damage detecton n composte lamnates usng con-tap method S.J. Km Korea Aerospace Research Insttute, 45 Eoeun-Dong, Youseong-Gu, 35-333 Daejeon, Republc of Korea yaeln@kar.re.kr 45 The con-tap test has the

More information

Canon NTSC Help Desk Documentation

Canon NTSC Help Desk Documentation Canon NTSC Help Desk Documentaton READ THIS BEFORE PROCEEDING Before revewng ths documentaton, Canon Busness Solutons, Inc. ( CBS ) hereby refers you, the customer or customer s representatve or agent

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME, ISSUE, FEBRUARY ISSN 77-866 Logcal Development Of Vogel s Approxmaton Method (LD- An Approach To Fnd Basc Feasble Soluton Of Transportaton

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services When Network Effect Meets Congeston Effect: Leveragng Socal Servces for Wreless Servces aowen Gong School of Electrcal, Computer and Energy Engeerng Arzona State Unversty Tempe, AZ 8587, USA xgong9@asuedu

More information

An Efficient and Simplified Model for Forecasting using SRM

An Efficient and Simplified Model for Forecasting using SRM HAFIZ MUHAMMAD SHAHZAD ASIF*, MUHAMMAD FAISAL HAYAT*, AND TAUQIR AHMAD* RECEIVED ON 15.04.013 ACCEPTED ON 09.01.014 ABSTRACT Learnng form contnuous fnancal systems play a vtal role n enterprse operatons.

More information

Automated Network Performance Management and Monitoring via One-class Support Vector Machine

Automated Network Performance Management and Monitoring via One-class Support Vector Machine Automated Network Performance Management and Montorng va One-class Support Vector Machne R. Zhang, J. Jang, and S. Zhang Dgtal Meda & Systems Research Insttute, Unversty of Bradford, UK Abstract: In ths

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Overview of monitoring and evaluation

Overview of monitoring and evaluation 540 Toolkt to Combat Traffckng n Persons Tool 10.1 Overvew of montorng and evaluaton Overvew Ths tool brefly descrbes both montorng and evaluaton, and the dstncton between the two. What s montorng? Montorng

More information

RequIn, a tool for fast web traffic inference

RequIn, a tool for fast web traffic inference RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France Olver.aul@nt-evry.fr, Jean-Etenne.Kba@nt-evry.fr Abstract As networked

More information

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns A study on the ablty of Support Vector Regresson and Neural Networks to Forecast Basc Tme Seres Patterns Sven F. Crone, Jose Guajardo 2, and Rchard Weber 2 Lancaster Unversty, Department of Management

More information

SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME

SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME August 7 - August 12, 2006 n Baden-Baden, Germany SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME Vladmr Šmovć 1, and Vladmr Šmovć 2, PhD 1 Faculty of Electrcal Engneerng and Computng, Unska 3, 10000

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons

More information

Inter-Ing 2007. INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007.

Inter-Ing 2007. INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007. Inter-Ing 2007 INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007. UNCERTAINTY REGION SIMULATION FOR A SERIAL ROBOT STRUCTURE MARIUS SEBASTIAN

More information