Filtering Junk

Size: px
Start display at page:

Download "Filtering Junk E-Mail:"

Transcription

1 Flterng Jun E-Mal: A Performance Comparson beteen Genetc Programmng & Naïve Bayes September Prepared by: Hooman Katra 4A Year Computer Engneerng Student Department of Electrcal & Computer Engneerng Unversty Of Waterloo Waterloo Ontaro Presented to: Dr. Dale. Schuurmans Professor of Computer Scence Logc Programmng & Artfcal Intellgence Group Unversty of Waterloo Waterloo Ontaro

2 Table of Contents 1. Summary Motvaton Spam s a maor problem The Need for a flter at the e-mal clent Problem To Be Solved Related Wor Approach Bacground The Naïve Bayes Approach to Text Classfcaton.. 6 The Generatve Naïve Bayes Model... 7 Tranng a Naïve Bayes Classfer Usng a Naïve Bayes Classfer The Genetc Programmng Approach The Ftness Functon Parse Trees: The Representaton of Solutons.. 13 Crossover Mutaton GP Summary Implementaton Issues The Tranng Set And The Test Set Feature Selecton Poor Performance due to lac of precson Learnng Issues Expermental Results Conclusons Recommendatons References Appendx 1: Table of Expenses

3 Table of Fgures Table 1: Contngency Table or Confuson Matrx... 6 Table 2: Numercal Operators Table 3: Word Operators Table 4: Recall and Precson for Naïve Bayes and Genetc Programmng 19 Table 5: E-mal Sgnatures Can Be Harvested as Useful Features. 20 Fgure 1: The Reducton of a Smple Tree Fgure 2: Representng The Order of Operatons n Parse Trees. 13 Fgure 3: A Smple Tree Contanng a Feature Detector.. 14 Fgure 4: The Crossover Operaton Fgure 5: The Mutaton Operaton

4 1. Summary Ths paper descrbes the applcaton of genetc programmng as a novel approach to the problem of flterng un e-mal. We benchmar our results aganst the common standard: the Naïve Bayes classfer. Whle the genetcally programmed classfer demonstrated a precson comparable to that of Naïve Bayes t as slghtly outperformed n recall. Snce both learnng methods gave smlar results t s recommended that a larger study be undertaen to ascertan hether these dfferences are ndeed statstcally sgnfcant. Further t s recommended that the performance of these classfers be tested n a rcher feature space more typcal of real-orld classfers. Although the genetcally programmng classfer greatly outperformed the Naïve Bayes classfer n speed t s concluded that a more effcent mplementaton of Naïve Bayes needs to be used n order to provde a far comparson. We sho that hen left unabated e-mal sgnatures also non as taglnes reduce the value of several mportant features n un e-mal detecton; hoever t s also shon that these e-mal sgnatures may be harvested as advantageous features f some of ther components are removed and noted as a feature. We therefore recommend that a better parser capable of meetng ths crtera be mplemented. To ad the reader n the theoretcal aspects of our or e have ncluded ntroductory bacground for both approaches ncludng a full dervaton of the generatve Naïve Bayes model. 1

5 2. Motvaton 2.1 Spam s a maor problem Unsolcted Bul E-mal (UBE) commonly non as un e-mal or spam s to most users merely a nusance an annoyng but sometmes unavodable realty of lfe n an ncreasngly nternet-centrc orld. As a parent concerned about pornographc materals landng n ther chldren s malbox and you ll get a very dfferent response. As the person ho must deal th such dstasteful messages on a daly bass and you ll fnd smlar concerns. In short spam s an unauthorzed ntruson nto a vrtual space the e-mal box rented by ctzens for ther on purposes. Spam s often dstasteful an AT&T study found that 11 percent of spam contaned adult content [3] but parents and users aren t the only ones concerned. UUNet one of the larger ISPs has a team of 6 people th an annual budget of USD $1 Mllon to combat spam [1]. Another ISP Netcom estmates that 10% of a customer s bll s devoted to fghtng spam [2]. Why are these ISPs spendng so much money to fght spam? Because n addton offendng users spam accounts for over 30% of the e-mal on maor ISPs such as AOL and Mndsprng [1]. One thrd of ISPs have reported system outages caused by spam [4]. Such hgh volumes of un e-mal clog netor banddth and consume large amounts of storage space n fles servers here many users may receve duplcates of the same message [10]. AOL n partcular has been hard ht because of ts publcly avalable e-mal drectory hch allos spammers to qucly gather a large number of e-mal addresses. Even large ISPs such as Pacfc Bell have experenced complete shutdons n servce due to spam [6]. A survey conducted by the respected research frm the Internatonal Data Corporaton (IDC) found that spam as raned by ISPs as ther number to problem [8]. There are other socal costs as ell. For example snce many UBE malers obtan ther e-mal addresses from Usenet nesgroups many people have become reluctant to post messages on publc forums reducng the vbrancy of the nternet nes communty. Brefly spam s a maor problem that needs to be brought under effectve control; hoever t s dffcult for governments to legslate on ths ssue snce the nternet s a global phenomenon. Thus users must tae the steps requred to protect themselves and ths often nvolves the applcaton of ntellgent agents or softbots to mae decsons on a human s behalf. It s th ths purpose to advance noledge n the means of contructng such softbots that ths study as undertaen. 2.2 The need for a flter at the e-mal clent There are many technques that can be used to flter un e-mal before t s deposted nto a user s e-mal account. For example messages contanng ell-non spam stes n ther relay paths are very lely to be spam themselves. Such messages should be automatcally fltered at the mal server level and never be delvered to the user s malbox. The focus of ths paper s the desgn of a flter to remove UBE hen these other methods have faled. Indeed methods that flter e-mal before they reach a user are 2

6 generally lmted n ther ablty to remove most of the spam receved 1. Snce e have assumed that all spam flters th 100% precson have already been employed at the router and mal-server levels t s then reasonable to assume that the flter at the e-mal clent level ll mae mstaes from tme-to-tme. Ths gves rse to some mportant ssues. Frstly users are hestant f not unllng to employ flters that can potentally remove legtmate e-mal especally f the flter has been gven the ablty to delete a message before a user s gven an opportunty to ve t. Users le to feel n control. Therefore nstead of automatcally deletng those messages classfed as spam an e-mal clent should n our opnon relocate those messages to a specal folder hch the user can chec occasonally to ensure that no legtmate e-mals ere msclassfed. Some researchers have proposed that messages th hgh spam ratngs should be fltered. For example n [10] Saham et. al propose that messages th a very hgh un confdence ratng should be automatcally removed. To reduce the chances of error Saham et. al proposed a hgh cutoff 99.9% but even at ths level they note that some mstaes ere made. Ths strengthens our argument that messages should not be deleted but moved to a dfferent folder here the user can rapdly delete them n tandem although t may be far more lely for a user to delete a legtmate message f t s moved to such a folder. Ths allevates the user s fear of mssng a legtmate and possbly mportant message. Furthermore e note that t s desrable to mae such a flter an automated one. Ths s because un e-mal not only taes tme to delete but ts sometmes contans offensve content (such as pornography) hch maes the cost of veng t greater than the tme needed to sort out the un [10]. Many e-mal pacages ncludng Mcrosoft Outloo allo users to manually create rules by hch they can detect and sort un e- mal. Ths alternatve to the automated approach s clearly nadequate to the tas of flterng un e-mal snce t assumes the user s savvy enough to create such rules and snce such rules ll be unable to adapt to the changes n the nature of UBE over tme [10]. Thus t s desrable for the flter to learn drectly from a user s mal repostory snce such a flter can automatcally adapt to the characterstcs of the user s un and legtmate e-mal [10]. 1 An example of such a flter ould nvolve reectng e-mals from ell-non spammng domans usng so-called real-tme spammng blaclsts hoever such flters are lmted n ther ablty to combat spam. Ths s because most UBE producers sho lttle regard for honest or normal busness practces [9]. To reduce the traceablty of ther bul e-mal to ts source many UBE senders falsfy the e-mal header ncludng the FROM feld mang an e-mal flter that reles on ths feld of lmted value. Spam producers are also non to abuse the relayng feature of the Smple Mal Transport Protocol (SMTP) the protocol used to transport nternet e-mal. Ths feature allos one mal server to relay messages to ntermedary mal servers. Spammers abuse ths feature by usng t to send masses of e-mal usng other people s servers thout ther noledge. Ths n effect offloads almost the entre cost of the bul malng to the vctm s mal server. Further f the recever attempts to trace the message bac to ts orgn t ll lead to the vctm s mal server and not to the spammer s doman. 3

7 3. Problem to be solved We no provde a more formal defnton of the problem to be solved. We sh to create an nformaton flter to autonomously classfy e-mals from an nput stream consstng of ncomng e-mals nto to output streams one representng the category belongng to Unsolcted Bul E-mal (UBE) and the other representng legtmate messages. We defne UBE as unsolcted messages advertsng solctng or advocatng a product servce eb ste vepont get-rch quc scheme or other fraudulent organzaton that ere sent n bul to many users and thout the pror consent of the recevers. 4. Related Wor Although there has been much or done n autonomous text categorzaton over the last fe decades only a small amount of or has gone nto the classfcaton of e-mal messages. Feer stll are the papers focussed on the autonomous dentfcaton of un e- mal. We lst the ones deemed most relevant here: 1. MAXIM (Lashar Metral & Maes 1993) s an e-mal based assstant that uses Machne Based Reasonng (MBR) to predct hether a user ould fle delete or read an e-mal although un e-mal s never specfcally addressed. 2. Mag (Payne 1994) a mal nterface agent that uses decson trees to model a user profle. Mag attempts to automatcally route ne messages to relevant folders. 3. RIPPER algorthm (Cohen 1996) Cohen suggests ne methods for automatcally learnng rules for classfyng e-mal nto categores; hoever he never specfcally addresses the category of un e-mal n hs paper. 4. Genetc Document Classfer (Clac C. & Farrngton J. & Ldell P. & Yu T. 1997) Ths as the frst publshed text classfer to use genetc programmng. It routed n-bound documents (ncludng e-mals) to a central classfer hch autonomously routed documents to nterested research groups thn a large organzaton. 5. Smoey (Spertus 1997) an e-mal assstant that detected flames an nternet slang term for hostle or angry messages usually n retalaton for some act or event such as an unelcome nesgroup postng. 6. Mcrosoft Outloo 98 A eyord based un e-mal flter as ntroduced n Outloo 98 beta but as later thdran follong legal concerns. 7. Bayesan Jun E-mal Flter (Saham M. & Dumas S. & Hecerman D. & Horvtz E. 1998) A un e-mal flter based on an enhanced naïve Bayes classfer. Recall and precson ere mproved hen phrases and header specfc nformaton ere added as features. 4

8 Our or dffers from the others n three mportant aspects: 1. Our or s specfcally focussed on the flterng of un e-mal 2. We use a novel approach (Genetc Programmng) based on the or of Clac et. al. 3. We present an emprcal comparson beteen generc programmng and naïve Bayes approach. 5. Approach We solve ths problem usng the novel approach of Genetc Programmng. Genetc Programmng has already been shon to be effectve n classfyng eb pages [27] and n general document classfcaton [23]. To provde a bass for comparson e have also solved ths problem usng a tradtonal Naïve Bayes classfer the most common classfer used n practce today. After delvng nto the theory behnd the Bayesan and Genetc approaches e then examne the specfcs of each mplementaton hch e follo th an analyss of the outcomes and some recommendatons. 6. Bacground 6.1 General Classfcaton Theory One may conceptually model a document classfer as a determnstc functon hch maps documents represented as sequences of ord events to categores. In our case e are concerned th a bnary classfcaton tas;.e. e sh to classfy e-mals nto one of to categores the dscrmnatng class.e un and the default class non-un. In flter termnology f a classfer classfes a document nto the dscrmnatng class the document s sad to have been accepted. Conversely f the document s classfed as non-un e say the document has been reected. A classfer s decson to accept or reect a document s based on the features of the document. In general text classfers only use a small subset of ords found n a doman as features. Ths lst of ords s called the classfer s vocabulary and t s sometmes also referred to as a dstngushed ords lst. In both the Naïve Bayes genetc programmng based classfers the features are the ndvdual frequences of the ords n the vocabulary. An deal flter ll accept all documents belongng to the dscrmnatng class and reect all others. In practce hoever a flter desgner s generally faced th the tradeoffs of recall and precson 5

9 hch are defned as follos. Classfer Accepted Classfer Reected Expert says yes a c Expert Says No b d Table 1: Contngency Table or Confuson Matrx: Each entry n the table represents the number of documents th the specfed outcome;.e. a s the number of tmes the classfer accepted a document that belonged to the dscrmnatng class. Recall ( a a c) Pr ecson ( a a b) (1) Recall s the percentage of documents n the dscrmnatng class that ere accepted. Precson s the percentage of accepted documents belongng to the dscrmnatng class. Many classfers also provde a confdence ratng hch asserts the degree to hch the document belongs to the dscrmnatng class. Often these confdence ratngs are expressed as a percentage here 100% represents a document that completely belongs to the dscrmnatng class and 0% represents a document that s completely rrelevant. Often nformaton flters are desgned to accept documents th a confdence above a certan threshold hle reectng documents belo that threshold. 6.2 The Naïve Bayes Approach to text classfcaton Theory We use a generatve probablstc model to explan the Naïve Bayes classfer. For those unfamlar th generatve probablty models e recommend the materals of Brendan Frey at Defnton of Terms To smplfy future explanaton e no defne some terms. A vocabulary V s an ordered collecton of ords.e. V={v 1 v 2 v 3. V}. Smlar to a human s vocabulary hch represents the ords hch a human understands the classfer s vocabulary represents the only ords the classfer ll use to determne a document s category.e. all ords n a document hch are not n the classfer' s vocabulary are gnored. We 6

10 7 )... ( ) ( T D c P D P ) ( )... ( T T D D c P c P ) ( ) ( ) ( 1 T T D c P D P c D P represent a document D as an ordered collecton of ord events rtten D={ D } here each represents a ord from the vocabulary. We rte to denote the th ord n document. A classfer s a machne n the mathematcal sense that determnstcally returns a class c n C={c 1 c 2 c 3 c}gven a partcular document D and a collecton of parameters T. The Generatve Naïve Bayes Model (adapted from [11]) To generate a document D e frst e pc the length for the document D and then generate a document based on ths length. Note ths means that e are assumng the document length to be ndependent of the category and that each ord s generated ndependent of the length. )... ( ) ( ) ( 2 1 D c P D P c C D P (1) Notce that e requre the generaton of each ord to depend on the ords that preceded t. Although ths s true n practce e no relax ths assumpton by assumng the standard Naïve Bayes assumpton.e. that (2) The reader may obect to the above assumpton notng that t requres each ord to be generated ndependent of ts context and that t further requres each ord to be generated ndependent of ts poston an assumpton certanly volated n practce. Computatonal lngusts have found hoever that ths model produces good results n classfyng text documents [11] [12] [13] [14] and detectng un e-mal mal [10]. Furthermore ths assumpton greatly reduces the number of parameters requred n the generatve model. Contnung e substtute (2) nto (1) to gve: (3)

11 Tranng a Naïve Bayes Classfer To tran a naïve Bayes classfer e must estmate the parameters of our model. These parameters are the class ord probabltes and the pror class probabltes rtten T { P( c T ) V c } (4) c C T c { P( c T ) c C} (5) here: and denotes the -th ord n the vocabulary V and c denotes the -th class n C. V 1 C P( c T ) 1 P ( C T ) 1. (6) 1 These parameters are respectvely estmated by: and ( ) N D D ˆ C T c P( C Tˆ) V (7) N( D ) t 1 D C t ˆ # of tranng documents n category T P(c T c ˆ) (8) D here N(xy) s the number of occurrences of ord x V n document y and D s the total number of tranng documents. The document length s not needed as a parameter for classfcaton snce e have assumed a unform dstrbuton for all classes. To elmnate zero probabltes n nfrequently occurrng ords e apply Laplacan smoothng hch eeps the sum of all ord probabltes thn a class as 1 hle elmnatng zero probabltes. Smoothng s necessary to prevent the product term n (3) from gong to zero every tme a gven document contans a vocabulary ord that dd not occur n the tranng data of the gven class c. We cannot exclude such terms from the product because dong so assumes a probablty of 1;.e. that the ord s omnpresent an assumpton that s completely defes our tranng data. We therefore employ smoothng as an ntutvely reasonable means of dealng th ths problem. 8

12 Applyng smoothng to equatons (7)(8) respectvely gves: and ( ) 1 N D D ˆ C T c P( W c Tˆ) V (9) V ) N( D t 1 D C t 1 P( C c D D ) ˆ T P( c T c ˆ) (10) C D here: N (xy) s the number of occurrences of ord x V n document y; C s the number of categores n C; D s the total number of tranng documents and P(C=c D=D ) {01}. That s P(C=c D=D )=1 f document belongs to category c otherse P(C=c D=D )=0. Usng a Naïve Bayes Classfer Usng a Naïve Bayes classfer nvolves calculatng P(C=c D ) here c s an element of C and D s the document e sh to classfy. To calculate ths quantty e use Bayes Rule for condtonal probabltes.e. P( C ˆ) P( D C ˆ) P( C C D D ˆ) (11) P( D ˆ) Substtutng (9) and (10) nto (11) gves: P( C C n 1 P( C n ˆ) D 1 D ˆ) P( 1 P( c ˆ) c n ) (12) To classfy a document nto a category e smply assgn the category for hch P(C D T) s maxmzed.e. Class of D arg max P ( C c D D ˆ) (13) If a probablty s not requred and f every document must be classfed e can ncorporate certan smplfcatons to ths model. Frst e can gnore the denomnator of the probablty n equaton (13) snce t s ndependent of and nstead of computng a large product n (12) e can nstead compute the log of the product. The latter s advantageous because t changes a product of many small numbers n the numerator often too small to be represented by hardare supported floatng pont representatons 9

13 nto a sum of reasonably szed numbers. Once ths sum has been calculated e can convert t bac to ts orgnal doman usng the nverse log functon. A smlar tactc can be used to compute the product terms n the denomnator. We cannot hoever ncorporate the frst smplfcaton snce e requre a probablty or confdence score for each document. The second smplfcaton the computng of the sum of logarthms n place of a product s hoever achevable. 6.3 The Genetc Programmng Approach Snce ts ncepton n 1992 by Koza Genetc Programmng (GP) has found many applcatons n the feld of machne learnng. Genetc Programmng s a subset of a larger famly of technques non as evolutonary computaton. In the evolutonary computaton paradgm the programmer does not explctly rte the program hch s sad to be the outcome of the evolutonary programmng process. Rather the programmer creates an evolutonary envronment heren the computer accordng to a process lad don by the programmer evolves the programs or hch are sad to be the outcome of the evolutonary process. Typcally ths process occurs th mnmal or no supervson hoever other confguratons hch accept more user feedbac are also possble. Genetc programmng s an evolutonary computaton paradgm here the programs are represented as trees and an evolutonary mechansm s accomplshed through to operators namely crossover and mutaton. The genetc programmng process as created as an attempt to model n a manner hoever crude the evolutonary process non n bology as natural selecton. Ths analogy to nature ll no be explored as a means of famlarzng the reader th the genetc programmng method. Once ths bg pcture has been establshed e ll proceed to fll n the detals. The Bg Pcture: A Comparson Beteen Nature and Genetc Programmng In Nature many dfferent speces compete to survve and reproduce. In GP dfferent programs representng possble solutons to a problem compete to survve and reproduce. In Nature the speces best adapted to ther envronment have the best chance to reproduce. Ths s often called the la of the survval of the fttest. In GP the best soluton to the problem e are tryng to solve s the soluton best adapted to the problem and therefore the most ft. The problem can be anythng from recognzng a face to fttng a curve or n the case of ths paper the classfcaton of a document. In Nature many dfferent males ll try to compete th one female (or vceversa). In GP ths analogy contnues through a process called tournament selecton here several programs ll compete th each other to mate th another program. In Nature the genes of the offsprng ll consst as a cross beteen ts mother and father. In GP ths happens too through the process non as crossover although n GP a chld may have more than to parents. In Nature genetc code can occur n a chld completely ndependent of ts parents. In GP ths s carred out through the evolutonary operator non as mutaton here mmedately after beng born a chld may receve code ndependent of that of ts parents. In Nature the parents gradually de off to be eventually replaced by ther chldren. In steady-state GP hch s the technque used n the classfer the parents alays gve brth to fraternal tns and they mmedately de after so dong. In Nature populatons become ncreasngly adapted to ther envronment 10

14 through evolutonary process of natural selecton. In GP the populaton taen as a hole becomes better and better at solvng the gven problem through the processes of crossover and mutaton. Eventually there ll be found an ndvdual soluton that meets some certan standard or some nd of crtera. Ths sngle soluton s sad to be the outcome of the genetc programmng process. Genetc programmng s best used hen no ell-non soluton to a problem exsts. It can only be attempted f some functon can be rtten to quanttatvely determne ho ell any soluton solves a gven problem. In the fe short years snce ts ncepton n 1992 t has demonstrated ts ablty to solve some dffcult engneerng problems sometmes evolvng better solutons then have been rtten by humans [14]. It s also mportant to dstngush the feld of genetc programmng from that of genetc algorthms. The dfference manly les n the fact that genetc programmng evolves actual programs represented as trees hereas genetc algorthms evolves bt strngs. The extra versatlty afforded by ths dfference needs no expoundng. The Ftness Functon In Nature some speces are better suted to ther envronment than others. The speces best adapted to ther envronment have the best chance to reproduce. Notce that ths mples that there s some ay of measurng ho ell any gven program solved the problem. Ths measurng devce s called the ftness functon. One type of ftness functon gves a score that ranges beteen zero and nfnty th zero representng an optmal soluton and larger numbers representng ncreasngly orse solutons. Thus the closer a canddate s soluton to zero the better t solves the problem and the larger ts ftness the orse the soluton. When the ftness s measured n ths ay t s called the standard ftness hch s ho e have represented ftness values n the genetc classfer. Ho the Ftness Functon s Calculated Calculatng the ftness functon for a partcular program alays nvolves fndng the error beteen the program s anser and some deal response. In the case of our classfer the deal response s a human s udgement as to hether or not a gven document s spam. We represent ths human response n numerc form by representng t as a percentage denotng the confdence that the e-mal s an Unsolcted Bul E-mal (UBE). Thus f the gven document s a pece of un e-mal the deal response s 100% and f an e-mal s a legtmate pece of e-mal.e. non-un the deal response s 0%. Thus for a partcular document the error s the dfference beteen the program s anser and the deal. To both exaggerate the error and force the error to be alays postve e square the error. Thus for a set of documents the ftness functon can be calculated as the sum of squared-errors over all documents; or n the language of mathematcs D 2 Ftness ( ) (14) 1 v a 11

15 Where D s the number of tranng documents v s the value returned by the classfer for the th document and a s equal to 100 for un documents and 0 for non-un documents. Ths ftness functon hoever encourages bas n the classfer toards the category hch comprses most of the tranng documents especally hen the tranng documents of one class far outnumber those of the others. In such a cases one ll often the most ft solutons durng the early stages of tranng to be those solutons that alays choose the category th the larger number of examples. Unfortunately snce these solutons ll be more ft then other members of the populaton they ll have a greater chance to mate. Often one ll fnd most of the populaton to be polluted th the genes of these ndvduals before any better solutons emerge. Ths greatly mpedes learnng. Therefore a ftness functon that balances the contrbutons beteen the to categores s more desrable. One such functon th ths property s the sum of the mean squared errors of each category.e. Ftness D D P( C Spam D )( v a) P( C Non Spam D )( v a ) (15) DSpam 1 DNon Spam 1 Where a {0100} s the correct anser for the th document v { 0 < v I < 100 } s the anser returned by the classfer for the th document D Spam s the number of un e-mal documents D Non Spam s the number of non-un e-mal documents P(C=unD I ) = 1 f document s a un e-mal document and 0 otherse. P(C= non un D I ) =1 f document s a non-un e-mal document and 0 otherse. Ths ftness functon as used after the ftness functon n (14) yelded dsappontng results. There are other ftness functons that can solve the gven problem. On possblty s usng Van Rbergen s E-Measure [25] snce t combnes the precson and recall n a sngle number a desrable property snce e sh to maxmze both. The E-measure accepts a sngle parameter E hch determnes the relatve eght put on recall and precson. 2 PR E ( 1) 1 (16) 2 P R Although desgned as a effectveness measure for nformaton retreval ths measure can be used as a standard ftness value snce t taes on a value of 0 n the deal case and ncreasngly larger values for ncreasngly orse solutons. Ths measure also satsfes our crtera f E s chosen to emphasze precson snce t ll not bas the learnng toards the category th the larger number of tranng examples. Although the use of ths ftness functon dd yeld some nterestng solutons ncludng one th 100% precson and 40% recall usng a parameter of 0.4 (.e. recall only 2/5 s as mportant as precson). The performance of equaton hoever (15) as better n obtanng solutons th both a hgh precson and recalls greater than 60%. Therefore only equaton (15) as ultmately used. 12

16 Parse Trees: The Representaton Scheme Of Solutons In genetc programmng each program n a populaton s represented by a tree. These trees are smlar to the parse trees used by complers to evaluate expressons. The tree structure used s as follos. The termnals or leaves of the tree consst solely of numercal constants or ords. The non-termnals are of to nds: ords operators and numercal operators. We lst both n tables 3 and 2 respectvely. Table 2: Numercal Operators Numercal Operators Type Symbols Arthmetc +-/* Relatonal =<>>=< Logcal AND OR NOT Non-Lnear Mn Max ABS Square Root Name Freq(x) Exsts(x) Word Operators Descrpton Returns the frequency of ord x n the document Returns 1 f ord x exsts and 0 otherse. Table 3: Word Operators The use of trees to represent expressons ll be llustrated by example. Suppose e desre to represent the expresson 2*8. In a parse tree ths ould loo le: hch reduces to Fg. 1: The reducton of a smple tree Every non-termnal reduces nto a sngle number and therefore a tree of such enttes also reduces to a sngle number. We use another example to demonstrate order of operatons. Fgure 2 shos a parse tree representaton of (5-3)*2*8. hch reduces to hch smplfes to Fg. 2: Representng order of operatons n parse trees 13

17 Ths sn t terrble exctng. The nterestng part happens hen feature detectors are ntroduced nto the trees. These feature detectors ntroduce nformaton about the current document nto the tree. An example of a feature detector mght be the number of tmes the ord classfer appears n a document. Fgure 3 represents a tree th a feature detector. Fg. 3: A Smple Tree Contanng a Feature Detector Ths smple tree returns the frequency of the ord classfer n the current document. Le the other trees shon thus far ths tree also reduces to a sngle number; hoever ths number ll vary accordng to the number of tmes the ord classfer appears n the gven document. Of course more complex trees can be constructed by ncorporatng both ord and numercal operators. It should also be noted that only ord operators can accept ords as nput and conversely only numercal operators can accept numbers as nput. It follos that all ords ll have ord operators as ther parents. Further follong Koza s recommendatons e have used closed dvson n place of standard dvson. In closed dvson a dvsor of zero ll not result n an error; rather a large value th the sgn of the dvdend s returned. For the ndetermnate value.e. 0/0 a zero s returned. Ths follos Koza s asserton that the operators n a genetc program should be able to accept all possble values hch ther descendents may generate. Crossover Crossover n the orld of genetc programmng s the equvalent of sexual reproducton. Fg. 4: The Crossover Operaton 14

18 It s the sngle most mportant evolutonary operaton n genetc programmng. In crossover to solutons are sexually combned to form a ne offsprng that s a hybrd of both parents. The parents are selected from the populaton through a process called tournament selecton hch s descrbed as follos. Frst a soluton s selected at random. Ths soluton represents the female. Then the genetc program chooses ten other solutons at random. These solutons represent the males. Of these ten the one th the best ftness s selected to mate th the female. Ths method smulates bologcal matng patterns n hch to or more members of the same sex compete to mate th a partcular member of the opposte sex. Once the parents have been selected the creaton of offsprng by crossover s accomplshed by randomly selectng a subtree n each parent and sappng them. Ths produces to chldren that contan code from both ther parents. Ths s shon n the above fgure here the to bolded subtrees n the parents are sapped to create to chldren. Mutaton Mutaton s also mportant feature n genetc programmng because t s the only ay that a chld can receve genetc code ndependent of ts parents. There are to types of mutaton n the classfer. In the frst type only a non-termnal can replace a non-termnal and n the second one subtree replaces another subtree. Fgure 5 belo demonstrates both. Fg. 5: The Mutaton Operaton 15

19 We note hoever that mutaton must result n a vald tree that s the tree must be reducble to a sngle number. Therefore a ord can only replace a ord and a number can only replace another number. Genetc Programmng Summary The genetc programmng process can be summarzed as follos: 1. A populaton of random solutons s generated usng some type of random tree generaton algorthm. 2. The bestftness varable s set to the hghest possble value 3. Untl a soluton s found to satsfy some predetermned stop crtera or untl a certan number of generatons have been completed the follong steps are repeated:. to parents are selected from the populaton of solutons usng tournament selecton and ther ftnesses are calculated... v. The solutons represented by these to parents are combned to form to ne solutons usng the operaton called crossover. The chldren may undergo the process of mutaton th a certan probablty. The ftness of each the to chldren s evaluated and compared to the bestftness varable. If the chld s ftness s loer than the current value stored n the bestftness varable t replaces the value n bestftness. v. If a parent s ftness s equal to bestftness t s ept; otherse t s removed from the populaton. In ths manner the best soluton n the populaton s never be lled. 4. The soluton th the best ftness s taen as the outcome of the genetc programmng cycle. If there are N members n the populaton at any gven tme a generaton s defned as N/2 teratons rounded up to the next nteger. 16

20 7.0 Implementaton Issues 7.1 The Tranng Set and the Test Set Intally e had a collecton of 972 unsolcted bul e-mal documents hch ere obtaned from a user ho had saved hs un e-mal over an approxmately to year perod. Approxmately one quarter of the spam documents duplcated others n the set and much or ent nto the detecton and removal of these duplcates. Often these duplcates only dffered n small aspects such as the tme n hch they ere sent or the mal-servers through hch the message as relayed. In some cases the messages only dffered by the addton of extra spaces or nelne characters. To elmnate such dfferences e removed the message headers usng a smple PERL scrpt. From ths pont onard fndng duplcates as a more straghtforard tas and a PERL scrpt as used to ths end resultng n the removal of 271 duplcates. After the removal of duplcates 701 spam documents remaned and the collecton of non-spam documents stood at 102 documents. We passed these documents through a seres of flters. The frst removed the HTML tags embedded n some messages. The second removed the 60 most common ords n the Englsh language a common practce n text learnng [10][23] snce t s felt that these ords occur too frequently to be of much dscrmnatng value. Thrd e appled stemmng a technque that attempts to reduce the many forms of a ord to ther root form. For example an deal stemmng algorthm ould convert the ords runs runnng and ran to run. We used Porter s stemmng algorthm as mplemented by Fraes n [24] due to ts smplcty and hgh executon speed. We dd not use other more sophstcated stemmng algorthms due to ther prohbtve tme and computng costs. Porter s algorthm s a fast and effcent algorthm hch taes only a fracton of a second to complete. In our experence other more sophstcated stemmng algorthms ould tae seven to eght seconds to complete on our test machne a 466 MHz Intel Celeron processor th 256MB of RAM. Clearly such delays are unacceptable to a user hose machne ould be ted up for more than three mnutes only to read 18 peces of mal. After passng our documents through these many flters e splt our documents nto a tranng set consstng of 671 spam documents and 72 non-spam documents and a test set of 30 spam documents and 30 non-spam documents. Although the actual occurrence of legtmate messages s far more frequent for a user then the occurrence of spam e thought t expedent to ncrease the number of spam documents to an equal footng to facltate more accurate percentages of UBE recall and precson. The alternatve gven our small number of samples ould requre that our recall and precson be calculated from a test set contanng only a fe un-e-mal messages. 7.2 Feature Selecton and The need to reduce the classfer s vocabulary Smlar to a human beng s vocabulary hch conssts of the ords hch a human understands a classfer s vocabulary represents the only ords hch a classfer ll use to determne the class of a document. We created the classfer s vocabulary by frst rtng a PERL scrpt to create a separate ord lst for the un and non-un categores. Each ord lst contaned the frequency of each ord and the number of documents n hch each ord occurred. The ntal count of ords yelded over

21 unque ords over both classes; some ords consstng solely of punctuaton. We further notced that our set folloed a Zpfan dstrbuton 2 a common occurrence n document corpora. Furthermore accordng to [21] t s a conventonal rule of thumb n pattern recognton practce to use fve to ten tmes as many tranng samples as features for each class n order to estmate the probablty dstrbutons. Snce thout feature selecton e ould requre a hoopng *2*5 = 1.2 mllon tranng documents t becomes pertnent to reduce our features as much as possble to ncrease the speed of learnng and to reduce the requred number of tranng samples. Although e dd not perform an exhaustve study e found 550 features to be adequate. Ths s consstent th many other practtoners or text learnng ho have found that feer features often yeld better performance n text-learnng domans [11] [17] [18] ncludng Saham et. al [10] ho used 500 features for classfyng un e-mal [10] and Mladenc ho found systems that only used 1-3% of the total ords n a category demonstrated lttle or no loss n performance [20]. Despte papers such [16] hch support the clam that document frequency s faster and generally as effectve as other feature selecton technques n text doman and despte the author s past experence th eb page classfcaton that seemed to support ths asserton the author notced very poor results (approxmately 50% precson) hen document frequency as used as a feature selecton crteron. Another common feature selecton crtera for text categorzaton s mutual nformaton th the class varable [10][11][17][18]. It s calculated as follos: MI( X C) X P( X C) P( X C)log P( X ) P( ) x C c C When mutual nformaton as used as the feature selecton crteron the results ere found to be much better and these results are those reported n ths paper. 7.3 Poor Performance caused by lac of precson After notng poor classfcaton performance n the Naïve Bayes classfer a closer examnaton shoed that product terms n equaton (12) ere sometmes stored as zero. Closer nspecton revealed that ths happened more often for longer documents. We later realzed that the source of the problem as that the product terms n equaton (12) ere so small that they ranged beyond the precson of the double precson floatng pont afforded by our C++ compler. Snce longer documents typcally had more vocabulary ords than shorter ones the product terms ent to zero more frequently for longer documents. To remedy ths problem an nfnte precson floatng pont pacage as used greatly ncreasng classfer performance although at the expense of tme complexty. In some cases the computaton tmes ere ncreased from a fe seconds to mnutes. In hndsght a better mplementaton ould have been one that computed the log 2 Zpf' s la named after George Zpf ts dscoverer s an observaton about the relatonshp beteen the ranng of the frequency of an event and the frequency tself. The means that the number of tmes the second most commonly-used ord occurred s approxmately 1/2 the number of tmes the most popular ord s used; the number of tmes the thrd most popular ord s used s close to 1/3 the the number of tmes the most popular ord s used and so on. In general f the most popular ord appears N tmes the th most popular ord appears approxmately (1/) * N tmes [22]. 18

22 of equaton (12). The advantage posed by a log-based approach s that t converts a large product of small numbers nto a sum of reasonably szed numbers. Ths crcumvents the need for an nfnte precson pacage and logs can be computed th lttle performance penalty hen a looup table s used. Once the sum of the logs have been computed they can easly be transformed bac to ther orgnal doman usng the nverse log functon. Thus a probablty can stll be computed. 8. Learnng Issues Stoppng Crteron and the Preventon of Overfttng Many teratve learnng algorthms suffer from the problem of overfttng and genetc programmng s no excepton. In most data sets there ll be nose and t s mportant for the learnng algorthm to generalze based on the sgnal and not the nose. To prevent overfttng n the genetc classfer durng tranng e cross-valdated the best member of the populaton after each generaton usng the recall and precson over the test set. Ths alloed the operator to observe the generalzablty of the decson rule. When these measures began to declne e stopped the genetc classfer s tranng. 9. Expermental Results Snce Genetc Programmng s not a determnstc process ts output ll vary from run to run. Thus e have provded the mean and standard devaton over sx runs th one run beng dscarded snce t as a degenerate case here learnng dd not occur. We labeled the GP run th the hghest combned sum of recall and precson as the best run. The GP results ere produced usng a populaton of 300 trees that ere ntally generated usng an algorthm that randomly generated trees th depths greater than 6. We obtaned excellent results usng a tournament sze of 10 and poor results usng a value of 5. Naïve Bayes GP Best GP Mean GP Std. Dev Jun Recall 76.67% 70.0% Jun Precson 95.83% 95.45% Table 4: Recall and Precson for Naïve Bayes and Genetc Programmng 10. Conclusons Although the precson of the best genetc programmed classfer as comparable to that of the Naïve Bayes classfer ts recall traled the Naïve Bayes classfer by 6.67%. Nevertheless e have shon that t s possble to construct a genetcally programmed classfer th reasonable performance. In fact the to performance measures are so close that e cannot be sure that the dfferences are statstcally sgnfcant. 19

23 11. Recommendatons Although both classfers demonstrated smlar results the Naïve Bayes outperformed the genetcally programmed classfer n recall by 6.67%. It s recommended that more data sets be produced to both ascertan that these dfferences are statstcally sgnfcant and to demonstrate ho ell these to learnng methods scale to larger numbers of tranng documents. Although the to classfers ere not explctly tmed the genetc classfer s computaton tme for a document as qualtatvely nstantaneous hle the Naïve Bayes classfer too a consderable amount of tme (sometmes over 40 seconds on a 466 Mhz Pentum) to output a classfcaton decson. Ths can be attrbuted to our use of a hgh precson math lbrary to deal th the very small numbers n the product term of equaton (12). We cannot hoever regarded ths as a far comparson of classfcaton tme snce mplementatons of the Naïve Bayes algorthm exst hch are much faster than the one used. It s therefore recommended that a Naïve Bayes classfer based on the sums of logs be mplemented to facltate a more far comparson of classfcaton tme. We have restrcted the features n the Genetc Programmng code to closely match those made avalable to the Naïve Bayesan classfer to provde a far bass of comparson beteen these to learnng methods. Le naïve Bayes genetc programmng can be extended to ncorporate many other features beyond ungrams (sngle ords) to n- grams (phrases or syllables of ords). Often. the nners of TREC an annual text retreval competton s a varant of Naïve Bayes extended to ncorporate selected n- grams and other features. It s therefore recommended that more or be done to compare the performance of a genetc classfer aganst a Bayesan classfer n a rcher feature space that ncorporates features such as the par-dstances beteen ords and the frequences of phrases and n-grams. Descrpton Spam Non-spam Messages contanng sgnatures 28% 36% Sgnatures contanng remove 86% 0% Sgnatures contanng a name and/or a ttle address 0% 72% Messages th punctuaton repeated 3 or more tmes 76% 38% Messages contanng repeated punctuaton th Taglne and forardng messages gnored 36% 2% Table 5: E-mal sgnatures can be harvested as useful features Many e-mals end th a personal sgnature or taglne. These e-mal sgnatures often contans repeated punctuaton denotng ther startng boundary folloed by a person s name ttle address and n some cases a toll free number. Sgnatures such as these are uncommon n unsolcted bul e-mal; hoever ther ndvdual components are frequently found n un e-mal. In a random sample of 50 spam and 50 non-spam messages 28% of the spam messages ere found to contan a taglne; hoever 86% of these messages contaned the ord remove or delete compared to 0% n the nonspam category demonstratng the utlty of ths nformaton as a hgh-precson feature. Furthermore e note that 72% of the non-un messages contaned a name ttle and 20

24 address compared to 0% n the spam category posng yet another useful feature n the e- mal sgnature. We further note as have Saham et. al n [10] that spam messages often contans repeated punctuaton; for example one mght fnd the phrase HUGE SAVINGS!!!! n a spam message; hoever e also note that repeated punctuaton s often contaned n the e-mal sgnatures of non-spam messages dmnshng ts value as a feature. A loo at our random samples revealed that spam messages ere only tce as lely to contan repeated punctuaton; hoever f the repeatng punctuaton n the emal sgnatures ere removed and forardng headers such as ---- orgnal message ---- ere gnored the spam messages ere 18 tmes more lely to contan repeated punctuaton than the non-spam messages an ncrease of 9 tmes n feature effectveness. Thus e have shon that repeated punctuaton s a another useful feature for the detecton of un e-mal. We have also shon that the utlty of ths feature can be greatly ncreased f the repeatng punctuaton assocated th e-mal sgnatures s detected and removed. It s therefore recommended that a better parser be constructed to detect and parse e-mal sgnatures so that ther repeatng punctuaton can be removed and so that ther features can be extracted. We hypothesze that such an approach ll be very helpful n ncreasng classfer performance. 21

25 References [1] Internet Wee May CMP Meda Inc Manhasset Ne Yor [2] Ne Yor Tmes March [3] Cranor Lorre & LaMaccha Bran Spam AT&T Labs Techncal Report March 1998 [4] Internet Wee May 11 CMP Meda Inc Manhasset Ne Yor 1998 [5] Commercal Internet Exchange and Internet La and Polcy Forum June 1998 [6] Marshall Jonathan Spam' Overloads Pac Bell Flood of un e-mal nocs out servce San Francsco Chroncle March [7] CNET The Net February [8] Levtt Mar & Comsey Me Brght Lght Focuses on Elmnatng Spam IDC Corporaton July [9] Hoffman P. & Crocer D. Unsolcted Bul E-mal: Mechansms for Control Internet Mal Consortum. [10] Saham M. & Dumas S. & Hecerman D. & Horvtz E. A Bayesan Approach to Flterng Jun E-mal n Learnng for Text Categorzaton: Papers from the 1998 Worshop. AAAI Techncal Report [11] Ngam K. & McCallum A. & Thurn S. & Mtchell T. Text Classfcaton from labeled and unlabeled documents. [12] Fredman Nr & Geger Dan & Goldszmdt Moses Bayesan netor classfers n Machne Learnng Vol. 2 pp [13] Les D. Nave (Bayes) at forty: The ndependence assumpton n nformaton retreval n ECML-98: Proceedngs of the Tenth European Conference on Machne Learnng [14] Les D. Test Representaton for ntellgent text retreval: A classfcaton orented ve. In Paul S. Jacobs edtor Text-Based Intellgent Systems pp Larence Erlbaum NJ [15] Benzaf et. al Genetc Programmng: An Introducton Morgan Kauffman San Francsco pp [16] Yang Y. & Pedersen J A comparatve study on feature selecton n text categorzaton n Internatonal Conference on Machne Learnng (ICML)

26 [17] Les D. D Feature selecton and feature extracton for text categorzaton Morgan Kaufmann San Francsco pp [18] Koller D. and Saham M. Herarchcally classfyng documents usng very fe ords n Internatonal Conference on Machne Learnng (ICML) pp [19] Jaaola T. S. and Haussler Explotng generatve models n dscrmnatve classfers [20] Mladenc D Feature subset selecton n text-learnng n Proc. of the 10th European Conference on Machne Learnng [21] Jan A. Chandrasearan Dmensonalty and sample sze consderatons n pattern recognton practce n handboo of statstcs Vol. 2 (Krshnah P. and Kanal L. edtors) pp Amsterdam: North-Holland Publshng Company [22] Zpf G.K. Human Behavour and the Prncple of Least Effort. Addson Wesley [23] Clac C. & Farrngton J. & Ldell P. & Yu T. Autonomous Document Classfcaton for Busness n Proceedngs of The ACM Agents Conference [24] Fraes W. Stemmng Algorthms n Informaton Retreval and Data Structures pp Engleood Clffs Ne Jersey: Prentce Hall [25] van Rsbergen Informaton Retreval Butterorths London 2 nd edton [26] Les D. "Evaluatng and optmzng autonomous text classfcaton systems" n SIGIR '95 pp [27] Katra H. Genetc Programmng and ts Applcaton to the Classfcaton of Web Pages Department of Electrcal and Computer Engneerng Techncal Report Unversty of Waterloo Waterloo Ontaro

27 Estmated Expenses: $0 Actual Expenses: $0 Appendx 1:Table of Expenses [As requred by the E&CE 499 Report Outlne] 24

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Performance attribution for multi-layered investment decisions

Performance attribution for multi-layered investment decisions Performance attrbuton for mult-layered nvestment decsons 880 Thrd Avenue 7th Floor Ne Yor, NY 10022 212.866.9200 t 212.866.9201 f qsnvestors.com Inna Oounova Head of Strategc Asset Allocaton Portfolo Management

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Web Spam Detection Using Machine Learning in Specific Domain Features

Web Spam Detection Using Machine Learning in Specific Domain Features Journal of Informaton Assurance and Securty 3 (2008) 220-229 Web Spam Detecton Usng Machne Learnng n Specfc Doman Features Hassan Najadat 1, Ismal Hmed 2 Department of Computer Informaton Systems Faculty

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

Using Content-Based Filtering for Recommendation 1

Using Content-Based Filtering for Recommendation 1 Usng Content-Based Flterng for Recommendaton 1 Robn van Meteren 1 and Maarten van Someren 2 1 NetlnQ Group, Gerard Brandtstraat 26-28, 1054 JK, Amsterdam, The Netherlands, robn@netlnq.nl 2 Unversty of

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

An RFID Distance Bounding Protocol

An RFID Distance Bounding Protocol An RFID Dstance Boundng Protocol Gerhard P. Hancke and Markus G. Kuhn May 22, 2006 An RFID Dstance Boundng Protocol p. 1 Dstance boundng Verfer d Prover Places an upper bound on physcal dstance Does not

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

A spam filtering model based on immune mechanism

A spam filtering model based on immune mechanism Avalable onlne www.jocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):2533-2540 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A spam flterng model based on mmune mechansm Ya-png

More information

Meta-Analysis of Hazard Ratios

Meta-Analysis of Hazard Ratios NCSS Statstcal Softare Chapter 458 Meta-Analyss of Hazard Ratos Introducton Ths module performs a meta-analyss on a set of to-group, tme to event (survval), studes n hch some data may be censored. These

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Implementation of Deutsch's Algorithm Using Mathcad

Implementation of Deutsch's Algorithm Using Mathcad Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

Traffic-light a stress test for life insurance provisions

Traffic-light a stress test for life insurance provisions MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence

More information

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons

More information

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets Improved Mnng of Software Complexty Data on Evolutonary Fltered Tranng Sets VILI PODGORELEC Insttute of Informatcs, FERI Unversty of Marbor Smetanova ulca 17, SI-2000 Marbor SLOVENIA vl.podgorelec@un-mb.s

More information

Mooring Pattern Optimization using Genetic Algorithms

Mooring Pattern Optimization using Genetic Algorithms 6th World Congresses of Structural and Multdscplnary Optmzaton Ro de Janero, 30 May - 03 June 005, Brazl Moorng Pattern Optmzaton usng Genetc Algorthms Alonso J. Juvnao Carbono, Ivan F. M. Menezes Luz

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection Stochastc Protocol Modelng for Anomaly Based Network Intruson Detecton Juan M. Estevez-Tapador, Pedro Garca-Teodoro, and Jesus E. Daz-Verdejo Department of Electroncs and Computer Technology Unversty of

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

Planning for Marketing Campaigns

Planning for Marketing Campaigns Plannng for Marketng Campagns Qang Yang and Hong Cheng Department of Computer Scence Hong Kong Unversty of Scence and Technology Clearwater Bay, Kowloon, Hong Kong, Chna (qyang, csch)@cs.ust.hk Abstract

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001 Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 5-9, 2001 LIST-ASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000 Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS Magdalena Rogalska 1, Wocech Bożeko 2,Zdzsław Heduck 3, 1 Lubln Unversty of Technology, 2- Lubln, Nadbystrzycka 4., Poland. E-mal:rogalska@akropols.pol.lubln.pl

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters Frequency Selectve IQ Phase and IQ Ampltude Imbalance Adjustments for OFDM Drect Converson ransmtters Edmund Coersmeer, Ernst Zelnsk Noka, Meesmannstrasse 103, 44807 Bochum, Germany edmund.coersmeer@noka.com,

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

RequIn, a tool for fast web traffic inference

RequIn, a tool for fast web traffic inference RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France Olver.aul@nt-evry.fr, Jean-Etenne.Kba@nt-evry.fr Abstract As networked

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech.,

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining Rsk Model of Long-Term Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,

More information

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture

A Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture A Desgn Method of Hgh-avalablty and Low-optcal-loss Optcal Aggregaton Network Archtecture Takehro Sato, Kuntaka Ashzawa, Kazumasa Tokuhash, Dasuke Ish, Satoru Okamoto and Naoak Yamanaka Dept. of Informaton

More information

Damage detection in composite laminates using coin-tap method

Damage detection in composite laminates using coin-tap method Damage detecton n composte lamnates usng con-tap method S.J. Km Korea Aerospace Research Insttute, 45 Eoeun-Dong, Youseong-Gu, 35-333 Daejeon, Republc of Korea yaeln@kar.re.kr 45 The con-tap test has the

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

FORECASTING TELECOMMUNICATION NEW SERVICE DEMAND BY ANALOGY METHOD AND COMBINED FORECAST

FORECASTING TELECOMMUNICATION NEW SERVICE DEMAND BY ANALOGY METHOD AND COMBINED FORECAST Yugoslav Journal of Operatons Research 5 (005), Number, 97-07 FORECAING ELECOMMUNICAION NEW ERVICE DEMAND BY ANALOGY MEHOD AND COMBINED FORECA Feng-Jenq LIN Department of Appled Economcs Natonal I-Lan

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Canon NTSC Help Desk Documentation

Canon NTSC Help Desk Documentation Canon NTSC Help Desk Documentaton READ THIS BEFORE PROCEEDING Before revewng ths documentaton, Canon Busness Solutons, Inc. ( CBS ) hereby refers you, the customer or customer s representatve or agent

More information

A GENETIC ALGORITHM-BASED METHOD FOR CREATING IMPARTIAL WORK SCHEDULES FOR NURSES

A GENETIC ALGORITHM-BASED METHOD FOR CREATING IMPARTIAL WORK SCHEDULES FOR NURSES 82 Internatonal Journal of Electronc Busness Management, Vol. 0, No. 3, pp. 82-93 (202) A GENETIC ALGORITHM-BASED METHOD FOR CREATING IMPARTIAL WORK SCHEDULES FOR NURSES Feng-Cheng Yang * and We-Tng Wu

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) 2127472, Fax: (370-5) 276 1380, Email: info@teltonika.

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) 2127472, Fax: (370-5) 276 1380, Email: info@teltonika. VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng Envronment Congcong Xong, Long Feng, Lxan Chen A New Task Schedulng Algorthm Based on Improved Genetc Algorthm n Cloud Computng

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

Section 5.4 Annuities, Present Value, and Amortization

Section 5.4 Annuities, Present Value, and Amortization Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today

More information

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers Journal of Computatonal Informaton Systems 7: 13 (2011) 4740-4747 Avalable at http://www.jofcs.com A Load-Balancng Algorthm for Cluster-based Mult-core Web Servers Guohua YOU, Yng ZHAO College of Informaton

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background: SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and

More information

Software project management with GAs

Software project management with GAs Informaton Scences 177 (27) 238 241 www.elsever.com/locate/ns Software project management wth GAs Enrque Alba *, J. Francsco Chcano Unversty of Málaga, Grupo GISUM, Departamento de Lenguajes y Cencas de

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent

More information