Web Spam Detection Using Machine Learning in Specific Domain Features

Size: px
Start display at page:

Download "Web Spam Detection Using Machine Learning in Specific Domain Features"

Transcription

1 Journal of Informaton Assurance and Securty 3 (2008) Web Spam Detecton Usng Machne Learnng n Specfc Doman Features Hassan Najadat 1, Ismal Hmed 2 Department of Computer Informaton Systems Faculty of Computer and Informaton Technology Jordan Unversty of Scence and Technology Irbd 22110, Jordan najadat@just.edu.jo 1 hmed@just.edu.jo 2 Abstract: In the last few years, as Internet usage becomes the man artery of the lfe's daly actvtes, the problem of spam becomes very serous for nternet communty. Spam pages form a real threat for all types of users. Ths threat proved to evolve contnuously wthout any clue to abate. Dfferent forms of spam wtnessed a dramatc ncrease n both sze and negatve mpact. A large amount of E-mals and web pages are consdered spam ether n Smple Mal Transfer Protocol (SMTP) or search engnes. Many techncal methods were proposed to approach the problem of spam. In E-mals spam detecton, Bayesan Flters are wdely and successfully appled for the sake of detectng and elmnatng spam. The assumpton that each term n the document contrbutes to the flterng task equally to other terms and the avodance of user's feed back are major shortcomngs that we attempt to overcome n ths work. We propose an mproved Naïve Bayes Classfer that gves weght to the nformaton fed by users and takes nto consderaton the exstence of some doman specfc features. Our results show that the mproved Naïve Bayes classfer outperforms the tradtonal one n terms of reducng the false postves and the false negatves and ncreasng the overall accuracy. Keywords: Web Spam, Naïve Bayes, Term Frequency Matrx (TFM), Confuson Matrx (CM). 1. Introducton Wth the ncreased advancements n nternet applcatons and the prolferaton of nformaton avalable for the publc, the need for effcent search engnes that are able to retreve the most relevant documents that satsfy users' needs becomes evdent. From Informaton Retreval (IR) perspectve, search engnes are responsble for retrevng a set of documents that are ranked n descendng order accordng to ther relevancy [2]. A common problem encountered n ths context s that there are some documents marked wth a hgh rank and retreved as the frst (or one of the top) documents by the search engnes where they are truly not [5]. Several reasons exst to justfy ths problem; one reason s related to the extent to whch a user knows exactly what he or she s searchng for, and consequently, hs or her knowledge s reflected on the retreved results. Another mportant reason s the exstence of the so called: Spam Web pages; these are pages that from the search Receved August 20, engnes' pont of vew seem to be relevant, but n realty they contan no useful nformaton for users [5]. In ther dscusson about web spam, Castllo et. al. [4] defned web spam as any attempt to deceve a search engne s relevancy algorthm, or an acton performed wth the purpose of nfluencng the rankng of the page. Detectng Web Spam s consdered as one of the most challengng ssues facng search engnes and web users [11]. Snce the search engnes are the gates to the World Wde Webs, t s mportant to provde the possble best results answerng the user's queres. There are some people well known as spammers try to mslead the search engnes by boostng ther web pages rank, as a result capture user attenton to ther pages. These pages contan a few or not any useful nformaton that the user expects to fnd. The search engnes need to detect or flter spam pages to provde hgh qualty results to users (.e. truly relevant pages). For a search engne to be evaluated as an effcent one, t should not only return as much documents as possble, but also should return those relevant documents that are spam-free. Currently, many technques are appled by search engnes to fght spam, such as detectng spam web pages through content analyss [11]. Ths technque s the most popular technque for spam detecton currently used by search engnes such as Google; nevertheless, t s stll lack to fnd all spam web pages. A separate secton s devoted to detal ths technque further. Spam can be very annoyng n the context of search engne for several reasons. Frst, n the case there are fnancal advantages from search engne, the exstence of spam pages may lower the chance for legtmate (legal) web pages to get the revenue that they mght earn n the absence of spam. Second the search engne my return rrelevant results that users do not expect, and therefore, a non-trval porton of tme mght spent on-lne wadng through such unwanted pages. Fnally the search engne my waste mportant resources on spam pages, ths nclude wastng network bandwdth (Crawlng), wastng CPU cycles (Processng), and wastng storage space (Indexng) [11]. Mcrosoft Researchers [11] show that some partcular toplevel domans are more lkely to contan spam than others do, for example,.bz (Busness) has greatest percentage of spam wth 70% of all pages beng spam,.us doman comes $03.50 Dynamc Publshers, Inc

2 Web Spam Detecton Usng Machne Learnng n Specfc Doman Features 221 n second place wth 35% spam pages. Moreover, pages wrtten n some partcular languages are more lkely to be spam than those wrtten n other languages, for nstance pages wrtten n French are the most lkely to be spam, wth the percentage of 25% of beng spam. Spammers proved ther excellence to adapt to the dfferent formats avalable for Web pages, several spammng technques used by spammers to nfluence the rankng page algorthms of search engnes. All these technques are consdered challengng for web page spam detecton algorthm, especally for Contents-Based approach. The two man categores of spammer technques are: Term Spammng, and Lnk Spammng [11, 3].In term spammng, many technques that modfy the content of the page are appled. The content ncludes: the document body, the ttle, Meta tags n HTML header, anchor texts assocated wth URLs and page URLs. The spammers can attach ther unsolcted content (.e. spam) to one or more of these contents resultng n a new page that can pass the spam flter wthout any doubt of beng legal. Among all term spammng technques, the most popular one s body spammng [11], n whch terms are ncluded n the document body, an example s to nclude specfc terms as "Free grant money", "free nstallaton", "Promse you...!", "free prevew", etc. Another way of groupng term spammng technques s based on the type of terms that are added to the text felds, ether by repeatng one or a few specfc terms, ncludng a large number of unrelated terms, or sttchng phrase wheren, sentences or phrases, possbly from dfferent sources are glued together [11]. In lnk spammng technque, spammers tend to nsert lnks between pages that are present for reasons other than mert [13]. Lnk spam takes advantage of lnk-based rankng algorthms, such as Google s Page Rank algorthm, whch gves a hgher rankng to a webste that s cted by other hgh ranked webstes. In correspondence to the aforementoned spammng technques, many content-based spam flterng technques were proposed. The mportance of analyzng the content of a partcular web page s that spammers tend to boost ther web pages rank by applyng spammng technques on these contents [11]. Durng content analyss, the number of words n the page s body and ttle, the average length of words, the amount of text anchor and keywords n metatags are analyzed to detect the abnormaltes n these contents that are nterpreted as spammng attempts. The rest of ths paper s organzed as follows. Secton 2 gves an overvew of the works related to spam detecton. The Naïve Bayes classfer s llustrated n Secton 3. Secton 4 proposed our approach, and our expermental results are shown n Secton 5. Secton 6 concludes the paper and provdes future drectons. 2. Related Work Due to the mportant role that web pages occupy as means for supportng electronc commerce (E-Commerce), web pages become entcng target for all dfferent knds of traders and marketers to advertse ther products for sale, get-rch-on-the-fly schemes [13, 17], and to get nformaton about pornographc web stes. Moreover, they become an entcng target for spammers to embed ther spam content. SpamCon Inc. [1] estmated the cost nduced by resources loss and spam flterng assocated wth only one unsolcted message s 1$ up to 2$ multpled by the number of spam sent and receved every day, the one dollar becomes mllon. Because of the serous problems assocated wth the unsolcted spam contents of ether a sngle E-mal or a large web page, a number of automated flterng approaches were proposed n the lterature to overcome such problems [16, 10]. These flters are used manly for E-mal spam and then transformed to be used n the context of Web Spam. Early proposed approaches for spam flterng reled mostly on manually constructed pattern-matchng rules that need to be tuned to each user s message [9]. That s, they allow users to hand-buld a rule set that conssts of a set of logcal rules to detect spam emals and Web pages. However, these approaches are seemed to be tedous and problematc, snce users need to pay a full attenton just to buld the desred set of rules, whch by the way not all users can buld such a set. In addton, t s a tme consumng process, snce the generated set of rules should be changed or refned perodcally as the nature of spam changes too. Because of the problems assocated wth the manual constructon of rules, another approach was proposed n [7] to automatcally adapt to the changng nature of spam over tme and to provde a system that can learn drectly from data already stored n the web server databases. These approaches proved as successful when appled for general classfcaton tasks, that s, the classfcaton of E-mal to ether spam or non-spam based on ther text, wth no regards to the exstence of some doman specfc features. Several machne learnng algorthms have been proposed for text categorzaton (classfcaton) [14, 15]. These approaches were nvestgated to be used for spam flterng snce t s vewed as a text categorzaton problem. In [13], they appled a machne learnng algorthm for the purpose of spam flterng. In ths algorthm, the flter learns to classfy documents nto fxed classes (.e. spam and nonspam), based on ther content, after beng traned on manually classfed documents. As a varaton of the rule-based approaches dscussed above, a great deal of work was wtnessed n the lterature to automatcally perform content-based classfcaton. Naïve Bayes classfers [16], was proposed as a good example of those approaches that showed satsfactory results n the context of E-mal spam Flterng. [13] traned a Naïve Bayes classfer on manually classfed spam and nonspam messages reportng surprsngly good results n terms of precson and recall. Our work utlzes Naïve Bayes classfers based on the context of web pages to detect the spam pages automatcally. 3. The Naïve Bayes Classfer A Naïve Bayesan classfer s a smple probablstc 221

3 222 Najadat and Hmed classfer based on applyng Bayes' theorem wth a strong (nave) ndependence assumpton that all varables A 1, A 2,,A n n a gven category C are condtonally ndependent wth each others gven C [16]. Dependng on the precse nature of the probablty model, Nave Bayes classfers can be traned very effcently n a supervsed learnng settng [6,12]. Besde Naïve Bayes classfers, a varety of supervsed machne learnng algorthms such as Support Vector Machne (SVM) and memory-based learnng [4, 8] have been successfully appled and showed satsfactory results n the context of spam flterng. Although these technques dscussed above proved to perform well n some cases, they stll have problems n other cases: for nstance, all types of Content-based spam flters have false postves; generally, t s more sever to msclassfy a legtmate message as spam than to let a spam message pass the flter [4]. In addton what s classfed as spam by these flters may not truly be so because spam s a relatve concept, that s, what mght be consdered as a spam for one person may not be so for another one. These lmtatons are drvng factors for us to develop a novel technque for spam flterng by usng a user-orented feedback mechansm that works n combnaton wth Naïve Bayes classfer to reduce the false postves and false negatve encountered by tradtonal classfers, n addton, a specal concern s gven to some specfc doman features (terms and patterns) that maybe consdered as spam dscrmnators. The classfers can be appled on spam flterng beng vewed as text classfcaton problem as follows: Gven a set of tranng documents, D= {t 1, t 2 t n } of tuples and a set of classes C= {C 1, C 2 C n }. The classfcaton problem s to defne a mappng f: D C where each t s assgned to one class wth ther assocated class labels, each document, t, s represented by a vector of words {w 1, w 2, w n }. The ndependent probablty of w of a gven document assocated wth class C can be wrtten as n [6] p( W C). Snce each document conssts of a large number of words, the Naïve Bayes classfer makes the smplfyng assumpton that w 1, w 2, w are condtonally ndependent gven the category C, P ( D C ) = p ( W C ) (1) The probablty of a gven document D belongs to a gven class C s represented as P(C D), whch can be computed p ( C ) P ( C D ) = p ( D C ) (2) P ( D ) To estmate the probablty of a partcular document s spam, gven that t contans certan words, Bayes' theorem states that the probablty of fndng those certan words n spam documents, tmes the probablty that any document s spam, dvded by the probablty of fndng those words n any document [13]: p( words spam) P ( spam words) = p( spam) (3) P( words) 4. Web Spam Detecton Classfer Fndng a spam web page s vewed as supervsed text classfcaton problem. In the supervsed classfcaton applcaton, the web spam classfer needs to be traned wth a set of web pages that are prevously classfed nto two categores, spam and non-spam. Snce spam s a relatve concept, that s, what s consdered spam for one user may not be the same for other users. Moreover, what mght be spam for a specfc user at a partcular tme mght not be so for the same user at dfferent tme, then, dependng only on the capabltes of the traned classfer as the case of many tradtonal Naïve Bayes Classfers seems to be of lmted benefts. In tranng phase, a user-orented preparaton s performed, wheren, the web pages resde n the web server are classfed nto spam or nonspam based on user's feedback and the automated classfcaton by the flter. The attenton s gven for the general spam that the majorty of users agree upon, then all the terms contaned n the pages classfed as general spam are extracted to form the General Spam Dctonary, ths phase s llustrated n Fgure 1. General spam, specfc spam, and nonspam pages Flter Feedback Fgure 1. User Orented Tranng Therefore, a novel user-orented tranng mechansm s needed to support the classfer wth perodc users' feedback to determne whether the page s consdered as spam or nonspam. In ths tranng scenaro, there are two outcomes: those web pages that are judged to be spam by the majorty of users, we call such pages: the General Spam, and those

4 Web Spam Detecton Usng Machne Learnng n Specfc Doman Features 223 web pages that vewed to be spam by one or few users, we call them Specfc Spam. Our attenton wll be focused on the general spam, because they can contrbute effcently n the process of classfcaton. Ths novel tranng mechansm s proposed manly to functon n the server-sde rather than n the clent-sde. It s an effectve and promsng mechansm to overcome the prevously mentoned problems assocated wth web server beng affected wth spam, especally the problem of wastng resources on spam pages (the resources nclude server's bandwdth, CPU cycles and storage space). The spam detecton system conssts of three phases whch nclude tranng phase, preprocessng phase, and classfcaton phase. As shown n Fgure 2, the preprocessng phase s the cleanng process, whch s appled to each web document to extract ts body. In stemmng and stop words removal, the frequent words that do not contrbute effcently n classfcaton process are removed from the body of the page. Stemmng operaton reduces dstnct words to ther common stem, whch s acheved by removng prefxes and suffxes from words. We choose the affx stemmers algorthm n the stemmng work. Ths asssts n reducng the tme requred for classfcaton snce the words length s lessened, whch yelds to reduce the accuracy of classfcaton. The vtal step n ths phase s the generatng dctonares, wheren two dfferent lsts of words are generated, such lsts are called dctonares, and they nclude: the frequent terms dctonary, extracted from those web pages that are classfed as non-spam, and the specal features dctonary. Each entry n the frequent terms dctonary conssts of <term, probablty of beng spam>. In classfcaton phase, the preprocessed documents are represented by Term Frequency Matrx (TFM) structure [5] to perform the statstcal analyss (.e. Bayesan rule). TFM smplfes the calculaton of the probablty of word belongs to class j, P(class j word ), and also mproves the effcency of Naïve Bayes classfer whch requres only one scan through the entre tranng dataset. As provded n Fgure 3, each ntersects of word and class row n Term Frequency Matrx represents the number of tmes (or frequency) of the word appear n class j. Assumng that each feature contrbutes n the process of spam flterng n an equal manner to each other feature may lead to nadequate results. In other words, there s no partcular feature n the text of the web page that provdes evdence as to whether the page s spam or nonspam. However, ths assumpton does not hold n many real stuatons. For example, t s proved by experence (and from users' feedback) that there are many specfc features whose exstence provdes a strong ndcaton on the suspcous message (spam), such as "free money", "congratulaton you are wnner number ", "$$$$ you wn xxx$$$$", and the over used punctuatons "??????". In addton to these dscrmnatng textual features (patterns), web pages contan many non-textual features that ndcate whether t s spam or not such as the doman type.edu,.org,.com,.bz, etc [4]. It s shown by Mcrosoft researchers [11] that 70% of those pages wth the doman.bz are spam, and that.edu pages are rarely (or never) contan spam. To ths end, we consder the employment of such (textual and non-textual) features as good dscrmnators of spam that nsst n a correct classfcaton. To acheve ths, we mantan a table (or named as dctonary) called specfc feature dctonary consstng of all these specfc features, each entry n ths table corresponds to: <feature, probablty>. Then, t becomes straghtforward to ncorporate such addtonal features to our Naïve Bayes Model. As a new web page needs to be classfed, a lst of all ts words s generated and checked aganst the pre-establshed features dctonary to make sure whether t contans one or more of ts words that are determned to be spam dscrmnators. In addton to the comparson wth the specfc features dctonary, the new webpage s compared aganst other dctonares (.e. the general spam and the frequent terms dctonares). After each comparson, the probablty of spam s computed, as a result, we come up wth a probablty value that takes nto account the doman specfc features exstence, the words that are judged (by users) to be spam, and the ordnary terms that mght be spam n some cases (accordng to ther probablty). Fgure 4 depcts the Naïve Bayes classfer wth user feedback and doman specfc features. 223

5 224 Najadat and Hmed Fgure 2. Web Spam detecton system structure wth Naïve Bayes Classfer

6 Web Spam Detecton Usng Machne Learnng n Specfc Doman Features 225 Fgure 3. Term Frequency Matrx assocated wth ndvual fle terms 5. Expermental Results Our experments were all performed on the webspam- UK2006 data set [4]. The tranng dataset conssts of 8,1415 web pages. A detaled descrpton of ther data set and the crtera n assgnng a web to be spam or nonspam can be found n [4]. In our work, a sample of these web pages s taken to evaluate the classfcaton accuracy of our spam detector. To estmate the accuracy of our proposed algorthm, we use a popular accuracy measure n the context of Informaton Retreval, namely: the Confuson Matrx (CM). CM contans nformaton about actual and predcted classfcatons done by a classfcaton system [6]. Fgure 5 shows confuson matrx for the two classes spam and non-spam. As n [6], TP represents actually postve and classfy as postve, FN represents actually postve and classfy as negatve, FP represents actually negatve and classfy as postve, TN represents actually negatve and classfy as negatve. 225

7 226 Najadat and Hmed 1) Buld Vocabulary Table for the entre documents. 2) Calculate P (Spam) and P (Non-spam). 3) For each word n test document j do If (word exsts n Feature dctonary) then calculate probablty of document D beng spam P 1 ( D spam) = P( w spam) If (word exsts n Spam dctonary) then calculate probablty of document D beng spam: P 2 ( D spam ) = P( w spam ) Fgure 4. Spam Detecton Procedure If (word exsts n Term dctonary) then calculate probablty of document D beng spam: P 3 ( D spam ) = P ( w spam ) 4) End for loop 5) Ptotal ( D spam) = P1 + P2 + P3 6) Calculate posteror probablty of document D : P ( spam D) = Ptotal ( D spam). P( spam), P( non spam D) = P( w non-spam). P( non spam) 7) If P ( spam D) > P( non spam D) then Classfy document D as spam else Classfy document D as not spam. Senstvty = truepostve truepostve + falsenegatve Specfcty = truenegatve truenegatve + falsepostve We use the above measurements to compute the accuracy whch s defned as follows: Accuracy = TP + TN (6) TP + FN + FP + TN To evaluate the accuracy, holdout technque s utlzed to produce a true estmate of the classfer. The data are parttoned nto two separated dataset, tranng set and testng set. The tranng set used to learn the classfer algorthm, and the testng set n used to evaluate the accuracy. We run our classfer on dfferent fve samples and then calculate the classfcaton accuracy for each run. For nstance, we calculate the accuracy of a sample consstng of 238 testng documents (wth 69 of documents belong to Nonspam class, and 169 of documents belong to Spam class). For ths test, the Senstvty and Specfcty are 97% and 66% respectvely. And the total accuracy s 88%, where total accuracy equal ((TP + TN) / (TP+FP+TN+FN)). Other experments are made on dfferent samples consstng of 324, 400, 519 and 618 documents. The accuracy results are shown n fgures 6 and 7, these fgures show also the effect of stemmng the document before classfcaton on the accuracy results. Fgure 7 ndcates that usng stemmng documents gans 80% n average, whle Fgure 6 shows the accuracy s 78%. (4) (5) Spam Nonspa m Actual Spam TP FN Class Nonspam FP TN Fgure 5. Confuson Matrx We use the confuson matrx to calculate senstvty and Specfcty measures. Senstvty refers to true postve rato, that s, the proporton of postve documents that are correctly dentfed. Specfcty s the true negatve rato, that s, the proporton of negatve documents that are correctly dentfed. Where truepostve s the number of true postve documents are correctly classfy. TrueNegatve s the number of true negatve documents that are correctly classfed. FalsePostve s the number of false postve documents that are ncorrectly classfed [6]. Accuracy (%) Data Sze Fgure 6. Accuracy versus Dataset sze (Stemmng text)

8 Web Spam Detecton Usng Machne Learnng n Specfc Doman Features 227 Fgure 7. Accuracy versus Dataset sze (Unstemmng text) We also evaluate the classfer performance by concentratng on the number of nonspam pages that are wrongly classfed as spam, and the number of spam pages that are wrongly classfed as nonspam. The frst parameter s called the Nonspam Msclassfcaton Rate (HMR) whle the second parameter s called Spam Msclassfcaton Rate (SMR). In the context of spam flterng, t s proved that the effect f msclassfyng a legtmate message as spam s more sever than msclassfyng a spam message as legtmate. The accuracy s compared for both Naïve Bayes Classfer wth Doman Specfc Features (NBCDSF) and the Naïve Bayes Classfer wthout User Feedback (NBCUF), the results show that the mproved Naïve Bayes classfer wth user feedback outperforms the NBCUF n terms of ncreased accuracy and decreased SMR and HMR rates. The results are shown n Fgure Concluson and Future work Web spam pages are an annoyng problem that were prevented by many technques, among them, s the Naïve Bayes classfers that proved to be effcent mechansms for spam flterng. SMR Accuracy (%) Data Sze Data Sze Fgure 8. Spam Msclassfcaton Rate results NBCUF NBCDSF Detectng spam web pages s one of the major challenges that face search engnes n ther queres results. Search engnes should return hgh qualty results n response to the user's queres. Many search engnes necesstate an ntegraton of a healthy detecton spam to elmnate all web pages that effect n page rankng algorthm. Several contentbased and machne learnng technques were proposed to detect spam pages. Ths paper proposed a Naïve Bayes approach that gves weght for user's feedback to mprove the tranng process of the classfer and that consder the exstence of some doman specfc features that contrbute strongly to the spam dscrmnaton, that s, there exstence provdes evdence that the webpage s spam. Ths approach proposed manly to functon n the server-sde, to reduce the overhead assocated wth spam pages n the web server. Our work nvolved Naïve Bayes classfer n dscoverng non requred pages. The expermental results showed that Naïve Bayes classfer provdes on average accuracy equal to 80.2%. For future works we wll develop an applcaton as plug-n n one of the open source browser to work as detector n onlne webste pages to notfy the users for spam pages currently workng on. Our future work wll be to optmze the performance of our Naïve Bayes Classfer by takng nto consderaton the word-poston of the doman specfc features, whch wll contrbute to a better accuracy. In addton, the mprovement wll nclude the user-orented feedback model to satsfy the needs of much more users. References [1] S.Atskns, "Sze and Cost of the Problem", In the Proceedngs of the Ffty-sxth nternet Engneerng Task Force(IETF) Meetng, San Francsco, CA, USA, [2] R.Baeza-Yates and B.Rbero-Neto, Modern Informaton Retreval, Ednburgh Gate, England, [3] L.Becchett, C.Castllo, D.Donato, R.Baeza-Yates, S.Leonard, "Lnk Analyss for Web Spam Detecton", ACM Trans. Web, 2 (1), pp. 1-42, [4] C.Castllo, D.Donato, L.Becchett1, P.Bold, S.Leonard, M.Santn and S.Vgna, "A reference Collecton for Web Spam", SIGIR Forum, (40), pp , [5] Z.Gyöngy and H.Garca-Molna, "Web Spam Taxonomy", In the Proceedngs of the Frst Internatonal Workshop on Adversaral Informaton Retreval on the Web, Stanford Unversty, May [6] J.Han, and M.Kamber, Data Mnng: Concept and Technques, Morgan-Kaufman, New York, [7] I.Koprnska, J.Poon, J.Clark, and J.Chan, "Learnng to Classfy E-mal", Informaton Scence, 10(177), pp , [8] C.La, "An Emprcal Study of Three Machne Learnng Methods for Spam Flterng", Knowledge- Based Systems, 3 (20), pp , [9] C.Lee, Y.Km, and P.Rhee "Web Personalzaton Expert wth Combnng Collaboratve Flterng and 227

9 228 Najadat and Hmed Assocaton Rule Mnng Technque", Expert Systems wth Applcatons, 3(21) pp , [10] J.María, G.Cajgas and E.Puertas, "Content Based SMS Spam Flterng", In the Proceedngs of the 2006 ACM symposum on Document engneerng, ACM, [11] A.Ntoulas, M.Najork, M.Manasse, and D.Fetterly, "Detectng Spam Web Pages Through Content Analyss", In the Proceedngs of the 15th nternatonal conference on World Wde Web, ACM, [12] G.Paul, "Better Bayesan Flterng". In the Proceedngs of the 2003 spam conference, Jan [13] M.Saham, S.Dunmas, D.Heckerman, and E.Horvtz, "A Bayesan Approach to Flterng Junk E-mal", AAAI Workshop on Learnng for Text Categorzaton, July 1998, Madson, Wsconsn, AAAI Techncal Report WS [14] J.Su and H.Zhang, "Full Baysan Network Classfers", In the Proceedngs of 23 rd Internatonal Conference on Machne Learnng, Pttsburgh, PA, [15] B.Yu and Z.Xu, "A Comparatve Study for Content- Based Dynamc Spam Classfcaton Usng Four Machne Learnng Algorthms", Knowledge-Based Systems, 4(21), pp , [16] H.Zhang and D.L, "Naïve Bayes Text Classfer", In the Proceedngs of the IEEE Internatonal Conference on Granular Computng, [17] L.Zhang, J.Zhu, and T.Yao, "An Evaluaton of Statstcal Spam Flterng Technques ACM Transactons on Asan Language Informaton Processng, 4(l.3), pp , Authors Bographes Hassan M. Najadat s an Assstant Professor n the Department of Computer Informaton Systems n Jordan Unversty of Scence and Technology n Irbd, Jordan. Hs research nterests are centered on data mnng, machne learnng, database systems and he has authored over nne refereed publcatons n the areas of clusterng, classfcaton, assocaton rules, and text mnng. He receved hs PhD n Computer Scence from North Dakota State Unversty n Fargo, USA, MS n Computer Scence from Unversty of Jordan n Amman, Jordan, and BS n Computer Scence from Mut ah Unversty n Alkarak, Jordan. Ismal Hmed s an Assstant Professor n the Department of Computer Informaton Systems n Jordan Unversty of Scence and Technology n Irbd, Jordan. Hs research nterests are centered on nformaton retreval, natural language processng, e-learnng, and database systems. He has authored over 13 refereed publcatons n the areas of query expanson, automatc ndexng and text categorzaton. He receved hs PhD n Computer Scence from Illnos Insttute of Technology, USA, MS and BS n Computer Scence from Eastern Mchgan Unversty, U.S.A.

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

Rank Based Clustering For Document Retrieval From Biomedical Databases

Rank Based Clustering For Document Retrieval From Biomedical Databases Jayanth Mancassamy et al /Internatonal Journal on Computer Scence and Engneerng Vol.1(2), 2009, 111-115 Rank Based Clusterng For Document Retreval From Bomedcal Databases Jayanth Mancassamy Department

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Semantic Link Analysis for Finding Answer Experts *

Semantic Link Analysis for Finding Answer Experts * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 51-65 (2012) Semantc Lnk Analyss for Fndng Answer Experts * YAO LU 1,2,3, XIAOJUN QUAN 2, JINGSHENG LEI 4, XINGLIANG NI 1,2,3, WENYIN LIU 2,3 AND YINLONG

More information

Using Content-Based Filtering for Recommendation 1

Using Content-Based Filtering for Recommendation 1 Usng Content-Based Flterng for Recommendaton 1 Robn van Meteren 1 and Maarten van Someren 2 1 NetlnQ Group, Gerard Brandtstraat 26-28, 1054 JK, Amsterdam, The Netherlands, robn@netlnq.nl 2 Unversty of

More information

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council Usng Supervsed Clusterng Technque to Classfy Receved Messages n 137 Call Center of Tehran Cty Councl Mahdyeh Haghr 1*, Hamd Hassanpour 2 (1) Informaton Technology engneerng/e-commerce, Shraz Unversty (2)

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Web Object Indexing Using Domain Knowledge *

Web Object Indexing Using Domain Knowledge * Web Object Indexng Usng Doman Knowledge * Muyuan Wang Department of Automaton Tsnghua Unversty Bejng 100084, Chna (86-10)51774518 Zhwe L, Le Lu, We-Yng Ma Mcrosoft Research Asa Sgma Center, Hadan Dstrct

More information

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages Assessng Student Learnng Through Keyword Densty Analyss of Onlne Class Messages Xn Chen New Jersey Insttute of Technology xc7@njt.edu Brook Wu New Jersey Insttute of Technology wu@njt.edu ABSTRACT Ths

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Detecting Credit Card Fraud using Periodic Features

Detecting Credit Card Fraud using Periodic Features Detectng Credt Card Fraud usng Perodc Features Alejandro Correa Bahnsen, Djamla Aouada, Aleksandar Stojanovc and Björn Ottersten Interdscplnary Centre for Securty, Relablty and Trust Unversty of Luxembourg,

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Comparison of Domain-Specific Lexicon Construction Methods for Sentiment Analysis

Comparison of Domain-Specific Lexicon Construction Methods for Sentiment Analysis , pp.152-156 http://d.do.org/10.14257/astl.2016.135.38 Comparson of Doman-Specfc Lecon Constructon Methods for Sentment Analyss Myeong So Km 1, Jong Woo Km 2,3 and Cu Jng 4 1 Department of Mathematcs,

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.

The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc. Paper 1837-2014 The Use of Analytcs for Clam Fraud Detecton Roosevelt C. Mosley, Jr., FCAS, MAAA Nck Kucera Pnnacle Actuaral Resources Inc., Bloomngton, IL ABSTRACT As t has been wdely reported n the nsurance

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently. Corporate Polces & Procedures Human Resources - Document CPP216 Leave Management Frst Produced: Current Verson: Past Revsons: Revew Cycle: Apples From: 09/09/09 26/10/12 09/09/09 3 years Immedately Authorsaton:

More information

Effective Network Defense Strategies against Malicious Attacks with Various Defense Mechanisms under Quality of Service Constraints

Effective Network Defense Strategies against Malicious Attacks with Various Defense Mechanisms under Quality of Service Constraints Effectve Network Defense Strateges aganst Malcous Attacks wth Varous Defense Mechansms under Qualty of Servce Constrants Frank Yeong-Sung Ln Department of Informaton Natonal Tawan Unversty Tape, Tawan,

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001 Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 5-9, 2001 LIST-ASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center

More information

Overview of monitoring and evaluation

Overview of monitoring and evaluation 540 Toolkt to Combat Traffckng n Persons Tool 10.1 Overvew of montorng and evaluaton Overvew Ths tool brefly descrbes both montorng and evaluaton, and the dstncton between the two. What s montorng? Montorng

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

An Empirical Study of Search Engine Advertising Effectiveness

An Empirical Study of Search Engine Advertising Effectiveness An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman

More information

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech.,

More information

AN E-MAIL FILTERING AGENT BASED ON SUPPORT VECTOR MACHINES

AN E-MAIL FILTERING AGENT BASED ON SUPPORT VECTOR MACHINES BULETINUL INSTITUTULUI POLITEHNIC DIN IAŞI Publcat de Unverstatea Tehncă Gheorghe Asach dn Iaş Tomul LVI (LX), Fasc. 3, 200 SecŃa AUTOMATICĂ ş CALCULATOARE AN E-MAIL FILTERING AGENT BASED ON SUPPORT VECTOR

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

Software project management with GAs

Software project management with GAs Informaton Scences 177 (27) 238 241 www.elsever.com/locate/ns Software project management wth GAs Enrque Alba *, J. Francsco Chcano Unversty of Málaga, Grupo GISUM, Departamento de Lenguajes y Cencas de

More information

ADVERTISEMENT FOR THE POST OF DIRECTOR, lim TIRUCHIRAPPALLI

ADVERTISEMENT FOR THE POST OF DIRECTOR, lim TIRUCHIRAPPALLI ADVERTSEMENT FOR THE POST OF DRECTOR, lm TRUCHRAPPALL The ndan nsttute of Management Truchrappall (MT), establshed n 2011 n the regon of Taml Nadu s a leadng management school n nda. ts vson s "Preparng

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

Construction Rules for Morningstar Canada Target Dividend Index SM

Construction Rules for Morningstar Canada Target Dividend Index SM Constructon Rules for Mornngstar Canada Target Dvdend Index SM Mornngstar Methodology Paper October 2014 Verson 1.2 2014 Mornngstar, Inc. All rghts reserved. The nformaton n ths document s the property

More information

Search Efficient Representation of Healthcare Data based on the HL7 RIM

Search Efficient Representation of Healthcare Data based on the HL7 RIM 181 JOURNAL OF COMPUTERS, VOL. 5, NO. 12, DECEMBER 21 Search Effcent Representaton of Healthcare Data based on the HL7 RIM Razan Paul Department of Computer Scence and Engneerng, Bangladesh Unversty of

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising* Probablstc Latent Semantc User Segmentaton for Behavoral Targeted Advertsng* Xaohu Wu 1,2, Jun Yan 2, Nng Lu 2, Shucheng Yan 3, Yng Chen 1, Zheng Chen 2 1 Department of Computer Scence Bejng Insttute of

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign PAS: A Packet Accountng System to Lmt the Effects of DoS & DDoS Debsh Fesehaye & Klara Naherstedt Unversty of Illnos-Urbana Champagn DoS and DDoS DDoS attacks are ncreasng threats to our dgtal world. Exstng

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Product Quality and Safety Incident Information Tracking Based on Web

Product Quality and Safety Incident Information Tracking Based on Web Product Qualty and Safety Incdent Informaton Trackng Based on Web News 1 Yuexang Yang, 2 Correspondng Author Yyang Wang, 2 Shan Yu, 2 Jng Q, 1 Hual Ca 1 Chna Natonal Insttute of Standardzaton, Beng 100088,

More information

Traffic-light a stress test for life insurance provisions

Traffic-light a stress test for life insurance provisions MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax

More information

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING FORMAL ANALYSIS FOR REAL-TIME SCHEDULING Bruno Dutertre and Vctora Stavrdou, SRI Internatonal, Menlo Park, CA Introducton In modern avoncs archtectures, applcaton software ncreasngly reles on servces provded

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno

Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno Data Mnng from the Informaton Systems: Performance Indcators at Masaryk Unversty n Brno Mkuláš Bek EUA Workshop Strasbourg, 1-2 December 2006 1 Locaton of Brno Brno EUA Workshop Strasbourg, 1-2 December

More information

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent

More information

SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME

SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME August 7 - August 12, 2006 n Baden-Baden, Germany SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME Vladmr Šmovć 1, and Vladmr Šmovć 2, PhD 1 Faculty of Electrcal Engneerng and Computng, Unska 3, 10000

More information

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets Improved Mnng of Software Complexty Data on Evolutonary Fltered Tranng Sets VILI PODGORELEC Insttute of Informatcs, FERI Unversty of Marbor Smetanova ulca 17, SI-2000 Marbor SLOVENIA vl.podgorelec@un-mb.s

More information

How To Classfy Onlne Mesh Network Traffc Classfcaton And Onlna Wreless Mesh Network Traffic Onlnge Network

How To Classfy Onlne Mesh Network Traffc Classfcaton And Onlna Wreless Mesh Network Traffic Onlnge Network Journal of Computatonal Informaton Systems 7:5 (2011) 1524-1532 Avalable at http://www.jofcs.com Onlne Wreless Mesh Network Traffc Classfcaton usng Machne Learnng Chengje GU 1,, Shuny ZHANG 1, Xaozhen

More information

Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger

Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger Enrchng the Knowledge Sources Used n a Maxmum Entropy Part-of-Speech Tagger Krstna Toutanova Dept of Computer Scence Gates Bldg 4A, 353 Serra Mall Stanford, CA 94305 9040, USA krstna@cs.stanford.edu Chrstopher

More information

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters Frequency Selectve IQ Phase and IQ Ampltude Imbalance Adjustments for OFDM Drect Converson ransmtters Edmund Coersmeer, Ernst Zelnsk Noka, Meesmannstrasse 103, 44807 Bochum, Germany edmund.coersmeer@noka.com,

More information

A Novel Auction Mechanism for Selling Time-Sensitive E-Services

A Novel Auction Mechanism for Selling Time-Sensitive E-Services A ovel Aucton Mechansm for Sellng Tme-Senstve E-Servces Juong-Sk Lee and Boleslaw K. Szymansk Optmaret Inc. and Department of Computer Scence Rensselaer Polytechnc Insttute 110 8 th Street, Troy, Y 12180,

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Network Security Situation Evaluation Method for Distributed Denial of Service

Network Security Situation Evaluation Method for Distributed Denial of Service Network Securty Stuaton Evaluaton Method for Dstrbuted Denal of Servce Jn Q,2, Cu YMn,2, Huang MnHuan,2, Kuang XaoHu,2, TangHong,2 ) Scence and Technology on Informaton System Securty Laboratory, Bejng,

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

Towards a Global Online Reputation

Towards a Global Online Reputation Hu L Unversty of Ottawa 55 Laurer Ave E Ottawa, ON KN 6N5 Canada + (63) 562 5800, 8834 Hl03@uottawa.ca Towards a Global Onlne Reputaton Morad Benyoucef Unversty of Ottawa 55 Laurer Ave E Ottawa, ON KN

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Planning for Marketing Campaigns

Planning for Marketing Campaigns Plannng for Marketng Campagns Qang Yang and Hong Cheng Department of Computer Scence Hong Kong Unversty of Scence and Technology Clearwater Bay, Kowloon, Hong Kong, Chna (qyang, csch)@cs.ust.hk Abstract

More information

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent

More information

An Inductive Fuzzy Classification Approach applied to Individual Marketing

An Inductive Fuzzy Classification Approach applied to Individual Marketing An Inductve Fuzzy Classfcaton Approach appled to Indvdual Marketng Mchael Kaufmann, Andreas Meer Abstract A data mnng methodology for an nductve fuzzy classfcaton s ntroduced. The nducton step s based

More information

Abstract. 1. Introduction

Abstract. 1. Introduction System and Methodology for Usng Moble Phones n Lve Remote Montorng of Physcal Actvtes Hamed Ketabdar and Matt Lyra Qualty and Usablty Lab, Deutsche Telekom Laboratores, TU Berln hamed.ketabdar@telekom.de,

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

A Dynamic Load Balancing for Massive Multiplayer Online Game Server A Dynamc Load Balancng for Massve Multplayer Onlne Game Server Jungyoul Lm, Jaeyong Chung, Jnryong Km and Kwanghyun Shm Dgtal Content Research Dvson Electroncs and Telecommuncatons Research Insttute Daejeon,

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello * Internatonal Journal of Computatonal Scence 992-6669 (Prnt) 992-6677 (Onlne) Global Informaton Publsher 27, Vol., No., 27-39 A neuro-fuzzy collaboratve flterng approach for Web recommendaton G. Castellano,

More information

Adaptive Intrusion Detection based on Boosting and Naïve Bayesian Classifier

Adaptive Intrusion Detection based on Boosting and Naïve Bayesian Classifier Adaptve Intruson Detecton based on Boostng and Naïve Bayesan Classfer Dewan Md. Fard Department of CSE Jahangrnagar Unversty Dhaka-1342, Bangladesh Mohammad Zahdur Rahman Department of CSE Jahangrnagar

More information

Enterprise Master Patient Index

Enterprise Master Patient Index Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an

More information

WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce

WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce WSE-ntegrator: An Automatc ntegrator of Web Search nterfaces for E-Commerce Ha He, Wey Meng Dept. of Computer Scence SUNY at Bnghamton Bnghamton, NY 13902 {hahe,meng}@cs.bnghamton.edu Clement Yu Dept.

More information

Design and Development of a Security Evaluation Platform Based on International Standards

Design and Development of a Security Evaluation Platform Based on International Standards Internatonal Journal of Informatcs Socety, VOL.5, NO.2 (203) 7-80 7 Desgn and Development of a Securty Evaluaton Platform Based on Internatonal Standards Yuj Takahash and Yoshm Teshgawara Graduate School

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688, dskim@ssu.ac.kr

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688, dskim@ssu.ac.kr Proceedngs of the 41st Internatonal Conference on Computers & Industral Engneerng BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK Yeong-bn Mn 1, Yongwoo Shn 2, Km Jeehong 1, Dongsoo

More information