How To Train A Spam Classifier

Size: px
Start display at page:

Download "How To Train A Spam Classifier"

Transcription

1 Iteratioal Joural o Artificial Itelligece Tools Vol. XX, No. X (2006) 20 World Scietific Publishig Compay WORDS VS. CHARACTER N-GRAMS FOR ANTI-SPAM FILTERING IOANNIS KANARIS, KONSTANTINOS KANARIS, IOANNIS HOUVARDAS, ad EFSTATHIOS STAMATATOS Dept. of Iformatio ad Commuicatio Systems Eg., Uiversity of the Aegea, Karlovassi, Samos , Greece [email protected] Received (Day Moth Year) Revised (Day Moth Year) Accepted (Day Moth Year) The icreasig umber of usolicited messages (spam) reveals the eed for the developmet of reliable ati-spam filters. The vast majority of cotet-based techiques rely o word-based represetatio of messages. Such approaches require reliable tokeizers for detectig the toke boudaries. As a cosequece, a commo practice of spammers is to attempt to cofuse tokeizers usig uexpected puctuatio marks or special characters withi the message. I this paper we explore a alterative low-level represetatio based o character -grams which avoids the use of tokeizers ad other laguage-depedet tools. Based o experimets o two well-kow bechmark corpora ad a variety of evaluatio measures, we show that character -grams are more reliable features tha word-tokes despite the fact that they icrease the dimesioality of the problem. Moreover, we propose a method for extractig variable-legth -grams which produces optimal classifiers amog the examied models uder cost-sesitive evaluatio. Keywords: ati-spam filterig, machie learig, -grams.. Itroductio Nowadays, is oe of the cheapest ad fastest available meas of commuicatio. However, a major problem of ay iteret user is the icreasig umber of usolicited commercial , or spam. Spam messages waste both valuable time of the users ad importat badwidth of iteret coectios. Moreover, they are usually associated with aoyig material (e.g., porographic site advertisemets) or the distributio of computer viruses. Hece, there is a icreasig eed for effective ati-spam filters that either automate the detectio ad removal of spam messages or iform the user of potetial spam messages. Early spam filters were based o blacklists of kow spammers ad hadcrafted rules for detectig typical spam phrases (e.g., free pics ). The developmet of such filters is a time-cosumig procedure. Moreover, they ca easily be fooled by usig forged addresses or variatios of kow phrases that is still readable for a huma (e.g., f*r*e*e.). Hece, ew rules have to be icorporated cotiuously to maitai the effectiveess of the filter.

2 2 I. Kaaris, K. Kaaris, I. Houvardas, ad E. Stamatatos Recet advaces i applyig machie learig techiques to text categorizatio ispired researchers to develop cotet-based spam filters. I more detail, a collectio of both kow spam ad legitimate (o-spam or ham) messages is used by a supervised learig algorithm (e.g., decisio trees, support vector machies, etc.) to develop a model for automatically classifyig ew, usee messages to oe of these two categories. That way, it is easy to develop persoalized filters suitable for either a specific user or a mailig list moderator. Spam detectio is ot a typical text categorizatio task sice it has some itriguig characteristics. I particular, both spam ad legitimate messages ca cover a variety of topics ad geres. I other words, both classes are ot homogeeous. Moreover, the legth of messages varies from a couple of text lies to dozes of text lies. I additio, the message may cotai grammatical errors ad strage abbreviatios (sometimes itetioally used by spammers i order to fool ati-spam filters). Therefore, the learig model should be robust i such coditios. Furthermore, besides the cotet of the body of the messages, useful iformatio ca be foud i address of the seder, attachmets etc. Such additioal o-textual iformatio ca cosiderably assist the effectiveess of spam filters. 2 Last, but ot least, spam detectio is a cost sesitive procedure. I the case of a fully-automated ati-spam filter, the cost of characterizig a legitimate message as spam is much higher tha lettig a few spam messages pass. This fact of crucial importace should be cosidered i evaluatig spam detectio approaches. All supervised learig algorithms require a suitable represetatio of the messages, usually i the form of a attribute vector. So far, the vast majority of machie learig approaches to spam detectio use the bag of words represetatio, that is, each message is cosidered as a set of words that occur a certai umber of times. 2, 3, 4, 5 Puttig it aother way, the cotext iformatio for a word is ot take ito accout. The wordbased text represetatios require laguage-depedet tools, such as a tokeizer (to split the message ito tokes) ad usually a lemmatizer plus a list of stop words (to reduce the dimesioality of the problem). A commo practice of spammers is to attempt to cofuse tokeizers, usig structures such as f.r.e.e., f-r-e-e, f r e e, etc. Moreover, there is o effective lemmatizers available for ay atural laguage, especially for morphologically rich laguages. O the other had, word -grams, i.e., cotiguous sequeces of words, have also bee examied. 6 Such approaches attempt to take advatage of cotextual phrasal iformatio (e.g., buy ow ), that distiguish spam from legitimate messages. However, word -grams cosiderably icrease the dimesioality of the problem ad the results so far are ot ecouragig. I this paper, we focus o a differet but simple text represetatio. I particular, each message is cosidered as a bag of character -grams, that is, strigs of legth. For example, the character 4-grams of the begiig of this paragraph would be a : I_t, _th, _thi, this, his_, is_p, s_pa, _pap, pape, aper, etc. Character -grams are a We use ad _ to deote -gram boudaries ad a sigle space character, respectively.

3 Words vs. Character N-grams for Ati-spam Filterig 3 able to capture iformatio o various levels: lexical ( the_, free ), word-class ( ed_, ig_ ), puctuatio mark usage (!!!, f.r. ), etc. I additio, they are robust to grammatical errors (e.g., the word-tokes assigmet ad asigmet share the majority of character -grams) ad strage usage of abbreviatios, puctuatio marks etc.. The bag of character -grams represetatio is laguage-idepedet ad does ot require ay text preprocessig (tokeizer, lemmatizer, or other deep NLP tools). It has already bee used i several tasks icludig laguage idetificatio, 7 authorship attributio, 8 ad topic-based text categorizatio 9 with remarkable results i compariso to word-based represetatios. A importat characteristic of the character-level -grams is that they avoid (at least to a great extet) the problem of sparse data that arises whe usig word-level -grams. That is, there is much less character combiatios tha word combiatios, therefore, less -grams will have zero frequecy. O the other had, the proposed represetatio still produces a cosiderably larger feature set i compariso with traditioal bag of words represetatios. Therefore, learig algorithms able to deal with high dimesioal spaces should be used. Support Vector Machies (SVM) is a supervised learig algorithm based o the structural risk miimizatio priciple. 0 Oe of the most remarkable properties of SVMs is that their learig ability is idepedet of the feature space dimesioality, because they measure the complexity of the hypotheses based o the margi with which they separate the data, istead of the features. The applicatio of SVMs to text categorizatio tasks has show the effectiveess of this approach whe dealig with high dimesioal data ad sparse data. I this paper, we compare character -gram represetatios with traditioal wordbased represetatios i the framework of cotet-based ati-spam filterig. No extra iformatio comig from, address of the seder, attachmets etc. is take ito accout. Experimets o two publicly available corpora usig a variety of cost-sesitive evaluatio measures provide strog evidece that character -gram represetatios produce more effective models i compariso with the word-based represetatios. Moreover, we propose a method for extractig variable-legth character -grams ad show that this represetatio produces optimal classifiers amog the examied models whe cosiderig a cost-sesitive evaluatio. The rest of this paper is orgaized as follows: Sectio 2 icludes related work o cotet-based ati-spam filterig. Sectio 3 describes the character -gram represetatio while Sectio 4 comprises the method for variable-legth character -gram selectio. Sectio 5 gives a overview of the evaluatio measures we used ad Sectio 6 describes the corpora ad the performed experimets. Fially, sectio 7 summarizes the coclusios draw from this study ad idicates future work directios. 2. Related Work Probably the first study employig machie learig methods for ati-spam filterig was published i late 990s. 2 A Bayesia classifier was traied o maually categorized legitimate ad spam messages ad its performace o usee cases was remarkable.

4 4 I. Kaaris, K. Kaaris, I. Houvardas, ad E. Stamatatos Sice the, several machie learig algorithms have bee tested o this task, icludig boostig decisio trees ad support vector machies, 5 memory-based algorithms, 4 ad esembles of classifiers based o stackig. 2 O the other had, a umber of text represetatios have bee proposed dealig maily with word tokes ad ispired from iformatio retrieval research. Oe commo method is to use biary attributes correspodig to word occurrece. 2, 4 Alterative methods iclude word (term) frequecies, 6 tf-idf, 5 ad word-positio-based attributes. 3 The dimesioality of the resultig attribute vectors is usually reduced by removig attributes that correspod to words occurrig oly a few times. Recet work 3 has showed that the removal of the most frequet words (like ad, to etc.) cosiderably improves the classificatio accuracy. Aother commo practice is to use a lemmatizer 3 for covertig each word-type to its lemma (e.g., copies becomes copy ). Naturally, the performace of the lemmatizer affects the accuracy of the filter ad makes the method laguage-depedet. Fially, the dimesioality of the attribute vector ca be further reduced by applyig a feature selectio method 4 that raks the attributes accordig to their sigificace i distiguishig amog the two classes. Oly a predefied umber of top raked attributes are, the, used i the learig model. I additio, word -grams have also bee proposed 6, 3 but, so far, the results are ot ecouragig. Although such a represetatio captures phrasal iformatio, sometimes particularly crucial, the dimesioality of the problem icreases sigificatly. Moreover, the sparse data problem arises sice there are may word combiatios with low frequecy of occurrece. Recetly, laguage modelig techiques have also tested for 5, 6 ati-spam filterig with promisig resuts. I, 2005, the Text REtrieval Coferece (TREC) has expaded its tracks with the additio of the spam track aimig at providig a stadard evaluatio of ati-spam filterig approaches. To this ed, a collectio of evaluatio corpora, 7 both public ad private, was compiled ad a methodology for filter evaluatio was developed. 8 Notably, TREC spam track focused o the ability of the filter to evolve ad improve its performace with use. A few recet studies attempt to utilize a character-level represetatio of messages. I Ref. (9) a suffix-tree approach is described which outperforms a traditioal Bayesia classifier that is based o a bag of words represetatio. O the other had, a represetatio based o the combiatio of character 2-grams ad 3-grams is proposed i Ref. (20). However, prelimiary results i a categorizatio task (where may message classes are available) show that approaches based o word-based represetatios perform slightly better. Fially, oe of the best performig participat systems i the TREC 2005 spam track was based o compressio models workig o the character level. 2 Research i spam detectio was cosiderably assisted by publicly available bechmark corpora, so that differet approaches to be evaluated o the same testig groud. A importat issue is that legitimate messages usually cotai persoal iformatio of the users which should ot become publicly available. Oe solutio is to

5 Words vs. Character N-grams for Ati-spam Filterig 5 collect legitimate messages from mailig lists, (e.g., Lig-Spam b ) or directly by users willig to doate them (e.g., SpamAssasi c ). Aother solutio is to attempt to obscure iformatio about seders ad receivers, 5 or ecode the words of the body of the messages so that to become ureadable. 6 Recetly, the publicly-available Ero corpus has bee used as a source of legitimate messages The Bag-of-Character N-grams Approach First, for a give, we extract the L most frequet character -grams of the traiig corpus. Let <g, g 2,, g L > be the ordered list (i decreasig frequecy) of the most frequet -grams of the traiig corpus. The, each message is represeted as a vector of legth L <x, x 2,, x L >, where x i depeds o g i. I more detail, we examie two represetatios: Biary: The value of x i may be (if g i is icluded at least oce i the message) or 0 (if g i is ot icluded i the message). Term Frequecy (TF): The value of x i correspods to the frequecy of occurrece (ormalized by the message legth) of g i i the message. The produced vectors ca be arbitrarily log. O oe had, if L is chose too short, the messages are ot represeted adequately. O the other had, if L is chose too log the dimesioality of the problem icreases sigificatly. I the experimets described i the followig sectios, all the character -grams that appear more tha 3 times i the traiig corpus are take ito accout. A feature selectio method ca the be applied to the resultig vectors, so that oly the most sigificat attributes cotribute to the classificatio model. A feature selectio method that proved to be quite effective for text categorizatio tasks is iformatio gai. 4 The iformatio gai of a feature x i is defied as a expected reductio i etropy by takig x i as give: IG(C, x i ) = H(C) H(C x i ) () where C deotes the class of the message (C {spam, legitimate}) ad H(C) is the etropy of C. I other words, IG(C, x i ) is the iformatio gaied by kowig x i. Iformatio gai helps us to sort the features accordig to their sigificace i distiguishig betwee spam ad legitimate messages. Oly the first m most sigificat attributes are, the, take ito accout. The produced vectors (of legth m) of the traiig set are used to trai a SVM classifier. The Weka 23 implemetatio of SVM was used (default parameters were set i all reported experimets). 4. Variable-legth N-gram Selectio b Available at: c Available at:

6 6 I. Kaaris, K. Kaaris, I. Houvardas, ad E. Stamatatos The approach described above is able to provide fixed-legth character -grams. I this paper we also propose a method for extractig variable-legth character -gram features based o a existig approach for extractig multiword terms (i.e., word -grams of variable legth) from texts. The origial approach aimed at iformatio retrieval applicatios. 24 I this study, we slightly modified this approach i order to apply it to character -grams. The mai idea is to compare each -gram with similar -grams (either immediately loger or shorter) ad keep the domiat -grams. Therefore, we eed a fuctio able to express the glue that sticks the characters together withi a -gram. For example, the glue of the -gram the_ will be higher tha the glue of the -gram thea. 4.. Domiat N-gram Extractio To extract the domiat character -grams i a corpus we modified the algorithm LocalMaxs itroduced i Ref. (24). It is a algorithm that computes local maxima comparig each -gram with similar -grams. Give that: g(c) is the glue of -gram C, that is the power holdig its characters together. at(c) is a atecedet of a -gram C, that is a shorter strig of size -. succ(c) is a successsor of C, that is, a loger strig of size +, i.e., havig oe extra character either o the left or right side of C. The, the domiat -grams are selected accordig to the followig rules: if g if g ( C. legth > 3) ( C) g( at( C) ) g( C) > g( succ( C) ), at( C), succ( C) ( C. legth = 3) ( C) > g( succ( C) ), succ( C) (2) I this study, we oly cosider 3-grams, 4-grams, ad 5-grams as cadidate -grams. Note that 3-grams are oly compared with successor -grams. Moreover, 5-grams are oly compared with atecedet -grams. So, it is expected that the proposed algorithm will favor 3-grams ad 5-grams agaist 4-grams Represetig the Glue To measure the glue holdig the characters of a -gram together various measures have bee proposed, icludig specific mutual iformatio 25, the φ 2 measure, 26 etc. I this study, we adopt the Symmetrical Coditioal Probability (SCP) proposed i Ref. (27). The SCP of a bigram xy is the product of the coditioal probabilities of each give the other: SCP 2 p( x, y) p( x, y), = = (3) p( x) p p ( x y) = p( x y) p( y x) ( y) p( x, y) () x p() y

7 Words vs. Character N-grams for Ati-spam Filterig 7 Give a character -gram c c, a dispersio poit defies two subparts of the - gram. A -gram of legth cotais - possible dispersio poits (e.g., if * deote a dispersio poit, the the 3-gram the has two dispersio poits: t*he ad th*e ). The, the SCP of the -gram c c give the dispersio poit c c - * c is: SCP (( c c ) c ), 2 p( c Kc ) ( c Kc ) p( c ) K = (4) p The SCP measure ca be easily exteded so that to accout for ay possible dispersio poit (sice this measure is based o fair dispersio poit ormalizatio, will be called fairscp). Hece the fairscp of the -gram c c is as follows: fairscp ( c K c ) = i= i= p p ( c Kc ) ( c... c ) p( c... c ) i 2 i+ (5) 5. Evaluatio Measures 5.. Total Cost Ratio Two well kow measures from iformatio retrieval commuity, recall ad precisio, ca describe i detail the effectiveess of a spam detectio approach. I more detail, give that S S is the amout of spam messages correctly recogized, S L is the amout of spam messages icorrectly categorized as legitimate, ad L S is the amout of legitimate messages icorrectly classified as spam, the, spam recall ad spam precisio ca be defied as follows: Spam Recall = S S S S + S L (6) Spam Precisio = S S S S + L S (7) I ituitive terms, spam recall is a idicatio of filter effectiveess (the higher the recall, the less spam messages pass) while spam precisio is a idicatio of filter safety (the higher the precisio, the less legitimate messages blocked). However, spam detectio is a cost sesitive classificatio task. So, it is much worse to misclassify a legitimate message as spam tha vice versa. Therefore, we eed a evaluatio measure that icorporates a idicatio of this cost. A cost factor λ is assiged to each legitimate message, that is, each legitimate message is cosidered as λ messages. 3,4 I other words, if a legitimate message is misclassified, λ errors occur. A cost-sesitive evaluatio measure, the Total Cost Ratio (TCR) ca, the, be defied 3, 4 as follows:

8 8 I. Kaaris, K. Kaaris, I. Houvardas, ad E. Stamatatos = λ S S S L TCR (8) L S + + The higher the TCR, the better the performace of the approach. I additio, if TCR is lower tha, the the filter should ot be used (the cost of blockig legitimate messages is too high). To be i accordace with previous studies, three cost scearios were examied: Low cost sceario (λ=): This correspods to a ati-spam filter that lets a message classified as spam to reach the mailbox of the receiver alog with a warig that the message is probably spam. Medium cost sceario (λ=9): This correspods to a ati-spam filter that blocks a message classified as spam ad the seder is iformed to resed the message. High cost sceario (λ=999): This correspods to a fully-automated filter that deletes a message classified as spam without otifyig either the receiver or the seder ROC Graphs Receiver Operatig Characteristics (ROC) graphs, origiated from sigal detectio theory, 28 are a useful tool for visualizig the performace of classifiers. Sice they offer a reliable represetatio of the classifier properties uder imbalaced class distributio ad uequal classificatio error costs, they are especially popular i the machie learig commuity. 29 A ROC graph depicts relative trade-offs betwee beefits ad costs. I more detail, give that false positive (fp) rate ad true positive (tp) rate are defied as follows: fp rate = L L L S + S L L S (9) tp rate = S S S S + S L (0) the performace of a classifier is depicted i a two-dimesioal space i which the false positive rate is plotted o the x axis ad the true positive rate is plotted o the y axis. Alteratively, the two axes may correspod to ham misclassificatio (=fp rate) ad spam misclassificatio (=-tp rate), respectively. 8 Give that the classifier ca assig a probability or score to a istace, a curve is plotted i the ROC space by varyig the threshold used to produce biary classificatio results. A importat property of the ROC graphs is that they are isesitive to chages i class distributio. 29 That is, if the proportio of spam to ham messages chages, the ROC curve of a classifier remais the same. Note that recall-precessio graphs are affected drastically i such chages. For spam filterig applicatios, this is a crucial factor sice the amout of spam messages that reaches a specific mailbox cotiuously chages. Aother iterestig property of ROC graphs is that they are able to produce a sigle value

9 Words vs. Character N-grams for Ati-spam Filterig 9 that represets the expected performace. The most commo method is to calculate the area uder the ROC curve (AUC). The AUC of a classifier correspods to the probability that the classifier will rak a radomly chose positive istace higher tha a radomly chose egative istace. Fially, the operatig coditios of a classifier (i.e., differet classificatio error costs) may be traslated ito iso-performace lies i the ROC space, that is, lies with the same slope. 29 Let λ be the cost of misclassifyig a ham message. The slope of the isoperformace lies is defied as: slope + L L L S = λ () S S + S L Lies closer to the upper left corer of the ROC space correspod to better classifiers. Give the ROC curves of a set of classifiers, a classifier may be optimal if ad oly if it lies o the ROC covex hull (ROCCH). If the operatig coditios of the classifier chage, the ROCCH remais the same but a differet portio of the ROCCH should be examied for idetifyig the optimal classifier. 6. Experimets 6.. Corpora ad Settigs I this study we are based o two widely-used corpora to evaluate the usefuless of the character -grams for spam filterig. The first corpus is Lig-Spam cosistig of 2,893 s, 48 spam messages ad 2,42 legitimate messages take from postigs of a mailig list about liguistics. This corpus has a relatively low spam rate (6%) ad the legitimate messages are ot as heterogeeous as the messages foud i the persoal ibox of a specific user. However, it has already bee widely used i previous studies 3, 4, 9 ad compariso of our results with word-based methods is feasible. Moreover, it provides evidece about the effectiveess of our approach as assistace to mailig list moderators. The bare versio of this corpus was used (o lemmatizig or stop-word removal was performed) so that to be able to extract accurate character -gram frequecies. Ufortuately, this corpus was already coverted to lower case, so it was ot possible to explore the sigificace of upper case characters. The secod corpus is a part of the publicly-available SpamAssasi corpus. The legitimate messages of this corpus were collected from public forums as well as from direct doatio of specific users. I more detail, the corpus we used cosists of 2,000 ham ad,500 spam messages. This corpus has bee preprocessed i order to remove attachmets, headers, ad html-tags. Sice it has bee costructed by may idividual users, the legitimate messages are expected to be more heterogeeous tha the messages foud i the persoal mailbox of a sigle user. Moreover, sice the messages are i their origial form, it is possible to explore whether or ot case sesitive character -grams perform better tha case-isesitive character -grams.

10 0 I. Kaaris, K. Kaaris, I. Houvardas, ad E. Stamatatos I all the experimets described below, a te-fold cross-validatio procedure was followed. That is, the etire corpus was divided ito te equal parts, i each fold a differet part is used as test set ad the remaiig parts as traiig set. Fial results come from averagig the results of each fold. Fially, i all cases, a SVM classifier was used as learig algorithm with default values (liear kerel, C=). Spam Recall (%) Spam Recall (%) Spam Precisio (%) Spam Precisio (%) grams 4-grams 3-grams Fig.. Spam recall ad spam precisio of the proposed approach based o character 3-grams, 4-grams, ad 5-grams ad varyig umber of attributes o the Lig-Spam corpus. Top: biary attributes. Bottom: TF attributes Results o Lig-Spam Three sets of experimets were performed based o character 3-gram, 4-gram, ad 5- gram represetatios, respectively. I all three cases, both biary ad TF attributes were examied. Moreover, differet values of the m attributes left after the feature selectio procedure were tested (m starts from 250 ad the varies from 500 to 4000 by 500). For evaluatig the performace of the classifiers we use the recall, precisio, ad TCR measures i order to be able to compare it with reported results of previous word-based studies o the same corpus.

11 Words vs. Character N-grams for Ati-spam Filterig TCR TCR TCR grams grams 3-grams Fig. 2. Results of cost-sesitive evaluatio o the Lig-Spam corpus. TCR values for λ=(top), λ=9 (middle), ad λ=999 (bottom) ad varyig umber of attributes ad -gram legth o the Lig-Spam corpus. Left colum: biary attributes. Right colum: TF attributes.

12 2 I. Kaaris, K. Kaaris, I. Houvardas, ad E. Stamatatos Table. Compariso of cost-sesitive evaluatio (λ=, 9, ad 999) of the proposed approach with previously published results o Lig-Spam. Best reported results for spam recall, spam precisio, ad TCR are give. ST results refer to a sub-corpus of Lig-Spam. Approach λ Recall Precisio TCR NB % 99.02% 5.4 MBL % 97.39% 7.8 SG % 98.70% 8.60 ST % 00% Proposed 3, % 99.60% NB % 99.45% 3.82 MBL % 98.79% 3.64 SG % 98.80% 4.08 ST % 98.89% 9.0 Proposed 9 3, % 99.60% 9.76 NB % 00% 2.86 MBL % 00% 2.49 ST % 00% Proposed 999 3, % 99.60% 0.25 The results of the applicatio of our approach to Lig-Spam are show i Fig.. As ca be see, for biary attributes, 4-grams seems to provide the more reliable represetatio (for m>2000). O the other had, for TF attributes there is o clear wier. More sigificatly, biary attributes seem to provide better spam precisio results while TF attributes are better i terms of spam recall. I most cases, spam recall was higher tha 97% while, at the same time, spam precisio was higher tha 98%. Moreover, a few thousads of features are required to get these results. This is i cotrast to previous word-based approaches that deal with a limited amout (a few hudreds) of attributes. The results of the cost-sesitive evaluatio are show i Fig. 2. I particular, TCR values for 3-grams, 4-grams, ad 5-grams are give for varyig umber of attributes. Results are give for both biary ad TF attributes as well as the three evaluatio scearios (λ=, 9, ad 999, respectively). As ca be see, i all three scearios, a represetatio based o character 4-grams with biary attributes provides the best results. This stads for a relatively high umber of attributes (m>2500). For λ=, ad λ=9 the TCR results are well above idicatig the effectiveess of the filter. O the other had, for λ=999, the TCR results are less tha idicatig that the filter should ot be used at all. However, it is difficult for this sceario to be used i practice. Table shows a compariso of the proposed approach with previously published results o the same corpus i terms of spam recall, spam precisio, ad TCR values. I more detail, best results achieved by three methods are reported: a Naïve Bayes (NB) classifier, 3 a Memory-Based Learer (MBL), 4 ad a Stacked Geeralizatio approach (SG) 2 usig word-based features ad a Suffix Tree (ST) 9 approach based o characterlevel iformatio. The umber of attributes that correspod to the best results of each method is also give. It should be oted that the results for the ST approach are referred

13 Words vs. Character N-grams for Ati-spam Filterig 3 to a sub-corpus of Lig-Spam with a proportio of spam to legitimate messages approximately equal to the etire Lig-Spam corpus (200 spam ad,000 legitimate messages). Moreover, o results were reported for the SG approach based o the high cost sceario. As cocers the TCR, the proposed approach is by far more effective tha wordbased approaches for the low ad medium cost scearios. This is due to the fact that it maages to achieve high spam recall while maitaiig spam precisio o equally-high level. ST is also quite competitive. This provides extra evidece that character-based represetatios are better able to capture the characteristics of spam messages. O the other had, the proposed approach failed to produce a TCR value greater tha for the high cost sceario. That is because the precisio failed to be 00%. It must be uderlied that previous studies 3, 4 show that TCR is ot stable for the high cost sceario ad it is commo for TCR to exceed oly for very specific settigs of the filter Results o SpamAssasi Usig the SpamAssasi corpus we are able to evaluate the character -grams approach based o case sesitive ad case isesitive datasets. Three sets of experimets were performed based o character 3-gram, 4-gram, ad 5-gram biary represetatios, respectively. All character -grams appearig more tha 3 times i the corpus costitute the iitial feature set. The, iformatio gai is applied to this iitial feature set to reduce the dimesioality. Differet values of the m attributes left after the feature selectio procedure were tested (m varies from 500 to 4,000 by 500) for each character -gram category. Additioally, traditioal word-based models were also costructed for comparative purposes. First, a bare bag of words approach ad, secod, a bag of words approach i combiatio with a lemmatizer ad stop words removal were tested usig the TMG toolbox 30. For evaluatig the performace of the produced classifiers we use the ROC graphs sice they offer a isight view ito the properties of the produced filters. Fig. 3 shows the performace of the case sesitive ad case isesitive character - gram-based classifiers i terms of the AUC. As ca be see, case sesitive character - grams outperform case isesitive character -grams. Moreover, the 3-gram model is the most effective for this corpus followed by 4-grams ad 5-grams. This cotrasts the case of Lig-Spam where 4-grams were foud to perform better. Recall that SpamAssasi legitimate messages are ot homogeeous i topic as Lig-Spam messages. Some log character -grams take from very commo words of the Lig-Spam corpus (e.g., laguage, liguistics, etc) ted to be the most importat for idetifyig ham messages. There is o such keywords i SpamAssasi corpus. Therefore, shorter character -grams prevail i this corpus. Fig. 4 depicts the performace of the case sesitive models i compariso to the word-based models. Agai, 3-grams are the most affective features. O the other had, word-based models outperform 4-grams ad 5-grams. Fially, the use of lemmatizer ad stop word removal improves the results for low dimesioal datasets (m<2,500) while the bare bag of words approach is better for higher dimesioality.

14 4 I. Kaaris, K. Kaaris, I. Houvardas, ad E. Stamatatos AUC CS 3-grams CS 4-grams CS 5-grams CI 3-grams CI 4-grams CI 5-grams Fig. 3. The performace of the case sesitive (CS) ad case isesitive (CI) character -gram models o the SpamAssasi corpus AUC BOW BOW+lem CS 3-grams CS 4-grams CS 5-grams Fig. 4. The performace of the word-based models (simple bag-of-words ad bag-of-words with lemmatizer ad stop word removal) ad the case sesitive (CS) character -grams models o the SpamAssasi corpus.

15 Words vs. Character N-grams for Ati-spam Filterig AUC BOW BOW+lem CS 3-grams VL -grams Fig. 5. Performace of the word-based approaches (simple bag-of-words ad bag-of-words with lemmatizer ad stop word removal), the case sesitive (CS) 3-gram model ad the variable-legth (VL) -gram model o the SpamAssasi corpus Results with Variable-legth N-grams So far, all the experimets were based o fixed-legth character -grams. I order to test the approach proposed i Sectio 4 for extractig variable-legth -grams we performed the followig experimet. A iitial large feature set cosistig of case sesitive character -grams of variable legth is extracted from the traiig corpus. This feature set icludes the L most frequet -grams for certai values of. That is, for L=,000, the,000 most frequet 3-grams, the,000 most frequet 4-grams, ad the,000 most frequet 5- grams compose the iitial feature set. The proposed variable-legth feature selectio method is applied to this iitial feature set. The resultig feature set is used to trai a SVM classifier. This procedure was followed for L ragig from,000 to 5,000 with a step of 500. Fig. 5 shows the performace (i terms of AUC) of the variable-legth character -grams i compariso with the case sesitive 3-grams ad the word-based approaches. Note that the variable-legth model is ot able to produce a predefied umber of attributes. However, by varyig the value of L, it is possible to get roughly as may attributes as the other models produce. The variable-legth -gram approach outperforms the word-based

16 6 I. Kaaris, K. Kaaris, I. Houvardas, ad E. Stamatatos tp rate BOW+lem BOW CS 3-grams VL -grams ROCCH fp rate Fig. 6. ROC graphs for the word-based models, the case sesitive 3-gram model, ad the variable-legth - gram model. The ROC covex hull idicatig optimal performace amog the examied models is also depicted. approaches ad it is quite competitive with the case sesitive 3-gram model. However, the latter is still the best performig model i most of the cases. Note that the AUC measure is a overall idicatio of the effectiveess of the classifier. A closer look at the ROC graphs reveals that the variable-legth -gram model is optimal whe cosiderig a cost-sesitive evaluatio. I particular, Fig. 6 shows the ROC graphs for the case sesitive 3-gram ad the word-based models (m=5,000) as well as the variable-legth -gram model (L=4,500 producig 5,277 selected attributes). The ROCCH is also depicted. As ca be see, the 3-gram model ad the variable-legth - gram model domiate the ROCCH. The former is better especially for both very low ad very high (ot show i the figure) levels of fp rate while the latter is better for the middle fp rate level. O the other had, word-based models are outperformed. Recall from sectio 5.2 that accordig to the operatig coditios of the classifier, a specific part of the ROCCH correspods to the optimal classifier. I the framework of a cost-sesitive evaluatio, the optimal classifier is idetified based o the iso-performace lies (see Eq. ). Fig. 7 shows the best iso-performace lies for differet costs of misclassifyig a legitimate message as spam: λ=, 9, ad 999. I all three cases, the part of ROCCH take up by the variable-legth -gram model correspods to the optimal classifier. Therefore, despite the fact that the 3-gram model is more effective i geeral, as idicated by the AUC measure, the variable-legth -gram model is optimal i a cost sesitive evaluatio.

17 Words vs. Character N-grams for Ati-spam Filterig 7 Note that the attributes of the case sesitive 3-gram model of the previous experimet were selected usig a 22,673 iitial feature set (i.e., all character 3-grams that appear more tha 3 times i the corpus). O the other had, the variable-legth -gram attributes were selected usig a 3,500 iitial feature set (sice L=4,500). Fig. 8 shows the compositio of this variable-legth -gram model produced for L=4,500 (resultig 5,277 attributes). The prevailig character -gram category is 3-grams, followed by 5-grams. However, the cotributio of the loger -grams seems to be of crucial importace for costructig more reliable cost sesitive filters. Iterestigly, the commo 3-grams of the variable-legth -gram set ad the case sesitive 3-gram set are oly 926. That is, oly 8% of the case sesitive 3-grams are icluded i the variable-legth -gram set. This fact meas that may 3-grams ot selected by iformatio gai are ow icluded i the feature set ad are helpful for producig more effective classifiers λ=999 λ=9 λ= 0.97 tp rate CS 3-grams VL -grams ROCCH fp rate Fig. 7. ROC graphs for the case sesitive 3-gram model ad the variable-legth -gram model as well as the ROC covex hull. Iso-performace lies idicatig the optimal classifier for differet cost values (λ=, 9, ad 999) of misclassifyig a legitimate message are also depicted. 7. Coclusios I this paper we preseted a compariso of words ad character -grams i the framework of cotet-based ati-spam filterig. A series of experimets usig two bechmark corpora ad a variety of cost-sesitive evaluatio measures provides strog evidece that character-level iformatio is better able to discrimiate betwee spam ad

18 8 I. Kaaris, K. Kaaris, I. Houvardas, ad E. Stamatatos grams 4-grams 5-grams Fig. 8. The compositio of the variable-legth -gram model for L=4,500 (5,277 extracted attributes). legitimate messages. The most importat property of the character -gram approach is that it avoids the use of tokeizers, lemmatizers ad other laguage-depedet tools. Those tools are quite vulerable for the spammers. O the other had, a model based o character-level iformatio ca capture uaces of the spammiess of a message ad, more importatly, it is ot easy for the spammers to fool it by icorporatig puctuatio marks or other special symbols withi a word. Oe corpus used i this study comprised homogeeous legitimate messages, sice they were take from a mailig list about liguistics. The legitimate messages of the other corpus are quite heterogeeous, more tha the messages foud i the mailbox of a sigle user, sice they cosist of messages doated by differet users. This differece was reflected i the results of the character -gram models o these two corpora. Whe cosiderig heterogeeous legitimate messages, short -grams (3-grams) produced the best models while loger -grams (4-grams) were optimal whe cosiderig homogeeous legitimate messages. This idicates the ability of the character-level approach to be adapted to the properties of a specific corpus or the mailbox of a specific user. I additio to character -grams of fixed legth, we also proposed a method for extractig variable-legth character -gram based o a existig approach, origially used for extractig multi-word terms for iformatio retrieval applicatios. Results of cost-sesitive evaluatio idicate that the variable-legth -gram model is more effective i ay of the three examied cost scearios (i.e., low, medium, or high cost). Although the majority of the variable-legth -grams cosists of 3-grams, there are oly a few commo members with the fixed-legth 3-gram set. Hece, the iformatio icluded i the variable legth -grams is quite differet i compariso to the iformatio represeted by case sesitive 3-grams.

19 Words vs. Character N-grams for Ati-spam Filterig 9 A iterestig future work directio will be to test the character -gram represetatio i the framework of o-lie evaluatio of ati-spam filterig. This implies that the set of sigificat character -grams should be evolved with use as ew messages arrive ad classified by the filter so as to better capture the properties of spam ad legitimate messages. Refereces. F. Sebastiai, Machie learig i automated text categorizatio, ACM Computig Surveys, 34() (2002) pp M. Sahami, S. Dumais, D. Heckerma, ad E. Horvitz, A Bayesia approach to filterig juk , I Proc. of AAAI Workshop o Learig for Text Categorizatio (998). 3. I. Adroutsopoulos, J. Koutsias, K.V. Chadrios, G. Paliouras, ad C.D. Spyropoulos, A evaluatio of aive Bayesia ati-spam filterig, I eds. G. Potamias, V. Moustakis, ad M. va Somere, Proc. of the Workshop o Machie Learig i the New Iformatio Age, th Europea Coferece o Machie Learig (2000) pp G. Sakkis, I. Adroutsopoulos, G. Paliouras, V. Karkaletsis, C.D. Spyropoulos, ad P. Stamatopoulos, A memory-based approach to ati-spam filterig for mailig lists, Iformatio Retrieval, 6() (2003) pp H. Drucker, D. Wu, ad V. Vapik, Support vector machies for spam categorizatio, IEEE Tras. Neural Networks, 0 (999) pp I. Adroutsopoulos, G. Paliouras, ad E. Michelakis, Learig to filter usolicited commercial , Techical report 2004/2, NCSR "Demokritos" (2004). 7. W. Cavar ad J. Trekle, N-gram-based text categorizatio. I Proc. 3rd It l Symposium o Documet Aalysis ad Iformatio Retrieval (994) pp V. Keselj, F. Peg, N. Cercoe, ad C. Thomas, N-gram-based author profiles for authorship attributio. I Proc. of the Coferece Pacific Assoc. Comp. Liguistics (2003). 9. H. Lodhi, C. Sauders, J. Shawe-Taylor, N. Cristiaii, ad C. Watkis, Text classificatio usig strig kerels, The Joural of Machie Learig Research, 2 (2002) pp V. Vapik, The Nature of Statistical Learig Theory, (Spriger, New York 995).. T. Joachims, Text categorizatio with support vector machies: learig with may relevat features, I Proc. of the Europea Coferece o Machie Learig (998). 2. G. Sakkis, I. Adroutsopoulos, G. Paliouras, V. Karkaletsis, C.D. Spyropoulos, ad P. Stamatopoulos, Stackig classifiers for ati-spam filterig of , I Proc. of 6 th Cof. Empirical Methods i Natural Laguage Processig (200) pp J. Hovold, Naive Bayes spam filterig usig word-positio-based attributes, I Proc. of the Secod Coferece o ad Ati-Spam (2005). 4. Y. Yag ad J.O. Peterse, A comparative study o feature selectio i text categorizatio, I Proc. of the 4 th It. Coferece o Machie Learig (997) pp B. Medlock, A adaptive, semi-structured laguage model approach to spam filterig o a ew corpus. I Proc. of the 3 rd Coferece o ad Ati-Spam (2006). 6. E. Terra, Simple laguage models for spam detectio, I Proc. of the 4 th Text Retrieval Coferece (2005). 7. G. Cormack ad T. Lyam, Spam corpus creatio for TREC, I Proc. of 2 d Coferece o ad Ati-Spam (2005). 8. G. Cormack ad T. Lyam, TREC 2005 spam track overview, I Proc. of the 4 th Text Retrieval Coferece (2005). 9. R. Pampapathi, B. Mirki, ad M. Levee, A suffix tree approach to text categorisatio applied to spam filterig,

20 20 I. Kaaris, K. Kaaris, I. Houvardas, ad E. Stamatatos 20. H. Berger, M. Koehle, D. Merkl, O the impact of documet represetatio o classifier performace i categorizatio, I Proc. of the 4 th Iteratioal Coferece o Iformatio Systems Techology ad its Applicatios (2005) pp A. Bratko ad B. Filipic, Spam filterig usig character-level Markov models: experimets for the TREC 2005 spam track, I Proc. of the 4 th Text Retrieval Coferece (2005). 22. V. Metsis, I. Adroutsopoulos, ad G. Paliouras, Spam filterig with aive Bayes - which aive Bayes?, I Proc. of the 3 rd Coferece o ad Ati-Spam (2006). 23. I.H. Witte ad E. Frak, Data Miig: Practical Machie Learig Tools with Java Implemetatios, (Morga Kaufma, Sa Fracisco 2000). 24. J. Silva ad G. Lopes, A local maxima method ad a fair dispersio ormalizatio for extractig multiword uits, I Proc. of the 6 th Meetig o the Mathematics of Laguage (999) pp K. Church ad K. Haks, Word associatio orms, mutual iformatio ad lexicography, Computatioal Liguistics, 6() (990) pp W. Gale ad K. Church, Cocordace for parallel texts, I Proc. of the 7 th Aual Coferece for the ew OED ad Text Research, (Oxford, 99) pp J. Silva, G. Dias, S. Guilloré, ad G. Lopes, Usig LocalMaxs algorithm for the extractio of cotiguous ad o-cotiguous multiword lexical uits, Lecture Notes o Artificial Itelligece, 695 (Spriger, 999) pp J. Ega, Sigal Detectio Theory ad ROC Aalysis, Series i Cogitio ad Perceptio, (New York: Academic Press, 975). 29. F. Provost ad T. Fawcett, Robust classificatio for imprecise eviromets, Machie Learig 42(3) (200) pp D. Zeimpekis ad E. Gallopoulos, Desig of a MATLAB toolbox for term-documet matrix geeratio, I Proc. Workshop o Clusterig High Dimesioal Data ad its Applicatios, (2005) pp

Spam Detection. A Bayesian approach to filtering spam

Spam Detection. A Bayesian approach to filtering spam Spam Detectio A Bayesia approach to filterig spam Kual Mehrotra Shailedra Watave Abstract The ever icreasig meace of spam is brigig dow productivity. More tha 70% of the email messages are spam, ad it

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

How to read A Mutual Fund shareholder report

How to read A Mutual Fund shareholder report Ivestor BulletI How to read A Mutual Fud shareholder report The SEC s Office of Ivestor Educatio ad Advocacy is issuig this Ivestor Bulleti to educate idividual ivestors about mutual fud shareholder reports.

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

Review: Classification Outline

Review: Classification Outline Data Miig CS 341, Sprig 2007 Decisio Trees Neural etworks Review: Lecture 6: Classificatio issues, regressio, bayesia classificatio Pretice Hall 2 Data Miig Core Techiques Classificatio Clusterig Associatio

More information

CREATIVE MARKETING PROJECT 2016

CREATIVE MARKETING PROJECT 2016 CREATIVE MARKETING PROJECT 2016 The Creative Marketig Project is a chapter project that develops i chapter members a aalytical ad creative approach to the marketig process, actively egages chapter members

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

(VCP-310) 1-800-418-6789

(VCP-310) 1-800-418-6789 Maual VMware Lesso 1: Uderstadig the VMware Product Lie I this lesso, you will first lear what virtualizatio is. Next, you ll explore the products offered by VMware that provide virtualizatio services.

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

Entropy of bi-capacities

Entropy of bi-capacities Etropy of bi-capacities Iva Kojadiovic LINA CNRS FRE 2729 Site école polytechique de l uiv. de Nates Rue Christia Pauc 44306 Nates, Frace [email protected] Jea-Luc Marichal Applied Mathematics

More information

The Forgotten Middle. research readiness results. Executive Summary

The Forgotten Middle. research readiness results. Executive Summary The Forgotte Middle Esurig that All Studets Are o Target for College ad Career Readiess before High School Executive Summary Today, college readiess also meas career readiess. While ot every high school

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

Research Article Sign Data Derivative Recovery

Research Article Sign Data Derivative Recovery Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov

More information

Subject CT5 Contingencies Core Technical Syllabus

Subject CT5 Contingencies Core Technical Syllabus Subject CT5 Cotigecies Core Techical Syllabus for the 2015 exams 1 Jue 2014 Aim The aim of the Cotigecies subject is to provide a groudig i the mathematical techiques which ca be used to model ad value

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

PUBLIC RELATIONS PROJECT 2016

PUBLIC RELATIONS PROJECT 2016 PUBLIC RELATIONS PROJECT 2016 The purpose of the Public Relatios Project is to provide a opportuity for the chapter members to demostrate the kowledge ad skills eeded i plaig, orgaizig, implemetig ad evaluatig

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich [email protected] [email protected] Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION

CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION www.arpapress.com/volumes/vol8issue2/ijrras_8_2_04.pdf CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION Elsayed A. E. Habib Departmet of Statistics ad Mathematics, Faculty of Commerce, Beha

More information

Domain 1 - Describe Cisco VoIP Implementations

Domain 1 - Describe Cisco VoIP Implementations Maual ONT (642-8) 1-800-418-6789 Domai 1 - Describe Cisco VoIP Implemetatios Advatages of VoIP Over Traditioal Switches Voice over IP etworks have may advatages over traditioal circuit switched voice etworks.

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Designing Incentives for Online Question and Answer Forums

Designing Incentives for Online Question and Answer Forums Desigig Icetives for Olie Questio ad Aswer Forums Shaili Jai School of Egieerig ad Applied Scieces Harvard Uiversity Cambridge, MA 0238 USA [email protected] Yilig Che School of Egieerig ad Applied

More information

Lesson 15 ANOVA (analysis of variance)

Lesson 15 ANOVA (analysis of variance) Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

ODBC. Getting Started With Sage Timberline Office ODBC

ODBC. Getting Started With Sage Timberline Office ODBC ODBC Gettig Started With Sage Timberlie Office ODBC NOTICE This documet ad the Sage Timberlie Office software may be used oly i accordace with the accompayig Sage Timberlie Office Ed User Licese Agreemet.

More information

ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC

ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC 8 th Iteratioal Coferece o DEVELOPMENT AND APPLICATION SYSTEMS S u c e a v a, R o m a i a, M a y 25 27, 2 6 ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC Vadim MUKHIN 1, Elea PAVLENKO 2 Natioal Techical

More information

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

CS100: Introduction to Computer Science

CS100: Introduction to Computer Science I-class Exercise: CS100: Itroductio to Computer Sciece What is a flip-flop? What are the properties of flip-flops? Draw a simple flip-flop circuit? Lecture 3: Data Storage -- Mass storage & represetig

More information

CS100: Introduction to Computer Science

CS100: Introduction to Computer Science Review: History of Computers CS100: Itroductio to Computer Sciece Maiframes Miicomputers Lecture 2: Data Storage -- Bits, their storage ad mai memory Persoal Computers & Workstatios Review: The Role of

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

Effective Hybrid Intrusion Detection System: A Layered Approach

Effective Hybrid Intrusion Detection System: A Layered Approach I. J. Computer Network ad Iformatio Security, 2015, 3, 35-41 Published Olie February 2015 i MECS (http://www.mecs-press.org/) DOI: 10.5815/ijcis.2015.03.05 Effective Hybrid Itrusio Detectio System: A Layered

More information

Is there employment discrimination against the disabled? Melanie K Jones i. University of Wales, Swansea

Is there employment discrimination against the disabled? Melanie K Jones i. University of Wales, Swansea Is there employmet discrimiatio agaist the disabled? Melaie K Joes i Uiversity of Wales, Swasea Abstract Whilst cotrollig for uobserved productivity differeces, the gap i employmet probabilities betwee

More information

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology Adoptio Date: 4 March 2004 Effective Date: 1 Jue 2004 Retroactive Applicatio: No Public Commet Period: Aug Nov 2002 INVESTMENT PERFORMANCE COUNCIL (IPC) Preface Guidace Statemet o Calculatio Methodology

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

The Importance of Media in the Classroom

The Importance of Media in the Classroom 01-TilestoVol09.qxd 8/25/03 3:47 PM Page 1 1 The Importace of Media i the Classroom As teachers, we have a wealth of iformatio from which to choose for our classrooms. We ca ow brig history ito the classroom

More information

Universal coding for classes of sources

Universal coding for classes of sources Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric

More information

Ranking Irregularities When Evaluating Alternatives by Using Some ELECTRE Methods

Ranking Irregularities When Evaluating Alternatives by Using Some ELECTRE Methods Please use the followig referece regardig this paper: Wag, X., ad E. Triataphyllou, Rakig Irregularities Whe Evaluatig Alteratives by Usig Some ELECTRE Methods, Omega, Vol. 36, No. 1, pp. 45-63, February

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

A Mathematical Perspective on Gambling

A Mathematical Perspective on Gambling A Mathematical Perspective o Gamblig Molly Maxwell Abstract. This paper presets some basic topics i probability ad statistics, icludig sample spaces, probabilistic evets, expectatios, the biomial ad ormal

More information

The Impact of Feature Selection on Web Spam Detection

The Impact of Feature Selection on Web Spam Detection I.J. Itelliget Systems ad Applicatios, 2012, 9, 61-67 Published Olie August 2012 i MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2012.09.08 The Impact of Feature Selectio o Web Spam Detectio Jaber

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

Groups of diverse problem solvers can outperform groups of high-ability problem solvers

Groups of diverse problem solvers can outperform groups of high-ability problem solvers Groups of diverse problem solvers ca outperform groups of high-ability problem solvers Lu Hog ad Scott E. Page Michiga Busiess School ad Complex Systems, Uiversity of Michiga, A Arbor, MI 48109-1234; ad

More information

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio

More information

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number. GCSE STATISTICS You should kow: 1) How to draw a frequecy diagram: e.g. NUMBER TALLY FREQUENCY 1 3 5 ) How to draw a bar chart, a pictogram, ad a pie chart. 3) How to use averages: a) Mea - add up all

More information

How To Solve The Homewor Problem Beautifully

How To Solve The Homewor Problem Beautifully Egieerig 33 eautiful Homewor et 3 of 7 Kuszmar roblem.5.5 large departmet store sells sport shirts i three sizes small, medium, ad large, three patters plaid, prit, ad stripe, ad two sleeve legths log

More information

Study on the application of the software phase-locked loop in tracking and filtering of pulse signal

Study on the application of the software phase-locked loop in tracking and filtering of pulse signal Advaced Sciece ad Techology Letters, pp.31-35 http://dx.doi.org/10.14257/astl.2014.78.06 Study o the applicatio of the software phase-locked loop i trackig ad filterig of pulse sigal Sog Wei Xia 1 (College

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad

More information

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms The Power of Free Brachig i a Geeral Model of Backtrackig ad Dyamic Programmig Algorithms SASHKA DAVIS IDA/Ceter for Computig Scieces Bowie, MD [email protected] RUSSELL IMPAGLIAZZO Dept. of Computer

More information

Extracting Similar and Opposite News Websites Based on Sentiment Analysis

Extracting Similar and Opposite News Websites Based on Sentiment Analysis 202 Iteratioal Coferece o Idustrial ad Itelliget Iformatio (ICIII 202) IPCSIT vol.3 (202) (202) IACSIT Press, Sigapore Extractig Similar ad Opposite ews Websites Based o Setimet Aalysis Jiawei Zhag, Yukiko

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean 1 Social Studies 201 October 13, 2004 Note: The examples i these otes may be differet tha used i class. However, the examples are similar ad the methods used are idetical to what was preseted i class.

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval Chapter 8 Tests of Statistical Hypotheses 8. Tests about Proportios HT - Iferece o Proportio Parameter: Populatio Proportio p (or π) (Percetage of people has o health isurace) x Statistic: Sample Proportio

More information

THE HEIGHT OF q-binary SEARCH TREES

THE HEIGHT OF q-binary SEARCH TREES THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average

More information

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows:

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows: Subettig Subettig is used to subdivide a sigle class of etwork i to multiple smaller etworks. Example: Your orgaizatio has a Class B IP address of 166.144.0.0 Before you implemet subettig, the Network

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics Chair for Network Architectures ad Services Istitute of Iformatics TU Müche Prof. Carle Network Security Chapter 2 Basics 2.4 Radom Number Geeratio for Cryptographic Protocols Motivatio It is crucial to

More information

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 [email protected] Abstract:

More information

Enhancing Oracle Business Intelligence with cubus EV How users of Oracle BI on Essbase cubes can benefit from cubus outperform EV Analytics (cubus EV)

Enhancing Oracle Business Intelligence with cubus EV How users of Oracle BI on Essbase cubes can benefit from cubus outperform EV Analytics (cubus EV) Ehacig Oracle Busiess Itelligece with cubus EV How users of Oracle BI o Essbase cubes ca beefit from cubus outperform EV Aalytics (cubus EV) CONTENT 01 cubus EV as a ehacemet to Oracle BI o Essbase 02

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find 1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.

More information

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships Biology 171L Eviromet ad Ecology Lab Lab : Descriptive Statistics, Presetig Data ad Graphig Relatioships Itroductio Log lists of data are ofte ot very useful for idetifyig geeral treds i the data or the

More information

Amendments to employer debt Regulations

Amendments to employer debt Regulations March 2008 Pesios Legal Alert Amedmets to employer debt Regulatios The Govermet has at last issued Regulatios which will amed the law as to employer debts uder s75 Pesios Act 1995. The amedig Regulatios

More information

France caters to innovative companies and offers the best research tax credit in Europe

France caters to innovative companies and offers the best research tax credit in Europe 1/5 The Frech Govermet has three objectives : > improve Frace s fiscal competitiveess > cosolidate R&D activities > make Frace a attractive coutry for iovatio Tax icetives have become a key elemet of public

More information

Basic Measurement Issues. Sampling Theory and Analog-to-Digital Conversion

Basic Measurement Issues. Sampling Theory and Analog-to-Digital Conversion Theory ad Aalog-to-Digital Coversio Itroductio/Defiitios Aalog-to-digital coversio Rate Frequecy Aalysis Basic Measuremet Issues Reliability the extet to which a measuremet procedure yields the same results

More information

Security Functions and Purposes of Network Devices and Technologies (SY0-301) 1-800-418-6789. Firewalls. Audiobooks

Security Functions and Purposes of Network Devices and Technologies (SY0-301) 1-800-418-6789. Firewalls. Audiobooks Maual Security+ Domai 1 Network Security Every etwork is uique, ad architecturally defied physically by its equipmet ad coectios, ad logically through the applicatios, services, ad idustries it serves.

More information

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2 Itroductio DAME - Microsoft Excel add-i for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,

More information

Installment Joint Life Insurance Actuarial Models with the Stochastic Interest Rate

Installment Joint Life Insurance Actuarial Models with the Stochastic Interest Rate Iteratioal Coferece o Maagemet Sciece ad Maagemet Iovatio (MSMI 4) Istallmet Joit Life Isurace ctuarial Models with the Stochastic Iterest Rate Nia-Nia JI a,*, Yue LI, Dog-Hui WNG College of Sciece, Harbi

More information

On the Periodicity of Time-series Network and Service Metrics

On the Periodicity of Time-series Network and Service Metrics O the Periodicity of Time-series Network ad Service Metrics Joseph T. Lizier ad Terry J. Dawso Telstra Research Laboratories Sydey, NSW, Australia {oseph.lizier, terry..dawso}@team.telstra.com Abstract

More information