Neurcmputig 7 (2013) 135-145 Ctets lists available at SciVerse ScieceDirect NEURCMPUTING Neurcmputig ELSEVIER jural hmepage: www.elsevier.cm/lcate/eucm Imprved cmpetitive learig eural etwrks fr etwrk itrusi ad fraud detecti Jh Zhg Lei a, Ali A. Ghrbai b, a Vesta Crprati, Prtlad, R 97229, USA b Faculty f Cmputer Sciece, Uiversity f New bruswick, Frederict, NB, Caada E3B 5A3 A R T I C L E I N F A B S T R A C T Available lie 22 Augus 2013 Keywrds: Cmpetitive learig Fraud detecti Itrusi detecti Supervised/usupervised clusterig Neural etwrk I this research, we prpse tw ew clusterig algrithms, the imprved cmpetitive learig etwrk (ICLN) ad the supervised imprved cmpetitive learig etwrk (SICLN), fr fraud detecti ad etwrk itrusi detecti. The ICLN is a usupervised clusterig algrithm, which applies ew rules t the stadard cmpetitive learig eural etwrk (SCLN). The etwrk eurs i the ICLN are traied t represet the ceter f the data by a ew reward-puishmet update rule. This ew update rule vercmes the istability f the SCLN. The SICLN is a supervised versi f the ICLN. I the SICLN, the ew supervised update rule uses the data labels t guide the traiig prcess t achieve a better clusterig result. The SICLN ca be applied t bth labeled ad ulabeled data ad is highly tlerat t missig r delay labels. Furthermre, the SICLN is capable t recstruct itself, thus is cmpletely idepedet frm the iitial umber f clusters. T assess the prpsed algrithms, we have perfrmed experimetal cmpariss bth research data ad real-wrld data i fraud detecti ad etwrk itrusi detecti. The results demstrate that bth the ICLN ad the SICLN achieve high perfrmace, ad the SICLN utperfrms traditial usupervised clusterig algrithms. 2013 Elsevier B.V. All rights reserved. 1. Itrducti Fraud detectis ad etwrk itrusi detectis are extremely critical t e-cmmerce busiess. Accrdig t U.S. cesus bureau retail e-cmmerce sales reprts, e-cmmerce i Nrth America has ctiued t grw 20% r mre each year. Hwever, fraud csts e-cmmerce cmpaies i U.S. ad Caada a verwhelmig lst each year. With the recet grwth f the e-cmmerce, credit card fraud has becme mre prevalet. Based the survey results i 2009, average, 1.6% f rders prved t be fraudulet, which is abut $3.3 billis. I additi t the direct lsses made thrugh fraudulet sales, fraud victims' trust i bth the credit card ad the retail cmpay decreases, which further icreases the lss. It is the itet f the cmpaies ad the credit card issuers t detect r prevet fraud as s as pssible. Netwrk itrusis, the ther had, attack e-cmmerce cmpaies frm their back. Ay dw time f Web servers r leaks f busiess r custmer ifrmati may cst huge lss. Bth the credit card fraud-detecti ad etwrk itrusi detecti dmais preset the fllwig challeges t data miig: There are millis f trasactis each day. Miig such massive amut f data requires highly efficiet techiques. Crrespdig authr. E-mail addresses: jh.lei@trustvesta.cm (J.Z. Lei), ghrbai@ub.ca (A.A. Ghrbai). The data are highly skewed. There are may mre gd evets tha bad evets. Typical accuracy-based miig techiques may geerate highly accurate detectrs by simply predictig all trasactis legitimate but these detectrs cat detect fraud at all. Data labels are t immediately available. Frauds r itrusis usually are aware after they have already happeed. It is hard t track users' behavirs. All types f users (gd users, busiess, ad fraudsters) chage their behavirs quite fte. Fidig ew r chagig patters is as imprtat as recgizig ld patters. I this research we prpse tw clusterig algrithms fr fraud detecti ad etwrk itrusi detecti: the imprved cmpetitive learig etwrk (ICLN) [20] ad the supervised imprved cmpetitive learig etwrk (SICLN). The ICLN is a usupervised clusterig algrithm develped frm the stadard cmpetitive learig etwrk (SCLN) [15]. The SICLN is a supervised clusterig algrithm derived frm the ICLN. ur gal is t develp advace machie learig techiques t slve the practical challeges i etwrk itrusi detectis ad fraud detectis. Fig. 1 is a example f a fraudulet evet. If credit card ifrmati f a card hlder is stle ad uses fr lie shppig, it will take a few days fr this trasacti t appear the credit card statemet, ad take a few mre days r a few mths fr the real card hlder t kw ad reprt t the bak. It will take a few ther days fr the bak t set a tice t the retail cmpay. Usually
136 Fig. 1. Fraud reprt prcedure. wuld have bee happeed ad the patters culd have bee chaged at the time they are fud. The ability t lear frm ulabeled data ad deal with abrmal data makes clusterig a gd cadidate fr etwrk itrusi detecti ad fraud detecti. Hwever, the ther had, a clusterig algrithm may t prduce desirable clusters withut additial ifrmati. Fig. 2 is a example. Clusterig result illustrated i Fig. 2(a) is perfect i terms f usupervised learig. The data pits are gruped it clusters based their atural similarities. Hwever, the actual desirable clusterig result is i Fig. 2(b) if we kw the data labels. Guided by all r a prti f data labels, a clusterig algrithm culd achieve this desirable result. Based the ptetial t cmbie the stregth f classificati ad clusterig, supervised clusterig techique is therefre applied t ur research. 2. Backgrud Fig. 2. Nature clusterig result vs. desirable result. (a) Usupervised clusterig whe the labels f data pits are ukw r uused. (b) Desirable clusterig whe the labels f data pits are kw. rders are csidered as gd befre beig reprted as frauds. They are deemed t be fraud ce the cmpay receives fraud reprts. The time gap frm the date f a rder t the date f the fraud reprt leads t mislabeled data. Mislabeled data culd itrduce ise t the supervised leaig. Waitig fr a few mths util mst f the fraud reprts are cmpleted might reduce the mislabeled ise, but lse The techiques fr fraud detecti ad itrusi detectis fall it tw categries: statistical techiques ad data miig techiques. Traditial methds f etwrk itrusi detecti are based the saved patters f kw evets. They detect etwrk itrusi by cmparig the features f activities t the attack patters prvided by huma experts. e f the mai drawbacks f the traditial methds is that they cat detect ukw itrusis. Mrever, huma aalysis becmes isufficiet whe the vlume f the activities grws rapidly. This leads t the iterest i data miig techiques fr fraud detecti ad etwrk itrusi detecti [10,19]. Data miig based etwrk itrusi detecti techiques ca be categrized it misuse detecti ad amaly detecti [19]. The misuse detecti techiques build the patters f the attacks by the supervised learig frm the labeled data. The mai drawback f the misuse detecti techiques is that they cat detect ew attacks that have ever ccurred i the traiig data. the ther had, the amaly detecti techiques establish rmal usage patters. They ca detect the usee itrusis by ivestigatig their deviati frm the rmal patters. The artificial eural etwrks prvide a umber f advatages i fraud detecti ad etwrk itrusi detecti [5]. The applicatis f the eural etwrk techiques icludes bth the misuse detecti mdels ad the amaly detecti mdels [18,25]. A multi-layer perceptr (MLP) was used i [13] fr amaly detecti. A sigle hidde layer eural etwrk was used ad tested the Defese Advaced Research Prjects Agecy (DARPA)1998 data. The MLP was applied i [22]. The back-prpagati algrithm was used i the learig phase t adapt the weights f the eural etwrk. As a usupervised eural etwrk, the self-rgaizig maps (SM) has bee applied i amaly detecti. It implicitly prepares itself t detect ay aberrat etwrk activity by learig t characterize the rmal behavirs [25]. The SM was als applied t perfrm the clusterig f etwrk traffic ad t detect attacks i [14]. The SM was desiged t lear the characteristics f rmal activities i [12]. The variatis frm rmal activities prvided a idicati f a virus. The usupervised iche clusterig (UNC), a geetic iche techique fr usupervised clusterig was applied t the itrusi detecti i [21]. Each cluster evlved by the UNC was assciated with a membership fucti that fllwed a Gaussia shape. Usig the rmal samples, the UNC geerated clusters summarizig the rmal space. A hybrid artificial itelliget system is preseted i [23]. A usupervised eural mdel was embedded i a multi-aget system fr etwrk itrusi detecti. Hybrid learig appraches [1,8] itegrate differet learig ad adaptati techiques t vercme
idividual limitatis. Cmbiig the stregth f tw r multiple appraches culd achieve high efficiecy. A hybrid mdel f the SM ad the MLP was prpsed i [5]. The SM was cmbied with the feed-frward eural etwrk t detect the dispersig ad pssibly cllabrative attacks. Traditial fraud detecti appraches face the same prblems as traditial methds f etwrk itrusi detecti. Fraud detecti aalysis is cducted by fraud specialist by cmparig the fraud activities with rmal trasactis. Huma aalysis becmes isufficiet as the vlume f the trasactis grws up rapidly. Mrever, traditial fraud detecti is t receptive t ew r chagig patters. Data miig based fraud detectis are als categrized it misuse detecti ad amaly detecti. There have bee may cmmercial data miig tls available. The fllwig cmmercial tls are the tp level: SAS Eterprise Mier, SPSS Clemetie, ad IBM DB2 Itelliget Mier. These cmmercial tls ca be used effectively fr discverig patters i data. There are less research reprts fraud detecti tha thse etwrk itrusi detecti despite their similarity. This is simply because fiacial data usually d t pe t the public like KDD99 data fr etwrk itrusi detecti ad are hard t acquire. Decisi trees are used i [6] fr fraud detecti. Their apprach divided the large data set f labeled trasactis it smaller subsets. The it used decisi tree t geerate classifiers i parallel ad cmbied the resultat base mdels by metal-earig [7] frm the classifiers' behavir t geerate a meta-classifier. Brause et al. [3] cmbied radial basis fucti etwrk ad rule-based ifrmati fr credit card fraud detecti. 3. Imprved cmpetitive learig etwrk The ICLN is develped frm the SCLN. It vercmes the shrtages f istability i the SCLN ad cverges faster tha the SCLN. Therefre it btais a better perfrmace i terms f the cmputatial time. 3.1. The limitati f SCLN The SCLN csists f tw layers f eurs: the distace measure layer ad the cmpetitive layer. The structure f SCLN is shw i Fig. 3. The distace measure layer csists f m weight vectrs W = {w 1,w 2,...,w m }. Whe a traiig example is preseted, the distace measure layer calculates the distace betwee the weight vectrs ad the traiig example. The distaces calculated i the distace measure layer becme the iput f the cmpetitive layer. The cmpetitive layer fids ut the clsest weight vectr f the traiig example. The utput f the cmpetitive layer is a 1 x m vectr. Each bit f the utput vectr is either 0 r 1, represetig the cmpetitive result f the weight vectrs. Fr example, if eur wj w the cmpetiti, utput wuld be a 1 x m vectr with y(j) = 1 ad y(i) = 0 8i a j. The wiig weight vectr wj is the rewarded t be clser t the traiig sample. Every time the wiig weight vectr mves twards a particular sample. The ther uw weight vectrs will remai uchaged. This prcess is repeated fr all the traiig samples fr may iteratis. Evetually each f the weight vectrs wuld cverge t the cetrid f a cluster. The update rule f the SCLN is called ''wier takes all''. That meas ly e wiig eur updates itself each time whe a traiig example is preseted. The wiig eur wuld update itself t mve clser t the traiig sample ce it w the cmpetiti. The update is calculated by the stadard cmpetitive learig rule: wj(r +1) = wj(r) + z(r)(x-wj(r)) (1) where wj is the weight vectr f the wiig eur j, ad Z is the learig rate. ly e wiig eur updates itself ce i a time. The essece f cmpetitive learig is illustrated i Fig. 4. The perfrmace f the SCLN relies the umber f iitial eurs ad the value f their weight vectrs. ce the umber f utput eurs is set, the umber f clusters is als predetermied regardless f data distributi. the ther had, differet iitial weight vectrs may lead t differet umber f fial clusters because the update fucti i Eq. (1) ly chages the weight vectr f the wiig eur tward its lcal earby examples. Fig. 5 shws a sceari that reveals the limitatis f the SCLN. I this example, tw eurs are iitialized clse t e cluster. Bth f them will stay i the clusterig result sice SCLN is a reward ly algrithm. The SCLN clusterig result f this example will be fur clusters althugh ly three clusters are expected as shw i Fig. 5(b). Q u Q 0 0 0 ^ [w ] l w w 0 j Weight Vectr Traiig Example w b 0 8gt4 w j Weight Vectr Traiig Example Fig. 3. The SCLN csists f tw layers f eurs: the distace measure layer ad the cmpetitive layer. Fig. 4. The priciple f the SCLN. (a) Iitial weight vectrs. (b) Clusterig result.
138 lsig weight vectrs away frm the cluster ad e f them will fially be remved frm the cluster. Furthermre, sice the distace betwee the traiig example ad all f the weight vectrs are always calculated fr the cmpetiti, usig these values t apply puish rules t the lsig weight vectrs will accelerate the clusterig prcess withut additial calculatis. 3.3. The ICLN algrithm The ICLN algrithm is utlied i Fig. 7. The ICLN first iitializes k eurs. Tw methds ca be used fr the iitializati. e is t assig the k eurs with k radm traiig examples. The ther is t assig the ceter pit, i.e. the mea, f the whle traiig data t all k eurs. Ceter pit iitializati takes mre iteratis t cverge tha radm e. The umber f clusters k is usually set t a umber bigger tha the Fig. 6. The effect f the ICLN update rules. Fig. 5. The drawback f the SCLN. (a) Iitial weight vectrs. The perfrmace f the SCLN depeds heavily the umber f the iitial eurs ad their iitial weight vectrs. (b) Clusterig result. The left lwer cluster is separated it tw grups sice tw weight vectrs are iitialized clse t e cluster. 3.2. New update rules i ICLN The ICLN chages the SCLN's reward-ly rule t rewardpuish rule. The wiig eur updates its weight vectr by the same update rule i Eq. (1). This updated prcess is als called reward as the wiig eur is updated tward the traiig example. At the same time, the ther eurs als update their weight vectrs by wj(r+1)= w,(r)-z2(r)k (d(xj))(x-wj(r)) (2) where K(d(x,j)) is a kerel fucti i which d(xj) is the distace betwee eur j ad the iput x, ad Z 2 is the learig rate. This update prcess is called puish as the eurs are updated t mve away frm the traiig example. There are varius chices f the kerel fucti K(d(x,j)), such as the iverse distace, the triagular kerel, the quadratic kerel, ad the Gaussia kerel [2]. A kerel fucti btais the maximum value at zer distace, ad the value decays as the distace icreases. A gd kerel fucti smths ad regulates the updated value. The effect f the reward-puishmet update rules is shw i Fig. 6. The tw weight vectrs at the left bttm f Fig. 5(a) cmpete agaist each ther whe applyig ICLN. The puish rule pushes the Iput: X {xi,x2,. - the iput dataset utput: W = {i«i, k>2,..., uiit}: the weight vectrs BEGIN 1. Radmly iitialize the weight vectrs W = {wi,w 2,..;} 2. Setup the learig rates rji ad 172 fr the wiig eur ad the lsig eurs, respectively. 0 < rfc < 771 < 1. 3. Setup the miimum weight update value Ta 4. Select Kerel fucti: K{d{xi,Wj)) = er^'^^ 5. Setup maximum umber f iteratis M epch repeat fr Xi X d fr wj W d cmpute the distaces: d(xi, 11)= Wj ed fr uw = Wa if a = arg mi* =1 d(w m, z;) / Update w w(*/ W^i = Wi + Vl(%i ~ Wwi) / Update ther weight vectrs w W A w w wi */ w = w rkk(d(xi, w))(xi w) ed fr util Aui < Ta Vtei^r (#iterati > M^h) Remve all weight vectrs that have assciated iput. END Fig. 7. Algrithm: the imprved cmpetitive learig etwrk.
expected umber f clusters because the ICLN ca reduce the umber f clusters but cat add ew cluster t the etwrk. The ICLN trais the iitial weight vectr W = {w 1,w 2,...,w k } by radmly presetig the traiig examples X = {x 1,x 2,...,x }. Whe a traiig example x is preseted t the ICLN, the weight vectrs cmpete t each ther by cmparig their distace, such as Euclidea distace, t x. ly e weight vectr with the miimum distace t x wis the cmpetiti: w W i (x) = Wj k if j = arg mi d(w i,x) i 1 The wiig weight vectr is the updated by the reward fucti as i Eq. (1). At the same time, ther weight vectrs i the etwrk are updated by the puishmet fucti as i Eq. (2). The result f this reward-puishmet rule is that the wiig eur is updated t be clser t the traiig example ad the ther eurs are updated away frm the traiig example. After the etwrk fiishes learig frm x, ather traiig example will be preseted t the etwrk. The etwrk will the cmpete ad lear frm this ew traiig example. The ICLN iterates the learig prcess util e f the stppig criteria are satisfied. e criteri is that the maximum update t weight vectr W is less tha a preset miimum update threshld: max l l w i ( r ) - w i ( r - 1 ) l l < T D i = 1 where wi(r) is the weight vectr wi i the curret iterati, w i (r-1) is the weight vectr w i i the previus iterati, ad T D is the preset miimum update threshld. The ther criteri is that it fiishes the preset maximum umber f iteratis. 4. Supervised imprved cmpetitive learig etwrk The SICLN is a supervised clusterig algrithm derived frm the ICLN. Whe data labels are available, the SICLN uses them t guide the clusterig prcedure. 4.1. The bjective fucti The SICLN uses a bjective fucti bj(x,w) t measure the quality f the clusterig result. The purpse f the bjective fucti is t miimize the impurity f the result clusters ad keep a miimum umber f clusters. The bjective fucti is defied as bj(x, W) = a x Imp(X, W)+b x Sct(X,W) (5) where a ad b are the weights f impurity ad scatterig respectively, ad a+b = 1. The impurity f the whle result is the weighted average f the impurity f each cluster: (3) (4) where # detes the cut. Similarly, the secd dmiate class Dm 2 C(w) is the class that has mre members tha ther classes except the dmiate class. The misclassificati rate f cluster f weight vectr wi is cmputed as #(x2dmc(wi); x A W,) Misrate(w i ) = H where 9wi9 is the cut f the members f wi. Whe misclassificati rate is chse as the impurity fucti, accrdig t Eqs. (6) ad (7), the impurity is calculated as Ef= 1 wi X Imp(X,W )= #(x2dmc(w t )) Iw - 9 E?= 1 #(x==dmc(wi)) A alterative t misclassificati rate is the GINI impurity measure. The GINI impurity measure was first used i classificati ad regressi trees (CART) [4] ad has bee widely used t determie the purity f split i decisi trees. The GINI f a weight vectr wi is cmputed as Gii(w) = 1- ± j = 1 #(x A c; x A wi) ( H where cj is the class f members f w i frm j = 1; 2;...; t, ad W, is the size f wi. A smaller GINI idicates a lwer impurity. The GINI value reaches its maximum if the members are equal ppulati i each class. By ctrast, GINI value is 0 if all members belg t e class. Whe GINI is chse, accrdig t Eqs. (6) ad (9), the impurity is calculated as Imp(X,W)= T,'i = 11 w, I X (1-Ej = 1 #(x A cj; x A w,) H (7) (8) (9) (10) The secd part f the bjective fucti is the scatterig. A simple chice f the scatterig fucti is t cmpare the umber f clusters ad the umber f data pits: Sct(X,W) = \ (11) where t represets the umber f classes f the data set X, is the umber f data pits, ad i is the umber f clusters. A bigger scatterig idicates a wider spread clusterig. Whe the umber f clusters equals t the umber f data pits, scatterig reach its maximum. By ctrast, if the umber f clusters is the same as the umber f classes, scatterig is 0. A alterative chice is t use the size f each cluster: Imp(X,W ) = Ei = 1 wi x Imp(X,Wi) where is the cut f the data set X ad w i is the cut f the cluster members f W,-. e cmm chice f the impurity fucti is the misclassificati rate. If a cluster ctais members that are labeled as classes {c 1,c 2,...,c,}, the misclassificati rate f this cluster is the percetage f members that are t labeled as the dmiate class. The dmiate class f a cluster is defied as the mst frequet class f its members. Fr a data set X = {x 1,...,x } which are labeled as classes C = {c 1,...,c t }, cj is the dmiate class f wi if the cut f members f wi belg t class cj is mre tha thse belg t ay ther class DmC(w) = cj (6) Sct(X,W) = i 1 V ^ x(k-t) (12) t=\ 1 w i 1 ( ) Scatterig is bigger whe the variace f the size f the clusters are bigger. Elimiatig small size clusters will miimize this clusterig fucti. Fr fraud detecti ad itrusi detecti, we chse misclassificati fucti as the impurity fucti sice it is easy t set up by busiess gal. We chse the first scatterig fucti because the alterative chice itet t remve small size clusters but fraud ad itrusi are usually i small size clusters. Cmbiig Eqs. (5), (8) ad (11), the fllwig bjective fucti is chse t evaluate the quality f a clusterig result. if #(x A cj) > #(x2cj; x A w,) 8x A w bj(x,w) = a x Tkk= 1 #(x==dmc(w,)) + b x\ (13)
140 where a is the weight f impurity, ad b is the weight f scatterig. A smaller bj(x,w) idicates a better clusterig result. Miimizig bj(x,w) attais the best result. It meas t miimize bth impurity ad scatterig. Hwever, impurity ad scatterig usually cflict t each ther. Decrease f either e leads t icrease f the ther. 4.2. The SICLN algrithm The SICLN is utlied i Fig. 8. It first iitializes k eurs. The iitializati methds f the SICLN are the same as thse f the ICLN. It is t as imprtat because the etwrk will be recstructed i traiig. The SICLN als labels the weight vectrs with their member data pits. I the SICLN, a weight vectr is labeled t a class with the biggest ppulati f its cluster members. If all members are i "ukw", this weight vectr will be labeled as "ukw". Fig. 9 illustrates hw the iitial weight vectrs are labeled. w1 ad w 5 are labeled as "Black" because their black pit members are mre tha gray pit members. w 2 ad w 4 are labeled as "Gray" because gray pits f their members are mre tha black pits. w 3 is labeled as "ukw" because all f its members are missig label. w 6 is labeled as "ukw" because it has data member. The learig step f the SICLN is a revised versi f the ICLN. The utput eurs cmpete t be active. Sice labels are available, The SICLN uses the labels t update the weight vectrs. I the ew rule, ly eurs with the same class as the traiig example r "ukw" has the right t cmpete t wi. Neurs that are labeled as differet class will lse. Whe a eur w the cmpetiti, its weight vectr wuld be rewarded by the Fig. 9. The SICLN labels the weight vectrs with their member data pits. Iput: X = {xi,x 2,.,x }\ the iput dataset Iput: L(X) = {L IltL xlt..., i I}: the label f the dataset Iput: C = {ci,c2,...,ct}- the classes f the dataset utput: W = {wi, ii>2,...,10*}: the weight vectrs Begi 1 //Iitialize// 1.1 Radmly iitialize the weight vectrs W = {u>i,w2,...,v>k} 1.2 Setup ther parameters such as learig rate, Kerel fucti, bjective threshlds, iterati limit 2 //Traiig// repeat 2.1 Idetify members f W 1.1 Label W with their dmiate class. 2.3 //Learig// fr VieXd 2.3.1 Lk fr wiig weight vectr w wi regardig iput x 2.3.2 Update w witl with reward fucti 2.3.3 Update ther w w^m with puishmet fucti ed fr 2.4 If bjective threshld is satisfied, stp 2.5 Split weight vectr w W \f estimate bjective fucti value is higher tha befre splittig. Cstruct ew vectrs W = W.^ 2.6 //Learig (same as 2.3)// fr VieXd 2.6.1 Lk fr wiig weight vectr w wi regardig iput x 2.6.2 Update tu^ with reward fucti 2.6.3 Update ther w ^ w wi with puishmet fucti ed fr 2.7 Merge weight vectr Wi ad Wj if estimate bjective fucti value is higher tha befre mergig ad they are the same class ad clser t each ther tha ay ther weigh vectrs. Cstruct ew vectrs util bject fucti is satisfied r iterati exceeds limit 3 Remve all weight vectrs that have member data pit. 4 utput W {iii],i02,--,iiifc} Ed Fig. 8. Algrithm utlie: supervised imprved learig cmpetitive etwrk. Fig. 10. The recstructi prcess f the SICLN. same update rule as Eq. (1) i the ICLN. The puishmet update is als the same as the ICLN i Eq. (2). I the SICLN, whe a labeled traiig example is preseted t the etwrk, ly the eurs f the same class r "Ukw" class are able t wi t get the reward. Hwever, if a ulabeled traiig example is preseted t the etwrk, all eurs i the etwrk have the ability t cmpete t get reward r puish. I this case, the learig step f the SICLN becmes the same as that f the ICLN. If all traiig examples i the data set are ulabeled, all trai data ad weight vectrs belg t "ukw" class. At this pit, the SICLN becmes a ICLN. After the learig step, the SICLN will recstruct a ew etwrk based the traied etwrk. I the recstructi step, a eur is split it tw ew eurs if it ctais may members belgig t ther classes. the ther had, tw eighbrig eurs are merged it e if they belg t the same class. Fig. 10 illustrates the recstructi step f the SICLN. The split prcess starts frm the clusters with the maximum impurity values. A estimated after split impurity is betwee the after split impurity ad the best pssible impurity. Fr example, if weight vectr ws is split t tw vectrs ws 1 ad ws 2, the after split impurity is Imp Ek= 1 #(x==dmc(wi)) after (X,W) =
The best pssible impurity is Ei = i,i a s #(x==dmc(wi)) Imp best (X,W ) = #(x2dmcw s,x2dm 2 C(w s ) + The estimated value f the impurity after split is imp(x,w) = Imp aft er(x,w)+y x (Impbe St (X,W)-Imp afkr (X,W)) where y is a estimate factr 0 < y < 1. The Scatterig value after split is Sct(X,W) = ^J k^ The estimated bjective fucti value is bj(x, W) = a imp(x, W)+b Sct(X, W) If the estimated bjective value is smaller tha the bjective value befre split, the weight vectr will be split it tw. The media pit f members f the dmiate class ad the media pit f the members f the secd dmiate class f the eur are selected t be the ew eurs. The merge prcess lks fr the clsest same class weight vectrs as the cadidates. T fid ut tw weight vectrs that are clser t each ther tha t ay ther weight vectrs, we use the mutual eighbr distace [17]. The mutual eighbr distace is MND(w i,wj ) = NN(w i,wj ) + NN(Wj,w i ) (14) where NN(w i,wj) is the eighbr umber f eur wj with respect t eur wi. If MND(w-t,w 2 ) = 2 (e.g. NN(w 1,w 2 )= 1 ad NN(w 2,w 1 ) = 1), w 1 ad w2 are clser t each ther tha t ay ther weight vectrs. w i ad wj are merged if they meet the fllwig cditis: (1) C(w i ) = C(w j ), (2) MND(w i,w j ) = 2, (3) bj(x,w s ) < bj(x,w). The ew eur takes the mea f wi ad wj as its weight vectr. The recstructi step creates a ''ew'' etwrk by splittig r mergig the weight vectrs, drive by the bjective fucti. This ''ew'' etwrk will replace the ld e t ctiue the learig step. This lear recstruct iterati repeats util e f the fllwig stppig criteria is satisfied: (1) the bjective fucti value satisfies the miimum threshld; (2) the traiig reach the maximum umber f iteratis. 4.3. The SiCLN vs. the icln While ICLN has the capability t cluster data i its ature grups. The SICLN uses labels t guide the clusterig prcess. The ICLN grups data it clusters by gatherig clser data pits it the same grup. As a supervised clusterig algrithm, the SICLN miimizes the impurity f the grups ad the umber f grups. Fig. 11 shws the imprvemet frm the ICLN t the SICLN. The result f the ICLN is i Fig. 11(a). The data are idetified i their ature grups withut lkig at the data labels. Weight vectrs w2 ad w3 becme the cluster ceter f tw grups f data at the left bttm althugh bth grups belg t the same class. the ther had, weight vectr w4 represets the grup f data the right upper, which ctais data f tw classes. The result f the SICLN applyig t the same data is i Fig. 11(b). Weight vectr w4 is split it wb ad wc, which represet the ceters f tw grups f data with differet classes. Therefre, the purity f the clusterig result is higher tha that f the ICLN. At the same time, the SICLN attempts t result i less clusters while keepig the same level f purity. Weight vectrs x2 ad w3 are merged t wa. The ew weight vectr wa becmes the ceter data grup w 2 + w 3, which belgs t the same class. Q 0» '"W (W, <wa ^ # 0 0. V 0 ~ ^ # w ) Weight Vectr Traiig Example 0 0 0 ' W5 )0 I w ] Weight Vectr (' ) Traiig Example Fig. 11. The ICLN vs. the SICLN. (a) ICLN: cluster data i its ature grups. Same class data at the left bttm are cluster it tw grups represeted by weight vectrs w2 ad w3. Data f differet classes i the middle are clustered it e grup represeted by weight vectr w4. (b) SICLN: ptimize the purity f the clusters ad the umber clusters. Weight vectr w4 i (a) is split it wb ad wc t maximize the purity f the clusterig. Weight vectrs w2 ad w3 are merged t wa t miimize the umber f clusters. 5. Experimetal cmpariss I this secti, we cmpare the perfrmace f the SICLN ad the ICLN with the k-meas ad SM three data sets: the Iris data, the KDD 1999 data, ad the Vesta trasacti data. 5.1. Evaluati metrics The utputs f a predicti r detecti mdel fall it fur categries: true psitive (TP), true egative (TN), false psitive (FP), ad false egative (FN). TP ad TN are crrect predicti r detecti while FP ad FN are icrrect predicti r detecti. The evaluati metrics icludes: accuracy, precisi, recall, ad receiver peratig characteristic (RC) curve. They are calculated as Accuracy = Precisi = TP+TN TP+TN+FP+FN TP TP+FP
142 Recall = TP TP+FN The three metrics describe the percetage f crrect predicti. Hwever, e f them ale ca represet the perfrmace f a algrithm. A high accuracy may t represet a better result because the cst f icrrect predictis f psitive data ad egative data are usually differet. A high accuracy result may have a lw recall. A high recall may t idicate a gd result either because the recall ca easily icrease by decreasig the precisi. A high precisi result may cme with a very lw recall. RC curve [11] is a graphical plt f the TP vs. FP rate as the threshld f the classificati varies. It illustrates the trade-ff betwee TP rate ad FP rate. 5.2. Iris data Iris data is a data set with 150 radm samples f flwers frm the iris species setsa, versiclr, ad virgiica. Frm each species there are 50 bservatis fr sepal legth, sepal width, petal legth, ad petal width i cm. The data set ctais three classes. Each class refers t a type f iris plat. I the three classes, e class is liearly separable frm the ther tw classes; the latter are t liearly separable frm each ther. We iitialized the umber f clusters f SICLN, the ICLN ad k-meas t 5. Sice the umber f eurs i the SM had t frm a rectagle, we select 6 as its iitial umber, e.g. 3 x 2. Fig. 12 shws the perfrmace cmparis f these algrithms. The SICLN utperfrms the thers i accuracy. The perfrmace f the ICLN ad the SM are almst idetical. The k-meas has the lwest accuracy. T test SICLN's capability f deal with missig labels, we masked sme f the labels. We radmly masked 100%, 70%, 50%, 30%, ad 20% labels i the iris data set ad applied the SICLN them. Fig. 13 shws perfrmace results. The SICLN becme the ICLN whe 100% labels are missig. The result is exactly the same as the ICLN. The perfrmace f the SICLN becme better with the icrease f the available labels. Whe there are eugh labeled data t guide the clusterig prcedures (less tha 20% labels were missig fr the case f Iris data), the SICLN reached its highest perfrmace. I ather experimet we tested the SICLN's ability f adaptig t differet iitial umber f weight vectrs. The SICLN was iitialized t 1, 3, 5, 8,10, ad 15 eurs. The SICLN csistetly resulted i five clusters. 5.3. Netwrk itrusi detecti: KDD-99 data The KDD-99 data set was used fr the Third Iteratial Kwledge Discvery ad Data Miig Tls Cmpetiti. This k-meas SM ICLN SICLN Fig. 12. Perfrmace cmparis the Iris data. Fig. 13. Perfrmace f the SICLN Iris data with missig labels. Fig. 14. Perfrmace cmparis KDD99 data. data set was acquired frm the 1998 DARPA itrusi detecti evaluati prgram. There were 4,898,431 cecti recrds, f which 3,925,650 were attacks. Each data pit is a etwrk cecti, which is represeted by 41 features, icludig the basic features f the idividual cectis, the ctet features suggested by the dmai kwledge, ad the traffic features cmputed usig a 2-s time widw [16]. Each cecti is labeled as "rmal" r a particular type f the attacks: eptue, smurf, ipsweep, r back DS. The ature f these attacks are described i [24,26]. Frm this data set, 501,000 recrds were chse i ur experimet. The selected cectis were further split it the traiig set ad the test set, ctaiig 101,000 ad 400,000 cectis, respectively. The perfrmace cmparis is shw i Fig. 14. The SICLN is better tha the ther algrithms i three evaluati metrics. The RC curves are illustrated i Fig. 15. It shws that all f them have gd perfrmace. I additi, the results f the SICLN shws its
capability t distiguish small ppulati classes whe we brake dw the results it idividual class level as shw i Table 1. Attack type eptue ad ipsweep have ly 0.03% ad 0.91% f the ppulati i the data set, ad they are similar t each ther. Althugh these eptue ad ipsweep cectis are detected as attacks i all algrithms, k-meas, the SM, ad the ICLN are t able t distiguish these tw attack types frm each ther. The SICLN utperfrmed k-meas, SM, ad ICLN i terms f misclassificati rate. The capability f kwig the types f the attack brig better autmatic slutis r treatmets. The clusterig detail als shws that the SICLN has the capability t distiguish clusters f small ppulati. We als masked sme f the labels t test whether the SICLN ca deal with missig labeled data. The results shw that whe all label are missig, SICLN becmes a ICLN. The perfrmace reaches the highest pit as mre tha 70% labels are available. We tested the SICLN's capability t adapt t differet iitial umber f weight vectrs as well. Startig frm the iitialized umber f 1, 5, 10, 15, 20, r 30 eurs, the SICLN csistetly cverged t 10 clusters. 5.4. Fraud detecti: credit card paymet data Fig. 15. RC curves f SICLN, k-meas, SM, ad ICLN KDD-99 data. Table 1 Misclassify rate idividual class. Class Ppul. (%) Misclassify rate (%) k-meas SM ICLN SICLN Nrmal 77.12 0.42 0.33 0.42 0.32 Neptue 0.03 100 100 100 0 Smurf 21.88 0 0 0 0 Ipsweep 0.91 7 7 7 3.2 Back 0.06 100 100 100 100 The data we used fr this experimet is the fraud detecti data frm Vesta Crprati. The data ctai credit card trasactis f callig cards f a telecmmuicati cmpay. Vesta crprati is a ivatr ad wrldwide leader i virtual cmmerce with headquarter i Prtlad, reg, USA. The cmpay is servicig mst majr US telecmmuicatis carriers, icludig AT&T, T-Mbile, Cricket, Veriz, ad Sprit. Fig. 16 shws the data flw fr fraud aalysis. The -lie trasacti prcessig (LTP) servers trasfer data t the -lie aalytical prcessig (LAP) data warehuses, which data miig tasks fr fraud detecti ad busiess aalysis are perfrmed. Frt ed LTP data are itegrated t data warehuse i aalysis servers thrugh the backup servers i a daily frequecy. Data miig ad aalysis tasks are perfrmed i aalysis servers by risk maagemet usig Micrsft SQL ad SAS Eterprise Mier. Fig. 16. Data flw f Vesta data fr fraud aalysis.
144 The data used i this experimet are a prti f credit card paymet trasactis f e f vesta's telecmmuicati parter. This data set ctais three mths trasacti histry f 206,541 credit cards, i which 204,078 are rmal ad 2463 are fraudulet. By cmbiig busiess kwledge, simple statistics, ad statistics measures, 21 variables were selected frm the raw data. As stated i Secti 1, data i fraud detectis are highly skewed. There are may mre gd evets tha fraudulet evets. I fraud detecti, recall rate is mre imprtat tha the verall accuracy ad precisi. Accuracy ale cat reflect the quality f the algrithms because simply predictig that all trasactis are gd evets, althugh this is equivalet t t detectig fraud at all ca still gets high accuracy. I this case, the ative rati f fraud agaist rmal is arud 1.2%. The accuracy ca be 98.8% if simply guessig every trasacti is rmal. Hwever, ur gal is t detect as may frauds as pssible, while keepig false psitive rate at a certai acceptable level. Table 2 shws the experimetal results Vesta data. The recall rate f the SICLN is abut 20% higher tha the thers. RC curve is a better tl fr perfrmace cmparis i this case. Fig. 17 shws the imprvemet f the SICLN by the cmparis f the RC curves f the SICLN, k-meas, the SM, ad the ICLN. 5.5. Discussis The small size ad lw dimesi f the Iris data make it pssible t prvide visible learig prcess f the algrithms. KDD-99 data ad Vesta data test the algrithms' scalability ad capability f dealig with real-wrld data. The SICLN utperfrms the ther algrithms all f these three data sets. The imprvemet is mre ticeable whe data pits f differet classes are very clse. Furthermre, the SICLN Table 2 Experimetal result Vesta data. Algrithm Number f clusters Accuracy (%) Recall (%) Precisi (%) k-meas 12 97.4 60.7 25.7 SM 13 97.8 54.8 27.8 ICLN 12 97.8 57.4 28.4 SICLN 12 97.4 79.1 28.8 is cmpletely idepedet frm the iitial umber f clusters sice its recstructi step is able t rebuild the structure f itself based the data labels. Meawhile, The SICLN's capability f dealig with missig data is als demstrated i these experimets. The clusterig perfrmace f the SICLN imprves whe available umber f labels icreases ad it reaches the highest pit whe abut 70% f data pits are labeled. This feature makes the SICLN a ideal cadidate f algrithms fr fraud detecti ad etwrk itrusi detecti sice there are always a certai amut f ulabeled ad delay labeled data i these dmais. 6. Cclusi ad future wrk We have prpsed ad develped tw clusterig algrithms: (1) ICLN, a usupervised clusterig algrithm imprvig frm the stadard cmpetitive learig eural etwrk, ad (2) The SICLN, a supervised clusterig algrithm, which itrduces supervised mechaism t the ICLN. The ICLN imprves the SCLN by mdifyig its update rule frm the reward ly rule t the reward-puishmet rule. The ew update rule icreases the stability ad speeds up the traiig prcess f the ICLN. Furthermre, the umber f fial clusters f the ICLN is idepedet frm the umber f iitial etwrk eurs sice the redudat eurs will be fially excluded frm the clusters by the puishmet rule. The SICLN is a supervised clusterig algrithm derived frm the ICLN. The SICLN utilizes labeled data t imprve the clusterig results. The SICLN mdifies the learig rule f the ICLN t trai bth labeled ad ulabeled data. Furthermre, the SICLN adds the recstructi step t the ICLN t merge r split the existig weight vectrs fr the clusterig task. The recstructi step eables the SICLN t becme cmpletely idepedet frm the umber f iitial clusters. A bjective fucti which cmbies the purity ad scatterig f the clusters is used i the SICLN t ptimize the misclassificati rate ad the umber f clusters. We cmpared the perfrmace f the SICLN ad the ICLN with the k-meas ad the SM usig three data sets: Iris data, KDD-99 data, ad credit card paymet data. The ICLN achieve similar accuracy as the ther traditial usupervised clusterig algrithms. The SICLN utperfrms the ther algrithms i all three data set ad exhibits the fllwig advatages: (1) achieves lw misclassificati rate i slvig classificati prblems; (2) is able t deal with bth labeled ad ulabeled data; (3) has the capability t achieve high perfrmace eve whe part f data labels are missig; (4) is able t classify highly skew data; (5) has the capability t idetify usee patters; (6) is cmpletely idepedet frm the iitial umber f clusters. The experimetal cmparis demstrates the SICLN has excellet perfrmace i slvig classificati prblems usig clusterig appraches. The advatages list abve recmmed the SICLN culd be a ideal algrithm fr fraud detectis ad etwrk itrusi detectis. The fllwig are the future imprvemets ad directis f this research: 0.2 0.4 0.6 0.8 Fig. 17. RC curves f SICLN, k-meas, SM, ad ICLN. A better estimati methd fr the recstructi step may imprve the efficiecy f the SICLN. If there is a mre accurate methd t estimate the bjective fucti value, the SICLN culd be able t cverge faster t the fial result. Further imprvemet may be de t avid lcal ptimizati. Althugh the recstructi step, the selecti f learig rate, ad the use f weight decay ca reduce the chace f edig t lcal ptimal pit fr the SICLN. The curret SICLN des t guaratee avidig lcal ptimizati. Further research may imprve the SICLN frm this prspect.
SICLN has the ptetial t be mdified t a icremetal traiig algrithm althugh the curret SICLN is desiged fr batch traiig. A icremetal traiig apprach will imprve the fraud detecti r etwrk itrusi detecti system t be a autmatic adaptive system withut r with small amut f huma iteracti. Itrducig fuzzy lgic [9] will be a ptetial big imprvemet t the SICLN. The differece betwee the fuzzy clusterig ad frm the traditial clusterig is that the utput f the fuzzy clusterig is the membership fucti that assciates each data pit t each cluster. Fuzzy result is helpful fr fraud detectis ad etwrk itrusi detectis t specify the likelihd f a activity beig a fraud r itrusi evet. Kwig the pssibility f a activity beig a fraud r itrusi evet ca guide the system t perfrm prper reactis. Ackwledgmets The authrs graciusly ackwledge the fudig frm the Atlatic Caada pprtuity Agecy (ACA) thrugh the Atlatic Ivati Fud (AIF) ad thrugh Grat RGPN 227441 frm the Natial Sciece ad Egieerig Research Cucil f Caada (NSERC) t Dr. Ghrbai. The data sets fr fraud detecti used i the experimets were cllected i ad prvided by Vesta Crprati, Uited States. Refereces [1] A. Abraham, E. Crchad, J.M. Crchad, Hybrid learig machies, Neurcmputig (13-15) (2009) 2729-2730. [2] C.G. Atkes, A.W. Mre, S. Schaal, Lcally weighted learig, Artificial Itelligece Review 11 (1-5) (1997) 11-73. [3] R. Brause, T. Lagsdrf, M. Hepp, Neural data miig fr credit card fraud detecti, i: ICTAI, 1999, pp. 103-106. [4] L. Breima, J. Friedma, R. lshe, C. Ste, Classificati ad Regressi Trees, Wadswrth Iteratial, 1984, pp. 21-28. [5] J. Caady, Artificial eural etwrks fr misuse detecti, i: Prceedigs f the 1998 Natial Ifrmati Systems Security Cferece, Arligt, VA, 1998, pp. 443-456. [6] P.K. Cha, W. Fa, A.L. Prdrmidis, S.J. Stlf, Distributed data miig i credit card fraud detecti, IEEE Itelliget Systems 14 (6) (1999) 67-74. [7] P.K. Cha, S.J. Stlf, Experimets i multistrategy learig by meta-learig, i: Prceedigs f the Secd Iteratial Cferece Ifrmati ad Kwledge Maagemet, 1993, pp. 314-323. [8] E. Crchad, A. Abraham, A.C.P.L.F. de Carvalh, Hybrid itelliget algrithms ad applicatis, Ifrmati Sciece (14) (2010) 2633-2634. [9] E. Czgala, J. Leski, Fuzzy ad Neur-fuzzy Itelliget Systems, Physica- Verlag, Heidelberg, 2000, pp. 107-127. [10] P. Dkas, L. Ertz, V. Kumar, A. Lazarevic, J. Srivastava, P. Ta, Data miig fr etwrk itrusi detecti, i: Prceedig NSF Wrkshp Next Geerati Data Miig, 2002, pp. 21-30. [11] R.. Duda, P.E. Hart, D.G. Strk, Patter Classificati, 2d ed., Wiley- Itersciece, 2000, pp. 517-601. [12] K. Fx, R. Heig, J. Reed, R. Simia, A eural etwrk apprach twards itrusi detecti, i: Prceedigs f the 13th Natial Cmputer Security Cferece, 1990, pp. 125-134. [13] A.K. Ghsh, A. Schwartzbard, A study i usig eural etwrks fr amaly ad misuse detecti, i: Prceedigs f USENIX Security Sympsium, 1999, p. 12. [14] L. Girardi, A eye etwrk itruder-admiistratr shtuts, i: Prceedigs f the Wrkshp Itrusi Detecti ad Netwrk Mitrig, 1999, pp. 19-28. [15] J. Ha, M. Kamber, i: Data Miig: Ccepts ad Techiques, Mrga Kaufma, 2000, pp. 335-385. [16] S. Hettich, S.D. Bay, The UCI KDD archive, 1999, <http://kdd.ics.uci.edu>, Uiversity f Califria Departmet f Ifrmati ad Cmputer Sciece, Irvie, CA, 2004, p. 1. [17] A.K. Jai, M.N. Murty, P.J. Fly, Data clusterig: a review, ACM Cmputig Surveys 31 (3) (1999) 264-323. [18] S.C. Lee, D.V. Heibuch, Traiig a eural-etwrk based itrusi detectr t recgize vel attacks, IEEE Trasactis Systems, Ma & Cyberetics Part A Systems & Humas 31 (4) (2001) 294-299. [19] W. Lee, S.J. Stlf, Data miig appraches fr itrusi detecti, i: SSYM'98: Prceedigs f the 7th Cferece USENIX Security Sympsium, 1998, p. 6. [20] J.Z. Lei, A. Ghrbai, Netwrk itrusi detecti usig a imprved cmpetitive learig eural etwrk, i: Secd Aual Cferece Cmmuicati Netwrks ad Services Research, 2004, pp. 190-197. [21] E. Le,. Nasraui, J. Gmez, Netwrk itrusi detecti usig geetic clusterig, i: Geetic ad Evlutiary Cmputati, vl. 3103/2004, 2004, pp. 1312-1313. [22] R.P. Lippma, R.K. Cuigham, Imprvig itrusi detecti perfrmace usig keywrd selecti ad eural etwrks, Cmputer Netwrks 34 (4) (2000) 597-603. [23] A. Herrer, E. Crchad, M.A. Pellicer, A. Abraham, MVIH-IDS: a mbile visualizati hybrid itrusi detecti system, Neurcmputig (13-15) (2009) 2775-2784. [24] S. Mukkamala, A. Sug, A. Abraham, Cyber security challeges: desigig efficiet itrusi detecti systems ad ativirus tls, i: V. Ra Vemuri, V. Sree Hari Ra (Eds.), Ehacig Cmputer Security with Smart Techlgy, CRC Press, USA, 2005. [25] B.C. Rhdes, J.A. Mahaffey, J.D. Caady, Multiple self-rgaizig maps fr itrusi detecti, i: Prceedigs f the 23rd Natial Ifrmati Systems Security Cferece, 2000. [26] E. Skudis, Cuter Hack: A Step-by-step Guide t Cmputer Attacks ad Effective Defeses, 2002. Jh Lei is a seir aalytic scietist ad a aalytic platfrm maager at Vesta Crprati i Prtlad, reg, USA. His research fcuses the develpmet f data miig, machie learig, ad statistical mdelig techiques fr fraud detecti ad busiess itelliget i challegig real-wrld applicati ctexts. Jh has a M.S. i Cmputer Sciece frm the Uiversity f New Bruswick i Caada ad a M.S. i Arts i Ecmics frm Guagxi Uiversity i Chia. Jh received his B.S. i Cmputer Sciece frm The Uiversity f Electric Sciece ad Techlgy f Chia i 1990. Ali Ghrbai has held a variety f psitis i academia fr the past 29 years icludig headig up prject ad research grups ad as departmet chair, directr f cmputig services, directr f exteded learig ad as assistat dea. He received his Ph.D. ad Master's i Cmputer Sciece frm the Uiversity f New Bruswick, ad the Gerge Washigt Uiversity, Washigt, DC, USA, respectively. Dr. Ghrbai curretly serves as Dea f the Faculty f Cmputer Sciece. He hlds UNB Research Schlar psiti. His curret research fcus is Web Itelligece, Netwrk ad Ifrmati Security, Cmplex Adaptive Systems, ad Critical Ifrastructure Prtecti. He authred mre tha 230 reprts ad research papers i jurals ad cferece prceedigs ad has edited eight vlumes. He served as Geeral Chair ad Prgram Chair/ c-chair fr seve Iteratial Cfereces, ad rgaized ver 10 Iteratial Wrkshps. He has als supervised mre tha 120 research assciates, pstdctral fellws, ad udergraduate ad graduate studets. Dr. Ghrbai is the fudig Directr f Ifrmati Security Cetre f Excellece at UNB. He is als the crdiatr f the Privacy, Security ad Trust (PST) etwrk at UNB. Dr. Ghrbai is the c-editr-i-chief f Cmputatial Itelligece, a iteratial jural, ad assciate editr f the Iteratial Jural f Ifrmati Techlgy ad Web Egieerig ad the ISC jural f Ifrmati Security. His bk, Itrusi detecti ad Preveti Systems: Ccepts ad Techiques, published by Spriger i ctber 2009.