A New Adaptive Ensemble Boosting Classifier for Concept Drifting Stream Data

A New Adapive Ensemble Boosing Classifier for Concep Drifing Sream Daa Kapil K. Wankhade and Snehlaa S. Dongre, Members, IACSIT Absrac Wih he emergence of large-volume and high speed sreaming daa, mining of sream daa has become a focus on increasing ineress in real applicaions including credi card fraud deecion, arge markeing, nework inrusion deecion, ec. The major new challenges in sream daa mining are, (a) since sreaming daa may flow in and ou indefiniely and in fas speed, i is usually expeced ha a sream daa mining process can only scan a daa once and (b) since he characerisics of he daa may evolve over ime, i is desirable o incorporae evolving feaures of sreaming daa. This paper inroduced new adapive ensemble boosing approach for he classificaion of sreaming daa wih concep drif. This adapive ensemble boosing mehod uses adapive sliding window and Hoeffding Tree wih naïve bayes adapive as base learner. The resul shows ha he proposed algorihm works well in changing environmen as compared wih oher ensemble classifiers. Index Terms Concep drif, ensemble approach, hoeffding ree, sliding window, sream daa I. INTRODUCTION The las weny years or so have winessed large progress in machine learning and in is capabiliy o handle real-world applicaions. Neverheless, machine learning so far has mosly cenered on one-sho daa analysis from homogeneous and saionary daa, and on cenralized algorihms. Mos of machine learning and daa mining approaches assume ha examples are independen, idenically disribued and generaed from a saionary disribuion. A large number of learning algorihms assume ha compuaional resources are unlimied, e.g., daa fis in main memory. In ha conex, sandard daa mining echniques use finie raining ses and generae saic models. Nowadays we are faced wih remendous amoun of disribued daa ha could be generaed from he ever increasing number of smar devices. In mos cases, his daa is ransien, and may no even be sored permanenly. Our abiliy o collec daa is changing dramaically. Nowadays, compuers and small devices send daa o oher compuers. We are faced wih he presence of disribued sources of deailed daa. Daa coninuously flow, evenually a high-speed, generaed from non-saionary processes. Examples of daa mining applicaions ha are faced wih his scenario include sensor neworks, social neworks, user Manuscrip received May 25, 2012; revised June 25, 2012. K. K. Wankhade is wih he Deparmen of Informaion Technolgy, G. H. Raisoni College of Engineering, Nagpur, Maharashra, INDIA 440019 (kaps.wankhade@gmail.com). S. S. Dongre is wih he Deparmen of Compuer Science and Engineering, G. H. Raisoni College of Engineering, Nagpur, Maharashra, INDIA 440019 (dongre.sneha@gmail.com). modeling, radio frequency idenificaion, web mining, scienific daa, financial daa, ec. Mos recen learning algorihms [1]-[12] mainain a decision model ha coninuously evolve over ime, aking ino accoun ha he environmen is non-saionary and compuaional resources are limied. The desirable properies of learning sysems for efficien mining coninuous, high-volume, open-ended daa sreams: Require small consan ime per daa example; use fix amoun of main memory, irrespecive o he oal number of examples. Buil a decision model using a single scan over he raining daa. Generae an anyime model independen from he order of he examples. Abiliy o deal wih concep drif. For saionary daa, abiliy o produce decision models ha are nearly idenical o he ones we would obain using a bach learner. Ensemble mehod is advanageous over single classifier mehods. Ensemble mehods are combinaions of several models whose individual predicions are combined in some manner o form a final predicion. Ensemble learning classifiers ofen have beer accuracy and hey are easier o scale and parallelize han single classifier mehods. The paper organized as, ypes of concep drif are discussed in Secion II. Secion III explains proposed mehod wih boosing, adapive sliding window, hoeffding ree. Experimens and resuls are included in Secion IV wih concluding conclusion in Secion V. II. TYPES OF CONCEPT DRIFT Change may come as a resul of he changing environmen of he problem e.g., floaing probabiliy disribuions, migraing clusers of daa, loss of old and appearance of new classes and/or feaures, class label swaps, ec. Fig. 1 shows four examples of simple changes ha may occur in a single variable along ime. The firs plo (Noise) shows changes ha are deemed non-significan and are perceived as noise. The classifier should no respond o minor flucuaions, and can use he noisy daa o improve is robusness for he underlying saionary disribuion. The second plo (Blip) represens a rare even. Rare evens can be regarded as ouliers in a saic disribuion. Examples of such evens include anomalies in landfill gas emission, fraudulen card ransacions, nework inrusion and rare medical condiions. Finding a rare even in sreaming daa can signify he onse of a concep drif. Hence he mehods for on-line deecion of rare evens can be a componen of he novely deecion paradigm. The las wo plos in Fig. 1 493

(Abrup) and (Gradual) show ypical examples of he wo major ypes of concep drif represened in a single dimension. Fig. 1. Types of concep change in sreaming daa. III. PROPOSED METHOD This adapive ensemble mehod uses boosing, adapive sliding window and Hoeffding ree for improvemen of performance. A. Boosing Boosing is a machine learning mea-algorihm for performing supervised learning. While boosing is no algorihmically consrained, mos boosing algorihms consis of ieraively learning weak classifiers wih respec o a disribuion and adding hem o a final srong classifier. When hey are added, hey are ypically weighed in some way ha is usually relaed o he weak learners' accuracy. Afer a weak learner is added, he daa is reweighed; examples ha are misclassified gain weigh and examples ha are classified correcly lose weigh. Boosing focuses on he misclassified uples, i risks overfiing he resuling composie model o such daa. Therefore, someimes he resuling boosed model may be less accurae han a single model derived from he same daa. Bagging is less suscepible o model overfiing. While boh can significanly improve accuracy in comparison o a single model, boosing ends o achieve greaer accuracy. There is reason for he improvemen in performance ha i generaes a hypohesis whose error on raining se is small by combining many hypoheses whose error may be large. The effec of boosing has o do wih variance reducion. However unlike bagging, boosing may also reduce he bias of he learning algorihm. B. Adapive Sliding Window In daa sreams environmen daa comes infiniely and huge in amoun. So i is impossible o sores and processes such daa fas. To overcome hese problems window echnique comes forward. Window sraegies have been used in conjuncion wih mining algorihms in wo ways: one, exernally o he learning algorihm; he window sysem is used o monior he error rae of he curren model, which under sable disribuions should keep decreasing or a mos sabilize; when insead his rae grows significanly, change is declared and he base learning algorihm is invoked o revise or rebuild he model wih fresh daa. A window is mainained ha keeps he mos recen examples and according o some se of rules from window older examples are dropped. The idea behind sliding window [13] is ha, whenever wo large enough sub-windows of W exhibi disinc enough averages, one can conclude ha he corresponding expeced values are differen, and he older porion of he window is dropped. In oher words, W is kep as long as possible while he null hypohesis µ has remained consan in W is susainable up o confidence δ. Large enough and disinc enough above are made precise by choosing an appropriae saisical es for disribuion change, which in general involves he value of δ, he lenghs of he sub-windows and heir conens. A every sep, oupus he value of ˆW as an approximaion o µ W. C. Hoeffding Tree The Hoeffding Tree algorihm is a decision ree learning mehod for sream daa classificaion. I ypically runs in sublinear ime and produces a nearly idenical decision ree o ha of radiional bach learners. I uses Hoeffding Trees, which exploi he idea ha a small sample can be enough o choose an opimal spliing aribue. This idea is suppored mahemaically by he Hoeffding bound. Suppose we make N independen observaions of a randomvariable r wih range R, where r is an aribue selecion measure. In he case of Hoeffding rees, r is informaion gain. If we compue he mean, r of his sample, he Hoeffding bound saes ha he rue mean of r is a leas r, wih probabiliy 1-δ, where δ is user-specified and R 2 ln(1/ 2N (1) D. Adapive Ensemble Boosing Classifier The designed algorihm uses boosing as ensemble mehod, sliding window and Hoeffding ree for o improve ensemble performance. Figure 1 shows algorihm for he daa sream classificaion. In his algorihm here are m base models, D, a daa used for raining and iniially assigns weigh w k o each base model equal o 1. The learning algorihm is provided wih a series of raining daases {x i Є X: y i Є Y}, i= 1,..., m where is an index on he changing daa, hence x i is he i h insance obained from he i h daase. The algorihm has wo inpus, a supervised classificaion algorihm, Base classifier o rain individual classifiers and he raining daa D drawn from he curren disribuion P (x, y) a ime. When new daa comes a ime, he daa is divided ino number of sliding windows W 1,,W n. The sliding window is used for change deecion, ime and memory managemen. In his algorihm sliding window is parameer and assumpion free in he sense ha i auomaically deecs and adaps o he curren rae of change. Window is no mainained explicily bu compressed using a varian of he exponenial hisogram echnique [14]. The expeced values ˆW = oal / widh of respecive windows are calculaed. If change deec, i raises change alarm. Wih using expeced values algorihm decides which sub-window has dropped or no. Then algorihm generaed one new classifier h, which is hen combined wih all previous classifiers o creae he composie hypohesis H. The decision of H serves as he ensemble decision. The error of he composie hypohesis, E, is compued on new daase, which is hen 494

used o updae he disribuion weighs. Once he disribuion is updaed, he algorihm calls he Base classifier & asks i o creae he h classifier h using drawn from he curren raining daase D i.e. h : X Y. All classifiers are generaed for h k, k=1,, are evaluaed on he curren daase by compuing ε k, he error of he k h classifier h k a he h ime sep. [15], OzaBoos [15] and OCBoos [16], OzaBagADWIN [17] and OzaBagASHT [17]. The experimens were performed on a 2.59 GHz Inel Dual Core processor wih 3 GB RAM, running on UBUNTU 8.04. For esing and comparison MOA [18] ool and Inerleaved Tes-Then-Train mehod has used. m i D i hk xi yi i1 ( ).[ ( ) ] (2) for k = 1,2,, A curren ime sep, we have error measures one for each classifier generaed. The error is calculaed using equaion (2) and hen i is used for o updae weigh. Then he weigh is dynamically updaed using equaion (3), w (1 ) / (3) ' k k k Using his updaed weigh, classifier is consruced. The performance evaluaion of classifier is imporan in concep drifing environmen. If he performance of he classifier has going down hen his algorihm drops he classifier and add nex classifier in ensemble for mainaining ensemble size. This classifier assigns dynamic sample weigh. I keeps he window of lengh W using only O (log W) memory & O (log W) processing ime per iem, raher han he O (W) one expecs from a naïve implemenaion. I is used as change deecor since i shrinks window if and only if here has been significan change in recen examples, and esimaor for he curren average of he sequence i is reading since, wih high probabiliy, older pars of he window wih a significanly differen average are auomaically dropped. The proposed classifier uses Hoeffding ree as a base learner. Because of his algorihm woks faser and increases performance. Basically algorihm is ligh weigh means ha i uses less memory. Windowing echnique does no sore whole window explicily bu insead of his i only sores saisics required for furher compuaion. In his proposed algorihm he daa srucure mainains an approximaion of he number of 1 s in a sliding window of lengh W wih logarihmic memory and updae ime. The daa srucure is adapive in a way ha can provide his approximaion simulaneously for abou O(logW) sub windows whose lenghs follow a geomeric law, wih no memory overhead wih respec o keeping he coun for a single window. Keeping exac couns for a fixed-window size is impossible in sub linear memory. This problem can be ackled by shrinking or enlarging he window sraegically so ha wha would oherwise be an approximae coun happens o be exac. More precisely, o design he algorihm a parameer M is chosen which conrols he amoun of memory used o O(MlogW/M) words. IV. EXPERIMENTS AND RESULTS The proposed mehod has esed on boh real and synheic daases. The proposed algorihm has compared wih OzaBag Fig. 2. Proposed agorihm. A. Experimen using Synheic Daa The synheic daa ses are generaed wih drifing concep conceps based on random radial basis funcion (RBF) and SEA. Random Radial Basis Funcion Generaor- The RBF generaor works as follows - A fixed number of random cenroids are generaed. Each cener has a random posiion, a single sandard deviaion, class label and weigh. New examples are generaed by selecing a cener displacemen is randomly drawn from a Gaussian disribuion wih sandard deviaion deermined by he chosen cenroid also deermines he class label of he example. This effecively creaes a normally disribued hypersphere of examples surrounding each cenral poin wih varying densiies. Only numeric aribues are generaed. Drif is inroduced by moving he cenroids wih consan speed. This speed is iniialized by a drif parameer. SEA Generaor- This arificial daase conains abrup concep drif, firs inroduced in [13]. I is generaed using hree aribues, where only he wo firs aribues are relevan. All hree aribues have values beween 0 and 10. The poins of he daase are divided ino 4 blocks wih differen conceps. In each block, he classificaion is done 495

using f 1 +f 2 θ, where f 1 and f 2 represen he firs wo aribues and θ is a hreshold value. The mos frequen values are 8, 9, 7 and 9.5 for he daa blocks. 10% of class noise was hen inroduced ino each block of daa by randomly changing he class value of 10% of insances. LED Generaor- The daase generaed is o predic he digi displayed on a seven segmen LED display, where each aribue has a 10% chance of being invered. I has an opimal Bayes classificaion rae of 74%. The paricular configuraion of he generaor used for experimens (led) produces 24 binary aribues, 17 of which are irrelevan. For all he above daases he ensemble size has se o 10 and he daase size in insances has been seleced as 1 million. All he experimens are carried ou using same parameer seings and prequenial mehod. Prequenial mehod- In Prequenial mehod he error of a model is compued from he sequence of examples. For each example in he sream, he acual model makes a predicion based only on he example aribue-values. The prequenial-error is compued based on an accumulaed sum of a loss funcion beween he predicion and observed values. Comparison in erms of ime (in sec) and memory (in MB) has abulaed in Table I and II respecively. The learning curves using RBF, SEA and LED daases has depiced in Fig. 3, Fig. 4 and Fig. 5. Fig. 3. Learning curves using RBF daase. Fig. 4. Learning curves using SEA daase. TABLE I: COMPARISON IN TERMS OF TIME (IN SEC) USING SYNTHETIC TABLE II: COMPARISON IN TERMS OF MEMORY(IN Mb) USING SYNTHETIC Fig. 5. Learning curves using LED daase. TABLE III: COMPARISON IN TERMS OF TIME(IN SEC) USING REAL B. Experimen using Real daase In his subsecion, he proposed mehod has esed on real daa ses from UCI machine learning reposioryhp://archive.ics.edu/ml/daases.hml. The proposed classifier is compared wih Ozabag, OzaBagASHT, OzaBagADWIN, OzaBoos and OCBoos. The resul is calculaed in erms of ime, accuracy and memory. The comparison in erms of ime, accuracy and memory has abulaed in Table III, Table IV and Table V. TABLE IV: COMPARISON IN TERMS OF ACCURACY(IN %) USING REAL 496

TABLE V: COMPARISON IN TERMS OF MEMORY(IN Mb) USING REAL V. CONCLUSION In his paper we have invesigaed he major issues in classifying large volume, high speed and dynamically changing sreaming daa and proposed a novel approach, adapive ensemble boosing classifier, which uses adapive sliding window and hoeffding ree for improving performance. Comparing wih oher classifier, The adapive ensemble boosing classifier mehod achieves disinc feaures as; i is dynamically adapive, uses less memory and processes daa fasly. Thus, adapive ensemble boosing classifier represens a new mehodology for effecive classificaion of dynamic, fas-growing and large volume daa sreams. REFERENCES [1] G. Widmer and M. Kuba, Learning in he presence of concep drif and hidden conexs, Journal on Machine Learning, vol. 23, no. 1, pp. 69-101, 1996. [2] Y. Freund and R. Schapire, Experimens wih a new boosing algorihm, in he Proceeding of Inernaional Conference of Machine Learning, 1996, pp 148-156. [3] P. Domingos and G. Hulen, Mining high-speed daa sreams, in KDD 01 : Proceedings of he sixh ACM SIDKDD inernaional conference on Knowledge discovery and da mining, New York, NY, USA, ACM Press, 2000, pp. 71-80. [4] G. Hulen, L. Spencer, and P. Domingos, Mining ime-changing daa sreams, in Proc. ACM SIGKDD, San Francisco, CA, USA, 2001, pp. 97-106. [5] W. N. Sree and Y. Kim, A sreaming ensemble algorihm (sea) for large-scale classificaion, in KDD 01 : Proceedings of he sevenh ACM SIDKDD inernaional conference on Knowledge discovery and da mining, New York, NY, USA, ACM Press, 2001, pp. 377-382. [6] H. Wang, W. Fan, P. S. Yu, and J. Han, Mining concep drifing daa sreams using ensemble classifiers, in KDD 03: Proceedings of he ninh ACM SIGKDD inernaional conference on Knowledge discovery and daa mining. New York, NY, USA, ACM Press, 2003, pp 226-235. [7] J. Z. Koler and M. A. Maloof, Dynamic weighed majoriy: a new ensemble mehod for racking concep drif, in 3 rd Inernaional Conference on Daa Mining, ICDM 03, IEEE CS Press, 2003, pp. 123-130. [8] B. Babcock, M. Daar, R. Mowani, and L. O Callaghan, Mainaining variance and k-medians over daa sream windows, in Proceeding of he 22 nd Symposium on Principles of Daabase Sysems, 2003, pp. 234-243. [9] J. Gama, R. Rocha, and P. Medas, Accurae decision rees for mining high-speed daa sreams, in Proceedings of he 9 h ACM SIGKDD Inernaional Conference on Knowledge Discovery and Daa Mining, 2003, pp. 523-528. [10] J. Gama, Pedro Medas, G. Casillo, and P. Rodrigues, Learning wih drif deecion, in Advances in Arificial Inelligence - SBIA 2004, 17 h Brazilian Symposium on Arificial Inelligence, volume 3171 of Lecure Noes in Compuer Science, Springer Verlag, 2004, pp. 286 295. [11] J. Z. Koler and M. A. Maloof, Using addiive exper ensembles o cope wih concep drif, in Proc. Inernaional conference on Machine learning (ICML), Bonn, Germany, 2005, pp. 449-456. [12] G. Cormode, S. Muhukrishnan, and W. Zhuang, Conquering he divide: Coninuous clusering of disribuued daa sreams, in ICDE, 2007, pp. 1036-1045. [13] Alber Bief and Ricard Gavald`a, Learning from ime changing daa wih adapive windowing, in SIAM Inernaional Conference on Daa Mining, 2007, pp. 443-449. [14] M. Daar, A. Gionis, P. Indyk, and R. Mowani, Mainaining sream saisics over sliding windows, SIAM Journal on Compuing, vol. 14, no. 1, pp 27-45, 2002. [15] N. Oza and S. Russell, Online bagging and boosing, in Arificial Inelligence and Saisics, Morgan Kaufmann, 2001, pp. 105-112. [16] R. Pelossof, M. Jones, I. Vovsha, and C. Rudin., Online coordinae boosing, hp://arxiv.org/abs/0810.4553, 2008. [17] A. Bife, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda, New ensemble mehods for evloving daa sreams, in KDD 09, ACM, Paris, 2009, pp. 139-148. [18] Goeffrey Holmes, Richard Kirkby and Bernhard Pfahringer. Moa: Massive [Online]. Analysis, hp://sourceforge.ne/projecs/moa-daasream, 2007. K. K. Wankhade He received B. E. degree in Informaion Technology from Swami Ramanand Teerh Marahwada Universiy, Nanded, India in 2007and M. Tech. degree in Compuer Engineering from Universiy of Pune, Pune, India in 2010. He is currenly working as Assisan Professor he Deparmen of Informaion Technology a G. H. Raisoni College of Engineering, Nagpur, India. Number of publicaions is in repued Journal and Inernaional conferences like Springer and IEEE. His research is on Daa Sream Mining, Machine Learning, Decision Suppor Sysem, Arificial Neural Nework and Embedded Sysem. His book has published on iled Daa Sreams Mining: Classificaion and Applicaion, LAP Publicaion House, Germany, 2010. Mr. Kapil K. Wankhade is a member of IACSIT and IEEE organizaions. He is currenly working as a reviewer for Springer s Evolving Sysem Journal and IJCEE journal. S. S. Dongre She received B. E. degree in Compuer Science and Engineering from P. Ravishankar Shukla Universiy, Raipur, India in 2007 and M. Tech. degree in Compuer Engineering from Universiy of Pune, Pune, India in 2010. She is currenly working as Assisan Professor he Deparmen of Compuer Science and Engineering a G. H. Raisoni College of Engineering, Nagpur, India. Number of publicaions is in repued Inernaional conferences like IEEE and Journals. Her research is on Daa Sream Mining, Machine Learning, Decision Suppor Sysem, ANN and Embedded Sysem. Her book has published on iled Daa Sreams Mining: Classificaion and Applicaion, LAP Publicaion House, Germany, 2010. Ms. Snehlaa S. Dongre is a member of IACSIT, IEEE and ISTE organizaions. 497