Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 Effcent Evolutonay Data Mnng Algothms Appled to the Insuance Faud Pedcton Jenn-Long Lu, Chen-Lang Chen, and Hsng-Hu Yang Abstact Ths study poposes two knds of Evolutonay Data Mnng (EvoDM) algothms to the nsuance faud pedcton. One s GA-Kmeans by combnng K-means algothm wth genetc algothm (GA). The othe s MPSO-Kmeans by combnng K-means algothm wth Momentum-type Patcle Swam Optmzaton (MPSO). The dataset used n ths study s composed of 6 attbutes wth 5000 nstances fo ca nsuance clam. These 5000 nstances ae dvded nto 4000 tanng data and 000 test data. Two dffeent ntal cluste centes fo each attbute ae set by means of (a) selectng the centes andomly fom the tanng set and (b) aveagng all data of tanng set, espectvely. Theeafte, the poposed GA-Kmeans and MPSO-Kmeans ae employed to detemne the optmal weghts and fnal cluste centes fo attbutes, and the accuacy of pedcton fo test set s computed based on the optmal weghts and fnal cluste centes. Results show that the pesented two EvoDM algothms sgnfcantly enhance the accuacy of nsuance faud pedcton when compaed the esults to that of pue K-means algothm. Index Tems Evolutonay data mnng, genetc algothm, nsuance faud pedcton, momentum-type patcle swam optmzaton. I. INTRODUCTION Ths study ams usng two evolutonay data mnng (EvoDM) algothms to evaluate whethe case s a nsuance faud o not. The nsuance faud s a behavo that the benefcay makes up fake affas to apply fo compensaton such that he/she can get llegal benefts to hmself /heself o some othe people. Geneally, the chaactestcs of nsuance faud ae that t s low cost and hgh poft and also t s an ntellgent cme. Moeove, nsuance faud could be an ntenatonal cme, and could happen n any knds of nsuance cases. Recently, thee ae moe and moe new types of nsuance poposed on the makets such that how to detect possble faud events fo a manage/analyst of nsuance company becomes moe mpotant than eve befoe. Ths wok poposes two knds of EvoDM algothms, whch combnes a clusteng algothm, K-means, wth two evolutonay algothms, Genetc Algothm (GA) and Momentum Patcle Swam Optmzaton (MPSO). The two poposed EvoDM algothms ae temed GA-Kmeans and MPSO-Kmeans, espectvely. Ths wok conducts 5000 Manuscpt eceved Apl 5, 202; evsed May 5 202. Ths wok was suppoted n pat by Natonal Scence Councl of Republc of Chna unde Gant Numbe NSC 00-222-E-24-040. Authos ae wth the Infomaton Management Depatment, I-Shou Unvesty, Kaohsung 8400, Tawan (e-mal: jllu@su.edu.tw; muffn.chen@gmal.com; nancyyang@ms.adc.com.tw). nstances of nsuance cases fo data mnng. The 5000 nstances ae dvded nto 4000 nstances to be the tanng set and 000 nstances to be the test set. Futhemoe, ths wok apples K-means, GA-Kmeans and MPSO-Kmeans algothms to evaluate the faud o not fom the tanng set and also evaluate the accuacy of faud pedcton fo the test set. II. CRISP-DM CRISP-DM (Coss Industy Standad Pocess fo Data Mnng) s a data mnng pocess model that descbes commonly used appoaches fo expet data mnes use to solve poblems. CRISP-DM was conceved n late 996 by SPSS (then ISL), NCR and DamleChysle (then Damle-Benz). Also, t s the leadng methodology used by data mnes. CRISP-DM beaks the pocesses of data mnng nto sx majo phases as follows. A. Busness Undestandng Ths s manly on the undestandng of busness poject objectves and equements, ts conveson to a data mnng poblem defnton, and the desgn of a pelmnay plan. B. Data Undestandng Ths phase collects an ntal data and then gets tself famlazed wth many actvtes to be able to dentfy ts qualty poblems, develop ts fst nsghts, o detect some nteestng subsets to fom hypotheses fo the yet-evealed nfomaton. C. Data Pepaaton Ths ncludes actvtes to constuct the fnal dataset based upon the ognal data. It s lkely to be epettously and andomly pefomed. It ncludes table, ecod and attbute selecton, tansfomaton, and the cleanng of data to be used as modelng tools. D. Modelng Hee the paametes ae calbated to optmal values, and dffeent modelng technques ae selected and put to use. Technques used fo the same data mnng poblem ae often wth specfc equements on data fom, whch makes t necessay to often go back to the data pepaaton phase. E. Evaluaton Up to ths phase, a model wth hgh qualty data analyss s bult. Thooughly evaluatng the model and evewng the pefomed steps n the constucton of a model s a must n ts achevement of busness objectves. Some mpotant, yet undecded busness ssue can detemne a key objectve. A decson based on data mnng should be made. 308
Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 F. Deployment The completon of a model s often not the fnal goal though ts pupose s to decphe moe nfomaton fom the data. Infomaton fom the ognal data wll need to be futhe oganzed and then tuned to a fom that can be of use to the custome. Ths often ncludes the applcaton of functonng models n an oganzaton s decson makng pocesses. Ths phase can be both smple and complex, dependng on the equements. It s often s the custome athe than the data analyst who caes ths phase out. It s mpotant fo the custome to ealze actons need to be caed out to the use of the ceated models. III. LITERATURE REVIEW Data Mnng s a cucal step n the Knowledge Dscovey n Database (KDD) pocess that conssts of applyng data analyss and knowledge dscovey algothms to poduce useful pattens (o ules) ove the datasets. Although the data mnng has seveal dffeent defntons fom the scholas, ts pupose s dscoveng useful knowledge and nfomaton fom database. Geneally, data mnng technologes nclude () Assocate Rules, (2) Classfcaton, (3) Clusteng Analyss, (4) Regesson Analyss, (5) Patcle Swam Optmzaton and (6) Tme Sees Analyss, and so on [4], [2]. Ths wok poposes two knds of EvoDM algothms, whch combnes a clusteng algothm, K-means, wth two evolutonay algothms, Genetc Algothm (GA) and Momentum Patcle Swam Optmzaton (MPSO). The below ntoduce clusteng analyss, GA, and MPSO. analyss, patten ecognton, nfomaton eteval, and bonfomatcs. The K-means algothm s the one of often used method n the clusteng algothms. When the numbe of clustes s fxed to k, K-means algothm gves a fomal defnton as an optmzaton poblem to specfy k cluste centes and assgn each nstance to ts belongng cluste wth the smallest dstance fom the nstance to assgned cluste [4]. The flowchat of K-means depcted n Fg.. B. Genetc Algothm Genetc Algothm s a stochastc seach algothm whch based on the Dawnan pncpal of natual selecton and natual genetcs. The selecton s based towad moe hghly ft ndvduals, so the aveage ftness of the populaton tends to mpove fom one geneaton to the next. In geneal, GA geneates an optmal soluton by means of usng epoducton, cossove, and mutaton opeatos [3], [9]. The ftness of the best ndvdual s also expected to mpove ove tme, and the best ndvdual may be selected as a soluton afte seveal geneatons. Geneally, the pseudo-code of the GA s shown as follows: Pocedue: The Hybd Genetc Algothm Begn Ceate ntal populaton andomly; do { Choose a pa of paents fom populaton; /* REPRODUCTION */ chlden=crossover(paent, paent2); MUTATION(chlden); Paents Chlden } whle (stoppng cteon not satsfed); End; Theefoe, the flowchat of GA can be depcted n Fg. 2. Fg.. Flowchat of K-means algothm A. Clusteng Analyss Clusteng Analyss s a man method fo explong data mnng and also s a common technque fo statstcal data analyss. It can be appled to machne leanng, mage Fg. 2. Flowchat of GA algothm C. Patcle Swam Optmzaton The PSO algothm was fst ntoduced by Kennedy and Ebehath [6] n 995. The concept of PSO s that each ndvdual n PSO fles n the seach space wth a velocty 309
Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 whch s dynamcally adjusted accodng to ts own flyng expeence and ts companons flyng expeence. Each ndvdual s teated as volume-less patcle n the D-dmensonal seach space. Sh and Ebehat modfed the ognal PSO n 999 []. The equaton s expessed as follows: k k k v + = wv + c ( Pbest x ) + c2 2 ( Gbest x ) () + + x = x + v, =, 2,..., N (2) patcle whee c and c 2 ae the cogntve and socal leanng ates, espectvely. The andom functon and 2 ae unfomly dstbuted n the ange [0, ]. Equaton () eveals that the lage neta weght pomotes global exploaton, wheeas the small value pomotes a local seach. The flowchat of PSO s depcted n Fg. 3. D. Momentum-type Patcle Swam Optmzaton Lu and Ln poposed a MPSO n 2007 [8] fo mpovng the computatonal effcency and soluton accuacy of Sh and Ebehat s PSO [0]. The ognal PSO developed by Kennedy and Ebehat [6] supposed that the th patcle fles ove a hypespace, wth ts poston and velocty gven by x and v. The best pevous poston of the th patcle s denoted by Pbest. The tem Gbest epesents the best patcle wth the hghest functon value n the populaton. The Lu and Ln s MPSO poposed the next flyng velocty and poston of the patcle at teaton k + by usng the followng heustc equatons: v = β ( Δv ) + c ( Pbest x ) + c2 2 ( Gbest x ) (3) + + x = x + v, =, 2,..., N (4) k+ k patcle whee c and c 2 ae the cogntve and socal leanng ates, espectvely. The andom functon and 2 ae unfomly dstbuted n the ange [0, ]. The value of β s a postve numbe ( 0 β < ) temed the momentum constant, whch contols the ate of change n velocty vecto. Equaton (3) allows each patcle the ablty of dynamc self-adaptaton n the seach space ove tme. That s, the th patcle can memoze the pevous velocty vaaton state and automatcally adjust the next velocty value dung movement. E. C4.5 Algothm To evaluate the algothmc pefomance of ou pesented two EvoDM algothms, ths pape also appled two exsted softwae, C4.5 and Naïve Bayes algothms, to the computaton of the nsuance faud pedcton. C4.5 s an algothm used to geneate a decson tee developed by Ross Qunlan. C4.5 s an extenson of Qunlan's eale ID3 algothm. C4.5 constucts a complete decson tee fst. Then, on each ntenal node, t punes the decson tee accodng to the defned Pedcted Eo Rate. The decson tees geneated by C4.5 can be used fo classfcaton. C4.5 s often efeed to as a statstcal classfe [3]. F. Naïve Bayes Algothm Nave Bayes algothm s a smple pobablstc classfe based on applyng Bayes' theoem wth stong (nave) ndependence assumptons. The man opeatng pncple of Nave Bayesan classfe, s to lean and memoze the cental concept of these tanng samples by classfyng the tanng samples accodng to the selected popetes. Then, apply the leaned categozng concept to the unclassfed data objects and execute the categoy foecast, to gan the taget of the test example. Intalzaton (ntal poston and speed of patcle) Calculate poblem functon, to fnd exteme values of ndvdual and goup PSO two fomulas:enew speed and poston of patcle + v = β ( Δv ) + c ( Pbest x ) + c2 2 ( Gbest x ) k+ k k+ x = x + v No If satsfed stop condton Yes The best paamete Fg. 3. Flowchat of PSO algothm IV. EVOLUTIONARY DATA MINING ALGORITHM In the data mnng feld, clusteng analyss s a vey mpotant technology fo KDD. Ths study ams to fnd nsuance faud cluste optmzaton by EvoDM algothms based on the K-means algothm [4], [2]. In geneal, K-means algothm s a popula method to solve ths knd of clusteng poblem, but the dawback of t s that the accuacy of clusteng esults needs to be futhe mpoved. Theefoe, the K-means clusteng algothm s combned genetc algothms as hybd genetc models [2], [7] to mpove the accuacy of pedcton. Ths study poposes two knds of EvoDM algothms as GA-based K-means and MPSO-based K-means whch ae temed GA-Kmeans and MPSO-Kmeans, espectvely. The flowchats of GA-Kmeans and MPSO-Kmeans ae depcted n Fgs. 4 and 5. The objectve functon, Obj ( w ), fo GA-Kmeans and MPSO-Kmeans s specfed by mnmzng the clusteng eos between classfcaton esults of pedcton (Cped) and ognal (Cactual) fo n tanng data to detemne the optmal weghts ( w ) fo each attbutes as follows. n Obj( w ) = Mn ( ) ( ) C ped C actual (5) = 30
Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 pesented below. () Age: younge than 20 yeas old s 0, 20-40 yeas old s (age-20)/20, 40-60 yeas old s, 60-70 yeas old s -(age-60)/0, olde than 70 yeas old s 0. (2) Gende: male s, female s 0. (3) Clam amount: =Max(-clam amount/5000,0). (4) Tckets: 0 tcket s, tcket s 0.6, ove 2 tckets s 0. (5) Clam tmes: none s, one tme s 0.5, ove 2 tmes s 0. (6) Accompaned wth attoney: none s, othes s 0. (7) Outcome: appoved s 0, faud s. Ths wok specfed sx weghts (w, w 2, w 3, w 4, w 5, w 6 ) fo applyng GA-Kmeans and MPSO-Kmeans algothms due to sx attbutes fo the dataset. All values of w ae specfed n the ange [0, ]. Fg. 4. Flowchat of GA-Kmeans algothm Intalzaton (ntal poston and speed of patcle) Calculate poblem functon, to fnd exteme values of ndvdual and goup PSO two fomulas:enew speed and poston of patcle + v = β ( Δv ) + c ( Pbest x ) + c 2 2 ( Gbest x ) No k+ k k+ x = x + v A. Dataset Sample If satsfed stop condton Yes The best paamete Fg. 5. Flowchat of MPSO-Kmeans algothm V. RESULTS & DISCUSSION K-means esult fo momentum-type POS object fomula settng Ths study uses 5000 nstances of nsuance clam wth sx vaables [2]. The sx vaables ae age, gende, clam amount, tckets, clam tmes, and accompaned wth attoney. Age means the age of the clame. Gende means the clame s gende. Clam amount means the amount of the clam, and tckets stands fo the amount of the tckets the clame that has eceved befoe. Clam tmes epesents the numbe of tmes that the clame has clamed befoe. Accompaned wth attoney shows whethe the clame s accompaned wth an attoney o not. The data types of age, clam amount, tckets and clam tmes ae all numec. The value of gende s male o female. The value of accompaned wth attoney s lawye s name o none. The patal datasets of ognal and optmzed nsuance clam was lsted n Tables I and II, espectvely. The nomalzaton fomulas ae TABLE I: PARTIAL DATA OF ORIGINAL INSURANCE FRAUD DATASET. Instance Age Gende Clam Amount Tckets Clam Attoney Outcome 54 male 2700 0 T 0 none appoved 2 39 male 000 0 0 none appoved 3 8 female 200 0 none appoved 4 42 female 800 0 none appoved 5 8 male 5000 0 3 Gold faud 6 5 female 900 0 none appoved 7 44 male 2300 0 0 none appoved 8 23 Female 4000 3 2 Smth appoved 9 34 Female 2500 0 0 none appoved 0 56 male 2500 0 0 none appoved TABLE II: PARTIAL DATA OF NORMALIZED INSURANCE FRAUD DATASET. Instance Age Gende Clam amount Tckets Clam Attoney Outcome tmes 0.46 0 0 2 0.95 0.8 0 0 3 0 0 0.76 0.5 0 0 4 0 0.64 0.6 0 0 5 0 0 0 6 0 0.62 0.6 0 0 7 0.54 0 0 8 0.5 0 0.2 0 0 0 9 0.7 0 0.5 0 0 0 0.5 0 0 B. Case : Intal Cluste Centes ae Selected Randomly fom Tanng Set Table III lsts the accuacy of usng thee dffeent algothms fo Case whch the ntal cluste centes ae selected fom tanng set andomly. The accuacy evaluated by GA-Kmeans s the same as that of MPSO-Kmeans. Also, t s clealy that the solutons obtaned usng the two EvoDM algothms wee bette than that of K-means. Table IV lsts the optmal weghts of sx attbutes computed by GA-Kmeans and MPSO-Kmeans. The attbutes fo clam amount, clam tmes and attoney wee sgnfcant than othe attbutes fo detemnng the clustes. 3
Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 TABLE III: COMPARISON OF PREDICTION RESULTS OF CASE. Algothm Clusteng (K-means Evolutonay Data Mnng Algothms Data set only) GA-Kmeans MPSO-Kmeans Tanng set 35.62% 85.20% 85.20% Test set 37.90% 86.32% 86.32% TABLE IV: OPTIMAL WEIGHTS OF CASE COMPUTED BY PRESENTED EVODM ALGORITHMS. Weghts fo 6 attbutes GA-Kmeans MPSO-Kmeans w (Age) 0.08937 0.06027 w 2 (Gende) 0.0308 0. w 3 (Clam Amount) 0.94993 0.46535 w 4 (Tckets) 0.0052 0.04573 w 5 (Clam tmes) 0.63839 0.6703 w 6 (Attoney) 0.54930 0.9 C. Case 2: Intal Cluste Centes ae Detemned by Aveagng Tanng Set Table V lsts the accuacy of thee dffeent algothms fo Case 2 whch the ntal centes ae obtaned by aveagng all tanng set fo each attbutes. The oveall accuacy of usng the thee algothms fo the case was hghe than that of the pevous one. Computatonal esults also showed that the accuacy of pesented two EvoDM algothms was bette than that of K-means algothm. Moeove, Table VI lsts the optmal weghts of sx attbutes obtaned usng GA-Kmeans and MPSO-Kmeans algothms. The attbutes fo clam amount and attoney wee elatvely sgnfcant than othe attbutes fo detemnng the clustes. Accodngly, the pesented two EvoDM algothms not only can acheve hgh accuacy of pedcton, but also they can detemne the sgnfcant attbutes automatcally fom all attbutes based on the evaluated weghts. The attbute nfomaton s most useful fo a manage o a staff membe who has the authoty to make a ght decson wth ageement o not when a clent submts the settlement of clams nvolvng nsuance cases. TABLE V: COMPARISON OF PREDICTION RESULTS OF CASE 2. Algothm Clusteng (K-means Evolutonay Data Mnng Algothms Data set only) GA-Kmeans MPSO-Kmeans Tanng set 88.30% 97.60% 97.60% Test set 89.72% 96.50% 96.50% TABLE VI: OPTIMAL WEIGHTS OF CASE 2 COMPUTED BY PRESENTED EVODM ALGORITHMS. Weghts fo 6 attbutes GA-Kmeans MPSO-Kmeans w (Age) 0.09542 0.8947 w 2 (Gende) 0.40204 0.3705 w 3 (Clam Amount) 0.94579 0.9 w 4 (Tckets) 0.7894 0.26487 w 5 (Clam tmes) 0.09067 0.0202 w 6 (Attoney) 0.968 0.69686 D. Confuson Matx Table VII lsts the confuson matx of fou dffeent algothms fo tanng set. The oveall accuacy of usng the fou algothms was vey hgh (ove 96%). Although the accuacy of C4.5 s 98.5% hgh, t cannot classfy any faud case. Naïve Bayes coectly pedcts 2 faud cases. Both of two EvoDM classfy one moe coect faud case than Naïve Bayes. Table VIII lsts the confuson matx of fou dffeent algothms fo test set. The accuaces of all fou algothms ae ove 96%. C4.5 can not coectly pedct any faud case. The accuacy of Naïve Bayes s lttle hghe than EvoDM. The coect pedcton of faud case wth EvoDM s 5 cases and wth Naïve Bayes s 3 cases. TABLE VII: CONFUSION MATRIX OF C4.5, NAÏVE BAYES, AND EVO-DM ALGORITHMS FOR TRAINING SET. Algothm C4.5 Naïve Bayes GA-Kmeans MPSO-Kmeans a b a b a b a b a=appoved 3940 0 3896 44 389 49 389 49 b=faud 60 0 48 2 47 3 47 3 accuacy 98.5% 96.8% 97.6% 97.6% TABLE VIII: CONFUSION MATRIX OF C4.5, NAÏVE BAYES, AND EVO-DM ALGORITHMS FOR TEST SET. Algothm C4.5 Naïve Bayes GA-Kmeans MPSO-Kmeans a b a b a b a b a=appoved 978 0 965 3 960 8 960 8 b=faud 22 0 9 3 7 5 7 5 accuacy 97.8% 96.8% 96.5% 96.5% VI. CONCLUSION Ths study ntoduced the K-means algothm and two EvoDM algothms ncludng GA-Kmeans and MPSO-Kmeans algothms to the nsuance faud pedcton. The two EvoDM algothms wee hybd by ncopoatng the K-means algothm wth GA and MPSO, espectvely. Two ntal cluste centes condtons wee studed to check the obustness of the algothms. Fom ou computatonal esults, the accuacy fo test set pedcton obtaned usng GA-Kmeans and MPSO-Kmeans algothms was 86.32% fo Case whch the ntal cluste centes wee selected fom tanng set andomly, wheeas the accuacy obtaned usng K-means algothm was 37.9% only. Fom the weght dstbuton of Case, the attbutes of clam amount, clam tmes and attoney showed the elatvely mpotant n judgng the nsuance faud. Futhemoe, ths wok made changes fo the ntal cluste centes, temed Case 2, by aveagng all the data tanng set fo each attbutes. The accuacy fo test set pedcton obtaned usng GA-Kmeans and MPSO-Kmeans algothms fo Case 2 was sgnfcantly enhanced to 96.5% whle the accuacy obtaned usng K-means algothm was 89.72%. Fom the weght dstbuton of Case 2, the attbutes of clam amount and attoney demonstated elatvely mpotant n judgng nsuance faud. Accodngly, the accuacy of nsuance faud pedcton can be enhanced by usng the pesented two EvoDM algothms. 32
Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 The man pupose of the nsuance faud pedcton s to fnd out the faud cases coectly. Nomally, the pobablty of faud cases s so small that even f msjudgment of faud cases occus, the accuacy s stll hgh. As lsted n Table VII and VIII, even C4.5 algothm can t pedct evey faud case coectly, the accuacy of pedcton s stll hghe than 97.8%. Although GA-Kmeans and MPSO-Kmeans ae not the best n pedcton accuacy, they can fnd moe faud cases than the othes. REFERENCES [] W. H. Au, K. C. C. Chan, and X. Yao. A Novel Evolutonay Data Mnng Algothm wth Applcatons to Chun Pedcton, IEEE Tansactons on Evolutonay Computaton, vol. 7, pp. 532-545, Dec. 2003. [2] A. Babazon, and P. Keenan, A Hybd Genetc Model fo the Pedcton of Copoate Falue, Computatonal Management Scence. vol., no. 3, pp. 293-30, Oct. 2004. [3] D. E. Goldbeg, Genetc Algothms n Seach, Optmzaton, and Machne Leanng, Addson Wesley, 989. [4] J. Han, and M. Kambe, Data Mnng: Concepts and Technques, Mogan Kaufmann Publshes, 200. [5] M. Kantadzc, Data Mnng: Concepts, Models, Methods, and Algothms, John Wley & Sons, 2002. [6] J. Kennedy, and R. Ebehat, Patcle Swam Optmzaton, n Poc. IEEE Int. Conf. on Neual Netwoks (Peth, Austala), IEEE Sevce Cente, Pscataway, NJ. vol. 4, Nov. 995, pp. 942-948. [7] P. C. Ln, and J. S. Chen, A Genetc-Based Hybd Appoach to Copoate Falue Pedcton, Intenatonal Jounal of Electonc Fnance. vol. 2, no. 2, pp. 24-255, Ma. 2008. [8] J. L. Lu, and J. H. Ln, Evolutonay Computaton of Unconstaned and Constaned Poblems Usng a Novel Momentum-type Patcle Swam Optmzaton, Engneeng Optmzaton. vol. 39, no. 3, pp. 287-305, Ap. 2007. [9] Z. Mchalewcz, Genetc Algothms + Data Stuctues = Evoluton Pogams, 3d ed., Spnge-Velag, 999. [0] Y. Sh, and R. Ebehat, A Modfed Patcle Swam Optmze, n Poc. of IEEE Intenatonal Confeence on Evolutonay Computaton (ICEC), pp. 69-73, May 998. [] Y. Sh, and R. Ebehat, Empcal study of patcle swam optmzaton, n Poceedngs of the 999 Congess on Evolutonay Computaton, July 999, pp. 945-950. [2] D. Olson, and Y. Sh, Intoducton to Busness Data Mnng, McGaw-Hll Educaton, 2008. [3] J. R. Qunlan, C4.5: Pogams fo Machne Leanng, Mogan Kaufmann Publshes, 993. 33