Efficient Evolutionary Data Mining Algorithms Applied to the Insurance Fraud Prediction



Similar documents
Additional File 1 - A model-based circular binary segmentation algorithm for the analysis of array CGH data

AREA COVERAGE SIMULATIONS FOR MILLIMETER POINT-TO-MULTIPOINT SYSTEMS USING STATISTICAL MODEL OF BUILDING BLOCKAGE

Keywords: Transportation network, Hazardous materials, Risk index, Routing, Network optimization.

(Semi)Parametric Models vs Nonparametric Models

Perturbation Theory and Celestial Mechanics

A New replenishment Policy in a Two-echelon Inventory System with Stochastic Demand

Mixed Task Scheduling and Resource Allocation Problems

Joint Virtual Machine and Bandwidth Allocation in Software Defined Network (SDN) and Cloud Computing Environments

A Novel Lightweight Algorithm for Secure Network Coding

On the Efficiency of Equilibria in Generalized Second Price Auctions

A Coverage Gap Filling Algorithm in Hybrid Sensor Network

Research on Cloud Computing Load Balancing Based on Virtual Machine Migration

TRUCK ROUTE PLANNING IN NON- STATIONARY STOCHASTIC NETWORKS WITH TIME-WINDOWS AT CUSTOMER LOCATIONS

AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM

An Algorithm For Factoring Integers

PCA vs. Varimax rotation

Orbit dynamics and kinematics with full quaternions

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

econstor zbw

PREVENTIVE AND CORRECTIVE SECURITY MARKET MODEL

Statistical modelling of gambling probabilities

A New Estimation Model for Small Organic Software Project

The transport performance evaluation system building of logistics enterprises

Bending Stresses for Simple Shapes

A Mathematical Model for Selecting Third-Party Reverse Logistics Providers

Gravitation. Definition of Weight Revisited. Newton s Law of Universal Gravitation. Newton s Law of Universal Gravitation. Gravitational Field

Reduced Pattern Training Based on Task Decomposition Using Pattern Distributor

Questions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing

Prejudice and the Economics of Discrimination

LINES ON BRIESKORN-PHAM SURFACES

Ilona V. Tregub, ScD., Professor

AN EQUILIBRIUM ANALYSIS OF THE INSURANCE MARKET WITH VERTICAL DIFFERENTIATION

Security of Full-State Keyed Sponge and Duplex: Applications to Authenticated Encryption

INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS

Effect of Contention Window on the Performance of IEEE WLANs

A PARTICLE-BASED LAGRANGIAN CFD TOOL FOR FREE-SURFACE SIMULATION

Software Engineering and Development

Statistical Discrimination or Prejudice? A Large Sample Field Experiment. Michael Ewens, Bryan Tomlin, and Liang Choon Wang.

Modeling and computing constrained

Charging the Internet Without Bandwidth Reservation: An Overview and Bibliography of Mathematical Approaches

Electric Potential. otherwise to move the object from initial point i to final point f

Determinants of Borrowing Limits on Credit Cards Shubhasis Dey and Gene Mumy

How a Global Inter-Country Input-Output Table with Processing Trade Account. Can be constructed from GTAP Database

Spirotechnics! September 7, Amanda Zeringue, Michael Spannuth and Amanda Zeringue Dierential Geometry Project

The Greedy Method. Introduction. 0/1 Knapsack Problem

REAL TIME MONITORING OF DISTRIBUTION NETWORKS USING INTERNET BASED PMU. Akanksha Eknath Pachpinde

Molecular Dynamics. r F. r dt. What is molecular dynamics?

Continuous Compounding and Annualization

High Availability Replication Strategy for Deduplication Storage System

The Can-Order Policy for One-Warehouse N-Retailer Inventory System: A Heuristic Approach

Department of Economics Working Paper Series

A DATA MINING APPLICATION IN A STUDENT DATABASE

Automatic Testing of Neighbor Discovery Protocol Based on FSM and TTCN*

Review Graph based Online Store Review Spammer Detection

International Business Cycles and Exchange Rates

An application of stochastic programming in solving capacity allocation and migration planning problem under uncertainty

Drag force acting on a bubble in a cloud of compressible spherical bubbles at large Reynolds numbers

Forecasting the Direction and Strength of Stock Market Movement

The Detection of Obstacles Using Features by the Horizon View Camera

Project Networks With Mixed-Time Constraints

A Secure Password-Authenticated Key Agreement Using Smart Cards

Improving Software Effort Estimation Using Neuro-Fuzzy Model with SEER-SEM

Order-Degree Curves for Hypergeometric Creative Telescoping

HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

Converting knowledge Into Practice

ON THE (Q, R) POLICY IN PRODUCTION-INVENTORY SYSTEMS

STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION

A Markov Chain Grey Forecasting Model: A Case Study of Energy Demand of Industry Sector in Iran

A SECURE CLOUD ARCHITECTURE FOR PUBLIC AUDITING BY USING SHARED MECHANISM

Loyalty Rewards and Gift Card Programs: Basic Actuarial Estimation Techniques

Tracking/Fusion and Deghosting with Doppler Frequency from Two Passive Acoustic Sensors

SUPPORT VECTOR MACHINE FOR BANDWIDTH ANALYSIS OF SLOTTED MICROSTRIP ANTENNA

REAL INTERPOLATION OF SOBOLEV SPACES

Cloud Service Reliability: Modeling and Analysis

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

MATHEMATICAL SIMULATION OF MASS SPECTRUM

est using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years.

An Efficient Group Key Agreement Protocol for Ad hoc Networks

Simultaneous Detection and Estimation, False Alarm Prediction for a Continuous Family of Signals in Gaussian Noise

FAIR VALUATION OF VARIOUS PARTICIPATION SCHEMES IN LIFE INSURANCE ABSTRACT

Research on Risk Assessment of the Transformer Based on Life Cycle Cost

The LCOE is defined as the energy price ($ per unit of energy output) for which the Net Present Value of the investment is zero.

Supplementary Material for EpiDiff

A framework for the selection of enterprise resource planning (ERP) system based on fuzzy decision making methods

Comparing Availability of Various Rack Power Redundancy Configurations

Competitive Targeted Advertising with Price Discrimination

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Redesign of a University Hospital Preanesthesia Evaluation Clinic. using a Queuing Theory Approach

METHODOLOGICAL APPROACH TO STRATEGIC PERFORMANCE OPTIMIZATION

9:6.4 Sample Questions/Requests for Managing Underwriter Candidates

Transcription:

Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 Effcent Evolutonay Data Mnng Algothms Appled to the Insuance Faud Pedcton Jenn-Long Lu, Chen-Lang Chen, and Hsng-Hu Yang Abstact Ths study poposes two knds of Evolutonay Data Mnng (EvoDM) algothms to the nsuance faud pedcton. One s GA-Kmeans by combnng K-means algothm wth genetc algothm (GA). The othe s MPSO-Kmeans by combnng K-means algothm wth Momentum-type Patcle Swam Optmzaton (MPSO). The dataset used n ths study s composed of 6 attbutes wth 5000 nstances fo ca nsuance clam. These 5000 nstances ae dvded nto 4000 tanng data and 000 test data. Two dffeent ntal cluste centes fo each attbute ae set by means of (a) selectng the centes andomly fom the tanng set and (b) aveagng all data of tanng set, espectvely. Theeafte, the poposed GA-Kmeans and MPSO-Kmeans ae employed to detemne the optmal weghts and fnal cluste centes fo attbutes, and the accuacy of pedcton fo test set s computed based on the optmal weghts and fnal cluste centes. Results show that the pesented two EvoDM algothms sgnfcantly enhance the accuacy of nsuance faud pedcton when compaed the esults to that of pue K-means algothm. Index Tems Evolutonay data mnng, genetc algothm, nsuance faud pedcton, momentum-type patcle swam optmzaton. I. INTRODUCTION Ths study ams usng two evolutonay data mnng (EvoDM) algothms to evaluate whethe case s a nsuance faud o not. The nsuance faud s a behavo that the benefcay makes up fake affas to apply fo compensaton such that he/she can get llegal benefts to hmself /heself o some othe people. Geneally, the chaactestcs of nsuance faud ae that t s low cost and hgh poft and also t s an ntellgent cme. Moeove, nsuance faud could be an ntenatonal cme, and could happen n any knds of nsuance cases. Recently, thee ae moe and moe new types of nsuance poposed on the makets such that how to detect possble faud events fo a manage/analyst of nsuance company becomes moe mpotant than eve befoe. Ths wok poposes two knds of EvoDM algothms, whch combnes a clusteng algothm, K-means, wth two evolutonay algothms, Genetc Algothm (GA) and Momentum Patcle Swam Optmzaton (MPSO). The two poposed EvoDM algothms ae temed GA-Kmeans and MPSO-Kmeans, espectvely. Ths wok conducts 5000 Manuscpt eceved Apl 5, 202; evsed May 5 202. Ths wok was suppoted n pat by Natonal Scence Councl of Republc of Chna unde Gant Numbe NSC 00-222-E-24-040. Authos ae wth the Infomaton Management Depatment, I-Shou Unvesty, Kaohsung 8400, Tawan (e-mal: jllu@su.edu.tw; muffn.chen@gmal.com; nancyyang@ms.adc.com.tw). nstances of nsuance cases fo data mnng. The 5000 nstances ae dvded nto 4000 nstances to be the tanng set and 000 nstances to be the test set. Futhemoe, ths wok apples K-means, GA-Kmeans and MPSO-Kmeans algothms to evaluate the faud o not fom the tanng set and also evaluate the accuacy of faud pedcton fo the test set. II. CRISP-DM CRISP-DM (Coss Industy Standad Pocess fo Data Mnng) s a data mnng pocess model that descbes commonly used appoaches fo expet data mnes use to solve poblems. CRISP-DM was conceved n late 996 by SPSS (then ISL), NCR and DamleChysle (then Damle-Benz). Also, t s the leadng methodology used by data mnes. CRISP-DM beaks the pocesses of data mnng nto sx majo phases as follows. A. Busness Undestandng Ths s manly on the undestandng of busness poject objectves and equements, ts conveson to a data mnng poblem defnton, and the desgn of a pelmnay plan. B. Data Undestandng Ths phase collects an ntal data and then gets tself famlazed wth many actvtes to be able to dentfy ts qualty poblems, develop ts fst nsghts, o detect some nteestng subsets to fom hypotheses fo the yet-evealed nfomaton. C. Data Pepaaton Ths ncludes actvtes to constuct the fnal dataset based upon the ognal data. It s lkely to be epettously and andomly pefomed. It ncludes table, ecod and attbute selecton, tansfomaton, and the cleanng of data to be used as modelng tools. D. Modelng Hee the paametes ae calbated to optmal values, and dffeent modelng technques ae selected and put to use. Technques used fo the same data mnng poblem ae often wth specfc equements on data fom, whch makes t necessay to often go back to the data pepaaton phase. E. Evaluaton Up to ths phase, a model wth hgh qualty data analyss s bult. Thooughly evaluatng the model and evewng the pefomed steps n the constucton of a model s a must n ts achevement of busness objectves. Some mpotant, yet undecded busness ssue can detemne a key objectve. A decson based on data mnng should be made. 308

Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 F. Deployment The completon of a model s often not the fnal goal though ts pupose s to decphe moe nfomaton fom the data. Infomaton fom the ognal data wll need to be futhe oganzed and then tuned to a fom that can be of use to the custome. Ths often ncludes the applcaton of functonng models n an oganzaton s decson makng pocesses. Ths phase can be both smple and complex, dependng on the equements. It s often s the custome athe than the data analyst who caes ths phase out. It s mpotant fo the custome to ealze actons need to be caed out to the use of the ceated models. III. LITERATURE REVIEW Data Mnng s a cucal step n the Knowledge Dscovey n Database (KDD) pocess that conssts of applyng data analyss and knowledge dscovey algothms to poduce useful pattens (o ules) ove the datasets. Although the data mnng has seveal dffeent defntons fom the scholas, ts pupose s dscoveng useful knowledge and nfomaton fom database. Geneally, data mnng technologes nclude () Assocate Rules, (2) Classfcaton, (3) Clusteng Analyss, (4) Regesson Analyss, (5) Patcle Swam Optmzaton and (6) Tme Sees Analyss, and so on [4], [2]. Ths wok poposes two knds of EvoDM algothms, whch combnes a clusteng algothm, K-means, wth two evolutonay algothms, Genetc Algothm (GA) and Momentum Patcle Swam Optmzaton (MPSO). The below ntoduce clusteng analyss, GA, and MPSO. analyss, patten ecognton, nfomaton eteval, and bonfomatcs. The K-means algothm s the one of often used method n the clusteng algothms. When the numbe of clustes s fxed to k, K-means algothm gves a fomal defnton as an optmzaton poblem to specfy k cluste centes and assgn each nstance to ts belongng cluste wth the smallest dstance fom the nstance to assgned cluste [4]. The flowchat of K-means depcted n Fg.. B. Genetc Algothm Genetc Algothm s a stochastc seach algothm whch based on the Dawnan pncpal of natual selecton and natual genetcs. The selecton s based towad moe hghly ft ndvduals, so the aveage ftness of the populaton tends to mpove fom one geneaton to the next. In geneal, GA geneates an optmal soluton by means of usng epoducton, cossove, and mutaton opeatos [3], [9]. The ftness of the best ndvdual s also expected to mpove ove tme, and the best ndvdual may be selected as a soluton afte seveal geneatons. Geneally, the pseudo-code of the GA s shown as follows: Pocedue: The Hybd Genetc Algothm Begn Ceate ntal populaton andomly; do { Choose a pa of paents fom populaton; /* REPRODUCTION */ chlden=crossover(paent, paent2); MUTATION(chlden); Paents Chlden } whle (stoppng cteon not satsfed); End; Theefoe, the flowchat of GA can be depcted n Fg. 2. Fg.. Flowchat of K-means algothm A. Clusteng Analyss Clusteng Analyss s a man method fo explong data mnng and also s a common technque fo statstcal data analyss. It can be appled to machne leanng, mage Fg. 2. Flowchat of GA algothm C. Patcle Swam Optmzaton The PSO algothm was fst ntoduced by Kennedy and Ebehath [6] n 995. The concept of PSO s that each ndvdual n PSO fles n the seach space wth a velocty 309

Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 whch s dynamcally adjusted accodng to ts own flyng expeence and ts companons flyng expeence. Each ndvdual s teated as volume-less patcle n the D-dmensonal seach space. Sh and Ebehat modfed the ognal PSO n 999 []. The equaton s expessed as follows: k k k v + = wv + c ( Pbest x ) + c2 2 ( Gbest x ) () + + x = x + v, =, 2,..., N (2) patcle whee c and c 2 ae the cogntve and socal leanng ates, espectvely. The andom functon and 2 ae unfomly dstbuted n the ange [0, ]. Equaton () eveals that the lage neta weght pomotes global exploaton, wheeas the small value pomotes a local seach. The flowchat of PSO s depcted n Fg. 3. D. Momentum-type Patcle Swam Optmzaton Lu and Ln poposed a MPSO n 2007 [8] fo mpovng the computatonal effcency and soluton accuacy of Sh and Ebehat s PSO [0]. The ognal PSO developed by Kennedy and Ebehat [6] supposed that the th patcle fles ove a hypespace, wth ts poston and velocty gven by x and v. The best pevous poston of the th patcle s denoted by Pbest. The tem Gbest epesents the best patcle wth the hghest functon value n the populaton. The Lu and Ln s MPSO poposed the next flyng velocty and poston of the patcle at teaton k + by usng the followng heustc equatons: v = β ( Δv ) + c ( Pbest x ) + c2 2 ( Gbest x ) (3) + + x = x + v, =, 2,..., N (4) k+ k patcle whee c and c 2 ae the cogntve and socal leanng ates, espectvely. The andom functon and 2 ae unfomly dstbuted n the ange [0, ]. The value of β s a postve numbe ( 0 β < ) temed the momentum constant, whch contols the ate of change n velocty vecto. Equaton (3) allows each patcle the ablty of dynamc self-adaptaton n the seach space ove tme. That s, the th patcle can memoze the pevous velocty vaaton state and automatcally adjust the next velocty value dung movement. E. C4.5 Algothm To evaluate the algothmc pefomance of ou pesented two EvoDM algothms, ths pape also appled two exsted softwae, C4.5 and Naïve Bayes algothms, to the computaton of the nsuance faud pedcton. C4.5 s an algothm used to geneate a decson tee developed by Ross Qunlan. C4.5 s an extenson of Qunlan's eale ID3 algothm. C4.5 constucts a complete decson tee fst. Then, on each ntenal node, t punes the decson tee accodng to the defned Pedcted Eo Rate. The decson tees geneated by C4.5 can be used fo classfcaton. C4.5 s often efeed to as a statstcal classfe [3]. F. Naïve Bayes Algothm Nave Bayes algothm s a smple pobablstc classfe based on applyng Bayes' theoem wth stong (nave) ndependence assumptons. The man opeatng pncple of Nave Bayesan classfe, s to lean and memoze the cental concept of these tanng samples by classfyng the tanng samples accodng to the selected popetes. Then, apply the leaned categozng concept to the unclassfed data objects and execute the categoy foecast, to gan the taget of the test example. Intalzaton (ntal poston and speed of patcle) Calculate poblem functon, to fnd exteme values of ndvdual and goup PSO two fomulas:enew speed and poston of patcle + v = β ( Δv ) + c ( Pbest x ) + c2 2 ( Gbest x ) k+ k k+ x = x + v No If satsfed stop condton Yes The best paamete Fg. 3. Flowchat of PSO algothm IV. EVOLUTIONARY DATA MINING ALGORITHM In the data mnng feld, clusteng analyss s a vey mpotant technology fo KDD. Ths study ams to fnd nsuance faud cluste optmzaton by EvoDM algothms based on the K-means algothm [4], [2]. In geneal, K-means algothm s a popula method to solve ths knd of clusteng poblem, but the dawback of t s that the accuacy of clusteng esults needs to be futhe mpoved. Theefoe, the K-means clusteng algothm s combned genetc algothms as hybd genetc models [2], [7] to mpove the accuacy of pedcton. Ths study poposes two knds of EvoDM algothms as GA-based K-means and MPSO-based K-means whch ae temed GA-Kmeans and MPSO-Kmeans, espectvely. The flowchats of GA-Kmeans and MPSO-Kmeans ae depcted n Fgs. 4 and 5. The objectve functon, Obj ( w ), fo GA-Kmeans and MPSO-Kmeans s specfed by mnmzng the clusteng eos between classfcaton esults of pedcton (Cped) and ognal (Cactual) fo n tanng data to detemne the optmal weghts ( w ) fo each attbutes as follows. n Obj( w ) = Mn ( ) ( ) C ped C actual (5) = 30

Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 pesented below. () Age: younge than 20 yeas old s 0, 20-40 yeas old s (age-20)/20, 40-60 yeas old s, 60-70 yeas old s -(age-60)/0, olde than 70 yeas old s 0. (2) Gende: male s, female s 0. (3) Clam amount: =Max(-clam amount/5000,0). (4) Tckets: 0 tcket s, tcket s 0.6, ove 2 tckets s 0. (5) Clam tmes: none s, one tme s 0.5, ove 2 tmes s 0. (6) Accompaned wth attoney: none s, othes s 0. (7) Outcome: appoved s 0, faud s. Ths wok specfed sx weghts (w, w 2, w 3, w 4, w 5, w 6 ) fo applyng GA-Kmeans and MPSO-Kmeans algothms due to sx attbutes fo the dataset. All values of w ae specfed n the ange [0, ]. Fg. 4. Flowchat of GA-Kmeans algothm Intalzaton (ntal poston and speed of patcle) Calculate poblem functon, to fnd exteme values of ndvdual and goup PSO two fomulas:enew speed and poston of patcle + v = β ( Δv ) + c ( Pbest x ) + c 2 2 ( Gbest x ) No k+ k k+ x = x + v A. Dataset Sample If satsfed stop condton Yes The best paamete Fg. 5. Flowchat of MPSO-Kmeans algothm V. RESULTS & DISCUSSION K-means esult fo momentum-type POS object fomula settng Ths study uses 5000 nstances of nsuance clam wth sx vaables [2]. The sx vaables ae age, gende, clam amount, tckets, clam tmes, and accompaned wth attoney. Age means the age of the clame. Gende means the clame s gende. Clam amount means the amount of the clam, and tckets stands fo the amount of the tckets the clame that has eceved befoe. Clam tmes epesents the numbe of tmes that the clame has clamed befoe. Accompaned wth attoney shows whethe the clame s accompaned wth an attoney o not. The data types of age, clam amount, tckets and clam tmes ae all numec. The value of gende s male o female. The value of accompaned wth attoney s lawye s name o none. The patal datasets of ognal and optmzed nsuance clam was lsted n Tables I and II, espectvely. The nomalzaton fomulas ae TABLE I: PARTIAL DATA OF ORIGINAL INSURANCE FRAUD DATASET. Instance Age Gende Clam Amount Tckets Clam Attoney Outcome 54 male 2700 0 T 0 none appoved 2 39 male 000 0 0 none appoved 3 8 female 200 0 none appoved 4 42 female 800 0 none appoved 5 8 male 5000 0 3 Gold faud 6 5 female 900 0 none appoved 7 44 male 2300 0 0 none appoved 8 23 Female 4000 3 2 Smth appoved 9 34 Female 2500 0 0 none appoved 0 56 male 2500 0 0 none appoved TABLE II: PARTIAL DATA OF NORMALIZED INSURANCE FRAUD DATASET. Instance Age Gende Clam amount Tckets Clam Attoney Outcome tmes 0.46 0 0 2 0.95 0.8 0 0 3 0 0 0.76 0.5 0 0 4 0 0.64 0.6 0 0 5 0 0 0 6 0 0.62 0.6 0 0 7 0.54 0 0 8 0.5 0 0.2 0 0 0 9 0.7 0 0.5 0 0 0 0.5 0 0 B. Case : Intal Cluste Centes ae Selected Randomly fom Tanng Set Table III lsts the accuacy of usng thee dffeent algothms fo Case whch the ntal cluste centes ae selected fom tanng set andomly. The accuacy evaluated by GA-Kmeans s the same as that of MPSO-Kmeans. Also, t s clealy that the solutons obtaned usng the two EvoDM algothms wee bette than that of K-means. Table IV lsts the optmal weghts of sx attbutes computed by GA-Kmeans and MPSO-Kmeans. The attbutes fo clam amount, clam tmes and attoney wee sgnfcant than othe attbutes fo detemnng the clustes. 3

Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 TABLE III: COMPARISON OF PREDICTION RESULTS OF CASE. Algothm Clusteng (K-means Evolutonay Data Mnng Algothms Data set only) GA-Kmeans MPSO-Kmeans Tanng set 35.62% 85.20% 85.20% Test set 37.90% 86.32% 86.32% TABLE IV: OPTIMAL WEIGHTS OF CASE COMPUTED BY PRESENTED EVODM ALGORITHMS. Weghts fo 6 attbutes GA-Kmeans MPSO-Kmeans w (Age) 0.08937 0.06027 w 2 (Gende) 0.0308 0. w 3 (Clam Amount) 0.94993 0.46535 w 4 (Tckets) 0.0052 0.04573 w 5 (Clam tmes) 0.63839 0.6703 w 6 (Attoney) 0.54930 0.9 C. Case 2: Intal Cluste Centes ae Detemned by Aveagng Tanng Set Table V lsts the accuacy of thee dffeent algothms fo Case 2 whch the ntal centes ae obtaned by aveagng all tanng set fo each attbutes. The oveall accuacy of usng the thee algothms fo the case was hghe than that of the pevous one. Computatonal esults also showed that the accuacy of pesented two EvoDM algothms was bette than that of K-means algothm. Moeove, Table VI lsts the optmal weghts of sx attbutes obtaned usng GA-Kmeans and MPSO-Kmeans algothms. The attbutes fo clam amount and attoney wee elatvely sgnfcant than othe attbutes fo detemnng the clustes. Accodngly, the pesented two EvoDM algothms not only can acheve hgh accuacy of pedcton, but also they can detemne the sgnfcant attbutes automatcally fom all attbutes based on the evaluated weghts. The attbute nfomaton s most useful fo a manage o a staff membe who has the authoty to make a ght decson wth ageement o not when a clent submts the settlement of clams nvolvng nsuance cases. TABLE V: COMPARISON OF PREDICTION RESULTS OF CASE 2. Algothm Clusteng (K-means Evolutonay Data Mnng Algothms Data set only) GA-Kmeans MPSO-Kmeans Tanng set 88.30% 97.60% 97.60% Test set 89.72% 96.50% 96.50% TABLE VI: OPTIMAL WEIGHTS OF CASE 2 COMPUTED BY PRESENTED EVODM ALGORITHMS. Weghts fo 6 attbutes GA-Kmeans MPSO-Kmeans w (Age) 0.09542 0.8947 w 2 (Gende) 0.40204 0.3705 w 3 (Clam Amount) 0.94579 0.9 w 4 (Tckets) 0.7894 0.26487 w 5 (Clam tmes) 0.09067 0.0202 w 6 (Attoney) 0.968 0.69686 D. Confuson Matx Table VII lsts the confuson matx of fou dffeent algothms fo tanng set. The oveall accuacy of usng the fou algothms was vey hgh (ove 96%). Although the accuacy of C4.5 s 98.5% hgh, t cannot classfy any faud case. Naïve Bayes coectly pedcts 2 faud cases. Both of two EvoDM classfy one moe coect faud case than Naïve Bayes. Table VIII lsts the confuson matx of fou dffeent algothms fo test set. The accuaces of all fou algothms ae ove 96%. C4.5 can not coectly pedct any faud case. The accuacy of Naïve Bayes s lttle hghe than EvoDM. The coect pedcton of faud case wth EvoDM s 5 cases and wth Naïve Bayes s 3 cases. TABLE VII: CONFUSION MATRIX OF C4.5, NAÏVE BAYES, AND EVO-DM ALGORITHMS FOR TRAINING SET. Algothm C4.5 Naïve Bayes GA-Kmeans MPSO-Kmeans a b a b a b a b a=appoved 3940 0 3896 44 389 49 389 49 b=faud 60 0 48 2 47 3 47 3 accuacy 98.5% 96.8% 97.6% 97.6% TABLE VIII: CONFUSION MATRIX OF C4.5, NAÏVE BAYES, AND EVO-DM ALGORITHMS FOR TEST SET. Algothm C4.5 Naïve Bayes GA-Kmeans MPSO-Kmeans a b a b a b a b a=appoved 978 0 965 3 960 8 960 8 b=faud 22 0 9 3 7 5 7 5 accuacy 97.8% 96.8% 96.5% 96.5% VI. CONCLUSION Ths study ntoduced the K-means algothm and two EvoDM algothms ncludng GA-Kmeans and MPSO-Kmeans algothms to the nsuance faud pedcton. The two EvoDM algothms wee hybd by ncopoatng the K-means algothm wth GA and MPSO, espectvely. Two ntal cluste centes condtons wee studed to check the obustness of the algothms. Fom ou computatonal esults, the accuacy fo test set pedcton obtaned usng GA-Kmeans and MPSO-Kmeans algothms was 86.32% fo Case whch the ntal cluste centes wee selected fom tanng set andomly, wheeas the accuacy obtaned usng K-means algothm was 37.9% only. Fom the weght dstbuton of Case, the attbutes of clam amount, clam tmes and attoney showed the elatvely mpotant n judgng the nsuance faud. Futhemoe, ths wok made changes fo the ntal cluste centes, temed Case 2, by aveagng all the data tanng set fo each attbutes. The accuacy fo test set pedcton obtaned usng GA-Kmeans and MPSO-Kmeans algothms fo Case 2 was sgnfcantly enhanced to 96.5% whle the accuacy obtaned usng K-means algothm was 89.72%. Fom the weght dstbuton of Case 2, the attbutes of clam amount and attoney demonstated elatvely mpotant n judgng nsuance faud. Accodngly, the accuacy of nsuance faud pedcton can be enhanced by usng the pesented two EvoDM algothms. 32

Intenatonal Jounal of Machne Leanng and Computng, Vol. 2, No. 3, June 202 The man pupose of the nsuance faud pedcton s to fnd out the faud cases coectly. Nomally, the pobablty of faud cases s so small that even f msjudgment of faud cases occus, the accuacy s stll hgh. As lsted n Table VII and VIII, even C4.5 algothm can t pedct evey faud case coectly, the accuacy of pedcton s stll hghe than 97.8%. Although GA-Kmeans and MPSO-Kmeans ae not the best n pedcton accuacy, they can fnd moe faud cases than the othes. REFERENCES [] W. H. Au, K. C. C. Chan, and X. Yao. A Novel Evolutonay Data Mnng Algothm wth Applcatons to Chun Pedcton, IEEE Tansactons on Evolutonay Computaton, vol. 7, pp. 532-545, Dec. 2003. [2] A. Babazon, and P. Keenan, A Hybd Genetc Model fo the Pedcton of Copoate Falue, Computatonal Management Scence. vol., no. 3, pp. 293-30, Oct. 2004. [3] D. E. Goldbeg, Genetc Algothms n Seach, Optmzaton, and Machne Leanng, Addson Wesley, 989. [4] J. Han, and M. Kambe, Data Mnng: Concepts and Technques, Mogan Kaufmann Publshes, 200. [5] M. Kantadzc, Data Mnng: Concepts, Models, Methods, and Algothms, John Wley & Sons, 2002. [6] J. Kennedy, and R. Ebehat, Patcle Swam Optmzaton, n Poc. IEEE Int. Conf. on Neual Netwoks (Peth, Austala), IEEE Sevce Cente, Pscataway, NJ. vol. 4, Nov. 995, pp. 942-948. [7] P. C. Ln, and J. S. Chen, A Genetc-Based Hybd Appoach to Copoate Falue Pedcton, Intenatonal Jounal of Electonc Fnance. vol. 2, no. 2, pp. 24-255, Ma. 2008. [8] J. L. Lu, and J. H. Ln, Evolutonay Computaton of Unconstaned and Constaned Poblems Usng a Novel Momentum-type Patcle Swam Optmzaton, Engneeng Optmzaton. vol. 39, no. 3, pp. 287-305, Ap. 2007. [9] Z. Mchalewcz, Genetc Algothms + Data Stuctues = Evoluton Pogams, 3d ed., Spnge-Velag, 999. [0] Y. Sh, and R. Ebehat, A Modfed Patcle Swam Optmze, n Poc. of IEEE Intenatonal Confeence on Evolutonay Computaton (ICEC), pp. 69-73, May 998. [] Y. Sh, and R. Ebehat, Empcal study of patcle swam optmzaton, n Poceedngs of the 999 Congess on Evolutonay Computaton, July 999, pp. 945-950. [2] D. Olson, and Y. Sh, Intoducton to Busness Data Mnng, McGaw-Hll Educaton, 2008. [3] J. R. Qunlan, C4.5: Pogams fo Machne Leanng, Mogan Kaufmann Publshes, 993. 33