Detecting Credit Card Fraud using Periodic Features

Detectng Credt Card Fraud usng Perodc Features Alejandro Correa Bahnsen, Djamla Aouada, Aleksandar Stojanovc and Björn Ottersten Interdscplnary Centre for Securty, Relablty and Trust Unversty of Luxembourg, Luxembourg Emal: al.bahnsen@gmal.com, djamla.aouada@un.lu, aleksandar.stojanovc@rwth-aachen.de, bjorn.ottersten@un.lu Abstract When constructng a credt card fraud detecton model, t s very mportant to extract the rght features from transactonal data. Ths s usually done by aggregatng the transactons n order to observe the spendng behavoral patterns of the customers. In ths paper we propose to create a new set of features based on analyzng the perodc behavor of the tme of a transacton usng the von Mses dstrbuton. Usng a real credt card fraud dataset provded by a large European card processng company, we compare state-of-the-art credt card fraud detecton models, and evaluate how the dfferent sets of features have an mpact on the results. By ncludng the proposed perodc features nto the methods, the results show an average ncrease n savngs of 3%. The aforementoned card processng company s currently ncorporatng the methodology proposed n ths paper nto ther fraud detecton system. Keywords Fraud detecton; von Mses dstrbuton; Costsenstve learnng I. INTRODUCTION Credt card fraud has been a growng problem worldwde. Durng 202 the total level of fraud reached.33 bllon Euros n the Sngle Euro Payments Area, whch represents an ncrease of 4.8% compared wth 20 []. Moreover, payments across non tradtonal channels (moble, nternet,...) accounted for 60% of the fraud, whereas t was 46% n 2008. Ths opens new challenges as new fraud patterns emerge, and current fraud detecton systems are less successful n preventng these frauds. Furthermore, fraudsters constantly change ther strateges to avod beng detected, somethng that makes tradtonal fraud detecton tools, such as expert rules, nadequate [2]. The use of machne learnng n fraud detecton has been an nterestng topc n recent years. Dfferent detecton systems that are based on machne learnng technques have been successfully used for ths problem, n partcular: neural networks [3], Bayesan learnng [4], artfcal mmune systems [5], hybrd models [6], support vector machnes [7], peer group analyss [8], onlne learnng [9] and socal network analyss [2]. When constructng a credt card fraud detecton model, t s very mportant to use those features that allow accurate classfcaton. Typcal models only use raw transactonal features, such as tme, amount, place of the transacton. However, these approaches do not take nto account the spendng behavor of the customer, whch s expected to help dscover fraud patterns [5]. A standard way to nclude these behavoral spendng patterns s proposed n [0], where Whtrow et al. proposed a transacton aggregaton strategy n order to take nto account a customer spendng behavor. The computaton of the aggregated features conssts n groupng the transactons made durng the last gven number of hours, frst by card or account number, then by transacton type, merchant group, country or other, followed by calculatng the number of transactons or the total amount spent on those transactons. In ths paper, we propose a new set of features based on analyzng the tme of a transacton. The logc behnd t s that a customer s expected to make transactons at smlar hours. We, hence, propose a new method for creatng features based on the perodc behavor of a transacton tme, usng the von Mses dstrbuton []. In partcular, these new tme features should estmate f the tme of a new transacton s wthn the confdence nterval of the prevous transacton tme. Furthermore, usng a real credt card fraud dataset provded by a large European card processng company, we compare the dfferent sets of features (raw, aggregated and perodc), usng two knds of classfcaton algorthms; cost-nsenstve [2] and example-dependent cost-senstve [3]. The results show an average ncrease n the savngs of 3% by usng the proposed perodc features. Addtonally, the outcome of ths paper s beng currently used to mplement a state-of-the-art fraud detecton system, that wll help to combat fraud once the mplementaton stage s fnshed. The remander of the paper s organzed as follows. In Secton 2, we dscuss current approaches to create the features used n fraud detecton models. Then, n Secton 3, we present our proposed methodology to create perodc features. Afterwards, the expermental setup and the results are gven n Sectons 4 and 5. Fnally, conclusons and dscussons of the paper are presented n Secton 6. II. TRANSACTION AGGREGATION STRATEGIES When constructng a credt card fraud detecton algorthm, the ntal set of features (raw features) nclude nformaton regardng ndvdual transactons. It s observed throughout the lterature, that regardless of the study, the set of raw features s qute smlar. Ths s because the data collected durng a credt card transacton must comply wth nternatonal fnancal reportng standards. In TABLE I, the typcal credt card fraud detecton raw features are summarzed. Several studes use only the raw features n carryng ther analyss [3], [4]. However, as noted n [4], a sngle transacton nformaton s not suffcent to detect a fraudulent transacton, snce usng only the raw features leaves behnd mportant nformaton such as the consumer spendng behavor, whch s usually used by commercal fraud detecton systems [0]. To deal wth ths, n [5], a new set of features were proposed such that the nformaton of the last transacton made wth the same credt card s also used to make a predcton.

TABLE I. SUMMARY OF TYPICAL RAW CREDIT CARD FRAUD DETECTION FEATURES Attrbute name Descrpton Transacton ID Transacton dentfcaton number Tme Date and tme of the transacton Account number Identfcaton number of the customer Card number Identfcaton of the credt card Transacton type e. Internet, ATM, POS,... Entry mode e. Chp and pn, magnetc strpe,... Amount Amount of the transacton n Euros Merchant code Identfcaton of the merchant type Merchant group Merchant group dentfcaton Country Country of trx Country 2 Country of resdence Type of card e. Vsa debt, Mastercard, Amercan Express... Gender Gender of the card holder Age Card holder age Bank Issuer bank of the card The objectve, s to be able to detect very dssmlar contnuous transactons wthn the purchases of a customer. The new set of features nclude: tme snce the last transacton, prevous amount of the transacton, prevous country of the transacton. Nevertheless, these features do not take nto account consumer behavor other than the last transacton made by a clent, ths leads to havng an ncomplete profle of customers. A more compressve way to take nto account a customer spendng behavor s to derve some features usng a transacton aggregaton strategy. Ths methodology was ntally proposed n [0]. The dervaton of the aggregaton features conssts n groupng the transactons made durng the last gven number of hours, frst by card or account number, then by transacton type, merchant group, country or other, followed by calculatng the number of transactons or the total amount spent on those transactons. Ths methodology has been used by a number of studes [7] [9], [5] [8]. When aggregatng a customer transactons, there s an mportant queston on how much to accumulate, n the sense that the margnal value of new nformaton may dmnsh as tme passes. [0] dscuss that aggregatng 0 transactons s not lkely to be more nformatve than aggregatng 00 transactons. Indeed, when tme passes, nformaton lose ther value, n the sense that a customer spendng patterns are not expected to reman constant over the years. In partcular, Whtrow et al. defne a fxed tme frame to be 24, 60 or 68 hours. Let S be a set of N transactons,.e., N = S, where each transacton s represented by the feature vector x = [x, x2,..., xk ], where k s the number of features, and labelled usng the class label y {0, }. Then, the process of aggregatng features conssts n selectng those transactons that were made n the prevous t p hours, for each transacton n the dataset S, { S agg T RX agg (S,, t p ) = x amt ( l x d l = x d ) ( hours(x tme, x tme l ) < t p ) } N l=, () where T RX agg s a functon that creates a subset of S assocated wth a transacton wth respect to the tme frame t p, N = S, beng the cardnalty of a set, x tme s the tme of transacton, x amt s the amount of transacton, the customer dentfcaton number of transacton, and x d TABLE II. EXAMPLE CALCULATION OF AGGREGATED FEATURES. WHERE, x a IS THE NUMBER OF TRANSACTIONS IN THE LAST 24 HOURS AND x a2 IS THE SUM OF THE TRANSACTIONS AMOUNTS IN THE SAME TIME PERIOD. Raw features Agg. features TrxId CardId Tme Type Country Amt. x a x a2 0/0 8:20 POS Lux 250 0 0 2 0/0 20:35 POS Lux 400 250 3 0/0 22:30 ATM Lux 250 2 650 4 02/0 00:50 POS Ger 50 3 900 5 02/0 9:8 POS Ger 00 3 700 6 02/0 23:45 POS Ger 50 2 50 7 03/0 06:00 POS Lux 0 3 400 hours(t, t 2 ) s a functon that calculates the number of hours between the tmes t and t 2. Afterwards the feature number of transactons and amount of transactons n the last t p hours are calculated as: x a = S agg, (2) and respectvely. x a2 = x amt S agg x amt, (3) To further clarfy how the aggregated features are calculated we show an example. Consder a set of transactons made by a clent between the frst and thrd of January of 205, as shown n TABLE II. Then we estmate the aggregated features (x a and x a2 ) by settng t p = 24 hours. Moreover, the total number of aggregated features can grow qute quckly, as t p can have several values, and the combnaton of combnaton crtera can be qute large as well. In [7], we used a total of 280 aggregated features. In partcular we set the dfferent values of t p to:, 3, 6, 2, 8, 24, 72 and 68 hours. Then calculate the aggregated features usng () wth the followng groupng crtera: country, type of transacton, entry mode, merchant code and merchant group. III. PROPOSED PERIODIC FEATURES When usng the aggregated features, there s stll some nformaton that s not completely captured by those features. In partcular we are nterested n analyzng the tme of the transacton. The logc behnd ths, s that a customer s expected to make transactons at smlar hours. The ssue when dealng wth the tme of the transacton, specfcally, when analyzng a feature such as the mean of transactons tme, s that t s easy to make the mstake of usng the arthmetc mean. Indeed, the arthmetc mean s not a correct way to average tme because, as shown n Fg., t does not take nto account the perodc behavor of the tme feature. For example, the arthmetc mean of transacton tme of four transactons made at 2:00, 3:00, 22:00 and 23:00 s 2:30, whch s counter ntutve snce no transacton was made close to that tme. We propose to overcome ths lmtaton by modelng the tme of the transacton as a perodc varable, n partcular usng the von Mses dstrbuton []. The von Mses dstrbuton, also known as the perodc normal dstrbuton, s a dstrbuton of a wrapped normal dstrbuted varable across a crcle. The von Mses probablty dstrbuton of a set of examples

Fg.. Analyss of the tme of a transacton usng a 24 hour clock. The arthmetc mean of the transactons tme (dashed lne) do not accurately represents the actual tmes dstrbuton. Fg. 2. Ftted von Mses dstrbuton ncludng the perodc mean (dashed lne) and the probablty dstrbuton (purple area). D = {t, t2,, tn } for a gven angle tx s gven by f (tx µvm, σvm ) = e σvm cos(tx µvm ) 2πI0 σvm (4) where I0 (κ) s the modfed Bessel functon of order 0, and µvm and σvm are the perodc mean and perodc standard devaton, respectvely. In Appendx A we present the calculaton of µvm and σvm. In partcular, we are nterested n calculatng a confdence nterval (CI) for the tme of a transacton. For dong that, ntally we select a set of transactons made by the same clent n the last tp hours (Sper ) as: tme d Sper T RXvM (S,, tp ) = xl xd l = x N tme hours(xtme, x ) < t. (5) p l l= Afterwards, the probablty dstrbuton functon of the tme of the set of transactons s calculated as:. (6) xtme vonmses µ (S ), vm per σvm (Sper ) In Fg. 2, the von Mses dstrbuton calculaton for the earler example s shown. It s observed that the arthmetc mean s dfferent from the perodc mean, the latter beng a more realstc representaton of the actual transactonal tmes. Then, usng the estmated dstrbuton, a new set of features can be extracted,.e., a bnary feature (xp ) f a new transacton tme s wthn the confdence nterval range wth probablty α. An example s presented n Fg. 3. Furthermore, other features can be calculated, as the confdence nterval range can be calculated for several values of α, and also the tme perod can have an arbtrary sze. Fg. 3. Expected tme of a transacton (green area). Usng the confdence nterval, a transacton can be flag normal or suspcous, dependng whether or not the tme of the transacton s wthn the confdence nterval. Addtonally, followng the same example presented n TABLE II, we calculate a feature xp, as a bnary feature that takes the value of one f the current tme of the transacton s wthn the confdence nterval of the tme of the prevous transactons wth a confdence of α = 0.9. The example s shown n TABLE III, where the arthmetc and perodc means dffer, as for the last transacton. Moreover, the new feature helps to get a better understandng of when a customer s expected to make transactons. Fnally, when calculatng the perodc features, t s mportant to use longer tme frames tp, snce f the dstrbuton s calculated usng only a couple of transactons t may not be as relevant of a customer behavor patterns, compared aganst usng a full year of transactons. Evdently, f tc s less than 24 hours, any transacton made afterwards wll not be expected

TABLE III. EXAMPLE CALCULATION OF PERIODIC FEATURES. WHERE x p IS A BINARY FEATURE THAT INFORMS WHENEVER A TRANSACTION IS BEING MADE WITHIN THE CONFIDENCE INTERVAL OF THE TIME OF THE TRANSACTIONS. Raw features Arthmetc Perodc features Id Tme mean mean Confdence nterval x p 0/0/5 8:20 2 0/0/5 20:35 3 0/0/5 22:30 9:27 9:27 5:45-23:0 True 4 02/0/5 00:50 20:28 20:28 7:54-23:03 False 5 02/0/5 9:8 6:34 22:34 8:5-00:7 True 6 02/0/5 23:45 6:9 2:07 5:2-02:52 True 7 03/0/5 06:00 8:33 22:33 7:9-0:46 False to be wthn the dstrbuton of prevous transactonal tmes. To avod ths, we recommend usng at least the prevous 7 days of transactonal nformaton, therefore, havng a better understandng of ts behavoral patterns. IV. EXPERIMENTAL SETUP In ths secton, frst the dataset used for the experments s descrbed. Second, the evaluaton measure used to compare the algorthms s shown. Lastly, the algorthms used n ths paper are brefly explaned. A. Database For ths paper we used a dataset provded by a large European card processng company. The dataset conssts of fraudulent and legtmate transactons made wth credt and debt cards between January 202 and June 203. The total dataset contans 20,000,000 ndvdual transactons, each one wth 27 attrbutes, ncludng a fraud label ndcatng whenever a transacton s dentfed as fraud. Ths label was created nternally n the card processng company, and can be regarded as hghly accurate. In the dataset only 40,000 transactons were labeled as fraud, leadng to a fraud rato of 0.025%. Furthermore, usng the methodologes for feature extracton descrbed n Secton II, we estmate a total of 293 features. Also, for the experments, a smaller subset of transactons wth a hgher fraud rato, correspondng to transactons made wth magnetc strpe, s selected. Ths dataset contans 236,735 transactons and a fraud rato of.50%. In ths dataset, the total fnancal losses due to fraud are 895,54 Euros. Ths dataset was selected because t s the one where most frauds occur. The total dataset s dvded nto 3 subsets: tranng, valdaton and testng. Each one contanng 50%, 25% and 25% of the transactons respectvely. TABLE IV summarzes the dfferent datasets. B. Evaluaton measure When evaluatng a credt card fraud detecton model, typcally a standard bnary classfcaton measure, such as msclassfcaton error, recever operatng characterstc (ROC), Kolmogorov-Smrnov (KS) or F Score statstcs, s used [9], [9], [20]. However, these measures may not be the most approprate evaluaton crtera when evaluatng fraud detecton models, because they tactly assume that msclassfcaton errors carry the same cost, smlarly wth the correct classfed transactons. Ths assumpton does not hold n practce, when wrongly predctng a fraudulent transacton as legtmate TABLE IV. SUMMARY OF THE DATASETS Set Transactons %Frauds Cost Total 236,735.50 895,54 Tranng 94,599.5 358,078 Valdaton 70,90.53 274,90 Testng 7,226.45 262,67 TABLE V. CREDIT CARD FRAUD COST MATRIX [7] Predcted Postve c = Predcted Negatve c = 0 Actual Postve Actual Negatve y = y = 0 C T P = C a C F P = C a C F N = Amt C T N = 0 carres a sgnfcantly dfferent fnancal cost than the nverse case. In order to take nto account the dfferent costs of fraud detecton durng the evaluaton of an algorthm, n [7], we proposed a cost matrx [3] that takes nto account the actual example-dependent fnancal costs. In TABLE V, the cost matrx s presented, where the predcton of the algorthm c s a functon of the k features of transacton, x = [x, x2,..., xk ], y s the true class of the transacton, and the cost assocated wth two types of correct classfcaton, namely, true postves C T P, and true negatves C T N ; and the two types of msclassfcaton errors, namely, false postves C F P, and false negatves C F N, are presented. Moreover, our cost matrx defnes the cost of a false negatve to be the amount of the transacton Amt, and the costs of false postve and true postve to be the admnstratve cost C a related to analyzng the transacton and contactng the card holder. Afterwards, usng the example-dependent cost matrx, a cost measure s calculated takng nto account the actual costs of each transacton. Let S be a set of N transactons,.e., N = S, where each transacton s represented by the augmented feature vector x = [x, C T P, C F P, C F N, C T N ], and labelled usng the class label y {0, }. A classfer f whch generates the predcted label c for each transacton, s traned usng the set S. Then the cost of usng f on S s calculated by Cost(f(S)) = N y ( c )Amt + c C a. (7) = Lastly, n order to have a measure that s easy to nterpret, we used the fnancal savngs as we defned n [2]. In partcular, the savngs measure we proposed a savngs measure that compare the cost of an algorthm versus the cost of usng no algorthm at all. In the case of credt card fraud the cost of usng no algorthm s equal to the sum of the amounts of the fraudulent transactons N = y Amt. Then, the savngs are calculated as: Savngs(f(S)) = N = y c Amt c C a N = y Amt. (8) In other words, the sum of the amounts of the corrected predcted fraudulent transactons mnus the admnstratve cost ncurred n detectng them, dvded by the sum of the amounts of the fraudulent transactons.

Fg. 4. Comparson of the dfferent algorthms, traned wth only the raw features (raw), only the aggregated features (agg) and both (raw + agg). In average, by usng both the raw and the aggregated features the savngs are doubled. Fg. 5. Comparson of the proposed perodc (per) set of features. It s observed, that when the new set of features are combned wth the aggregated features, an addtonal ncrease of savngs of 6.4% s made. C. Algorthms For the experments we used three cost-nsenstve classfcaton algorthms: decson tree (DT ), logstc regresson (LR) and a random forest (RF ), usng the mplementaton of Scktlearn [22]. Furthermore, n prevous works, we have shown the need to use algorthms that take nto account the dfferent costs assocated wth fraud detecton [8]. In partcular we also used three cost-senstve algorthms, namely, Bayes mnmum rsk (BM R) [7], [8], cost-senstve logstc regresson (CSLR) [23] and cost-senstve decson tree (CSDT ) [2]. The BM R s a decson model based on quantfyng tradeoffs between varous decsons usng probabltes and the costs that accompany such decsons. In the case of credt card fraud detecton, a transacton s classfed as fraud f C a Amt ˆp, and as legtmate f false. Where ˆp s the estmated probablty of a transacton beng fraud gven x. The other two cost-senstve methods, CSLR and CSDT, are based on ntroducng the example-dependent costs nto a logstc regresson and a decson tree algorthm, by changng the objectve functon of the models to one that s cost-senstve. For a further dscusson see [23] and [2], respectvely. The mplementaton of the cost-senstve algorthms s done usng the CostCla lbrary. Moreover, each algorthm was traned, usng the dfferent sets of features: raw features (raw) as shown n TABLE I, aggregated features (agg) usng equatons (2) and (3), and the perodc features (per) as descrbed n Secton III. V. RESULTS Frst we evaluate the savngs of the dfferent algorthms usng only the raw features (raw), only the aggregated features (agg) and both (raw + agg). The results are shown n Fg. 4. Note that all the algorthms generate savngs,.e., no algorthm performs worse than usng no algorthm at all. The CSDT algorthm s the one that performs best, n partcular when usng both the raw and aggregated features. When analyzng the results usng the dfferent set of features, the aggregated features perform better than usng only the raw features n Fg. 6. Comparson of the average ncrease n savngs when ntroducng each set of features compared wth the results of usng only the raw set of features. all the cases. Ths confrms the ntuton of the need of usng the customer behavor patterns n order to dentfy fraudulent transactons. On average, by usng both the raw and the aggregated features the savngs are doubled. Then, we evaluate the results of the perodc set of features. In Fg. 5, the results are shown. The new set of perodc features ncrease the savngs by an addtonal 3%. The algorthm wth the hghest savngs s the CSDT, closely followed by CSLR. Smlarly to usng the extended aggregated features, the perodc features do not perform well when used only wth raw features. It s when combned the set of aggregated features that an ncrease n savngs s found. Fnally, n Fg. 6, we compare the average ncrease n savngs when ntroducng each set of features compared wth the results of usng only the set of raw features. Frst, the aggregated features gve an average ncrease n savngs of 20%. As prevously shown, n order to mprove the results, these new features need to be combned wth the aggregated features n order to ncrease savngs. Lastly, when combnng the prevous features wth the perodc features, the results ncrease by 287% compared wth usng raw features only. https://gthub.com/albahnsen/costsenstveclassfcaton

VI. CONCLUSION AND DISCUSSION In ths paper we have shown the mportance of usng features that analyze the consumer behavor of ndvdual card holders when constructng a credt card fraud detecton model. We show that by preprocessng the data n order to nclude the recent consumer behavor, the performance ncreases by more than 200% compared to usng only the raw transacton nformaton. Moreover, we extended the current approaches to analyze the consumer behavor by proposng a new method to analyze the perodc behavor of the tme of a transacton usng the von Mses dstrbuton. The new proposed set of features ncreases the performance by 287%. However, because ths study was done usng a dataset from a fnancal nsttuton, we were not able to deeply dscuss the specfc features created, and the ndvdual mpact of each feature. Nevertheless, our framework s ample enough to be recreated wth any knd of transactonal data. Furthermore, when mplementng ths framework on a producton fraud detecton system, questons regardng response and calculaton tme of the dfferent features should be addressed. In partcular, snce there s no lmt on the number of features that can be calculated, a system may take too long to make a decson based on the tme spent recalculatng the features wth each new transacton. ACKNOWLEDGMENT Fundng for ths research was provded by the Fonds Natonal de la Recherche, Luxembourg, grant number AFR-PhD-5942749. REFERENCES [] European Central Bank, Thrd report on card fraud, European Central Bank, Tech. Rep., 204. [2] V. Van Vlasselaer, C. Bravo, O. Caelen, T. Elass-Rad, L. Akoglu, M. Snoeck, and B. Baesens, APATE: A Novel Approach for Automated Credt Card Transacton Fraud Detecton usng Network- Based Extensons, Decson Support Systems, vol. 75, pp. 38 48, 205. [3] R. Brause, T. Langsdorf, and M. Hepp, Neural data mnng for credt card fraud detecton, Proceedngs th Internatonal Conference on Tools wth Artfcal Intellgence, pp. 03 06, 999. [4] S. Pangrah, A. Kundu, S. Sural, and A. Majumdar, Credt card fraud detecton: A fuson approach usng Dempster Shafer theory and Bayesan learnng, Informaton Fuson, vol. 0, no. 4, pp. 354 363, Oct. 2009. [5] S. Bachmayer, Artfcal Immune Systems, Artfcal Immune Systems, vol. 532, pp. 9 3, 2008. [6] M. Krvko, A hybrd model for plastc card fraud detecton systems, Expert Systems wth Applcatons, vol. 37, no. 8, pp. 6070 6076, Aug. 200. [7] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland, Data mnng for credt card fraud: A comparatve study, Decson Support Systems, vol. 50, no. 3, pp. 602 63, Feb. 20. [8] D. J. Weston, D. J. Hand, N. M. Adams, C. Whtrow, and P. Juszczak, Plastc card fraud detecton usng peer group analyss, Advances n Data Analyss and Classfcaton, vol. 2, no., pp. 45 62, Mar. 2008. [9] A. D. Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontemp, Learned lessons n credt card fraud detecton from a practtoner perspectve, Expert Systems wth Applcatons, vol. 4, no. 0, pp. 495 4928, Aug. 204. [0] C. Whtrow, D. J. Hand, P. Juszczak, D. J. Weston, and N. M. Adams, Transacton aggregaton as a strategy for credt card fraud detecton, Data Mnng and Knowledge Dscovery, vol. 8, no., pp. 30 55, Jul. 2008. [] N. I. Fsher, Statstcal Analyss of Crcular Data, 996, vol. 9. [2] C. M. Bshop, Pattern Recognton and Machne Learnng, ser. Informaton scence and statstcs. Sprnger, 2006, vol. 4, no. 4. [3] C. Elkan, The Foundatons of Cost-Senstve Learnng, n Seventeenth Internatonal Jont Conference on Artfcal Intellgence, 200, pp. 973 978. [4] R. Bolton and D. J. Hand, Unsupervsed proflng methods for fraud detecton, n Credt Scorng and Credt Control VII, 200. [5] D. Tasouls and N. Adams, Mnng nformaton from plastc card transacton streams, n Proceedngs n 8th Internatonal Conference on Computatonal Statstcs, 2008. [6] S. Jha, M. Gullen, and J. Chrstopher Westland, Employng transacton aggregaton strategy to detect credt card fraud, Expert Systems wth Applcatons, vol. 39, no. 6, pp. 2 650 2 657, 202. [7] A. Correa Bahnsen, A. Stojanovc, D. Aouada, and B. Ottersten, Cost Senstve Credt Card Fraud Detecton Usng Bayes Mnmum Rsk, n 203 2th Internatonal Conference on Machne Learnng and Applcatons. Mam, USA: IEEE, Dec. 203, pp. 333 338. [8], Improvng Credt Card Fraud Detecton wth Calbrated Probabltes, n Proceedngs of the fourteenth SIAM Internatonal Conference on Data Mnng, Phladelpha, USA, 204, pp. 677 685. [9] R. J. Bolton, D. J. Hand, F. Provost, and L. Breman, Statstcal Fraud Detecton: A Revew, Statstcal Scence, vol. 7, no. 3, pp. 235 255, 2002. [20] D. J. Hand, C. Whtrow, N. M. Adams, P. Juszczak, and D. J. Weston, Performance crtera for plastc card fraud detecton tools, Journal of the Operatonal Research Socety, vol. 59, no. 7, pp. 956 962, May 2007. [2] A. Correa Bahnsen, D. Aouada, and B. Ottersten, Example- Dependent Cost-Senstve Decson Trees, Expert Systems wth Applcatons, vol. 42, no. 9, pp. 6609 669, 205. [22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Mchel, B. Thron, O. Grsel, M. Blondel, P. Prettenhofer, R. Wess, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, Sckt-learn: Machne learnng n Python, Journal of Machne Learnng Research, vol. 2, pp. 2825 2830, 20. [23] A. Correa Bahnsen, D. Aouada, and B. Ottersten, Example-Dependent Cost-Senstve Logstc Regresson for Credt Scorng, n 204 3th Internatonal Conference on Machne Learnng and Applcatons. Detrot, USA: IEEE, 204, pp. 263 269. APPENDIX The von Mses dstrbuton, also known as the perodc normal dstrbuton, s a dstrbuton of a wrapped normal dstrbuted varable across a crcle []. The von Mses dstrbuton of a set of examples D = {t, t 2,, t N } s defned as D vonmses (µ vm, /σ vm ), where µ vm and σ vm are the perodc mean and perodc standard devaton, respectvely, and are calculated as follows [2] µ vm (D) = 2 tan φ (, (9) ψ2 + φ 2 + φ) and where φ = ( ) σ vm (D) = ln ( N φ) 2 ( + N ψ) 2, (0) t j D sn(t j ) and ψ = cos(t j ). t j D