CARD FRAUD DETECTION USING LEARNING MACHINES



Similar documents
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Predict Influencers in the Social Network

Statistical Machine Learning

Introduction to Support Vector Machines. Colin Campbell, Bristol University

An Introduction to Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Support Vector Machines Explained

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Linear Classification. Volker Tresp Summer 2015

Making Sense of the Mayhem: Machine Learning and March Madness

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

Early defect identification of semiconductor processes using machine learning

Scalable Developments for Big Data Analytics in Remote Sensing

A Simple Introduction to Support Vector Machines

Support Vector Machine (SVM)

Azure Machine Learning, SQL Data Mining and R

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Detecting Corporate Fraud: An Application of Machine Learning

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Supervised Learning (Big Data Analytics)

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Lecture 3: Linear methods for classification

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Introduction to Logistic Regression

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Machine Learning and Pattern Recognition Logistic Regression

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

STA 4273H: Statistical Machine Learning

6.2.8 Neural networks for data mining

Machine learning for algo trading

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Support Vector Machines

Simple and efficient online algorithms for real world applications

Active Learning SVM for Blogs recommendation

Knowledge Discovery from patents using KMX Text Analytics

Classification of Bad Accounts in Credit Card Industry

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

UNSING GEOGRAPHIC INFORMATION SYSTEM VISUALISATION FOR THE SEISMIC RISK ASSESMENT OF THE ROMANIAN INFRASTRUCTURE

Support Vector Machines with Clustering for Training with Very Large Datasets

SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Equity forecast: Predicting long term stock price movement using machine learning

Machine Learning in Spam Filtering

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Machine Learning in FX Carry Basket Prediction

Beating the NCAA Football Point Spread

Lecture 6. Artificial Neural Networks

FUTURE INTERNET AND ITIL FOR INTELLIGENT MANAGEMENT IN INDUSTRIAL ROBOTICS SYSTEMS

Support Vector Machines for Classification and Regression

Machine Learning and Financial Advice

E-commerce Transaction Anomaly Classification

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Introduction to Online Learning Theory

Decompose Error Rate into components, some of which can be measured on unlabeled data

Introduction to Machine Learning Using Python. Vikram Kamath

Trading Strategies and the Cat Tournament Protocol

A Logistic Regression Approach to Ad Click Prediction

Lecture 6: Logistic Regression

Lecture 9: Introduction to Pattern Analysis

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Comparison of machine learning methods for intelligent tutoring systems

MACHINE LEARNING IN HIGH ENERGY PHYSICS

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Linear Threshold Units

An Introduction to Data Mining

Density Level Detection is Classification

Basics of Statistical Machine Learning

Decision Support Systems

Programming Exercise 3: Multi-class Classification and Neural Networks

A Decision Tree for Weather Prediction

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Intrusion Detection via Machine Learning for SCADA System Protection

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Learning is a very general term denoting the way in which agents:

A fast multi-class SVM learning method for huge databases

Coding science news (intrinsic and extrinsic features)

Data Mining - Evaluation of Classifiers

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

CSCI567 Machine Learning (Fall 2014)

GURLS: A Least Squares Library for Supervised Learning

DATA MINING TECHNIQUES AND APPLICATIONS

Machine Learning Introduction

Big Data Analytics CSCI 4030

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA

Predict the Popularity of YouTube Videos Using Early View Data

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Advanced Ensemble Strategies for Polynomial Models

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Transcription:

BULETINUL INSTITUTULUI POLITEHNIC DIN IAŞI Publicat de Universitatea Tehnică Gheorghe Asachi din Iaşi Tomul LX (LXIV), Fasc. 2, 204 SecŃia AUTOMATICĂ şi CALCULATOARE CARD FRAUD DETECTION USING LEARNING MACHINES BY ARMAND EUGEN PĂSĂRICĂ Faculty of Cybernetics, Statistics and Economic Informatics Bucharest Received: September, 204 Accepted for publication: October 8, 204 Abstract. Searching Card Fraud via Internet will return approximately 80 million results. The total level of fraud reached.26 billion euro in 200 in Europe according with BCE. The ingenuity of thieves reached highly sophisticated forms. To model mathematically this behavior requires a classification method derived from supervised learning algorithm which must be able to separate the class of fraudulent with a high degree of accuracy. Following his definition, the technique of Support Vector Machines is characterized by two strong hypotheses: margin optimization and kernel representation. So, I chose the techniques of SVM with non-linear kernels. We propose the Gaussian kernel function for measuring the similarities between features into new linear space as the best approach to detect the fraud patterns. Key words: card fraud behaviour; SVM; non-linear kernels; Cover s theorem; LIBSVM. 200 Mathematics Subject Classification: 68T05, 62H30, 93E35.. Introduction Payment cards are the most commonly used electronic payment instrument. A payment card allows its holder to access funds and to make payments either on credit (credit card), either by debiting the account (debit Corresponding author; e-mail: armand.pasarica@yahoo.com

30 Armand Eugen Păsărică card). A total of 42 million cards were issued in Romania by the end of 203, statistics shows the number of the cards and terminals published by the National Bank of Romania. With the development of Information Technology and globalization up to 85 percent of the retail transactions are made by cards. The most widely used techniques for reading information from the card to forge or use them are skimmer, where traditional cards and electronic band is replicated and digital pickpockets if the cards are using RFID (radio frequency identification). Unfortunately, current solutions do not satisfy very well the request of finding the best approach in card fraud detection, because there is no a best approach on this issue. Some encouraging results comes from Stolfo, Fan and Lee in Credit Card Fraud Detection Using Meta-Learning issues and Initial Results and Soheila Ehramikar with the book The Enhacement of credit Card Fraud Detection Systems: using Machine Learning. 2. Theoretical Foundation of a Learning Machine and his Implements in Economic and Industry Area The following set of definitions represents the foundation of a learning machine: Definition : Tom Mitchell defines in 988 the condition as a learning problem to be well defined: A soft program has learned from the experience E to perform the job J with measured performance P if P 0 ( J, E) ( t) dt> 0. Definition 2: The job of a supervised learning machine is to maximize the quality of output (either maximizing P(Y X ) or minimizing the empirical risk at the end of a finite number of iterative training epochs)(huang T.-M., Kecman). Definition 3: Each algorithm has both strength and weaknesses; there is no algorithm good in all classifications or regressions problems but for a given problem there is only one algorithm asymptotic convergent which maximizes the quality of output Y given some set of feature input X: P(Y X). Following is shown the general flow-chart for a General Supervised Machine Learning, which is a cybernetic system with a negative feed-back response because his purpose is to reduce the magnitude of changes with the help of control variables such as learning rates, parameter of regularization, etc. Lema : as an effect of a previous statement, a well defined Learning Machine must produce a steady system ready for using in prediction in practice for a huge range of problems. Field of intelligent machines, adaptable or designed to learn is one of the most interesting and complex areas today. It is a branch of artificial intelligence that deals with the study and construction of systems that can learn from data. For example, whenever a T

Bul. Inst. Polit. Iaşi, t. LX (LXIV), f. 2, 204 3 check is emitted in favor of a person who has an account at a bank, algorithms from the software that reads the payment instruments to identify the correct signature and payment amount without even you know this. When you use a credit card for the purposes of online shopping in the department of banks' risk fraud prevention programs who knows the behavior of the consumer and react when the card is stolen or suspicious. The Core of Machine learning is about to study the representation and generalization of the Data. Following is presented a Machine Learning problem viewed as a Cybernetic System (Fig. ): Fig. The logic behind a Machine Learning. Currently, the area of ML is one of the biggest challenges in modern History. The actual Google engines on Internet includes an impressive number of books, research studies, articles and forums on the study area which is also

32 Armand Eugen Păsărică called the Machine Learning or Statistical Learning (Vapnik,998) in many other papers. Learning Machine is an area interconnected with many other disciplines having impact in many other industries and sectors including: Financial Institutions: card fraud detection, analysis and prediction indexes, stock markets; Marketing: Customer segmentation, analysis of consumer preferences; clustering; Voice and Face detection, OCR, Biotechnology and Medicine; Robotics and Nanotechnology, Systems that drives themselves; IT: Software engineering, Sequence Pattern Mining, Information Retrieval, Adaptive Websites. 3. Support Vector Machines and his Advantages In Machine Learning area, Support Vector Machine is a supervised learning model that analyzes data and recognizes patterns mainly used for solving linear and non-linear classification. The main goal of an SVM model is to make non-linear classification and high-performance for this purpose is by using the kernel functions. Theoretical basis of SVM was founded by Vladimir N. Vapnik and the idea of Soft Margin Optimizer has been developed together with Corinna Cortez. Other implementations are using the KKT conditions or Least Squares Support Vector Machines. Suykens and Vandewalle have proposed LS-SVM method. The method of SVM is one of the most used machines learning technique into the world of university as well as industry. The best applications of SVM consist in image processing and voice recognition and benchmark tests have shown that the probability of correct classification exceeds 95% being a superior model to Neural Network algorithms such as back-propagation. The major inconveniences are represented by: a) Very large set of training data required to achieve this accuracy; b) The correct identification of the kernel function. These issues lead to huge computational effort and as an impact to very large IT hardware resources. It is important to explain why support classification looks to be more accurate than other existing classification methods. Strictly linear separation on the input space X is practically impossible to achieve based on classical discriminative techniques (like Bayes or the logistic regression). On the other hand trying to find an appropriate non-linear classifier from the data is it extremely difficult to be found and will have a small generalization character.

Bul. Inst. Polit. Iaşi, t. LX (LXIV), f. 2, 204 33 Support Vector Machines does a mapping of the initial feature data into another space and then do a selection of an optimal linear classification into that space. It is good demonstrated that putting the classification in terms of minimizing the empirical risk will impact the generalization capacity. Support Vector Technique put accent on finding the best geographical boundary for each data cloud (the vector supports) in each group (practically the border will be placed as far as possible by the most worst training examples from each group). Therefore, the risk has an element of empirical risk minimization and also an element of regularization (Das Gupta, 20). m R ( h) = E L h( x), y = L h( x), y dp( x, y) = L h( x ), y ( ) ( ) ( ) emp i i m i = Choosing the most appropriate kernel strongly depends on the problem but most important is that fine tuning his parameters will become extremely difficult. The technique for an automating selection of the best kernel is not yet mathematically proved. The motivation behind choosing of a particular kernel might be very intuitive, and depends directly on what kind of problem we have to learn. It is recommended to perform first a technique for dimensionality reduction (e.g. PCA method) on attributes and to normalize the data in order to represent graphic where is technically feasible. This will guide at least about the kernel structure. If that is not possible, then the Accuracy or an indicator like F score will have the most evaluating impact on choosing the right form of the kernel. 4. Research Methodology SVM is based on convex optimization problem and as a consequence will always find the global minimum point if the training set is separable either by a linear or non-linear hyper-plane. A necessary condition for that is the kernel function must be properly chosen. In opposition, Neural Networks, BP algorithm can get in some situations a non-convex optimization even the training set is still separable and as consequence is possible to get stuck into a local minimum point. To give a much robust behavior on generalization, I will come up with an interesting point of view about SVM by starting from the model of logistic regression. Firstly, we define the following variables in order to build a valid and robust ML model:

34 Armand Eugen Păsărică So for the observation k {,2,..., m} ( k ) ( k) pair of training set (, ) x y of the vectors: k considered fixed we have the ( ( k ) ( k ) ( k ), 2,..., ) x= x x x n ( k) y= y { 0;} So: x is the input matrix having m variables and n features (e.g: amount, tranzaction status...etc) and y is the outcome vector which must be binary where the pozitive example will reflect a tranzaction fraudulent and 0 means tranzaction ok. The vector of parameters must be learned such that my hypotesys h which is a continues function in variable x and parameter to predict/classify if a transaction is fraudulent or not. One of the constraints of SVM is that the cost function J ( ) must be continuos and because ψ ( ) is built on the basis of the Hypothesis h, I want to predict the output as a continue varriable so I will do the transformation: y 0, 0,. { } [ ] Now let s define the hypothesis h -the function that will be used to classify as accurate as possible the outcome of a card holder: fraudulent or not. The hypothesis should be continuously and must output values between 0 and. A convenient function seems to be the logistic function which is continue and outputs values between 0 and. So I will construct the Hypothesis h which simulates the output of fraudulent as: h : X Y where the purpose is to find a function 2 ( i) h ( x, x,..., x ) y for each pair of training example. n,

Bul. Inst. Polit. Iaşi, t. LX (LXIV), f. 2, 204 35 Let s take g : R R continuous where: def def g( z) = + e T z = x= x + x +... + x = x 0 0 z n n n j j j= 0 h ( x) =, k= x + e n jx j j= 0 n j= 0 We notice that h ( x) practically converges rapidly to 0 and. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0-0 -8-6 -4-2 0 2 4 6 8 0 Fig. 2 The plot of the logistic function Source: MATLAB: fplot(@(k) /(+exp(-k)),[-0 0]). j j if y= h ( x) x 0 if y= 0 h ( x) x 0 The previous relations reflect how the positive and negative examples are geographically separated by the same separation hyperplane. But on the other side: P( y= x; ) = h ( x) P( y= 0 x; ) = h ( x) The probability that the output is certain classified as positive conditioned by the feature vector x and parameterized by the parameter is written as h ( x) and the probability that output to be certain classified as negative conditioned by the feature vector x and parameterized by is given by h ( x). T T

36 Armand Eugen Păsărică These two relations will be written into a single one expression according to a Bernoulli distribution: ( ) P( y / x; ) = h ( x) h ( x) Supposing that m training examples were generated independently then according with the method of maximum-likelihood estimation (MLE), the chance that all the training example will occur with the same probability in the training set is happened if and only if the following expression is maximized: L ( ) = P Y X ; = P y x ; = h x h ( x ) y y i ( i) ( i) ( i) ( i) ( ) ( ) ( ) ( ) def m m ( ) ( i) y ( y ) i= i= m i= ( ) ( ) ( ) ( ) l( ) = log L ( ) = y log h x + y log h x ψ ( ɶ ) = max l( ) = (, 2,..., n ) ( i ) ( i ) ( i ) ( i ) ( ) ( ) ( ) ( ) m = min y log h x + y log h x (, 2,..., n ) i= ( i) ( i) ( i) ( i) () () The cost function for one training example, (, ) ( ) ( ) ( ) x y will be written as: ( ) () () () () ψ ( ) = y log h x + y log h x or: () () ψ ( ) = y log + ( y ) log x + e + e () () () if y = 0 ψ ( ) = log () x + e () () x if y = ψ ( ) = log( + e ) So the cost function is continuously and convex over R and as consequences will admit a global minimum point (Fig. 3). x

Bul. Inst. Polit. Iaşi, t. LX (LXIV), f. 2, 204 37 2 0 8 j(theta) 6 4 2 0-0 -8-6 -4-2 0 2 4 6 8 0 theta Fig. 3 The plot of the cost function for one training example Source: Output MATLAB. By the general case where m 2 and m> n, the overall cost function will be given by the following relation (Math. Jens Flemming Chemnitz, 20). The regularization is practically a compromise between how perfect do I want to fit the data and how much importance do I wish to allocate to each feature x by his parameter j ( ( )) ( ( )) ( ) ( ) m ψ ( ) = min y log h x + y log h x (, 2,.., n ) m i= ( i) ( i) ( i) ( i) We notice that finding the parameters ( ) 2 j,,..., n which must be estimated by minimizing the function J ( ) is not affected by the term m. We shall do the following notations and the cost function ψ ( ) will be simplified: C = λ c = log h x c2 = log h x ( i) ( ) ( i) ( ) ( ) i ψ ( ) = min C y c + ( y ) c + m n ( i) ( ) 2 2 j i= 2 j= (, 2,.., n ) 0

38 Armand Eugen Păsărică But the idea of SVM is to give extra safety condition for a much robust decision boundary. So we get: T ( i) T ( i) = if y x instead of x 0 T ( i) T ( i) if y= 0 x instead of x < 0 Further, I will analyse the following two outcomes of the regularization parameter: C vs. C 0. I) If the regularization parameter C is very large then in order to get ψ ( ) to be minimum then t 0 : m i= i ( ) ( i) ( ) t= y c + y c 2 0 So the optimization model will be written as: n 2 ψ ( ) = min j = min 2 2 (, 2,..., n ) 0 j= 2 T ( i) ( i) ( i) ( i) x y = pr y = T ( i) ( i) ( i) ( i) x y = 0 pr y = 0 Pr is a projection of x onto the vector of. It can be noticed the logic behind the SVM or why the Support Vector Machine is classifiers of large margin. So we proved that SVM model is a convex optimization problem and according with KKT theorem will admit a global minimum point. Following is presented an example of linear classification with regularization parameter C = 0 000. I want to give importance even to a bad example so I chose a very high value for C. As the consequence the decision boundary will fit the training set perfectly, but this model is not so robust in prediction. 0.9 0.85 0.8 0.75 0.7 0.65 0.5 0.2 0.25 0.3 0.35 Fig. 4 SVM Linear Classification model with high variance (overfitting) Source: Output MATLAB.

Bul. Inst. Polit. Iaşi, t. LX (LXIV), f. 2, 204 39 II) If the regularization parameter C is small then the decision boundary is much robust in the presence of outliers and the optimisation model is practically derived from the general. i ψ ( ) = min C y c + ( y ) c + m n ( i) ( ) 2 2 j i= 2 j= (, 2,..., n ) 0 T ( i) ( i) x y = T ( i) ( i) x y = 0 That model is still a quadratic convex optimization and will admit a global minimum point.even if the data for classification is not linearly separable then this optimization model will give a global minimum point, and this is one of the most strengths of this algorithm. This model that has C = 0 is much robust because the decision boundary has the largest margin and must be used in prediction despite the fact that has an accuracy in testing of 9.66% (/2). 0.9 0.85 0.8 0.75 0.7 0.65 0.5 0.2 0.25 0.3 0.35 Fig. 5 SVM Linear Classification model with high bias (underfitting) Source: Output MATLAB. Now I have all the necesar conditions to solve this optimization problem for parameters. The logic behind the Cover s theorem is that data which are not linear sepparable in 2D becomes linear separable in 3D by doing a function transformation which is called as kernel function. But not all the functions will create a valid kernels. In order to be a valid kernel must satisfy the technical condition named Mercer s Theorem. This theorem actually assures the necesarry conditions to make sure SVM packages optimizations run correctly and not diverge (Cover, 965).

40 Armand Eugen Păsărică Cover's Theorem is a statement in computational learning theory and is one of the primary theoretical motivations for the use of non-linear kernel methods in machine learning applications. The theorem states that given a set of training data that is not linearly separable, we can prove with high probability that is possible to transform it into a set that is linearly separable by projecting it into a higher-dimensional space via some non-linear transformation. A complex pattern-classification problem, being in a low dimensional space as nonlinearly, is more likely to be linearly separable into a high-dimensional space (Fig. 6). Theoretical fundaments will be found into the Hilbert spaces (Paulsen, 2009). Fig. 6 Exemplification of the Cover s Theorem Source: Output VISIO. Cortes and Vapnik proposed in 995 the following soft margin optimiser model: l T ψ ( ) = min + C ξi (, ξi, b) 2 i= ( ) ( i) T ( i) y ϕ x + b ξ i ξ 0, i=, 2,..., l i ( i) T ( i) Furthemore k( xi, x j) ϕ( x ) ϕ( x ) is called the kernel function. The following four types of kernel functions are subject of our paper about card fraud study: linear: (, ) ( ) ( i) T ( i) y ϕ x + b ξi, ξi 0 T i j = i j k x x x x T polynomial: k( xi x j) ( γ xi x j r) gaussian (RBF): ( i j), = +, γ > 0 d ( xi x j ) k x, x = e γ, γ > 0 T sigmoid k( xi, x j) = tanh( γ xi x j + r)

Bul. Inst. Polit. Iaşi, t. LX (LXIV), f. 2, 204 4 We firstlly must decide about the best form of the kernel and then to C, γ, r, d. decide about parameters selection ( ) 5. Machine Learning Design and the Implementation Typically in a commercial bank, there are two types of Back-Office applications available to prevent Card Fraud: Online Fraud Monitoring (OLMA) and Off-line Fraud Monitoring (OFMA).The first one gets data through a pipeline process from the live database and in case one or more triggers are activated a pop-up message shows up on the screen: possible card fraud attempt. The card holder must be notified immediately, and the card could be blocked at customer s requirement. The second application mentioned is part of a process included in End of Day and is actually a program that runs a Matlab routine called svmlab.exe and generates a separated report with all potentials fraudulent cards. Report must be further analyzed by the Risk Department during the next working day. The current paper does refer only to OFMA. After issuing the requirements received from the Card Fraud Department, the following phases were rolled out to implement the action plan against fraudulent cards (Fig. 7). Fig. 7 The chart-flow of implementing the Card Fraud Detection system at XXXBank. Source: Output VISIO. Phase : Firstly, the requirement comes from the Cards Department of one in the top 5 Romanian Banks. They would like to build an intelligent

42 Armand Eugen Păsărică system that should be able to learn and correctly classify whether a transaction is going to be fraudulent or not. After a long period of monitoring fraudulent transactions, the cards specialists of XXX BANK have noticed that the following attributes (Table ) are influencing the behavior of a fraudulent card that could provide us with a pattern to recognize frauds. The records are indexed to the unique key given by PAN Number and consists of 6 digits (e.g. 4256 0343 226 0369). Phase 2: When the trigger called txn_in has been activated, which means that at least a PIN code has been entered into the Card System, then the record having attributes (Table ) are picked up from the primary all_tran.dbn and inserted into a new table named cards.dbn. This is a massive table containing around 0 000 rows daily. Phase 3: Normalizing the data will improve the accuracy of the classification and the data attributes have been adjusted using the following formula: xɶ i x j µ i j = i(max) i(min) x j x j Phase 4: The data normalized is further split into 3 sets: Training Set, Cross-Validation Set and Testing Set: Training Set has 70% of the total data and is used to estimate the parameters of the model Cross-Validation Set has 5% of the total data and is used to fine calibrate and confirm the parameters estimated with the Training Set. Testing Set has the rest of 5% left and is designated to test and validate the final model before implementing in production. Table The Attributes/Variables of the SVM Model Field Name of the field Type Codification Index The amount of txn numerical..0 000 2 Number of txn per window numerical {3,4,5} 3 The country were txn was made (country qualitative..00 code) 4 The Time when the txn was made numerical HH:MM:SS 5 The Channel were the txn has been processed qualitative = ATM 2 = POS 3 = Internet 6 The type of the Card: Smart Card or not qualitative {0,} 7 The type of Operation qualitative = Purchasing at POS 2 = Withdraw at ATM 3 = Internet txn 8 The number of unsuccessfully txn qualitative {,2,3} authorizations 9 The number of consecutive low value txn qualitative {,2,3,4}

Bul. Inst. Polit. Iaşi, t. LX (LXIV), f. 2, 204 43 The input file which has been further imported and processed in MATLAB using the LIBSVM library. The Table 2 reflects the results of applying LIBSVBM module in MATLAB via different kernel functions. The number of the variables was set to 0 000. In trial all the variables have been generated randomly using the uniform repartition; in trial 2 the output variable is generated using Weibull repartition, the input variables are generated using a uniform repartition. Table 2 The Accuracy in Card Fraud Detection Using Various Types of Kernel Function Source: Output MATLAB Type of the kernel function Accuracy(Trial ) Number of iterations Accuracy(Trial 2) Number of iterations Linear 5.56% 820 5.% 30660 (556/0000) (50/0000) Polynomial (d = 3) 53.39% 3802 54.66% 56860 (5339/0000) (5466/0000) Logistic 50.240% 3328 49.34% 3288 (502/0000) (4934/0000) Gaussian (γ = ) 68.08% 9057 90.% 0546 (6808/0000) (900/0000) Gaussian (γ = 0) 92.38% 438 99.99% 3436 (9238/0000) (9999/0000) Gaussian (γ = 00) 99.96% 2739 00% 475 (9996/0000) Gaussian (γ = 000) 00% (0000/0000) (0000/0000) 232 00% (0000/0000) 4920 LIBSVM is an open source machine learning libraries, developed by the National Taiwan University and written in C++. LIBSVM implements the SMO algorithm for various types of kernel, outputting a very accurate classification for the massive structure of data. LIBLINEAR implements linear SVMs and logistic regression models trained using a coordinate descent algorithm (Chang & Lin, 20) The source code in MATLAB which uses a radial kernel to train and then to classify the frauds is the following: label=data(:,); feature=data(:,2:end); gama=0 model=svmtrain(label,feature,sprintf('-s 0 -t 2 -g %g', gama)); [predicted_label, accuracy, decision_values] =svmpredict(label,feature,model);...*...* optimization finished, #iter = 3436 nu = 0.974392 obj = -503.89262, rho = 0.07280 nsv = 0000, nbsv = 4950

44 Armand Eugen Păsărică Total nsv = 0000 Accuracy = 99.99% (9999/0000) (classification) Fig. 8 Screen containing SVM using LIBSVM Source: Output MATLAB. 5. Conclusions. Using the specific SVM technology with a proposed Gaussian kernel with γ = 00 on a set of normalized data reveals an accuracy of screening potential fraudulent cards around 99.96%. The accuracy is net superior to other non-linear kernels like sigmoid, polynomial or exponential. Using a different set of data, the optimal parameter (γ) might vary. 2. Due to his exceptional performance, we further propose to implement the technique of SVM with Gaussian kernel being superior to other standard classification methods like: neural network, logistic regression or Bayesian analysis. 3. Based on this study we recommend this model to be further applied in Card Fraud Monitoring activity at any Retail Business. 4. In addition to this technique, we propose similar procedures based on Gaussian kernels for other outliers detection problems like: fake checks, counterfeit banknotes and so on. REFERENCES * * * Generalized Tikhonov Regularization - Basic Theory and Comprehensive Results on Convergence Rate- Dipl.-Math. Jens Flemming Chemnitz, 28 October 20, 2. Chang Chih-Chung, Lin Chih-Jen, LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology National Taiwan University, 2 6 (20).

Bul. Inst. Polit. Iaşi, t. LX (LXIV), f. 2, 204 45 Cover T.M., Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition. Electronic Computers, IEEE Transactions on 965, 2 3. DasGupta A., Probability for Statistics and Machine Learning. Fundamentals and advanced topics, Springer, 736, 20. Huang T.-M., Kecman V., Kopriva I., Kernel Based Algorithms for Mining Huge Data Sets, Supervised, Semi-Supervised, and Unsupervised Learning. Springer- Verlag, Berlin (2006). Paulsen V., An Introduction to the Theory of Reproducing Kernel Hilbert Spaces. 4 6, 2009. Vapnik V., Statistical Learning Theory. Wiley-Interscience, 29 (998). DETECTAREA FRAUDELOR CU CARDURI BANCARE UTILIZÂND SVM (Rezumat) ImportanŃa detectării fraudelor cu carduri bancare este mare: în primul rând evită pierderi finanicare însemnate şi poate salva deopotrivă reputańia băncilor şi a comercianńilor; s-au efectuat numeroase studii, articole şi cărńi; subiectul este încă deschis şi nu are o soluńie stabilă acceptată. Lucrarea prezintă semnificańia învăńării de tip supervizat care este practic un proces cibernetic cu buclă feedback negativă, autoreglarea realizându-se prin minimizarea continuă a erorilor (de învăńare şi de testare). Decizia de utilizare a tehnicii Support Vector Machine în probleme de outliers classification se bazează pe numeroasele teste de tip benchmark care au reflectat superioritatea asupra altor clase de algoritmi: reńele neurale sau bayessiene. Lucrarea motivează de ce ideea de minimizare a riscului empiric este echivalentă atât cu minimizarea distanńei geografice care este de fapt ideea centrală în SVM (Vapnik), dar reprezintă şi un element de regularizare. De aceea este prezentat modul în care regularizarea Tikhonov generalizată reprezintă un bun compromis între bias şi varianńă. Abordarea metodologică presupune construcńia funcńiei de cost pornind de la funcńia sigmoid şi a EMV şi demonstrarea faptului că aceasta este convexă pe R. De un real ajutor a fost experienńa departamentului Card fraud monitoring din ING BANK în ceea ce priveşte selectarea variabilelor cu adevărat utile în construcńia modelului. În final, a fost utilizat pachetul libsvm în Matlab pentru diferitele simulări numerice datorită uşurinńei în lucrul cu volume mari de date precum şi a flexibilităńii deosebite în customizarea parametrilor funcńiei de kernel. A fost demonstrat numeric faptul că utilizând tehnica SVM cu funcńie de kernel Gaussian, acurateńea în faza de învăńare depăşeşte 90% clasificare corectă pe eşantioane generate aleator. În concluzie, propunem cu încredere utilizarea acestui model în lupta împotriva fraudelor cu carduri bancare.