Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit



Similar documents
Fraud - Consequences of Cutting Edge Solutions

Plastic Card Fraud Detection using Peer Group analysis

Intrusion Detection via Machine Learning for SCADA System Protection

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 2: Statistical models of default

Fraud Detection for Online Retail using Random Forests

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

An effective approach to preventing application fraud. Experian Fraud Analytics

Gerry Hobbs, Department of Statistics, West Virginia University

SAS Fraud Framework for Banking

POINT OF SALE FRAUD PREVENTION

How to Design Better Financial Regulation COST-SENSITIVE CLASSIFIERS AND RIA: CREDIT FRAUD DETECTION CASE STUDY. Pietro Scabellone

Dan French Founder & CEO, Consider Solutions

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Despite its emphasis on credit-scoring/rating model validation,

Credit Risk Models. August 24 26, 2010

Classification of Bad Accounts in Credit Card Industry

Ira J. Haimowitz Henry Schwarz

MHI3000 Big Data Analytics for Health Care Final Project Report

Soft Computing Tools in Credit card fraud & Detection Rashmi G.Dukhi G.H.Raisoni Institute of Information & Technology, Nagpur rashmidukhi25@gmail.

DATA MINING APPLICATION IN CREDIT CARD FRAUD DETECTION SYSTEM

Prediction of Stock Performance Using Analytical Techniques

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

INTRODUCTION TO RATING MODELS

Unsupervised Profiling Methods for Fraud Detection

Data Mining. Nonlinear Classification

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Adaptive Anomaly Detection for Network Security

Credit Card Fraud Detection Using Self Organised Map

Local outlier detection in data forensics: data mining approach to flag unusual schools

II. Methods X (i.e. if the system is convective or not). Y = 1 X ). Usually, given these estimates, an

A Statistical Method for Profiling Network Traffic

Uncovering More Insurance Fraud with Predictive Analytics Strategies for Improving Results and Reducing Losses

Knowledge Discovery in Stock Market Data

Detecting Credit Card Fraud by Decision Trees and Support Vector Machines

An Overview of Knowledge Discovery Database and Data mining Techniques

The State of Play in Cyber Payments Fraud Improving Security for Online & Card Not Present Transactions

A CHASE PAYMENTECH WHITE PAPER. Expanding internationally: Strategies to combat online fraud

Decision Support Systems

Neural Network Predictor for Fraud Detection: A Study Case for the Federal Patrimony Department

How To Make A Credit Risk Model For A Bank Account

Fraudulent accounts in collections: improve detection and reduce collector workload. An Experian briefing paper

THE USE OF PREDICTIVE MODELLING TO BOOST DEBT COLLECTION EFFICIENCY

Principles of Data Mining by Hand&Mannila&Smyth

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

OUTLIER ANALYSIS. Data Mining 1

Detecting Credit Card Fraud

A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model

Azure Machine Learning, SQL Data Mining and R

Guide to credit card security

Statistical Machine Learning

Introduction to time series analysis

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Why is Internal Audit so Hard?

WHY YOUR CREDIT HISTORY MATTERS AND HOW TO IMPROVE IT.

Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms

Take Charge of Credit Cards

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

PROBLEM REDUCTION IN ONLINE PAYMENT SYSTEM USING HYBRID MODEL

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

CHAPTER VII CONCLUSIONS

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

6 Hedging Using Futures

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models

Data Mining Applications in Higher Education

1,000 ajobse]dd. Accenture All rights reserved. Commercial in confidence. Subject to contract. Oct

Bond valuation and bond yields

Using Analytics to detect and prevent Healthcare fraud. Copyright 2010 SAS Institute Inc. All rights reserved.

Modern Fraud Prevention from a Bank s Point of View

SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS

White Paper. Predictive Modeling for True-Name Fraud An Equifax Analytical Services Research Paper

Predictive time series analysis of stock prices using neural network classifier

The New Reality of Synthetic ID Fraud How to Battle the Leading Identity Fraud Tactic in The Digital Age

E-commerce Transaction Anomaly Classification

Knowledge Discovery and Data Mining

Using kernel methods to visualise crime data

Data Mining + Business Intelligence. Integration, Design and Implementation

Application of Hidden Markov Model in Credit Card Fraud Detection

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

DATA MINING TECHNIQUES AND APPLICATIONS

BinBase.com REPORT: credit card fraud

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

When Your Child s Identity Is Stolen

Take Charge of Credit Cards Note Taking Guide

Recognize the many faces of fraud

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Chapter 4: Vector Autoregressive Models

Credit Card Fraud Detection using Hidden Morkov Model and Neural Networks

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Active Learning SVM for Blogs recommendation

Predictive Analytics Modeling Methodology Document

A Property & Casualty Insurance Predictive Modeling Process in SAS

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

Virtual Site Event. Predictive Analytics: What Managers Need to Know. Presented by: Paul Arnest, MS, MBA, PMP February 11, 2015

Improving Credit Card Fraud Detection with Calibrated Probabilities

11. Analysis of Case-control Studies Logistic Regression

2.1 The Present Value of an Annuity

Some Statistical Applications In The Financial Services Industry

Transcription:

Statistics in Retail Finance Chapter 7: Fraud Detection in Retail Credit 1

Overview > Detection of fraud remains an important issue in retail credit. Methods similar to scorecard development may be employed, but there are some problems specific to this application area. In this chapter we discuss:- Types of fraud and size of the problem. Automated fraud detection. Two-class and one-class classifiers for fraud detection. Parzen density estimation. Evaluation issues for fraud detection. 2

References > There is not too much material on fraud detection in retail finance. The following sources should be useful. Fraud The Facts (2012) Financial Fraud Action UK report (http://www.financialfraudaction.org.uk/download.asp?file=2699) Anderson R (2007) The Credit Scoring Toolkit: theory and practice for retail credit risk management and decision automation. NY: OUP. Hit em where it hurts: Using analytics to lock up fraudsters. SAS white paper 2012 Dorronsoro JR, Ginel F, Sanchez C and Santa Cruz C, Neural fraud detection in credit card operations, IEEE transactions on Neural Networks, Vol.8, no.4, July 1997. Juszczak P, Adams NM, Hand DJ, Whitrow C, Weston DJ, Off-the-peg and bespoke classifiers for fraud detection, Computational statistics and data analysis 52 (2008) 4521-4532. 3

Types of fraud > Theft fraud. A credit card is physically stolen or lost and used by someone other than the card holder. Card mail non-receipt fraud. A type of theft, but before the genuine card holder gets the card. Counterfeit fraud. A credit card is physically faked and used. Application fraud. An individual applies for credit deliberately using false information. Bankruptcy fraud. A person receives and uses credit knowing that they will be personally bankrupt in future. 4

Behavioural fraud / Card-not-present (CNP) fraud. Credit card details are taken and used remotely by someone other than card holder. Common in telephone sales, internet commerce and mail order. Example of real fraud http://www.bbc.co.uk/news/uk-england-somerset-20505489 5

Cost and detection of fraud > The loss due to credit card fraud is strongly related increasingly with the length of time from the time the fraud starts to the time the fraud is detected and the credit is stopped. When is fraud detected? For stolen or lost cards, a card can be stopped as soon as it is reported missing. For application and bankruptcy fraud, a problem may only become apparent when payments become due and are not met. For a personal loan, the whole amount could be lost. Counterfeit and behavioural fraud may only be detected when a customer spots an anomalous transaction on his/her account statement and reports this to the bank. Analytic methods in banks can be used to detect fraudulent behaviour. 6

million Statistics in Retail Finance Size of the fraud problem > Cost of retail credit fraud in UK (2001 to 2011). 700 600 500 400 300 200 Mail non-receipt Card ID theft Lost/stolen Counterfeit Card-not-present 100 0 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 Source: FFA UK (2012) Note: In 2004, chip-and-pin was introduced and this has been quoted as part of the reason for reduction in fraud losses from 2008. 7

Automated fraud detection > Automated methods are applied to detect behavioural fraud. The main issue here is the timeliness of the detection, to shorten the amount of time the fraud is operating. Usually automated methods generate fraud alerts that are followed up manually. Note, not all fraud alerts will turn out to be genuine fraud; many will be false alarms. This is a type of classification problem, to distinguish between legitimate transactions ( ) and fraudulent transactions ( ). 8

Special considerations for fraud detection > There are some special problems for fraud detection: 1.Need to process millions of transactions in real time. 2.Highly imbalanced classification problem. Ratio of fraudulent to legitimate transactions is typically less than 1:1000. 3.Nature of fraud is reflexive. That is, fraudsters adapt to the detection methods applied by banks to stop them. However, unlike application model development, there is less need to build an explanatory model, therefore complex structured non-linear models can be considered. 9

Automated fraud detection methods > There are four categories of methods:- 1.Business rules 2.Predictive models 3.Anomaly detection 4.Social network analysis 10

Method 1: Business rules > The simplest approach is to use expert knowledge to implement business knowledge of fraudulent behaviour as part of a computer-based expert system. A typical rule is:- Generate a fraud alert if a credit card is used abroad and it has not been used in that country in the past year and the credit card holder has not told the bank they will be visiting that country. 11

Method 2: Predictive models > We treat fraud detection as a classification problem and use a two-class classifier. The result is a fraud scorecard. Usually the fraud score is used with low scores indicating higher level of fraud risk and higher scores indicating lower level of fraud risk. Choose a classifier based on a model with functional form, such that ( ) for a transaction and some model parameters. Estimate fraud. based on a training data of past transactions that included 12

To deal with the high imbalance between classes, a simple filter can be applied first to detect and remove obviously legitimate transactions and so increase the ratio of fraudulent to legitimate transactions in the training data. o For example, inactive accounts and low value or repeated transactions could be removed. Research results and past experience show that models based on linear combinations of predictor variables such as OLS and logistic regression are not sufficient. Non-linear classifiers such as artificial neural networks (ANN) are effective and used in practice (eg SAS fraud tools). We do not have the scope to present ANNs in this course. 13

We can expect to have good results for types of fraud that are the same as the ones in the training data. This is because the two-class classifier is a model of the fraudulent behaviour observed. However, it is not expected to perform well if new types of fraud emerge over time. They will not have been modelled. 14

Method 3: Anomaly detection > An alternative to predictive modelling is to model only the legitimate transactions then report anomalies in new cases as potential fraudulent transactions. This method has the advantage that fraud is not explicitly modelled, so in principle it should be adaptable to new types of fraud that emerge. Additionally, the highly unbalanced nature of the data is not a problem since model is only based on the legitimate transactions. The one major problem is that it will not be sensitive to frauds which appear very similar to legitimate ones. One-class classifiers are used to build a model of legitimate transactions. Typically these work by modelling the probability density function (PDF) over the predictor variables for legitimate transactions. In this chapter we will use the common Parzen density estimator. 15

Anomaly detection process > A typical anomaly detection process is given as follows:- 1.Use an outlier detector to remove extreme cases from the training data (these may be errors, genuine outliers or fraudulent transactions). 2.Let ( ) be a training sequence of legitimate transactions (with outliers removed) 3.Denote outcome by { } where 1 denotes a legitimate transaction and 0 a fraudulent one. 4.Estimate PDF ( ) where is an estimation parameter. 5.A classification decision on a new observation is made as ( ( ) ) for some threshold on the density,. 16

The threshold can be set based on the (sensible) strategy of controlling the fraction of legitimate cases to be classified as anomalous, based on training data. This controls the false alert rate and also can be informed by how many alerts can be followed-up manually, which is constrained by business resources (eg how many staff are employed to do follow-up). We write this as the optimization task ( ( ) ) ( ) Note: The inequality is used here only for cases where the sum does not give an exact value of ( ). Because is minimized, the sum always gives a value as close to ( ) as possible. 17

Parzen density estimator > We could base the estimate on just the empirical frequency, but 1.This only works for univariate data and 2.It is a somewhat crude estimator of the underlying PDF: ( ) ( ) Instead we use a Parzen estimator that smooths over a multivariate sample to generate a distribution. ( ) ( ) where is some kernel which is symmetric, ( ) ( ), and integrates to 1, ( ), is a bandwidth parameter and is the dimensionality of (ie the number of predictor variables). 18

For any point in the variable space,, each value in the training sequence contributes to the estimate, but its contribution is weighted by its distance from, given by. The bandwidth controls the scaling of that distance within the kernel function. A typical kernel function is the multivariate normal distribution: ( ) ( ) ( ) In the R statistical language, the function density implements Parzen density estimation. 19

Exercise 9.1 Prove that ( ) 20

Example 9.1. This R code demonstrates Parzen density estimation and the use of bandwidth. The example simulates 200 observations from a mixture of two normal distributions. x <- c(rnorm(100,-2,1), rnorm(100,2,1)) par(mfrow=c(2,2)) hist(x) plot(density(x,bw=0.1), main="density estimate") plot(density(x,bw=0.5), main="density estimate") plot(density(x,bw=1.5), main="density estimate") 21

The following output is produced: 22

Method 4: Social network analysis > Very recently banks have been accessing publicly available social network data. This allows them to determine transactions that have some association with other individuals or accounts that are known to be fraudulent or suspect. This would reduce the fraud score of such transactions. Statistical methods that are evolving to deal with this data:- o Social network analysis, o Dynamic network analysis. This is a very new area and we will not investigate these topics further in this course. 23

Available data for fraud detection > Accounts data Including type of account, application details and aggregate behavioural characteristics. Transaction data Including spending and repayment patterns. Personal data Data the bank has about person holding the account, some of which may have been provided by a credit bureau. Location data Information about where the transaction was performed and the borrower lives. 24

Evaluation > Although, essentially a classification problem, the fraud problem has some characteristics that make evaluation of performance slightly different: 1.The timeliness of detection has an effect on the cost of the fraud. 2.The cost of monitoring automated fraud alerts is important. 3.It is necessary to ensure false alerts are kept to a minimum in order to not upset/alienate legitimate customers. At the moment there is no clear agreement about the best performance measure. As with scorecard development, typically base measures on the two CDFs: ( ) ( ) for some fraud score (remember lower value means more risk of fraud), and for each outcome { } (remember means legitimate). 25

Thus, plotting ( ) against ( ) gives the receiver-operating characteristics (ROC) curve and the area under the ROC curve (AUC) as classification performance measure: ( ) ( ) However, the ROC curve and AUC does not take into account the special points (1) to (3) given above. We consider a measure based on these terms: The false alarm rate is given by ( ). The undetected fraud rate is given by ( ). The alert rate, which is linked to the monitoring cost, is ( ) ( ). Notice that ( ) ( ) ( ) ( ) ( ). 26

Performance curve > The performance curve is an alternative to the ROC curve. Plot ( ) against ( ). o This plots monitoring cost (point 2) against proportion of frauds not detected. o Also, since ( ) ( ) ( ) and ( ) this also shows some control on false alarms (point 3). The point ( ( )) is the perfect performance: all detected at minimal possible cost. The line must pass through ( ) when no frauds are detected since no detection is performed. The performance given by a random classifier is where ( ) ( ). Hence this is the diagonal from (0,1) to (1,0). 27

Best performance is given by curves below this line, but area under the performance curve is a penalty measure: ( ) ( ) The x-axis is called a timeline since it captures an aspect of detection over time (point 1). o Basically as frauds are detected this increases the proportion of undetected frauds left in the data, so over time we expect to move along the x-axis. o This is similar to performance curves in engineering (eg stress versus performance curves). 28

Cost-based evaluation > The financial cost of fraud can be estimated directly. Based on history of past fraud or total exposure of account at time of fraud. This is based on past accounting data for those cases that have been correctly detected in the past. 29

Example 9.2 This is an example of a comparison between a one-class classifier, using Parzen density estimator a with two-class classifier. Uses the performance curve as an evaluation method. Based on Juszczak et al (2008). Data set: 11,383 accounts with 646,729 transactions with 3,217 (28.3%) fraudulent accounts and 18,501 (2.9%) fraudulent transacations. Transaction records over a 6 month period. Use Parzen density estimator as one-class classifier. 30

F( c) Statistics in Retail Finance Outcome of model build and test on hold-out sample:- Performance curve 0.5 0.4 0.3 0.2 0.1 F(c) 0-0.1 0.1 0.3 0.5 0.7 1-F0( c) Now consider forecasts over time and in comparison with comparable twoclass classifier (in this case a density-based Parzen classifier). 31

Cost F(c ) Statistics in Retail Finance Fixing ( )=0.2 and plotting cost against forecast ahead months. 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 2 3 4 5 6 Months One-class Two-class This shows that initially the two-class classifier gives slightly better performance. However, its performance deterioriates over time in comparison to the one-class classifier which is more robust. Our hypothesis is that the two-class classifier is not sensitive to new types of fraud. 32

Exercise 9.2 Suppose and ( ) { ( ) for { }. Let ( ) be a sequence of instances of, which correspond to legitimate transactions. 1.Show that is a kernel function for Parzen density estimation for random variable with bandwidth. 2.Using, compute the threshold that gives a false positive rate up to. 33

Review of Chapter 9 > In this chapter we have investigated:- Types of fraud and size of the problem. Automated fraud detection. Two-class and one-class classifiers for fraud detection. Parzen density estimation. Evaluation issues for fraud detection. 34