Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit
|
|
- Karen Barber
- 8 years ago
- Views:
Transcription
1 Statistics in Retail Finance Chapter 7: Fraud Detection in Retail Credit 1
2 Overview > Detection of fraud remains an important issue in retail credit. Methods similar to scorecard development may be employed, but there are some problems specific to this application area. In this chapter we discuss:- Types of fraud and size of the problem. Automated fraud detection. Two-class and one-class classifiers for fraud detection. Parzen density estimation. Evaluation issues for fraud detection. 2
3 References > There is not too much material on fraud detection in retail finance. The following sources should be useful. Fraud The Facts (2012) Financial Fraud Action UK report ( Anderson R (2007) The Credit Scoring Toolkit: theory and practice for retail credit risk management and decision automation. NY: OUP. Hit em where it hurts: Using analytics to lock up fraudsters. SAS white paper 2012 Dorronsoro JR, Ginel F, Sanchez C and Santa Cruz C, Neural fraud detection in credit card operations, IEEE transactions on Neural Networks, Vol.8, no.4, July Juszczak P, Adams NM, Hand DJ, Whitrow C, Weston DJ, Off-the-peg and bespoke classifiers for fraud detection, Computational statistics and data analysis 52 (2008)
4 Types of fraud > Theft fraud. A credit card is physically stolen or lost and used by someone other than the card holder. Card mail non-receipt fraud. A type of theft, but before the genuine card holder gets the card. Counterfeit fraud. A credit card is physically faked and used. Application fraud. An individual applies for credit deliberately using false information. Bankruptcy fraud. A person receives and uses credit knowing that they will be personally bankrupt in future. 4
5 Behavioural fraud / Card-not-present (CNP) fraud. Credit card details are taken and used remotely by someone other than card holder. Common in telephone sales, internet commerce and mail order. Example of real fraud 5
6 Cost and detection of fraud > The loss due to credit card fraud is strongly related increasingly with the length of time from the time the fraud starts to the time the fraud is detected and the credit is stopped. When is fraud detected? For stolen or lost cards, a card can be stopped as soon as it is reported missing. For application and bankruptcy fraud, a problem may only become apparent when payments become due and are not met. For a personal loan, the whole amount could be lost. Counterfeit and behavioural fraud may only be detected when a customer spots an anomalous transaction on his/her account statement and reports this to the bank. Analytic methods in banks can be used to detect fraudulent behaviour. 6
7 million Statistics in Retail Finance Size of the fraud problem > Cost of retail credit fraud in UK (2001 to 2011) Mail non-receipt Card ID theft Lost/stolen Counterfeit Card-not-present Source: FFA UK (2012) Note: In 2004, chip-and-pin was introduced and this has been quoted as part of the reason for reduction in fraud losses from
8 Automated fraud detection > Automated methods are applied to detect behavioural fraud. The main issue here is the timeliness of the detection, to shorten the amount of time the fraud is operating. Usually automated methods generate fraud alerts that are followed up manually. Note, not all fraud alerts will turn out to be genuine fraud; many will be false alarms. This is a type of classification problem, to distinguish between legitimate transactions ( ) and fraudulent transactions ( ). 8
9 Special considerations for fraud detection > There are some special problems for fraud detection: 1.Need to process millions of transactions in real time. 2.Highly imbalanced classification problem. Ratio of fraudulent to legitimate transactions is typically less than 1: Nature of fraud is reflexive. That is, fraudsters adapt to the detection methods applied by banks to stop them. However, unlike application model development, there is less need to build an explanatory model, therefore complex structured non-linear models can be considered. 9
10 Automated fraud detection methods > There are four categories of methods:- 1.Business rules 2.Predictive models 3.Anomaly detection 4.Social network analysis 10
11 Method 1: Business rules > The simplest approach is to use expert knowledge to implement business knowledge of fraudulent behaviour as part of a computer-based expert system. A typical rule is:- Generate a fraud alert if a credit card is used abroad and it has not been used in that country in the past year and the credit card holder has not told the bank they will be visiting that country. 11
12 Method 2: Predictive models > We treat fraud detection as a classification problem and use a two-class classifier. The result is a fraud scorecard. Usually the fraud score is used with low scores indicating higher level of fraud risk and higher scores indicating lower level of fraud risk. Choose a classifier based on a model with functional form, such that ( ) for a transaction and some model parameters. Estimate fraud. based on a training data of past transactions that included 12
13 To deal with the high imbalance between classes, a simple filter can be applied first to detect and remove obviously legitimate transactions and so increase the ratio of fraudulent to legitimate transactions in the training data. o For example, inactive accounts and low value or repeated transactions could be removed. Research results and past experience show that models based on linear combinations of predictor variables such as OLS and logistic regression are not sufficient. Non-linear classifiers such as artificial neural networks (ANN) are effective and used in practice (eg SAS fraud tools). We do not have the scope to present ANNs in this course. 13
14 We can expect to have good results for types of fraud that are the same as the ones in the training data. This is because the two-class classifier is a model of the fraudulent behaviour observed. However, it is not expected to perform well if new types of fraud emerge over time. They will not have been modelled. 14
15 Method 3: Anomaly detection > An alternative to predictive modelling is to model only the legitimate transactions then report anomalies in new cases as potential fraudulent transactions. This method has the advantage that fraud is not explicitly modelled, so in principle it should be adaptable to new types of fraud that emerge. Additionally, the highly unbalanced nature of the data is not a problem since model is only based on the legitimate transactions. The one major problem is that it will not be sensitive to frauds which appear very similar to legitimate ones. One-class classifiers are used to build a model of legitimate transactions. Typically these work by modelling the probability density function (PDF) over the predictor variables for legitimate transactions. In this chapter we will use the common Parzen density estimator. 15
16 Anomaly detection process > A typical anomaly detection process is given as follows:- 1.Use an outlier detector to remove extreme cases from the training data (these may be errors, genuine outliers or fraudulent transactions). 2.Let ( ) be a training sequence of legitimate transactions (with outliers removed) 3.Denote outcome by { } where 1 denotes a legitimate transaction and 0 a fraudulent one. 4.Estimate PDF ( ) where is an estimation parameter. 5.A classification decision on a new observation is made as ( ( ) ) for some threshold on the density,. 16
17 The threshold can be set based on the (sensible) strategy of controlling the fraction of legitimate cases to be classified as anomalous, based on training data. This controls the false alert rate and also can be informed by how many alerts can be followed-up manually, which is constrained by business resources (eg how many staff are employed to do follow-up). We write this as the optimization task ( ( ) ) ( ) Note: The inequality is used here only for cases where the sum does not give an exact value of ( ). Because is minimized, the sum always gives a value as close to ( ) as possible. 17
18 Parzen density estimator > We could base the estimate on just the empirical frequency, but 1.This only works for univariate data and 2.It is a somewhat crude estimator of the underlying PDF: ( ) ( ) Instead we use a Parzen estimator that smooths over a multivariate sample to generate a distribution. ( ) ( ) where is some kernel which is symmetric, ( ) ( ), and integrates to 1, ( ), is a bandwidth parameter and is the dimensionality of (ie the number of predictor variables). 18
19 For any point in the variable space,, each value in the training sequence contributes to the estimate, but its contribution is weighted by its distance from, given by. The bandwidth controls the scaling of that distance within the kernel function. A typical kernel function is the multivariate normal distribution: ( ) ( ) ( ) In the R statistical language, the function density implements Parzen density estimation. 19
20 Exercise 9.1 Prove that ( ) 20
21 Example 9.1. This R code demonstrates Parzen density estimation and the use of bandwidth. The example simulates 200 observations from a mixture of two normal distributions. x <- c(rnorm(100,-2,1), rnorm(100,2,1)) par(mfrow=c(2,2)) hist(x) plot(density(x,bw=0.1), main="density estimate") plot(density(x,bw=0.5), main="density estimate") plot(density(x,bw=1.5), main="density estimate") 21
22 The following output is produced: 22
23 Method 4: Social network analysis > Very recently banks have been accessing publicly available social network data. This allows them to determine transactions that have some association with other individuals or accounts that are known to be fraudulent or suspect. This would reduce the fraud score of such transactions. Statistical methods that are evolving to deal with this data:- o Social network analysis, o Dynamic network analysis. This is a very new area and we will not investigate these topics further in this course. 23
24 Available data for fraud detection > Accounts data Including type of account, application details and aggregate behavioural characteristics. Transaction data Including spending and repayment patterns. Personal data Data the bank has about person holding the account, some of which may have been provided by a credit bureau. Location data Information about where the transaction was performed and the borrower lives. 24
25 Evaluation > Although, essentially a classification problem, the fraud problem has some characteristics that make evaluation of performance slightly different: 1.The timeliness of detection has an effect on the cost of the fraud. 2.The cost of monitoring automated fraud alerts is important. 3.It is necessary to ensure false alerts are kept to a minimum in order to not upset/alienate legitimate customers. At the moment there is no clear agreement about the best performance measure. As with scorecard development, typically base measures on the two CDFs: ( ) ( ) for some fraud score (remember lower value means more risk of fraud), and for each outcome { } (remember means legitimate). 25
26 Thus, plotting ( ) against ( ) gives the receiver-operating characteristics (ROC) curve and the area under the ROC curve (AUC) as classification performance measure: ( ) ( ) However, the ROC curve and AUC does not take into account the special points (1) to (3) given above. We consider a measure based on these terms: The false alarm rate is given by ( ). The undetected fraud rate is given by ( ). The alert rate, which is linked to the monitoring cost, is ( ) ( ). Notice that ( ) ( ) ( ) ( ) ( ). 26
27 Performance curve > The performance curve is an alternative to the ROC curve. Plot ( ) against ( ). o This plots monitoring cost (point 2) against proportion of frauds not detected. o Also, since ( ) ( ) ( ) and ( ) this also shows some control on false alarms (point 3). The point ( ( )) is the perfect performance: all detected at minimal possible cost. The line must pass through ( ) when no frauds are detected since no detection is performed. The performance given by a random classifier is where ( ) ( ). Hence this is the diagonal from (0,1) to (1,0). 27
28 Best performance is given by curves below this line, but area under the performance curve is a penalty measure: ( ) ( ) The x-axis is called a timeline since it captures an aspect of detection over time (point 1). o Basically as frauds are detected this increases the proportion of undetected frauds left in the data, so over time we expect to move along the x-axis. o This is similar to performance curves in engineering (eg stress versus performance curves). 28
29 Cost-based evaluation > The financial cost of fraud can be estimated directly. Based on history of past fraud or total exposure of account at time of fraud. This is based on past accounting data for those cases that have been correctly detected in the past. 29
30 Example 9.2 This is an example of a comparison between a one-class classifier, using Parzen density estimator a with two-class classifier. Uses the performance curve as an evaluation method. Based on Juszczak et al (2008). Data set: 11,383 accounts with 646,729 transactions with 3,217 (28.3%) fraudulent accounts and 18,501 (2.9%) fraudulent transacations. Transaction records over a 6 month period. Use Parzen density estimator as one-class classifier. 30
31 F( c) Statistics in Retail Finance Outcome of model build and test on hold-out sample:- Performance curve F(c) F0( c) Now consider forecasts over time and in comparison with comparable twoclass classifier (in this case a density-based Parzen classifier). 31
32 Cost F(c ) Statistics in Retail Finance Fixing ( )=0.2 and plotting cost against forecast ahead months Months One-class Two-class This shows that initially the two-class classifier gives slightly better performance. However, its performance deterioriates over time in comparison to the one-class classifier which is more robust. Our hypothesis is that the two-class classifier is not sensitive to new types of fraud. 32
33 Exercise 9.2 Suppose and ( ) { ( ) for { }. Let ( ) be a sequence of instances of, which correspond to legitimate transactions. 1.Show that is a kernel function for Parzen density estimation for random variable with bandwidth. 2.Using, compute the threshold that gives a false positive rate up to. 33
34 Review of Chapter 9 > In this chapter we have investigated:- Types of fraud and size of the problem. Automated fraud detection. Two-class and one-class classifiers for fraud detection. Parzen density estimation. Evaluation issues for fraud detection. 34
Fraud - Consequences of Cutting Edge Solutions
Detection using Peer Group analysis David Weston, Niall Adams, David Hand, Christopher Whitrow, Piotr Juszczak 19 September, 2007 19/09/07 1 / 69 EPSRC Think Crime Peer Group Crime Prevention & Detection
More informationPlastic Card Fraud Detection using Peer Group analysis
Plastic Card Fraud Detection using Peer Group analysis David Weston, Niall Adams, David Hand, Christopher Whitrow, Piotr Juszczak 29 August, 2007 29/08/07 1 / 54 EPSRC Think Crime Peer Group - Peer Group
More informationIntrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. s.l.yasakethu@surrey.ac.uk J. Jiang Department
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More informationFraud Detection for Online Retail using Random Forests
Fraud Detection for Online Retail using Random Forests Eric Altendorf, Peter Brende, Josh Daniel, Laurent Lessard Abstract As online commerce becomes more common, fraud is an increasingly important concern.
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationAn effective approach to preventing application fraud. Experian Fraud Analytics
An effective approach to preventing application fraud Experian Fraud Analytics The growing threat of application fraud Fraud attacks are increasing across the world Application fraud is a rapidly growing
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationSAS Fraud Framework for Banking
SAS Fraud Framework for Banking Including Social Network Analysis John C. Brocklebank, Ph.D. Vice President, SAS Solutions OnDemand Advanced Analytics Lab SAS Fraud Framework for Banking Agenda Introduction
More informationPOINT OF SALE FRAUD PREVENTION
2002 IEEE Systems and Information Design Symposium University of Virginia POINT OF SALE FRAUD PREVENTION Student team: Jeff Grossman, Dawn Herndon, Andrew Kaplan, Mark Michalski, Carlton Pate Faculty Advisors:
More informationHow to Design Better Financial Regulation COST-SENSITIVE CLASSIFIERS AND RIA: CREDIT FRAUD DETECTION CASE STUDY. Pietro Scabellone
How to Design Better Financial Regulation COST-SENSITIVE CLASSIFIERS AND RIA: CREDIT FRAUD DETECTION CASE STUDY Pietro Scabellone Ljubljana, September 12-14, 2007 ABSTRACT Classification methods are of
More informationDan French Founder & CEO, Consider Solutions
Dan French Founder & CEO, Consider Solutions CONSIDER SOLUTIONS Mission Solutions for World Class Finance Footprint Financial Control & Compliance Risk Assurance Process Optimization CLIENTS CONTEXT The
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationDespite its emphasis on credit-scoring/rating model validation,
RETAIL RISK MANAGEMENT Empirical Validation of Retail Always a good idea, development of a systematic, enterprise-wide method to continuously validate credit-scoring/rating models nonetheless received
More informationCredit Risk Models. August 24 26, 2010
Credit Risk Models August 24 26, 2010 AGENDA 1 st Case Study : Credit Rating Model Borrowers and Factoring (Accounts Receivable Financing) pages 3 10 2 nd Case Study : Credit Scoring Model Automobile Leasing
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationIra J. Haimowitz Henry Schwarz
From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Clustering and Prediction for Credit Line Optimization Ira J. Haimowitz Henry Schwarz General
More informationMHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given
More informationSoft Computing Tools in Credit card fraud & Detection Rashmi G.Dukhi G.H.Raisoni Institute of Information & Technology, Nagpur rashmidukhi25@gmail.
Soft Computing Tools in Credit card fraud & Detection Rashmi G.Dukhi G.H.Raisoni Institute of Information & Technology, Nagpur rashmidukhi25@gmail.com Abstract Fraud is one of the major ethical issues
More informationDATA MINING APPLICATION IN CREDIT CARD FRAUD DETECTION SYSTEM
Journal of Engineering Science and Technology Vol. 6, No. 3 (2011) 311-322 School of Engineering, Taylor s University DATA MINING APPLICATION IN CREDIT CARD FRAUD DETECTION SYSTEM FRANCISCA NONYELUM OGWUELEKA
More informationPrediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
More informationCOPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationINTRODUCTION TO RATING MODELS
INTRODUCTION TO RATING MODELS Dr. Daniel Straumann Credit Suisse Credit Portfolio Analytics Zurich, May 26, 2005 May 26, 2005 / Daniel Straumann Slide 2 Motivation Due to the Basle II Accord every bank
More informationUnsupervised Profiling Methods for Fraud Detection
Unsupervised Profiling Methods for Fraud Detection Richard J. Bolton and David J. Hand Department of Mathematics Imperial College London {r.bolton, d.j.hand}@ic.ac.uk Abstract Credit card fraud falls broadly
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationAccurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are
More informationAdaptive Anomaly Detection for Network Security
International Journal of Computer and Internet Security. ISSN 0974-2247 Volume 5, Number 1 (2013), pp. 1-9 International Research Publication House http://www.irphouse.com Adaptive Anomaly Detection for
More informationCredit Card Fraud Detection Using Self Organised Map
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 13 (2014), pp. 1343-1348 International Research Publications House http://www. irphouse.com Credit Card Fraud
More informationLocal outlier detection in data forensics: data mining approach to flag unusual schools
Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential
More informationII. Methods - 2 - X (i.e. if the system is convective or not). Y = 1 X ). Usually, given these estimates, an
STORMS PREDICTION: LOGISTIC REGRESSION VS RANDOM FOREST FOR UNBALANCED DATA Anne Ruiz-Gazen Institut de Mathématiques de Toulouse and Gremaq, Université Toulouse I, France Nathalie Villa Institut de Mathématiques
More informationA Statistical Method for Profiling Network Traffic
THE ADVANCED COMPUTING SYSTEMS ASSOCIATION The following paper was originally published in the Proceedings of the Workshop on Intrusion Detection and Network Monitoring Santa Clara, California, USA, April
More informationUncovering More Insurance Fraud with Predictive Analytics Strategies for Improving Results and Reducing Losses
white paper Uncovering More Insurance Fraud with Predictive Analytics Strategies for Improving Results and Reducing Losses April 2012 Summary Predictive analytics are a powerful tool for detecting more
More informationKnowledge Discovery in Stock Market Data
Knowledge Discovery in Stock Market Data Alfred Ultsch and Hermann Locarek-Junge Abstract This work presents the results of a Data Mining and Knowledge Discovery approach on data from the stock markets
More informationDetecting Credit Card Fraud by Decision Trees and Support Vector Machines
Detecting Credit Card Fraud by Decision Trees and Support Vector Machines Y. Sahin and E. Duman Abstract With the developments in the Information Technology and improvements in the communication channels,
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationThe State of Play in Cyber Payments Fraud Improving Security for Online & Card Not Present Transactions
The State of Play in Cyber Payments Fraud Improving Security for Online & Card Not Present Transactions Mark Greene, Ph.D CEO, FICO Federal Reserve Bank of Chicago 26 September 2011 Cybercrime Costs 431
More informationA CHASE PAYMENTECH WHITE PAPER. Expanding internationally: Strategies to combat online fraud
A CHASE PAYMENTECH WHITE PAPER Expanding internationally: Strategies to combat online fraud Fraud impacts nearly eight in every ten international online retailers 1. It hampers prospects for growth, restricts
More informationDecision Support Systems
Decision Support Systems 50 (2011) 602 613 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Data mining for credit card fraud: A comparative
More informationNeural Network Predictor for Fraud Detection: A Study Case for the Federal Patrimony Department
DOI: 10.5769/C2012010 or http://dx.doi.org/10.5769/c2012010 Neural Network Predictor for Fraud Detection: A Study Case for the Federal Patrimony Department Antonio Manuel Rubio Serrano (1,2), João Paulo
More informationHow To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationFraudulent accounts in collections: improve detection and reduce collector workload. An Experian briefing paper
Fraudulent accounts in collections: improve detection and reduce collector workload An Experian briefing paper September 2010 1. The inter-relationship of fraud and collections The fraud and collections
More informationTHE USE OF PREDICTIVE MODELLING TO BOOST DEBT COLLECTION EFFICIENCY
CREDIT SCORING AND CREDIT CONTROL XIII EDINBURGH 28-30 AUGUST 2013 THE USE OF PREDICTIVE MODELLING TO BOOST DEBT COLLECTION EFFICIENCY MARCIN NADOLNY SAS INSTITUTE POLAND Many executives fear that the
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationOUTLIER ANALYSIS. Data Mining 1
OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,
More informationDetecting Credit Card Fraud
Case Study Detecting Credit Card Fraud Analysis of Behaviometrics in an online Payment environment Introduction BehavioSec have been conducting tests on Behaviometrics stemming from card payments within
More informationA Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model
A Secured Approach to Credit Card Fraud Detection Using Hidden Markov Model Twinkle Patel, Ms. Ompriya Kale Abstract: - As the usage of credit card has increased the credit card fraud has also increased
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationGuide to credit card security
Contents Click on a title below to jump straight to that section. What is credit card fraud? Types of credit card fraud Current scams Keeping your card and card details safe Banking and shopping securely
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationIntroduction to time series analysis
Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples
More informationAcknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
More informationWhy is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
More informationWHY YOUR CREDIT HISTORY MATTERS AND HOW TO IMPROVE IT.
WHY YOUR CREDIT HISTORY MATTERS AND HOW TO IMPROVE IT. CONTENTS. 1 WHY YOUR CREDIT HISTORY MATTERS 1 2 WHAT S CREDIT? 2 3 WHAT IS A CREDIT REPORT? 3 4 CHECKING YOUR CREDIT REPORT 4 5 IMPROVING YOUR CREDIT
More informationFinancial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms
Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms Johan Perols Assistant Professor University of San Diego, San Diego, CA 92110 jperols@sandiego.edu April
More informationTake Charge of Credit Cards
Take Charge of Credit Cards Get Ready to Take Charge of Your Finances Introductory Level What is Credit? Credit- something is received in exchange for a promise to pay back money in the future Borrower
More informationComparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
More informationPROBLEM REDUCTION IN ONLINE PAYMENT SYSTEM USING HYBRID MODEL
PROBLEM REDUCTION IN ONLINE PAYMENT SYSTEM USING HYBRID MODEL Sandeep Pratap Singh 1, Shiv Shankar P. Shukla 1, Nitin Rakesh 1 and Vipin Tyagi 2 1 Department of Computer Science and Engineering, Jaypee
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationCHAPTER VII CONCLUSIONS
CHAPTER VII CONCLUSIONS To do successful research, you don t need to know everything, you just need to know of one thing that isn t known. -Arthur Schawlow In this chapter, we provide the summery of the
More informationSilvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com
SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
More information6 Hedging Using Futures
ECG590I Asset Pricing. Lecture 6: Hedging Using Futures 1 6 Hedging Using Futures 6.1 Types of hedges using futures Two types of hedge: short and long. ECG590I Asset Pricing. Lecture 6: Hedging Using Futures
More informationInsurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationMonitoring the Behaviour of Credit Card Holders with Graphical Chain Models
Journal of Business Finance & Accounting, 30(9) & (10), Nov./Dec. 2003, 0306-686X Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models ELENA STANGHELLINI* 1. INTRODUCTION Consumer
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More information1,000 ajobse]dd. Accenture 2013. All rights reserved. Commercial in confidence. Subject to contract. Oct 2012 1
1,000 ajobse]dd Oct 2012 1 1. Executive Summary The Department of Social Protection (DSP) is responsible for the provision of income supports and employment services in Ireland and process around 20billion
More informationBond valuation and bond yields
RELEVANT TO ACCA QUALIFICATION PAPER P4 AND PERFORMANCE OBJECTIVES 15 AND 16 Bond valuation and bond yields Bonds and their variants such as loan notes, debentures and loan stock, are IOUs issued by governments
More informationUsing Analytics to detect and prevent Healthcare fraud. Copyright 2010 SAS Institute Inc. All rights reserved.
Using Analytics to detect and prevent Healthcare fraud Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Introductions International Fraud Trends Overview of the use of Analytics in Healthcare
More informationModern Fraud Prevention from a Bank s Point of View
Modern Fraud Prevention from a Bank s Point of View Extract from an interview between Alexey Golenishev, Payment Schemes Relationships, Head of Department, Alfa-Bank and PLUS Magazine #8 [148] September
More informationSAMPLE SELECTION BIAS IN CREDIT SCORING MODELS
SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS John Banasik, Jonathan Crook Credit Research Centre, University of Edinburgh Lyn Thomas University of Southampton ssm0 The Problem We wish to estimate an
More informationWhite Paper. Predictive Modeling for True-Name Fraud An Equifax Analytical Services Research Paper
White Paper Predictive Modeling for True-Name Fraud An Equifax Analytical Services Research Paper Dave Whitin, Consultant Michiko Wolcott, Statistician September 2006 Table of contents Executive summary...................................
More informationPredictive time series analysis of stock prices using neural network classifier
Predictive time series analysis of stock prices using neural network classifier Abhinav Pathak, National Institute of Technology, Karnataka, Surathkal, India abhi.pat93@gmail.com Abstract The work pertains
More informationThe New Reality of Synthetic ID Fraud How to Battle the Leading Identity Fraud Tactic in The Digital Age
How to Battle the Leading Identity Fraud Tactic in The Digital Age In the 15 years since synthetic identity fraud emerged as a significant threat, it has become the predominant tactic for fraudsters. The
More informationE-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC
More informationUsing kernel methods to visualise crime data
Submission for the 2013 IAOS Prize for Young Statisticians Using kernel methods to visualise crime data Dr. Kieran Martin and Dr. Martin Ralphs kieran.martin@ons.gov.uk martin.ralphs@ons.gov.uk Office
More informationData Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
More informationApplication of Hidden Markov Model in Credit Card Fraud Detection
Application of Hidden Markov Model in Credit Card Fraud Detection V. Bhusari 1, S. Patil 1 1 Department of Computer Technology, College of Engineering, Bharati Vidyapeeth, Pune, India, 400011 Email: vrunda1234@gmail.com
More informationTHE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationBinBase.com REPORT: credit card fraud
BinBase.com REPORT: credit card fraud Whether you are a security specialist, an e-commerce web developer, or an online merchant, a knowledge of how credit card fraud works and what you can do to prevent
More informationCONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
More informationWhen Your Child s Identity Is Stolen
When Your Child s Identity Is Stolen Consumer Information Sheet 3B May 2015 What Is Child Identity Theft? Adults are not the only targets of identity theft. In fact, children under the age of 18 can also
More informationTake Charge of Credit Cards Note Taking Guide
2.4.1.L1 Note taking guide Take Charge of Credit Cards Note Taking Guide Total Points Earned Total Points Possible Percentage What is credit? A credit card is a form of credit! What is interest? What is
More informationRecognize the many faces of fraud
Recognize the many faces of fraud Detect and prevent fraud by finding subtle patterns and associations in your data Contents: 1 Introduction 2 The many faces of fraud 3 Detect healthcare fraud easily and
More informationPredictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm
More informationChapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
More informationCredit Card Fraud Detection using Hidden Morkov Model and Neural Networks
Credit Card Fraud Detection using Hidden Morkov Model and Neural Networks R.RAJAMANI Assistant Professor, Department of Computer Science, PSG College of Arts & Science, Coimbatore. Email: rajamani_devadoss@yahoo.co.in
More informationUsing reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationPredictive Analytics Modeling Methodology Document
Predictive Analytics Modeling Methodology Document Campaign Response Modeling 17 October- 2012 Version details Version number Date Author Reviewer name 1.0 16 October- 2012 Vikash chandra CONTENTS 1. TRAINING
More informationA Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
More informationDECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING
DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four
More informationVirtual Site Event. Predictive Analytics: What Managers Need to Know. Presented by: Paul Arnest, MS, MBA, PMP February 11, 2015
Virtual Site Event Predictive Analytics: What Managers Need to Know Presented by: Paul Arnest, MS, MBA, PMP February 11, 2015 1 Ground Rules Virtual Site Ground Rules PMI Code of Conduct applies for this
More informationImproving Credit Card Fraud Detection with Calibrated Probabilities
Improving Credit Card Fraud Detection with Calibrated Probabilities Alejandro Correa Bahnsen, Aleksandar Stojanovic, Djamila Aouada and Björn Ottersten Interdisciplinary Centre for Security, Reliability
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More information2.1 The Present Value of an Annuity
2.1 The Present Value of an Annuity One example of a fixed annuity is an agreement to pay someone a fixed amount x for N periods (commonly months or years), e.g. a fixed pension It is assumed that the
More informationSome Statistical Applications In The Financial Services Industry
Some Statistical Applications In The Financial Services Industry Wenqing Lu May 30, 2008 1 Introduction Examples of consumer financial services credit card services mortgage loan services auto finance
More information