COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Similar documents
Customer and Business Analytic

From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here.

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Using multiple models: Bagging, Boosting, Ensembles, Forests

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Azure Machine Learning, SQL Data Mining and R

Mining. Practical. Data. Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. CRC Press. Taylor & Francis Group

Contents. List of Figures. List of Tables. List of Examples. Preface to Volume IV

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Learning outcomes. Knowledge and understanding. Competence and skills

Data Mining + Business Intelligence. Integration, Design and Implementation

Using Data Mining for Mobile Communication Clustering and Characterization

Model Deployment. Dr. Saed Sayad. University of Toronto

E-commerce Transaction Anomaly Classification

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

PROGRAM DIRECTOR: Arthur O Connor Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements

An Introduction to Data Mining

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Predictive Modeling and Big Data

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Microsoft Azure Machine learning Algorithms

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Predictive modelling around the world

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Data Mining. Nonlinear Classification

Classification of Bad Accounts in Credit Card Industry

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

Supervised Learning (Big Data Analytics)

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Big Data Analytics and Optimization

Machine learning for algo trading

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

NEURAL NETWORKS A Comprehensive Foundation

HT2015: SC4 Statistical Data Mining and Machine Learning

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

life science data mining

Forecasting and Hedging in the Foreign Exchange Markets

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

MS1b Statistical Data Mining

How To Make A Credit Risk Model For A Bank Account

The Predictive Data Mining Revolution in Scorecards:

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Big Data and Marketing

How To Identify A Churner

Random Forest Based Imbalanced Data Cleaning and Classification

The Data Mining Process

Predictive Dynamix Inc

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Machine Learning Capacity and Performance Analysis and R

Operationalising Predictive Insights

Contents. Dedication List of Figures List of Tables. Acknowledgments

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Banking Analytics Training Program

Advanced Database Marketing Innovative Methodologies and Applications for Managing Customer Relationships

Data Mining Applications in Higher Education

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

How To Detect Fraud With Reputation Features And Other Features

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Programming Using Python

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

STA 4273H: Statistical Machine Learning

Decision Support Systems

Support Vector Machines with Clustering for Training with Very Large Datasets

Machine Learning with MATLAB David Willingham Application Engineer

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

An Overview of Knowledge Discovery Database and Data mining Techniques

Principles of Data Mining by Hand&Mannila&Smyth

Chapter ML:XI (continued)

MSCA Introduction to Statistical Concepts

Data Warehousing and Data Mining in Business Applications

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Maschinelles Lernen mit MATLAB

Business Analytics and Credit Scoring

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Pentaho Data Mining Last Modified on January 22, 2007

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

ANALYTICS CENTER LEARNING PROGRAM

Make Better Decisions Through Predictive Intelligence

Monday Morning Data Mining

Leveraging Ensemble Models in SAS Enterprise Miner

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90

Java Modules for Time Series Analysis

Course Syllabus. Purposes of Course:

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Social Media Mining. Data Mining Essentials

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Transcription:

Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for Fraud Detection 15 Data-Driven Fraud Detection 17 Fraud-Detection Techniques 19 Fraud Cycle 22 The Fraud Analytics Process Model 26 Fraud Data Scientists 30 A Fraud Data Scientist Should Have Solid Quantitative Skills 30 A Fraud Data Scientist Should Be a Good Programmer 31 A Fraud Data Scientist Should Excel in Communication and Visualization Skills 31 A Fraud Data Scientist Should Have a Solid Business Understanding 32 A Fraud Data Scientist Should Be Creative 32 A Scientific Perspective on Fraud 33 References 35 COPYRIGHTED MATERIAL Chapter 2 Data Collection, Sampling, and Preprocessing 37 Introduction 38 Types of Data Sources 38 Merging Data Sources 43 Sampling 45 Types of Data Elements 46 ix

x CONTENTS Visual Data Exploration and Exploratory Statistical Analysis 47 Benford s Law 48 Descriptive Statistics 51 Missing Values 52 Outlier Detection and Treatment 53 Red Flags 57 Standardizing Data 59 Categorization 60 Weights of Evidence Coding 63 Variable Selection 65 Principal Components Analysis 68 RIDITs 72 PRIDIT Analysis 73 Segmentation 74 References 75 Chapter 3 Descriptive Analytics for Fraud Detection 77 Introduction 78 Graphical Outlier Detection Procedures 79 Statistical Outlier Detection Procedures 83 Break-Point Analysis 84 Peer-Group Analysis 85 Association Rule Analysis 87 Clustering 89 Introduction 89 Distance Metrics 90 Hierarchical Clustering 94 Example of Hierarchical Clustering Procedures 97 k-means Clustering 104 Self-Organizing Maps 109 Clustering with Constraints 111 Evaluating and Interpreting Clustering Solutions 114 One-Class SVMs 117 References 118 Chapter 4 Predictive Analytics for Fraud Detection 121 Introduction 122 Target Definition 123 Linear Regression 125 Logistic Regression 127 Basic Concepts 127 Logistic Regression Properties 129 Building a Logistic Regression Scorecard 131

CONTENTS xi Variable Selection for Linear and Logistic Regression 133 Decision Trees 136 Basic Concepts 136 Splitting Decision 137 Stopping Decision 140 Decision Tree Properties 141 Regression Trees 142 Using Decision Trees in Fraud Analytics 143 Neural Networks 144 Basic Concepts 144 Weight Learning 147 Opening the Neural Network Black Box 150 Support Vector Machines 155 Linear Programming 155 The Linear Separable Case 156 The Linear Nonseparable Case 159 The Nonlinear SVM Classifier 160 SVMs for Regression 161 Opening the SVM Black Box 163 Ensemble Methods 164 Bagging 164 Boosting 165 Random Forests 166 Evaluating Ensemble Methods 167 Multiclass Classification Techniques 168 Multiclass Logistic Regression 168 Multiclass Decision Trees 170 Multiclass Neural Networks 170 Multiclass Support Vector Machines 171 Evaluating Predictive Models 172 Splitting Up the Data Set 172 Performance Measures for Classification Models 176 Performance Measures for Regression Models 185 Other Performance Measures for Predictive Analytical Models 188 Developing Predictive Models for Skewed Data Sets 189 Varying the Sample Window 190 Undersampling and Oversampling 190 Synthetic Minority Oversampling Technique (SMOTE) 192 Likelihood Approach 194 Adjusting Posterior Probabilities 197 Cost-sensitive Learning 198 Fraud Performance Benchmarks 200 References 201

xii CONTENTS Chapter 5 Social Network Analysis for Fraud Detection 207 Networks: Form, Components, Characteristics, and Their Applications 209 Social Networks 211 Network Components 214 Network Representation 219 Is Fraud a Social Phenomenon? An Introduction to Homophily 222 Impact of the Neighborhood: Metrics 227 Neighborhood Metrics 228 Centrality Metrics 238 Collective Inference Algorithms 246 Featurization: Summary Overview 254 Community Mining: Finding Groups of Fraudsters 254 Extending the Graph: Toward a Bipartite Representation 266 Multipartite Graphs 269 Case Study: Gotcha! 270 References 277 Chapter 6 Fraud Analytics: Post-Processing 279 Introduction 280 The Analytical Fraud Model Life Cycle 280 Model Representation 281 Traffic Light Indicator Approach 282 Decision Tables 283 Selecting the Sample to Investigate 286 Fraud Alert and Case Management 290 Visual Analytics 296 Backtesting Analytical Fraud Models 302 Introduction 302 Backtesting Data Stability 302 Backtesting Model Stability 305 Backtesting Model Calibration 308 Model Design and Documentation 311 References 312 Chapter 7 Fraud Analytics: A Broader Perspective 313 Introduction 314 Data Quality 314 Data-Quality Issues 314 Data-Quality Programs and Management 315 Privacy 317 The RACI Matrix 318 Accessing Internal Data 319

CONTENTS xiii Label-Based Access Control (LBAC) 324 Accessing External Data 325 Capital Calculation for Fraud Loss 326 Expected and Unexpected Losses 327 Aggregate Loss Distribution 329 Capital Calculation for Fraud Loss Using Monte Carlo Simulation 331 An Economic Perspective on Fraud Analytics 334 Total Cost of Ownership 334 Return on Investment 335 In Versus Outsourcing 337 Modeling Extensions 338 Forecasting 338 Text Analytics 340 The Internet of Things 342 Corporate Fraud Governance 344 References 346 About the Authors 347 Index 349