Dan French Founder & CEO, Consider Solutions



Similar documents
Why is Internal Audit so Hard?

The Power of Risk, Compliance & Security Management in SAP S/4HANA

Neil Meikle, Associate Director, Forensic Technology, PwC

Procurement Fraud Identification & Role of Data Mining

ACL WHITEPAPER. Automating Fraud Detection: The Essential Guide. John Verver, CA, CISA, CMC, Vice President, Product Strategy & Alliances

SAS Fraud Framework for Health Care Evolution and Learnings

Using Technology to Automate Fraud Detection Within Key Business Process Areas

The Data Mining Process

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Using Data Analytics to Detect Fraud

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG

ACL EBOOK. Detecting and Preventing Fraud with Data Analytics

Fraud - Consequences of Cutting Edge Solutions

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Data Warehousing and Data Mining in Business Applications

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

AGA Kansas City Chapter Data Analytics & Continuous Monitoring

Profit from Big Data flow. Hospital Revenue Leakage: Minimizing missing charges in hospital systems

Data Mining/Fraud Detection. April 28, 2014 Jonathan Meyer, CPA KPMG, LLP

Bayesian networks - Time-series models - Apache Spark & Scala

Credit Card Fraud Detection Using Self Organised Map

Machine Learning: Overview

Chapter 6. The stacking ensemble approach

Using Analytics to detect and prevent Healthcare fraud. Copyright 2010 SAS Institute Inc. All rights reserved.

Fighting Fraud with Data Mining & Analysis

Data Mining. Nonlinear Classification

The Cyber Threat Profiler

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

Azure Machine Learning, SQL Data Mining and R

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Machine Learning with MATLAB David Willingham Application Engineer

ASSUMING A STATE OF COMPROMISE: EFFECTIVE DETECTION OF SECURITY BREACHES

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Anomaly detection. Problem motivation. Machine Learning

Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Using Predictive Analytics to Detect Fraudulent Claims

A Cloud Based Solution with IT Convergence for Eliminating Manufacturing Wastes

Plastic Card Fraud Detection using Peer Group analysis

Foundations of Business Intelligence: Databases and Information Management

Introduction to Data Mining

Intrusion Detection via Machine Learning for SCADA System Protection

MS1b Statistical Data Mining

Data Mining: Overview. What is Data Mining?

IBM's Fraud and Abuse, Analytics and Management Solution

Data Analytics For the Restaurant Industry

An effective approach to preventing application fraud. Experian Fraud Analytics

Benefits fraud: Shrink the risk Gain group plan sustainability

Solvency II data requirements Raising the Bar

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Hospital Billing Optimizer: Advanced Analytics Solution to Minimize Hospital Systems Revenue Leakage

ACEDS Membership Benefits Training, Resources and Networking for the E-Discovery Community

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Using Data Mining for Mobile Communication Clustering and Characterization

THE ABC S OF DATA ANALYTICS

Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Fighting Identity Fraud with Data Mining. Groundbreaking means to prevent fraud in identity management solutions

Maschinelles Lernen mit MATLAB

Pentaho Data Mining Last Modified on January 22, 2007

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Best Practices in Duplicate Invoice Detection

Database Marketing, Business Intelligence and Knowledge Discovery

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

LEVERAGING BIG DATA & ANALYTICS TO IMPROVE EFFICIENCY. Bill Franks Chief Analytics Officer Teradata July 2013

8. Machine Learning Applied Artificial Intelligence

An Overview of Knowledge Discovery Database and Data mining Techniques

Machine Learning using MapReduce

Introduction to Business Intelligence

Hybrid Model For Intrusion Detection System Chapke Prajkta P., Raut A. B.

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

PROCURE-TO-PAY TRANSFORMATION FOR CFOs. Achieving Control, Visibility & Cost Savings.

Fraud Workshop Finding the truth in the transactions

Knowledge Discovery and Data Mining

INTERNATIONAL STANDARD ON AUDITING (UK AND IRELAND) 240 THE AUDITOR S RESPONSIBILITIES RELATING TO FRAUD IN AN AUDIT OF FINANCIAL STATEMENTS

Unsupervised Outlier Detection in Time Series Data

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Centralized da Audit Selection and Audit Case Management

Business Intelligence and Decision Support Systems

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Some Research Challenges for Big Data Analytics of Intelligent Security

Emerging Trends in Fighting Spam

Chapter 4 Getting Started with Business Intelligence

Smarter Analytics Leadership Summit Content Review

Big Data Text Mining and Visualization. Anton Heijs

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

E-commerce Transaction Anomaly Classification

An Auditor s Guide to Data Analytics

Fraud Prevention, Detection and Response. Dean Bunch, Ernst & Young Fraud Investigation & Dispute Services

A Review of Data Mining Techniques

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Transcription:

Dan French Founder & CEO, Consider Solutions

CONSIDER SOLUTIONS Mission Solutions for World Class Finance Footprint Financial Control & Compliance Risk Assurance Process Optimization

CLIENTS

CONTEXT The typical organization loses the equivalent of 5% of its revenues to fraud & waste each year Source: Global Economic Crime Survey; PwC

AGENDA Introduction Challenge for Information Systems Audit & Assurance The Role of Controls & Risk Monitoring (Data Analytics) Machine Learning The Next Generation Evolution The Future of Controls & Audit Roles? Q&A

CHALLENGE FOR IS AUDIT & ASSURANCE

THE STANDARDISATION & CONTROL MYTH We invest heavily in ERP implementation to drive: Process standardisation Business efficiency Economies of scale However, only some of the value gets released Businesses implement standard systems and achieve A standard data input process NOT A standard business process

ERP ENABLED STANDARDISATION EXAMPLE ERP is configured to only allow GRN if PO exists, however Truck drops off shipment, but no PO exists Warehouse calls up Purchasing to create a PO Purchasing creates PO for Shipment GRN is created against PO First time match KPI looks good despite process breakdown!

DATA ANALYTICS IDENTIFY & PREDICT EXCEPTIONS

BUSINESS PERFORMANCE AND RISK MANAGEMENT Two sides of the same coin For example: Risk KRI Credit check Payment terms Delivery quantity and quality Performance KPI DSO Exceptions provide a roadmap for diagnosis and improvement

DATA ANALYTICS IDENTIFY EXCEPTIONS Purchase to Pay Order to Cash Duplicate Payments Retrospective POs Changing payment terms Same Bank Account usage Fixed Assets Inappropriate asset depreciation periods Misclassified capital equipment Financial Close Postings into prior closed periods Manual payments Price Changes Undelivered orders Exceptional customer credits/returns Payment terms Travel Expenses Duplicate claims Suspicious claims Ineligible items claims Repeating amounts Trading Relationships OFAC restrictions Sunshine Act disclosures

DATA ANALYTICS WHAT WE HAVE LEARNED SO FAR Current approaches are powerful but not sufficiently effective: Programmatic need to know the rules for known anomalies Yes / no red flag logic High proportion of false positives Periodic data sampling Inability to ask complex questions of the data Little or no context to the results Susceptible to human bias and error Need for cross-discipline business / technical skills Average detection time is too long (if detected at all) High level of effort and investment required to implement & sustain exception analytics There is a big gap between average and best practice Best practice is expensive in current paradigm

RESEARCH Guiding principles are to identify techniques that will provide Precision Complex questions to significantly reduce false positives Less reliance on human interpretation Discover previously unknown anomalies Timeliness Fast time to detection after initial occurrence Speed of analysis Usability Eliminate need for specialist / on-going scripting or programming skills Transparency of results easy to understand what you have Efficiency Radically cheaper approach to democratise analytics Radically faster processing on cheap cloud computing

RESEARCH NEW TECHNIQUES Artificial Intelligence Machine Learning Instance Based learning K-Star Baysian Learning Naive Bayse Baysian Network Functions Support Vector Machines (SVM) Time Series Analysis Kalman Filter Peer Group Analysis (PGA) Decision Tree Random Forest Deep Learning Recurrent Neural Network (RNN) Feed Forward Neural Network (FFNN)

MACHINE LEARNING: UNSUPERVISED APPROACH Unsupervised learning can be used to model normal behaviour and discover anomalies. When several of these anomalies occur in the same area, it may be grounds for suspicion. Supplier with unusually sporadic payments Payments always processed at end of day By user who normally deals with one time suppliers Flag for further investigation

MACHINE LEARNING: SUPERVISED APPROACH Supervised learning can be used to label and classify known exceptions for certain fraud schemes and map these scheme models to new data and infer / predict new exceptions. Scheme A Scheme C Scheme B ID 720424-720425 - Fraud Scheme 720426 - Database of new transactions Classifier 720427-720428 C 720429-720430 -

DEEP LEARNING - COMPREHENSION Raw pixels Abstraction

DEEP LEARNING: RECURRENT NEURAL NETWORKS Deep learning method which learns sequentially Can be used to comprehend audio, text, video or predict time series Promising initial results using for prediction of sequential data for outlier detection. Best outlier detector tested Given the complete works of Shakespeare, an RNN can be trained to predict characters & words in a sequence Shakespeare generator

RNN: SHAKESPEARE This was generated a character at a time. It shows the network has: Learned how to put characters together to make (Shakespearian) English Learned simple grammar Learned the structure of how plays are written

RNN: UNCHARACTERISTIC INVOICES The RNN ingests a sequence of invoices for a specific vendor Develops a model about what the next invoice will look like given: What it has learned about invoices in general What it has learned about this vendor specifically By comparing the RNNs models to the actual next invoice we can flag invoices which are uncharacteristic for this vendor. Comparison Vendor X RNN

EXAMPLE #1 FRAUDULENT INVOICING The perpetrator submitted fictitious invoices from a real supplier, but changed the bank account to be their own. These invoices were processed alongside genuine invoices paid to that company. The deception was not detected by conventional methods and only came to light when the perpetrators bank notified authorities because of unusually high value transactions passing through the account. Based on this, our research modelled a scheme to look for a small increase in transactions per month which coincided with a change in bank account details based on a data set of 50,098 invoices

EXAMPLE #1 FRAUDULENT INVOICING In isolation payment to different bank accounts are not a significant indicator:

EXAMPLE #1 FRAUDULENT INVOICING Varying invoice amounts are also not significant:

EXAMPLE #1 FRAUDULENT INVOICING The actual anomalous data is unremarkable:

EXAMPLE #1 FRAUDULENT INVOICING Using time series anomaly detection with the relevant attributes, the false invoices scored very highly compared to all other invoices and were easily detected 7 invoices from a data set of 50098, detection occurring 4 months after the first invoice Also significant was that no false positives were identified

DEEP LEARNING: FEED FORWARD NEURAL NETWORK (FFNN) Used for classification and regression on static data Classification of policy based schemes Effective at predicting expense claim fraud

EXAMPLE #2 UK MP'S EXPENSE CLAIMS UK MPs Expense Claims were analysed using Machine Learning and Classification technology with respect to: Expense Date, Category, Type, Cost, Description and Individual MPs expense history compared to average expense cost per category Trained on MP Expense Claims 2010 2013 Positive labels coming from the Legg report 677,066 claimed expense items 3,268 repaid expense items Analysed MP Expense Claims 2013 present 77,065 claimed expense items 206 repaid expense items (Legg Report)

ALL CLAIMED EXPENSES IN GREEN REPAYMENTS IN RED = NEEDLE IN A HAYSTACK

REPAYMENTS HIGHLIGHTED

THRESHOLD > 15% REPAYMENT LIKELIHOOD

THRESHOLD > 25% PAYMENT LIKELIHOOD

THRESHOLD > 40% REPAYMENT LIKELIHOOD

COMPARISON OF REPAYMENTS AND REPAYMENT PREDICTION OF A SPECIFIC MP OVER TIME

MACHINE LEARNING APPROACH Subject domains organised as Themes & Schemes A multi-layered hierarchical process to create features that are interpreted by a machine learning engine: Feature creation discovery of relationships between features and composite relationship inferences Behaviour profiles for example how a certain organisation / person completes a document Smart feature-based rules Automated feedback for supervised classifiers to act in ensemble with their unsupervised cousins Low cost, high performance computing

Feedback MACHINE LEARNING APPROACH Source Data Data Abstraction Feature Creation Machine Generated - Pattern Recognition, Behaviour Profiling, Time Series, Peer Group,... Domain Expertise Conventional indicators Classification Anomaly Detection Engine (ADE) Supervised Deep Learning, Neural Network, Support Vector Machines,... Unsupervised Feature Based Smart Rules Intelligent Scoring Algorithm Results

CURRENT RESEARCH P2P/AP Based on a Risk Data Matrix, analyse and risk rate the data using an ensemble of the latest artificial intelligence and machine learning techniques in concert with some traditional red flag indicators. For example: Complex multi dimensional analysis across business process data Changes in behaviour of people entering invoices / payments Changes in patterns of invoices / payments over time Dissimilarity of invoices submitted by same vendor Dissimilarity of payments made to same vendor Unusual invoiced items and quantities based on previous history Unusual expense spending patterns Unusual variances for an expense item Validation against external data sources

THEMES AND SCHEMES Vendors Duplicate Exact & Fuzzy Dormant 12, 24, 36 months Sanction List Vendor activity with no existing vendor master data Invoices Duplicate Exact & Near Match Top 10 Invoice Activity Payments Duplicate Unusual bank accounts and cross-vendor duplicates Payments to Vendors are period of inactivity Invoice-Payment period outliers

EARLY RESEARCH RESULTS

EARLY RESEARCH RESULTS

EARLY RESEARCH RESULTS

EARLY RESEARCH RESULTS

EVOLUTION INEVITABLE, INEXORABLE Systematic exception monitoring Machine learning analytics Ad hoc exception assessments Spreadsheet based analysis Manual by eye sampling

FUTURE OF CONTROLS & AUDIT ROLES? Still early days but... Less Separation between IT & Business focus? Understanding answers vs framing questions? Data Science opportunity Increasing focus on genuine business value Risk -> Diagnosis -> Root Cause Analysis -> Improvement

THE FUTURE OF CONTROL & AUDIT ROLES? BUSINESS PERFORMANCE & RISK MANAGEMENT Business Assurance Two sides of the same coin For example Risk KRI Credit check Payment terms Delivery quantity & quality Performance KPI DSO

REVIEW Introduction Challenge for Information Systems Audit & Assurance The Role of Controls & Risk Monitoring (Data Analytics) Machine Learning The Next Generation Evolution The Future of Controls & Audit Roles? Q&A

DISCUSSION Dan French, Founder & CEO Consider Solutions dfrench@consider.biz Eliminating Error, Waste & Fraud: Data Science advancing World Class Finance www.consider.biz/thinking/ @consider_ations #worldclassfinance