Dan French Founder & CEO, Consider Solutions
CONSIDER SOLUTIONS Mission Solutions for World Class Finance Footprint Financial Control & Compliance Risk Assurance Process Optimization
CLIENTS
CONTEXT The typical organization loses the equivalent of 5% of its revenues to fraud & waste each year Source: Global Economic Crime Survey; PwC
AGENDA Introduction Challenge for Information Systems Audit & Assurance The Role of Controls & Risk Monitoring (Data Analytics) Machine Learning The Next Generation Evolution The Future of Controls & Audit Roles? Q&A
CHALLENGE FOR IS AUDIT & ASSURANCE
THE STANDARDISATION & CONTROL MYTH We invest heavily in ERP implementation to drive: Process standardisation Business efficiency Economies of scale However, only some of the value gets released Businesses implement standard systems and achieve A standard data input process NOT A standard business process
ERP ENABLED STANDARDISATION EXAMPLE ERP is configured to only allow GRN if PO exists, however Truck drops off shipment, but no PO exists Warehouse calls up Purchasing to create a PO Purchasing creates PO for Shipment GRN is created against PO First time match KPI looks good despite process breakdown!
DATA ANALYTICS IDENTIFY & PREDICT EXCEPTIONS
BUSINESS PERFORMANCE AND RISK MANAGEMENT Two sides of the same coin For example: Risk KRI Credit check Payment terms Delivery quantity and quality Performance KPI DSO Exceptions provide a roadmap for diagnosis and improvement
DATA ANALYTICS IDENTIFY EXCEPTIONS Purchase to Pay Order to Cash Duplicate Payments Retrospective POs Changing payment terms Same Bank Account usage Fixed Assets Inappropriate asset depreciation periods Misclassified capital equipment Financial Close Postings into prior closed periods Manual payments Price Changes Undelivered orders Exceptional customer credits/returns Payment terms Travel Expenses Duplicate claims Suspicious claims Ineligible items claims Repeating amounts Trading Relationships OFAC restrictions Sunshine Act disclosures
DATA ANALYTICS WHAT WE HAVE LEARNED SO FAR Current approaches are powerful but not sufficiently effective: Programmatic need to know the rules for known anomalies Yes / no red flag logic High proportion of false positives Periodic data sampling Inability to ask complex questions of the data Little or no context to the results Susceptible to human bias and error Need for cross-discipline business / technical skills Average detection time is too long (if detected at all) High level of effort and investment required to implement & sustain exception analytics There is a big gap between average and best practice Best practice is expensive in current paradigm
RESEARCH Guiding principles are to identify techniques that will provide Precision Complex questions to significantly reduce false positives Less reliance on human interpretation Discover previously unknown anomalies Timeliness Fast time to detection after initial occurrence Speed of analysis Usability Eliminate need for specialist / on-going scripting or programming skills Transparency of results easy to understand what you have Efficiency Radically cheaper approach to democratise analytics Radically faster processing on cheap cloud computing
RESEARCH NEW TECHNIQUES Artificial Intelligence Machine Learning Instance Based learning K-Star Baysian Learning Naive Bayse Baysian Network Functions Support Vector Machines (SVM) Time Series Analysis Kalman Filter Peer Group Analysis (PGA) Decision Tree Random Forest Deep Learning Recurrent Neural Network (RNN) Feed Forward Neural Network (FFNN)
MACHINE LEARNING: UNSUPERVISED APPROACH Unsupervised learning can be used to model normal behaviour and discover anomalies. When several of these anomalies occur in the same area, it may be grounds for suspicion. Supplier with unusually sporadic payments Payments always processed at end of day By user who normally deals with one time suppliers Flag for further investigation
MACHINE LEARNING: SUPERVISED APPROACH Supervised learning can be used to label and classify known exceptions for certain fraud schemes and map these scheme models to new data and infer / predict new exceptions. Scheme A Scheme C Scheme B ID 720424-720425 - Fraud Scheme 720426 - Database of new transactions Classifier 720427-720428 C 720429-720430 -
DEEP LEARNING - COMPREHENSION Raw pixels Abstraction
DEEP LEARNING: RECURRENT NEURAL NETWORKS Deep learning method which learns sequentially Can be used to comprehend audio, text, video or predict time series Promising initial results using for prediction of sequential data for outlier detection. Best outlier detector tested Given the complete works of Shakespeare, an RNN can be trained to predict characters & words in a sequence Shakespeare generator
RNN: SHAKESPEARE This was generated a character at a time. It shows the network has: Learned how to put characters together to make (Shakespearian) English Learned simple grammar Learned the structure of how plays are written
RNN: UNCHARACTERISTIC INVOICES The RNN ingests a sequence of invoices for a specific vendor Develops a model about what the next invoice will look like given: What it has learned about invoices in general What it has learned about this vendor specifically By comparing the RNNs models to the actual next invoice we can flag invoices which are uncharacteristic for this vendor. Comparison Vendor X RNN
EXAMPLE #1 FRAUDULENT INVOICING The perpetrator submitted fictitious invoices from a real supplier, but changed the bank account to be their own. These invoices were processed alongside genuine invoices paid to that company. The deception was not detected by conventional methods and only came to light when the perpetrators bank notified authorities because of unusually high value transactions passing through the account. Based on this, our research modelled a scheme to look for a small increase in transactions per month which coincided with a change in bank account details based on a data set of 50,098 invoices
EXAMPLE #1 FRAUDULENT INVOICING In isolation payment to different bank accounts are not a significant indicator:
EXAMPLE #1 FRAUDULENT INVOICING Varying invoice amounts are also not significant:
EXAMPLE #1 FRAUDULENT INVOICING The actual anomalous data is unremarkable:
EXAMPLE #1 FRAUDULENT INVOICING Using time series anomaly detection with the relevant attributes, the false invoices scored very highly compared to all other invoices and were easily detected 7 invoices from a data set of 50098, detection occurring 4 months after the first invoice Also significant was that no false positives were identified
DEEP LEARNING: FEED FORWARD NEURAL NETWORK (FFNN) Used for classification and regression on static data Classification of policy based schemes Effective at predicting expense claim fraud
EXAMPLE #2 UK MP'S EXPENSE CLAIMS UK MPs Expense Claims were analysed using Machine Learning and Classification technology with respect to: Expense Date, Category, Type, Cost, Description and Individual MPs expense history compared to average expense cost per category Trained on MP Expense Claims 2010 2013 Positive labels coming from the Legg report 677,066 claimed expense items 3,268 repaid expense items Analysed MP Expense Claims 2013 present 77,065 claimed expense items 206 repaid expense items (Legg Report)
ALL CLAIMED EXPENSES IN GREEN REPAYMENTS IN RED = NEEDLE IN A HAYSTACK
REPAYMENTS HIGHLIGHTED
THRESHOLD > 15% REPAYMENT LIKELIHOOD
THRESHOLD > 25% PAYMENT LIKELIHOOD
THRESHOLD > 40% REPAYMENT LIKELIHOOD
COMPARISON OF REPAYMENTS AND REPAYMENT PREDICTION OF A SPECIFIC MP OVER TIME
MACHINE LEARNING APPROACH Subject domains organised as Themes & Schemes A multi-layered hierarchical process to create features that are interpreted by a machine learning engine: Feature creation discovery of relationships between features and composite relationship inferences Behaviour profiles for example how a certain organisation / person completes a document Smart feature-based rules Automated feedback for supervised classifiers to act in ensemble with their unsupervised cousins Low cost, high performance computing
Feedback MACHINE LEARNING APPROACH Source Data Data Abstraction Feature Creation Machine Generated - Pattern Recognition, Behaviour Profiling, Time Series, Peer Group,... Domain Expertise Conventional indicators Classification Anomaly Detection Engine (ADE) Supervised Deep Learning, Neural Network, Support Vector Machines,... Unsupervised Feature Based Smart Rules Intelligent Scoring Algorithm Results
CURRENT RESEARCH P2P/AP Based on a Risk Data Matrix, analyse and risk rate the data using an ensemble of the latest artificial intelligence and machine learning techniques in concert with some traditional red flag indicators. For example: Complex multi dimensional analysis across business process data Changes in behaviour of people entering invoices / payments Changes in patterns of invoices / payments over time Dissimilarity of invoices submitted by same vendor Dissimilarity of payments made to same vendor Unusual invoiced items and quantities based on previous history Unusual expense spending patterns Unusual variances for an expense item Validation against external data sources
THEMES AND SCHEMES Vendors Duplicate Exact & Fuzzy Dormant 12, 24, 36 months Sanction List Vendor activity with no existing vendor master data Invoices Duplicate Exact & Near Match Top 10 Invoice Activity Payments Duplicate Unusual bank accounts and cross-vendor duplicates Payments to Vendors are period of inactivity Invoice-Payment period outliers
EARLY RESEARCH RESULTS
EARLY RESEARCH RESULTS
EARLY RESEARCH RESULTS
EARLY RESEARCH RESULTS
EVOLUTION INEVITABLE, INEXORABLE Systematic exception monitoring Machine learning analytics Ad hoc exception assessments Spreadsheet based analysis Manual by eye sampling
FUTURE OF CONTROLS & AUDIT ROLES? Still early days but... Less Separation between IT & Business focus? Understanding answers vs framing questions? Data Science opportunity Increasing focus on genuine business value Risk -> Diagnosis -> Root Cause Analysis -> Improvement
THE FUTURE OF CONTROL & AUDIT ROLES? BUSINESS PERFORMANCE & RISK MANAGEMENT Business Assurance Two sides of the same coin For example Risk KRI Credit check Payment terms Delivery quantity & quality Performance KPI DSO
REVIEW Introduction Challenge for Information Systems Audit & Assurance The Role of Controls & Risk Monitoring (Data Analytics) Machine Learning The Next Generation Evolution The Future of Controls & Audit Roles? Q&A
DISCUSSION Dan French, Founder & CEO Consider Solutions dfrench@consider.biz Eliminating Error, Waste & Fraud: Data Science advancing World Class Finance www.consider.biz/thinking/ @consider_ations #worldclassfinance