Why is Internal Audit so Hard?

Similar documents
Dan French Founder & CEO, Consider Solutions

2/5/2013. Session Objectives. Higher Education Headlines. Getting Started with Data Analytics. Higher Education Headlines.

Using Technology to Automate Fraud Detection Within Key Business Process Areas

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Machine Learning using MapReduce

An Auditor s Guide to Data Analytics

Forensic Audit and Automated Oversight Federal Audit Executive Council September 24, 2009

Qi Liu Rutgers Business School ISACA New York 2013

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

ACL WHITEPAPER. Automating Fraud Detection: The Essential Guide. John Verver, CA, CISA, CMC, Vice President, Product Strategy & Alliances

LEVERAGING BIG DATA & ANALYTICS TO IMPROVE EFFICIENCY. Bill Franks Chief Analytics Officer Teradata July 2013

ANALYTICS CENTER LEARNING PROGRAM

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Data Mining Applications in Higher Education

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Sunnie Chung. Cleveland State University

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Fraud Workshop Finding the truth in the transactions

Big Data and Data Science: Behind the Buzz Words

SURVEY REPORT DATA SCIENCE SOCIETY 2014

The Big Data Paradigm Shift. Insight Through Automation

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

An Introduction to Data Mining

Using Data Mining to Detect Insurance Fraud

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

Making critical connections: predictive analytics in government

Improve Model Accuracy with Unstructured Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Big Data. Fast Forward. Putting data to productive use

Information Management course

Auditing Applications. ISACA Seminar: February 10, 2012

Statistics for BIG data

Advanced In-Database Analytics

Chapter 1. Contrasting traditional and visual analytics approaches

Data Mining + Business Intelligence. Integration, Design and Implementation

Using Data Analytics to Detect Fraud

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Maximizing Return and Minimizing Cost with the Decision Management Systems

MACHINE LEARNING BASICS WITH R

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)

Foundations of Business Intelligence: Databases and Information Management

IPT 2015 Sales & Use Tax Symposium

Advanced Big Data Analytics with R and Hadoop

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Azure Machine Learning, SQL Data Mining and R

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Make Better Decisions Through Predictive Intelligence

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

An Overview of Knowledge Discovery Database and Data mining Techniques

Bayesian networks - Time-series models - Apache Spark & Scala

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

8. Machine Learning Applied Artificial Intelligence

International Journal of Innovative Research in Computer and Communication Engineering

Big Data Executive Survey

Achieve Better Insight and Prediction with Data Mining

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

INTRODUCING AZURE MACHINE LEARNING

SAP SE - Legal Requirements and Requirements

Analytics on Big Data

SIFMA Society: IAS Seminar. A Look at Data Mining from a Business Perspective. Analyzing Data to Increase Audit Efficiency. October 25, 2011.

Fuzzy Matching in Audit Analytics. Grant Brodie, President, Arbutus Software

Data mining life cycle in fraud auditing

Completing an Accounts Payable Audit With ACL (Aired on Feb 15)

Benford s Law and Digital Frequency Analysis

DATAOPT SOLUTIONS. What Is Big Data?

How To Handle Big Data With A Data Scientist

Predictive Analysis Risk Analysis

Framework for Audit Oversight INTERNATIONAL WORKSHOP ON ACCOUNTABILITY IN SCIENCE AND RESEARCH FUNDING JUNE 2 4, 2011

Better planning and forecasting with IBM Predictive Analytics

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Introduction to Data Mining

Achieve Better Insight and Prediction with Data Mining

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

Fraud Detection In Insurance Claims. Bob Biermann April 15, 2013

Text Analytics. A business guide

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

How Big Data is Different

Advanced analytics at your hands

Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2

Data Mining - Evaluation of Classifiers

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

Fighting Fraud with Data Mining & Analysis

Reduce Audit Time Using Automation, By Example. Jay Gohil Senior Manager

Data Mining and Pattern Recognition for Large-Scale Scientific Data

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Web Data Mining: A Case Study. Abstract. Introduction

Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger

2010 Data Miner Survey Highlights

White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission

Transcription:

Why is Internal Audit so Hard? 2 2014

Why is Internal Audit so Hard? 3 2014

Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014

Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets The end of hand calculation 5 2014

2 nd Wave: ERPs ERPs all our data in one place Database analysis Opens the Age of Rules Personal Computers Electronic Spreadsheets The end of hand calculation 6 2014

2 nd Wave Also Opens the Age of CAATs Beginner s CAATs: Basic database manipulation: join, summarize, append, stratify, sample, extract Basic testing: duplicates, gaps Intermediate CAATs: Automate our rules and (limited) automated testing. (for example in purchase-to-pay) o o o o o o o o o P.O. with blank / zero amount Split P.O.s Duplicate invoices Invoice amount paid > goods received Invoices with no matching receiving report Multiple invoices for same P.O. and date Pattern of sequential invoices from a vendor Non-approved vendors Employee and vendor with same: Name, address, bank, etc. 7 2014

3 rd Wave: Predictive Analytics Predictive Analytics focuses our attention on important / suspect transactions. Comes in many different flavors o Each somewhat more sophisticated o Each making audit work more accurate and our lives easier (GTAG 16, 2011, The use of data analysis can significantly reduce audit risk by honing the risk assessment and stratifying the population ) Personal Computers Electronic Spreadsheets The end of hand calculation ERPs data in one place Database analysis Age of Rules Predictive Analytics Sophisticated Statistical Insights True Predictive & Continuous Audit 8 2014

5 Levels of Predictive Analytics 1. Statistical Insights 2. Fuzzy Logic 3. Clustering 4. Predictive Modeling 5. Big Data Analytics 9 2014

Statistical Insights: Benford s Law The most famous name in forensic accounting does not belong to an accountant. In 1938 at the age of 55 he published a paper titled The Law of Anomalous Numbers. Benford s Law is a statement about the occurrence of digits in lists of data. Useful in detecting fraudulent invoices or other numbered documents. 10 2014 Frank Benford (1883-1948), an American physicist.

Benford s Law Distribution of 1 st Digits Benford s Distribution Observed Distribution 11 2014

Which to Investigate? For distributions that appear to be anomalous: 1. Calculate the Kolmogorov- Smirnov distance between the vendor s first digit distribution and the ideal Benford distribution. 2. Investigate those with the largest numerical scores. Benford s Law of first digit distribution follows a logarithmic pattern and applies to a large number of surprising datasets including country populations, Twitter users by follower count and many more. See testingbenefordslaw.com for more examples. Kolmogorov-Smirnov distance is the absolute value of the greatest distance between the cumulative distribution functions (CDF). Source: Graph: Pivotal, Inc., Machine Learning for Forensic Accounting, 2013 12 2014

Fuzzy Logic Duplicate Invoice Detection Problem: Deterministic rules expect key information to be exactly the same. Vendor name Address Phone Invoice amount Date Bank account TIN If the criteria is kept tight: Too many false negatives missed duplicates. If the criteria is made loose: Too many false positives result in too many items to investigate. 13 2014

Fuzzy Matching Using Natural Language Processing Vendors are considered close matches when: Vendor names Remit vendor Address & Phone Other text-based of your choosing are identical or sufficiently similar 14 2014 Steps in Natural Language Processing (NLP) 1. Tokenize the vendor names 2. Remove stop words and special characters (of, and, the, ) 3. Process synonyms and abbreviations. 4. Calculate the tf-idfs for each word (term frequency inverse document frequency) 5. Calculate the cosine similarity between documents to identify close matches

Fuzzy Matching in Numerical Strings Numerical Values (strings) are considered close when: Invoice IDs Edit distance is small Dates Are the same Are within 7 days of each other Are inversed (3/11/14 vs 11/3/14) Payments Amounts are identical Edit distances are small TINS, Bank Accounts, Other Numerics Edit distances are small Substitutions Additions Deletions Transposes Edit Distance calculated with the Damerau-Levenschtein value 15 2014

Fuzzy Matching Using as many features of the invoice as desired o Not limited to 3 dimensions 1. Determine the best distance metric for each dimension o o Some are text-based Others numerical strings 2. Calculate the distance between invoices 3. Adjust the measurement values to yield the best true positive result 4. Investigate any pair of invoices where the distance is within your threshold 16 2014

Clustering Identify Invoice Anomalies with Vendor Baselining Vendors will tend to have patterns in their billing but may have more than one pattern based on service, ordering business unit, specific users, delivery address, etc. There may be multiple normal behaviors. Identify the true outliers for investigation by: Payments ~$1,000 to $5,000 Bus Unit: Bldg Maintenance Users: Loc 1, Loc 2, Loc 3 Paid by ACH To address ABC Payments <$700 Bus Unit: Security Users: Loc Z Paid by check To address GHI Featurizing the invoices (see fuzzy logic) Run a clustering algorithm such as K-Means Identify clusters with low populations and low density as potential anomalies. Vendor A Payments >$100,000 Bus Unit: Construction Users: Loc 4 Paid by ACH To address DEF 17 2014

Predictive Modeling: Time Travel in the 21 st Century 18 2014

Type 1: Prediction by Scoring ML continuously monitors and scores from 1 to 100 examine only the high scoring items. Your Financial System Future You Do this once - ML learns what is FWA Examine lots of possible FWA invoices every month Machine Learning System Current You 19 2014

Type 2: Prediction by Actual Value Example from Insurance $ Premium SIC code # employees Address $ Sales N 1 N 100 Claim File N 1 N 100 Machine Learning System Historical data from many sources is combined to train the ML System to predict the correct $ premium Predicted Premium Actual Premium Paid variance $ 10,254 $ 9,946-3% $ 25,687 $ 26,971 5% $ 5,621 $ 5,452-3% $ 96,321 $ 98,247 2% $ 85,741 $ 72,880-18% Investigate the outliers 20 2014 Accuracy can be very high in the range of 90% to 98% based on historical data used.

So What is a Machine Learning System? ML Mathematical Cores Regression K-Means Bayesian Classifiers Decision Trees CART / CHAID Support Vector Machines Artificial Neural Nets (ANN) Genetic Programs Systems (very partial list) Advanced CAATS Pivotal Oversight (as a service) EMC Proprietary General Purpose SAS IBM SPSS RapidMiner Open Source Do It Yourself PSPP Weka R Python 21 2014

4th Wave: Big Data Analytics Big Data Analytics o Addresses new concerns regarding social media and other risks from text and image based sources. o Continues to improve the accuracy of predictive analytics further reducing false positives and false negatives. o Allows true continuous audit of even the largest enterprises as computation costs drop to fractions of previous investments. Personal Computers Electronic Spreadsheets The end of hand calculation ERPs data in one place Database analysis Age of Rules Predictive Analytics Statistical Insights True Predictive & Continuous Audit 22 2014

Got Big Data? Volume High Terabytes or Petabytes Very long retrieval and processing times Variety Structured Unstructured Semistructured All at once Velocity Batch Near time Real Time Streams 23 2014

It s Really About Big Data Technology Search & Retrieve The database Source: EMC 24 2014

What are Big Data Analytics? 1 st The haystack gets a lot bigger Traditional structured data Unstructured data o Documents o Email o Web content o Social Media 2 nd Thanks to Hadoop and Massive Parallel Processing Query and retrieval times are short Cost of even massive storage is very low 3 rd Many predictive modeling techniques can also be applied to structured and unstructured data Models become more accurate 4 th New techniques for unstructured data based on NLP Sentiment analysis 25 2014

Focus on Social Media Risks* *Risk also arises from other types of unstructured and semi-structured data: Email Internal documents Images stored centrally or on users machines 26 2014

Social Media Risks 7.3 6.9 6.6 6.1 5.6 5.5 4.9 4.9 4.0 2.9 0 1 2 3 4 5 6 7 They gave me financial aid then I cancelled all my classes and kept the money Sit in at the Chancellor s Office at 3:00 Joe sold me the answers to tomorrow s test Can t believe how much I made on ebay today I ll fix them. I put a virus on the lab computer. Professor X is such a perv The instructor said I could make money after school fixing cars in the auto shop I just downloaded a bunch of student financial data from the finance system I found out they re cutting my budget. I m going to the union before this gets out Did you hear we re losing accreditation. Don t sign up next term. 27 2014 Source: 2014 Internal Audit Capabilities and Needs Survey Report, Protiviti

You Don t Need to be a Data Scientist, Just a Smart Tool User The Age of Smart CAATs Personal Computers Electronic Spreadsheets The end of hand calculation ERPs data in one place Database analysis Age of Rules Predictive Analytics Statistical Insights True Predictive & Continuous Audit Social media, text, image Improved accuracy Cost effective continuous audit 28 2014

Questions Contact Information Bill Vorhies President & Chief Data Scientist Data-Magnum Bill@Data-Magnum.com www.data-magnum.com 818.257.2035 I shall find a way or make one. Admiral Robert Peary Big Data & Predictive Analytics 29 2014