The Application of Exploratory Data Analysis (EDA) in Auditing



Similar documents
Why do statisticians "hate" us?

Appendix B Checklist for the Empirical Cycle

How is Big Data Different? A Paradigm Shift

Data, Measurements, Features

Now we begin our discussion of exploratory data analysis.

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

Data Mining Classification: Decision Trees

Data Warehousing and Data Mining for improvement of Customs Administration in India. Lessons learnt overseas for implementation in India

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Data Mining for Fun and Profit

Introduction. A. Bellaachia Page: 1

A Statistical Text Mining Method for Patent Analysis

Data Mining: Overview. What is Data Mining?

Data Warehouse design

Qi Liu Rutgers Business School ISACA New York 2013

Brillig Systems Making Projects Successful

Application in Predictive Analytics. FirstName LastName. Northwestern University

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Machine Learning in Hospital Billing Management. 1. George Mason University 2. INOVA Health System

Chapter ML:XI. XI. Cluster Analysis

Dr. Tarek A. Tutunji Philadelphia University, Jordan. Engineering Skills, Philadelphia University

Database Marketing, Business Intelligence and Knowledge Discovery

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Classification and Prediction

Exploratory Data Analysis

Advanced Analytics for Audit Case Selection

By: Omar AL-Rawajfah, RN, PhD

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

DESCRIPTIVE RESEARCH DESIGNS

Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics

The SAS Analytics Shootout Annual Student Competition

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

ADVANCED DATA VISUALIZATION

Azure Machine Learning, SQL Data Mining and R

PERCEIVED VALUE OF BENEFITS FOR PROJECT MANAGERS COMPENSATION. Răzvan NISTOR 1 Ioana BELEIU 2 Marius RADU 3

Building A Smart Academic Advising System Using Association Rule Mining

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Using Data Mining for Mobile Communication Clustering and Characterization

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

TIETS34 Seminar: Data Mining on Biometric identification

Data Mining/Fraud Detection. April 28, 2014 Jonathan Meyer, CPA KPMG, LLP

Data Mining for Knowledge Management. Classification

Screen Design : Navigation, Windows, Controls, Text,

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

College of Health and Human Services. Fall Syllabus

Analysis Issues II. Mary Foulkes, PhD Johns Hopkins University

Audit Data Analytics. Bob Dohrer, IAASB Member and Working Group Chair. Miklos Vasarhelyi Phillip McCollough

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Preprocessing Web Logs for Web Intrusion Detection

Umbrella & Excess Liability - Understanding & Quantifying Price Movement

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Protein Protein Interaction Networks

Implementation Date Fall Marketing Pathways

Distributed forests for MapReduce-based machine learning

Research Methods: Qualitative Approach

2014/02/13 Sphinx Lunch

Statistics Review PSY379

Decision Trees from large Databases: SLIQ

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Testing Multiple Secondary Endpoints in Confirmatory Comparative Studies -- A Regulatory Perspective

T Non-discriminatory Machine Learning

Data Mining and Machine Learning in Bioinformatics

PSPSOHS602A Develop OHS information and data analysis and reporting and recording processes

A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, HIKARI Ltd,

Exploratory Data Analysis

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Asset/liability Management for Universal Life. Grant Paulsen Rimcon Inc. November 15, 2001

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Application of Predictive Analytics to Higher Degree Research Course Completion Times

Qualitative vs Quantitative research & Multilevel methods

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

>

PSYCHOLOGY PROGRAM LEARNING GOALS AND OUTCOMES BY COURSE LISTING

DATA IN A RESEARCH. Tran Thi Ut, FEC/HSU October 10, 2013

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Regulatory Pathways for Licensure and Use of Ebola Virus Vaccines During the Current Outbreak FDA Perspective

How To Perform An Ensemble Analysis

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Data Driven Security Framework to Success

Sampling Biases in IP Topology Measurements

A Multitier Fraud Analytics and Detection Approach

Data Mining: Foundation, Techniques and Applications

Lecture 1: Review and Exploratory Data Analysis (EDA)

3. Dataset size reduction. 4. BGP-4 patterns. Detection of inter-domain routing problems using BGP-4 protocol patterns P.A.

Transcription:

The Application of Exploratory Data Analysis (EDA) in Auditing Qi Liu Ph.D. Candidate (A.B.D.) Dept. of Accounting & Information Systems Rutgers University 28 th WCARS November 9, 2013

Outline Introduction An overview of EDA concept EDA in Auditing An application of EDA in auditing A credit card retention case Future Research 2

Introduction Motivation Audit is a data intensive process; data analysis plays an important role in audit process. Current data analysis approaches used in auditing process focus on validating predefined audit objectives, which can not discover unaware risks from the data. EDA is often linked to detective work, and one of its objectives is to identify outliers. Even though some EDA techniques have been used in some auditing procedures, EDA has never been systematically employed in auditing. Contribution This research contributes to the auditing literature by taking the first cut to use exploratory data analysis in auditing and illustrate a real-world application in audit process. 3

Definition of EDA Exploratory data analysis (EDA) is a data analysis approach emphasizing on pattern recognition and hypothesis generation. 4

EDA vs CDA Confirmatory Data Analysis (CDA) is a widely used data analysis approach emphasizing on experimental design, significance testing, estimation, and prediction (Good, 1983). Exploratory Data Analysis (EDA) Confirmatory Data Analysis (CDA) Reasoning Type Inductive Deductive Goal Applied Data Techniques Pattern Recognition and Hypothesis generation Observation Data (data collected without well-defined hypothesis) Descriptive Statistics, Data Visualization, Clustering Analysis, Process Mining Advantages No assumptions required Promotes deeper understanding of the data Disadvantages No conclusive answers Difficult to avoid bias produced by overfitting Estimation, Modeling, Hypothesis testing Experimental data (data collected through formally designed experiments) Traditional statistical techniques of inference, significance, and confidence Precise Well-established theory and methods Required unrealistic assumptions Difficult to notice unexpected results 5

Current Applications of EDA Since 1980s, EDA has been applied to diversified disciplines such as interior design, marketing, industrial engineering, and geography (Chen et al., 2011; Nayaka and Yano, 2010; Koschat and Sabavala, 1994; Wesley et al., 2006; De Mast and Trip, 2007, 2009). A framework to apply EDA in practical problem solving issues include: (1) display the data; (2) identify salient features; (3) interpret salient features (De Mast and Kemper, 2009). 6

Framework to apply EDA in auditing Step 1 Display the distribution of related fields Step 2 Identify salient features from the distribution Step 3 Perform CDA to test possible explanations Step 4 Identify suspicious cases Step 5 Explore the causes of abnormal cases Step 6 Perform CDA to confirm the relationship Step 7 Report the risks and recommend improvement suggestions 7 Add a new audit objective

Purpose Credit Card Retention Case Demonstrate the benefits of applying EDA in audit process Provide a real example to support the proposed guidelines Scenario: Clients call the bank asking for a reduction of their card fees. Bank representatives offer discounts to clients to retain their accounts. Objectives: identify the situations of loss of revenue in the negotiation of fees caused by bank representatives, as b: bank representatives offer higher discounts than allowed bank representatives usually offer the highest allowable discounts without putting enough efforts to negotiate lower discounts bank representatives offer discounts without any negotiation with the clients 8

Data Description Data (Retention Dataset) Each record represents a customer call 195,694 records 162 fields Time frame: January, 2012 Selected Attributes Original fee (VLR_ANUIDADE_G) Actual fee (_Valor da Anuidade de Saída) Agent identification (Funcional do Agente) Supervisor identification (Funcional do Supervisor) Location of the customer service center (Polo de Atendimento) Call duration (Tempo de Atendimento de Retenti) 9

Data Preprocess Discount Calculation Methodology Applied EDA techniques Descriptive Statistics Data Visualization Data Transformation 10

Discount Results Analysis (1/8) Policy-violating bank representatives and negative discounts Bank policy allows bank representatives to offer discounts up to 100% of the annuity to retain the customer >100% 90%-100% 80%-90% 70%-80% 60%-70% 50%-60% 40%-50% 30%-40% 20%-30% 10%-20% 0-10% <0 0% 1.21% 0.69% 0.59% 0.15% 5.71% 4.60% Descriptive statistics of discounts 11.51% 15.14% 13.39% 15.66% 31.34% 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% Frequency distribution of discounts 11

Results Analysis (2/8) Policy-violating bank representatives and negative discounts C o u n t 200 180 160 140 120 100 80 60 40 20 0 27 0 0 0 0 0 7 50 12 0 190 Discount Distribution of negative discounts 12

Results Analysis (3/8) Policy-violating bank representatives and negative discounts Relationships between negative discounts and original and actual fees New Audit Objective: Actual fees are recorded correctly. Original fees reflect the number of cards in an account. 13

861563 909547 906542 905572 906661 952382 905077 905228 906332 912521 904487 912666 999597 900132 900722 906704 985641 870987 952390 908397 908914 901927 905355 909339 900078 911786 913974 911927 907964 911585 908586 909602 923951 997981 907972 909012 921708 915361 950488 904699 990559 998099 911887 911854 905720 907871 914912 911906 908585 Results Analysis (4/8) Effortless bank representatives and inactive representatives Bank representatives who always offer 100% discounts should be considered not putting enough effort to negotiate with the clients for a lower discount. F r e q u e n c y 700 600 500 400 300 200 100 0 Bank Representatitves ID Frequency in total Frequency in 100% discount subset Distribution of bank representatives offered 100% discounts in the whole retention data and the 100% discount subset 14

Results Analysis (5/8) Effortless bank representatives and inactive representatives Descriptive statistics of frequency distribution of bank representatives Active representatives 65% 35% Inactive representatives 32% Supervisor 3% Distribution of bank representatives 15

Results Analysis (6/8) Effortless bank representatives and inactive representatives P e r c e n t a g e 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% 25.97% 11.28% Frequency of active representatives Frequency of inactive representatives Comparison of active and inactive representatives on frequency distribution of discounts 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 47.33% 85.95% 12.30% 7.84% 26.47% 13.90% 4.32% 1.89% Sao Paulo Rio dejaneiro Salvador Nao Especificado Distribution of bank representatives Frequency of active representatives Frequency of inactive representatives 16

Results Analysis (7/8) Non-negotiation bank representatives and short calls Bank representatives who offer a discount without negotiation usually related to short call duration Descriptive statistics of call duration Frequency distribution of call duration less than 600 seconds 17

Results Analysis (8/8) Non-negotiation bank representatives and short calls One possible explanation for these unreasonable short calls is that these calls are forced to terminate due to bad network connection. 80 A v e r a g e d i s o c u n t s 75 70 65 60 55 0 45 100 200 300 400 500 600 Call duration Relationship between call duration and discounts New Audit Objective: All the discounts are given in calls long enough to offer discounts. 18

Future research directions Demonstrate the application of EDA in the audit of financial statement related business cycle. Demonstrate the application of EDA in other types of auditing. Extend current framework to continuous auditing environment. Explore the application of other EDA technologies in auditing. Explore the most suitable EDA techniques for each audit procedure. 19

Thank You! 20