Predictive Models for Enhanced Audit Selection: The Texas Audit Scoring System

Similar documents
Improving tax administration with data mining

Data Mining Applications in Higher Education

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Centralized da Audit Selection and Audit Case Management

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

An Introduction to Advanced Analytics and Data Mining

CS590D: Data Mining Chris Clifton

Advanced Analytics for Audit Case Selection

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

Overview, Goals, & Introductions

Banking Analytics Training Program

Treasury Inspector General Tax Administration (TIGTA)

Data Mining: Overview. What is Data Mining?

SAP BusinessObjects Predictive Analysis. Transforming the Future with Insight Today

Data Mining Algorithms Part 1. Dejan Sarka

Using Data Mining to Detect Insurance Fraud

Real World Application and Usage of IBM Advanced Analytics Technology

Nagarjuna College Of

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Chapter 7: Data Mining

Easily Identify the Right Customers

Using Data Mining to Detect Insurance Fraud

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Data Mining for Fun and Profit

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Revenue and Sales Reporting (RASR) Business Intelligence Platform

Azure Machine Learning, SQL Data Mining and R

Optimizing Case Management with Predictive Tax Compliance

Affinity Marketing: Turning Insight Into Profit Through Business Intelligence Visualization

Master of Science in Marketing Analytics (MSMA)

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

Data Mining Techniques in CRM

WHITE PAPER. The Five Fundamentals of a Successful FCR Program

Use Data Strategy and Customer Analytics to Drive Business Decisions. Alison Shaffer August 26, 2010

Texas Comptroller of Public Accounts

PREDICTIVE ANALYTICS IN HIGHER EDUCATION NOVEMBER 6, 2014

Data Mining Applications in Fund Raising

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining

Prerequisites. Course Outline

Inside the SC DOR Data Warehouse Project

Pentaho Data Mining Last Modified on January 22, 2007

Data Mining - The Next Mining Boom?

The Business in Business Intelligence. Bryan Eargle Database Development and Administration IT Services Division

GE Capital The Net Promoter Score: A low-cost, high-impact way to analyze customer voices

11/17/2015. Learning Objectives. What Is Data Mining? Presentation. At the conclusion of this presentation, the learner will be able to:

ISSN: (Online) Volume 3, Issue 7, July 2015 International Journal of Advance Research in Computer Science and Management Studies

AA Automated Attendant is a device connected to voice mail systems that answers and may route incoming calls or inquiries.

Enhancing Sales and Operations Planning with Forecasting Analytics and Business Intelligence WHITE PAPER

From Chaos to Clarity.

AMIS 7640 Data Mining for Business Intelligence

The Role of Feedback Management in Becoming Customer Centric

Predictive Modeling Techniques in Insurance

Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses

Introduction to Data Mining

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

MBA Data Mining & Knowledge Discovery

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Foundations of Business Intelligence: Databases and Information Management

Database Marketing simplified through Data Mining

OFFLINE NETWORK INTRUSION DETECTION: MINING TCPDUMP DATA TO IDENTIFY SUSPICIOUS ACTIVITY KRISTIN R. NAUTA AND FRANK LIEBLE.

Using predictive analytics to maximise the value of charity donors

Building In-Database Predictive Scoring Model: Check Fraud Detection Case Study

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Introduction to Function Points

Predictive Analytics in the Public Sector: Using Data Mining to Assist Better Target Selection for Audit

Maximize Sales and Margins with Comprehensive Customer Analytics

Business Intelligence. Tutorial for Rapid Miner (Advanced Decision Tree and CRISP-DM Model with an example of Market Segmentation*)

The Data Mining Process

Learning Analytics: Targeting Instruction, Curricula and Student Support

Business Intelligence and Decision Support Systems

Database Marketing, Business Intelligence and Knowledge Discovery

Management Decision Making. Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011

Data Mining: An Introduction

from Larson Text By Susan Miertschin

Workforce Planning & Analytics: Advancing Your Organization s Capability

Hurwitz ValuePoint: Predixion

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Revenue en e Solutions, Inc. Top Ten Compliance Programs to Drive Revenue

Easily Identify Your Best Customers

Self-Service Big Data Analytics for Line of Business

Transcription:

Predictive Models for Enhanced Audit Selection: The Texas Audit Scoring System FTA TECHNOLOGY CONFERENCE 2003 Bill Haffey, SPSS Inc. Daniele Micci-Barreca, Elite Analytics LLC Agenda ß Data Mining Overview ß Audit Selection Problem ß Data Mining for Audit Selection Texas Audit Select System ß Predictive Models ß Results ß ß Final Remarks 1

Data Mining Overview Data Mining... is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques. (Gartner Group) Mature, widely accepted in: Commercial (e.g., CRM, credit-scoring, and etail ) Public sector (e.g., payment error detection, law enforcement, logistics, and homeland security) Data Mining Overview Cross Industry Standard Process for Data Mining (CRISP-DM) launched 1996 200 members across vendor, integrator/consultant, and user communities application/tool/vendor neutral focus on business issues, guidance framework, and comfort factor www.crisp-dm.org 2

Data Mining Overview - Modeling Data driven Supervised techniques modeling relationships between known targets and potentially relevant inputs using training data for example, Neural Networks Unsupervised techniques discovering patterns (associations, sequences, segments) for example, Clustering Clusters/groups/segments of similar returns small clusters can be interesting Audit Selection Challenge Problem: Maximizing tax collection and compliance with limited auditing resources. Texas Statistics: Active Sales Tax Taxpayers: 761,434 Available Auditors: 380 Yearly Audits Completed: ~5,200 % of active taxpayers audited yearly: 0.7% 3

Finding the golden needle A small percentage of audits account for a very large proportion of assessments Total assessed tax adjustments from audits: Approximately $90M (2002) 40% from the top 0.5% of completed audits One fifth of resources spent on no-change audits Percentage of no-change audits: 35-40% No-change audits account for 15-20% of hours Audit Selection Strategies Traditional audit selection strategies: Top Contributors Focus on reported tax dollars, large businesses Texas Priority 1 program Prior Audits Focus on prior audit outcome Texas Prior Productive program 4

Why Data Mining? Traditional audit selection criteria typically leverage a single metric: Total tax reported Prior audit results Percent deductions, SIC, Age of the business, etc. Difficult to identify and leverage patterns from audit outcome data: Which metrics are more relevant? Profiling the golden needles to find more of those Multi-Dimensional Approach Data Mining can leverage dozens of taxpayer metrics and characteristics for enhanced audit selection. Gross Sales TAXPAYER PROFILE Deductions SIC Wages Other Tax Types Prior Audits Years in Business Other TP characteristics 5

Leveraging Audit Results Analytical Models Data Mining Closed-Loop Data-Driven Process Audit Selection Audits Texas Audit Select Scoring System Designed for Sales Tax audit selection Based on a predictive model estimating final tax adjustment Score Scaled between 1 and 1000 All active sales tax accounts scored yearly Score presented in Audit Select application In use since mid-2000 6

What is a Predictive Model? A representation of some target function that maps a set of input feature to one or more target variables Input 1 Input 2 Input 3 Input 10 Model Target Inputs: Gross Sales, Deductions, SIC, Prior Audits, Target: Final Tax Adjustment, Change vs. No-Change Predictive models are trained using an historical data set, for which the inputs and the target variable are known. Main Model Inputs Historical Audits Outcome SCORE Training Data Model Sales Tax Filings Other Tax Filings Employment Records Audit History Business Information MODEL INPUTS 7

Modeling Audit Outcomes Tax Filings Taxpayer Information Historical Audits Employment Data Modeling Tools Taxpayer Profile at Audit Gross Sales Deductions SIC Wages Other Tax Prior Audits Years in Business Other TP Audit Outcome ($) Model Scoring Taxpayers Tax Filings Taxpayer Information Current Taxpayer Profile Gross Sales Deductions SIC Wages Other Tax Prior Audits Years in Business Other TP Employment Data Score 0 1000 8

ADS Environment Audit Select Application Data Warehouse Mainframe External Data Sources Taxpayer Scores Data Mining Server Tax Adjustment Vs. Score High Score region (800 to 1000) produce higher than average tax adjustments 9

Finding New Leads Score Distribution Right-tail Detail Over one third of current taxpayers scoring above 750 had no prior audits Score Vs. Priority One Top 9% scoring audits vs. P1Audits (9% of total) Only 36% of top 9% scoring audits were P1 audits 10

Score-Enhanced P1 Selection Strategy Score 0 600 2 State Contribution Ranking 45% to 65% & Score < 600 1 25% of P1 Audits Avg. Hourly Yield = $255 P1 Avg. Hourly Yield = $512 Avg. Hourly Yield = $377 1000 Keep Priority-1 Status 40% 65% 3 New Priority-1 s % of Total Reported Tax Use scores to refine the Priority 1 selection strategy Replace Audits in Region-1 with Audits in Region-3 Score Vs. Prior Productive Average tax adjustment for PP audits: $17,700 (median $4,100) Top-ranking 20% of the audits based on of the Score, the average tax adjustment is $27,000 (median $6,500) 11

Challenges Much more difficult to predict no-change outcome than outcome scale Score relevance/accuracy limited by prior audit selection criteria Selection Bias Problem End-user education: selectors need training to gain confidence in the score Score does not replace human judgment, its just a selection tool Current Developments Score combines two predictive models Audit Likelihood model: models the selector judgment Audit Outcome model: predicts outcome High score if: 1) Taxpayer similar to other previously audited taxpayers, and 2) Likely to result in large tax adjustment Does the taxpayer fit the selection profile? If selected, what will be the likely tax adjustment? Audit Likelihood Score Audit Outcome Score Combined Score 12

Ongoing and Future Work Audit Select Model Enhancement Additional Tax Payer Factors IRS Data, Return amendments, etc. Better Modeling Algorithms Taxpayer Segmentation Models by Segment Franchise Tax Scoring Model Enforcement Models Tax Affinity Non-Filers Summary Data mining has great potential in tax compliance: Translates mountains of data into actionable information, predictions and categorization Improves efficiency of resource-intensive tasks, such as auditing Supports decision making and helps uncover more golden needles 13

Acknowledgments Texas Comptroller s Office, Audit Division Lisa McCormack Rose Orozco Eddie Coats Elite Analytics Dr. Satheesh Ramachandran Thanks Bill Haffey, SPSS Inc. bhaffey@spss.com Daniele Micci-Barreca, Elite Analytics LLC daniele@eliteanalytics.com 14