Analytics and Risk Examples from Research & Analytics Branch Duncan Cleary dcleary@revenue.ie http://www.linkedin.com/in/duncancleary Research & Analytics Branch DATA - INFORMATION - KNOWLEDGE 1 Revenue s Business Context To serve the community by fairly and efficiently collecting taxes and duties and implementing Customs controls. www.revenue.ie Total Receipts 31.5 Billion (2010, Net) 2 Analysis of REAP 1
Research & Analytics Branch Conduct analyses to transform data into information primarily using SAS software. Evidence based projects, predictive analytics, segmentation, forecasting etc. using data from Revenue and other sources. Enables Revenue make better use of its data and provides an improved understanding of the taxpayer population. The results are used to better target services to customers and to improve compliance. 3 Target 4 Analysis of REAP 2
Rules 5 not a duck 6 Analysis of REAP 3
7 Not a duck 8 Analysis of REAP 4
Rules combined are better + + = 9 But where are the ducks? 10 Analysis of REAP 5
Case Study: Use of Predictive Analytics in Revenue Revenue s Risk system; uses ~300 business rules to quantify risk. Goal: use predictive analytics to extend from risk to predicting likelihood of yield, if audited. Pilot model and if successful bring to production. Show how analytics can assist development of effective business strategies for Revenue. Optimise use of Revenue resources. 11 Data and Variables: Considerable effort at Data Integration stage. (use SAS DI Studio, scalable, semi auto). Data Quality! Risk system data is opportunistic. Business Context and understanding. Rules that fire/ don t fire, binary and frequency. Derived variables, such as monetary risk and behaviour scores created by risk system. Target variables: Audit Outcomes (e.g. yield). Demographic variables, Geography, Sector etc. 12 Analysis of REAP 6
Help for finding ducks 13 SAS Credit Scoring Module Banking Analogy: Likelihood of a case defaulting on a loan, based on their profile and the profiles of cases who have defaulted in the past. Credit Scoring techniques applied in this model where the likelihood of a case to yield, based on their profile and the profiles of cases who have yielded in the past. Model creates a scorecard and probability of yield for the cases base. 14 Analysis of REAP 7
Training the assistant 15 Results SAS Credit Scoring Module in SAS Enterprise Miner. Target: Any yield over 2500= 1, < 2500 = 0 Cut off point e.g. p= 0.65: misclassification of 23% (77% hit rate). Number of cases: can continue to select until quota is filled, based on decreasing probability to yield. Scorecard can be used to assess cases. 2000 1800 1600 1400 1200 1000 800 600 400 200 0 0.95-1.00 0.85-0.90 0.75-0.80 0.65-0.70 0.55-0.60 0.45-0.50 0.35-0.40 0.25-0.30 0.15-0.20 0.05-0.10 Yielding Cases Non Yielding Cases 16 Analysis of REAP 8
Scorecard Extract All cases are asssigned a score based on their profile as per the model. Cut offs can be set to increase likelihood The less points that a case scores, the more likely it is to yield if audited. 17? 18 Analysis of REAP 9
Extending the Model: Reduce Misclassification Unseen data scored, i.e. cases that have not been audited in period List of cases with scores based on propensity to yield according to model. Cut off set high (e.g. 0.70 probability). 19 Hits vs. Misses in Pilot Region 3:1 20 Analysis of REAP 10
21 An auditor in the field 22 Analysis of REAP 11
So what next? Operationalise this approach Developing more models for the business (e.g. Yield, Sectoral, Regional, Liquidation, Phoenix Directors, Real Time Risk, etc.) To evaluate models through field testing, in co-operation with Revenue Regions Extract more value from the data & info we already have, better training data. To make analytics more central to how Revenue performs its work 23 24 Analysis of REAP 12
25 26 Analysis of REAP 13
Audit Yield Models Other Work 2 stage: monetary risk as a target, cost of audit Liquidation Models Balloon Payments Phoenix Directors/ Group Risk Model Evaluation Real Time or Look Back Prevent and Detect, Customs, VAT, Excise Customer Segmentation 27 Two Stage Model: Test 28 Analysis of REAP 14
Two Stage Model: Test cont d 29 Liquidation Model Assessment: Probability Distributions 30 Analysis of REAP 15
SNA: Social Network Analysis 31 Directors Linked to Associated Companies 2 Directors with many companies in common One director red according to REAP Other director and most companies green Methods of propagating content(risk) through a network 32 Analysis of REAP 16
Identifying Risky Cases Cases flagged by predictive model: Cases flagged by algorithm: 33 Customer Segmentation for Risk 34 Analysis of REAP 17
35 Questions? Dr. Duncan Cleary Revenue, Planning Division, Research & Analytics Branch t: 00353-(0)1-4251414 e: mailto:dcleary@revenue.ie http://ie.linkedin.com/in/duncancleary 36 Analysis of REAP 18