Demystifying Big Data Analytics Practical approaches to business intelligence and forensic analytics 9 May 2013
Discussion topics Big Data & Big Data Analytics Current fraud risks - industry research Components of an effective anti-fraud analytics program Advanced email analytics to detect rogue employee behavior Forensic analytics technology framework beyond rules-based tests Emerging technologies Page 2
Big Data and Big Data Analytics Page 3
The big question is What is Big Data? Big Data characteristics Big data represents data sets that can no longer be easily managed or analyzed with traditional or common management tools, methods, and infrastructure. High Volume Page 4
Big Data has arrived In this decade, the universe will grow 44x from 0.9 zettabytes to 35.2 zettabytes. Facebook Electronic Payments Video Rendering Social Media Mobile Sensors PayPal Video Surveillance Medical Imaging Smart grids Geophysical exploration Gene Sequencing Page 5
Enterprises are looking to leverage Big Data to Better understand their customer need and further their business growth. ebay needs insights into what the customer wants in order to improve customer experience and increase traffic on the site Yahoo adopted Big Data to connect what users are looking for with what advertisers are trying to sell to them. Optimize business decisions through data driven insights. The auto industry has been able to use GPS systems to gather information on customer driving habits to then improve their products. Make money by selling insights from Big Data. Polk, a household name in the auto industry, sells online subscriptions to data about vehicle sales and ownership to automakers, part suppliers, dealers, advertising firms, and insurance companies Page 6
What is Big Data analytics? Transform data to information, information to insight and insight to intelligence Analysis is based on a large population of transactions instead of sample Process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information Act of transforming data with the aim of extracting useful information and facilitating the achievement of factual conclusions Extracting the nuggets of gold hidden under mountains of data Page 7
The Analytics Value Chain: Manage DATA Manage DATA relevant data Perform ANALYTICS insights rules/algorithms Drive DECISIONS Primary goal with managing data should be to cut through the Big Data hype to leverage the relevant data needed to drive better business decisions Big Data Not So Big Data Volume Terabytes/Pedabytes Megabytes/Gigabytes Variety Unstructured (text, voice, video) Structured / Relational Velocity Data in Motion (Streaming) Data at Rest Veracity Untrusted / Uncleansed Trusted / Cleansed Page 8
The Analytics Value Chain: Perform Analytics Manage DATA relevant data Perform ANALYTICS insights rules/algorithms Drive DECISIONS Clients must be adept across a continuum of analytical techniques Advanced Analytics Prescriptive Analytics To determine WHICH decision and/or action will produce the most effective result against a specific set of objectives and constraints Predictive Analytics Leverage past data to understand the underlying relationship between data inputs and outputs to understand WHY something happened or to predict WHAT will happen in the future across various scenarios Business Intelligence Descriptive Analytics Mine past data to report, visualize, and understand WHAT has already happened after the fact or in real-time Mathematical Complexity Page 9
The Analytics Value Chain: Drive Decisions Manage DATA relevant data Perform ANALYTICS insights rules/algorithms Drive DECISIONS Last year, IBM surveyed 4,500 clients who said that they use analytics to drive decisions in three primary areas: Customer - to grow revenue and provide personalized services / products Operations - to reduce costs and improve service reliability Finance - to optimize investments across revenue/cost tradeoffs and to identify fraud, waste, and abuse Page 10
Current fraud risks Page 11
Risks by category Page 12
How is fraud detected? 50.3% by tip or accident Source: ACFE 2010 Report to the Nations On Occupational Fraud Source: ACFE 2012Report to the Nations Page 13
Anti-Fraud Controls Page 14
2012 Ernst & Young Global Fraud Survey 39% of respondents say that bribery & corruption practices occur frequently in their countries 15% of CFOs surveyed said they would be willing to make cash payments to win business 20% of CFOs surveyed said that they are willing to make personal gifts to win business 372 global CFOs surveyed Page 15
Components of an effective anti-fraud & corruption compliance program Elements of a successful corporate anti-fraud, bribery and corruption program Code of Ethics Setting the Proper Tone Proactive Fraud and Controls Corruption Communication Risk Monitoring Prevention and Training Assessment and Analytics Policies Management Ownership and Involvement Reactive Incident Response Plan Anti-fraud, bribery and corruption key activities Corporate compliance assessment Corporate compliance design Gap analysis Future state design session Discovery response planning Records and information management Who owns fraud? Assign roles and responsibilities Fraud and risk committee formulation Customized training Corporate governance Design sessions Corporate anti-fraud road map FCPA / anti-bribery compliance assessments Fraud risk assessment Targeted anti-fraud analytics Anti-bribery and corruption analytics M&A Due Diligence 3 rd Party Due Diligence Vendor Risk profiling Vendor Vetting -Level I, II, III background checks Investigations Fraud response planning Forensic data analytics Discovery and document review Page 16
Start with the Fraud Tree New tools and methodologies are required for monitoring corruption schemes Traditional focus of legal and compliance. Increased use of audit resources. Corruption Fraud tree Traditional focus of external audit. Fraudulent statements Conflicts of interest Bribery and corruption/ FCPA Illegal gratuities Bid-rigging/ procurement Revenue recognition GAAP Reserves Non financial Asset misappropriation Traditional focus of internal audit. Cash larceny Theft of other assets inventory/ AR/ fixed assets Fake vendor Payroll fraud T&E fraud Theft of data Page 17
Forensic analytics maturity model Beyond traditional rules-based queries consider all four quadrants Low Detection Rate High Structured Data Matching, Grouping, Ordering, Joining, Filtering Traditional rules-based Queries & Analytics Anomaly Detection, Clustering Risk Ranking Statistical-Based Analysis Unstructured Data Keyword Search Traditional Keyword Searching Data visualization, Drill-down into data, Text Mining Data Visualization & Text Mining High False Positive Rate Low Page 18
Fraud risk is in all datasets When considering enterprise risk, all sources of data should be addressed Gartner study shows that 80% of enterprise data is unstructured in nature Most internal audit procedures focus on the 20% structured data Structured Data CRM Databases Accounting Systems Text Graphics Email 80% Unstructured Data Presentations & Spreadsheets Unstructured Data 20% 80% Few organizations have the methodologies or technologies to efficiently address unstructured data Source: Gartner Research Page 19
Unstructured forensic analytics Page 20
Every second 05 Babies are born 1.4 Million spam e-mails 48 Minutes of video are uploaded to YouTube 06 People long-on for the first time 05 New broadband subscribers 50 Mobile phones sold 34,000 Google searches 200,000 Text messages are sent 02 New blogs created 09 PCs sold >01 New domains registered 60 People visit an online dating site Page 21
Message Frequency Analyze communication over time to identify gaps in data set Understand message counts across the population (View email and/or instant messaging frequency and consistency relative to the population.) Over which time period? When communications occur Communication spikes around key business events Page 22
Keyword Search Summary Analyze keyword hits by term, custodian, and date Analyze effectiveness of keywords. Understand the effect of keyword hits by custodian and timeframe to prioritize review and analyze keyword hits. Page 23
Link analysis Who is talking to who? The first 48 hours: Live server log files pulled in quickly for early case assessments Understanding a complex organization s true organization chart: Identification of relationships, versus activities, amongst actors Triage of custodians and communications for traditional review and additional analytics: Rapidly identify and point to communications of highest interest Sample analytics criterion: 1. Private communications, where 90% of all communications is outbound 2. Private Communications where content is FORWARDED Outbound more than 35% of time 3. Private Communications where attachments are sent outbound more that 35% of time Page 24
Fraud Triangle analytics Applying the theory to electronic communications Over 3,000 fraudulent terms/phrases in a dozen languages Page 25
Interactive dashboard for Email Page 26
Emotional Tone Analysis Identify Derogatory, Surprised, Secretive, Worried communications Page 27
Combine analytics to surface issues Unstructured, communications data. is organized and risk scored for analysis and remediation. Email & TXT Messages Transactions Voice Mail & Instant Messages Analysis Platform Document management Employee Hard Drives Page 28
Custodian Risk Ranking Scored by custodian and time period based on multiple criteria 1. Behavioral Keywords 2. Behavioral Keywords 3. Behavioral Keywords 4. User Activity 5. User Activity 6. Alias Clustering Percentage of EY-ACFE opportunity-focused behavioral term hits for that week in ESI sent or received by the custodian in focus Percentage of EY-ACFE rationalization-focused behavioral term hits for that week in ESI sent or received by the custodian in focus. Percentage of EY-ACFE incentive-pressure-focused behavioral term hits for that week in ESI sent or received by the custodian in focus. Percentage of instances within that week, where custodian sends or receives ESI involving those outside of peer group, as identified through hierarchies. Percentage of instances within that week, where custodian sends or receives ESI involving those outside of superiors, as identified through hierarchies. Percentage of instances within that week, where custodian sends or receives ESI involving at least one (1) of their identified communicative aliases. Scaling: 3 Scaling: 3 Scaling: 4 Scaling: 2 Scaling: 2 Scaling: 3 7. Emotive Percentage of instances within that week, where the custodian sends or receives ESI Tone with negative emotions identified through linguistic analyses. Scaling: 5 Custodian C1 C2 C3 C4 C5 C6 C7 Scaling C1 Scaling C2 Scaling C3 Scaling C4 Scaling C5 Scaling C6 Scaling C7 Score A, Week 1 1 3 3 4 6 2 3 45 3 3 4 2 2 3 5 A, Week 2 2 2 4 5 3 4 2 37 Page 29
Rogue employee analytics Risk Scoring Model peer stratification dashboard review Peer Stratification Dots represent clusters of high risk communications that can be reviewed by clicking. Detail-Level View Page 30
Structured forensic analytics Page 31
Integrated sampling approach Identifying a sample set of 50 payments for review, from 1.4 million in scope. Raw ERP system tables Data model design EY Payment Sample Selection Tool X riskiest payments, for detailed manual review 0.5X randomly selected payments, for contextual review Structured Attribute Analysis Payments related to sensitive field changes, by riskiest users Payments related to sensitive field changes, by riskiest fields Duplicative invoices Invoice completeness Round payment amounts Requirements Clustering Payments to high amount, low frequency vendors Payments to low amount, high frequency vendors Benford s Law deviants Payments to statistically anomalous vendors Fuzzy matching between employees and vendors Text Mining and Analysis Identification and extraction of entities: Geographies Proper nouns Addresses Telephone numbers Top concept extraction Identification of hits against EY- ACFE bribery and corruption keywords Minimal BAU disruption Transparency Sustainability EY received raw SAP tablesand rapidly re-constructed change activity and payments. Adjust weightings on-the-fly to reduce false-positives Architecture to support feeding in additional data, as needed. Page 32
Transaction Risk Scoring Analyzing multiple tests for each transaction Review breaches on targeted analytics Filter by selected analytics Page 33
Beyond rules-based tests Integrate statistical, visual and text mining techniques to identify patterns of high risk or rogue employee activities. Page 34
Focus on the payment text descriptions What if you saw these terms used as justification for payments to third parties? <blank description> Friend fee Donation Nobody calls it bribe expense Government fee Special commission One time payment Special payment Commission to the customer Incentive payment Pay on behalf of Goodwill payment Consulting fee Team building expense Volume contract incentive Processing fee Page 35
Text analytics in transactional data Page 36
Interactive dashboard: Expense review interface (who, what, where, why, how and how much?) Page 37
Emerging technologies Page 38
Search-around functionality Rapidly build out networks of interest and tie in multiple data sources Easily find entities, documents, events, etc which are directly related to your selection Page 39
A more human way to look at data Data points are represented as objects, with logical relationships Graphical representation of relationships between seemingly discrete entities Epicenters of activity become immediately discernable View supporting documents as dynamic objects Page 40
EY resources Page 41
Ernst & Young resources The guide to investigating business fraud Corruption or compliance weighing the costs, the Ernst & Young 10th global fraud survey Best practices for a global FCPA program, Compliance Week Keep it clean: the role of policies and training in compliance with anti-corruption laws, Supply Chain Quarterly Exposing the iceberg: detecting fraud through email, FRAUD Magazine Fraud triangle analytics: applying the theory, FRAUD Magazine Page 42
Ernst & Young resources Staying ahead of corruption liabilities, ACG Mergers & Acquisitions Detecting financial statement fraud Ernst & Young white paper Demonstrating the effectiveness of your compliance program, Compliance Week Acquisitions in emerging markets: know the risks and how to address them, Financier Worldwide Accounting for Words, Internal Auditor Breaking the Status Quo in E-Mail Review, FRAUD Magazine Page 43
Components of an effective corruption compliance program Page 44
Contacts James Walton Senior Manager Advisory Services Enterprise Intelligence Dallas, TX (214) 969-0777 james.walton@ey.com Dave Rogers Senior Manager Fraud Investigation & Dispute Services Dallas, TX (214) 969-8037 dave.rogers@ey.com Page 45
Ernst & Young Assurance Tax Transactions Advisory About Ernst & Young Ernst & Young is a global leader in assurance, tax, transaction and advisory services. Worldwide, our 152,000 people are united by our shared values and an unwavering commitment to quality. We make a difference by helping our people, our clients and our wider communities achieve their potential. Ernst & Young refers to the global organization of member firms of Ernst & Young Global Limited, each of which is a separate legal entity. Ernst & Young Global Limited, a UK company limited by guarantee, does not provide services to clients. For more information about our organization, please visit www.ey.com. Ernst & Young LLP is a client-serving member firm of Ernst & Young Global Limited operating in the US. 2012 Ernst & Young LLP. All Rights Reserved. 1206-1369289 This publication contains information in summary form and is therefore intended for general guidance only. It is not intended to be a substitute for detailed research or the exercise of professional judgment. Neither Ernst & Young LLP nor any other member of the global Ernst & Young organization can accept any responsibility for loss occasioned to any person acting or refraining from action as a result of any material in this publication. On any specific matter, reference should be made to the appropriate advisor. Page 46