Taking Data Analytics to the Next Level Implementing and Supporting Big Data Initiatives
What Is Big Data and How Is It Applicable to Anti-Fraud Efforts? 2 of 20
Definition Gartner: Big data is high-volume, -velocity, and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. 3 of 20
Why Big Data? Fact Gathering on an Investigation or Proactive Compliance Program Interviews Interviews pull from document analysis and financial and operational analysis. Financial & operational analysis (structured data) Sales records Payment or expense details Selected general ledger accounts Financial reports and analysis Document analysis (unstructured data) Email & user documents Social media Corporate document repositories New feeds & research 4 of 20
IBM Projection: Massive Explosion of Data The Dawn of Big Data: The uncertainty of new information is growing alongside it s complexity 5 of 20
MapReduce MapReduce is built on the proven concept of divide and conquer: it s much faster to break a massive task into smaller chunks and process them in parallel. In 2004, Google decided to implement the power of parallel, distributed computing to digest enormous amounts of data produced in daily operations, which resulted in a group of technologies and architectural design philosophies known as MapReduce. 6 of 20
Hadoop Hadoop implementation of MapReduce was created by Doug Cutting and is written in Java. After it was created, Hadoop was turned over to the Apache Software Foundation. Now maintained as an open-source, top-level project with a global community of contributors. Original deployments include some of the most well-known, technologically advanced organizations such as Yahoo, Facebook, and LinkedIn. 7 of 20
Pig and Hive To build applications such as Hadoop, one normally employs a popular programming interface such as Java, Pig, or Hive Pig: A specialized higher-level MapReduce language Hive: A specialized SQL-based MapReduce language Many other programming interfaces exist 8 of 20
IBM Survey Big Data Sources Transactions Log Data Events Emails Social Media External Feeds Sensors Free-form text RFID scans or POS data Geospatial Audio Still images/video 43% 42% 42% 41% 41% 40% 38% 34% 59% 57% 73% 88% 0% 20% 40% 60% 80% 100% Source: IBM Report Analytics: The Real-World Use of Big Data 9 of 20
IBM Survey Big Data Analytics Activities Query and reporting Data mining Data visualization Predictive modeling Optimization Simulation Natural language text Geospatial analytics Streaming analytics Video analytics Voice analytics 77% 71% 67% 65% 56% 52% 43% 35% 26% 25% 91% 0% 20% 40% 60% 80% 100% Source: IBM Report Analytics: The Real-World Use of Big Data 10 of 20
Fraud Detection Requires a Comprehensive Approach For fraud detection, any direction can, and should, be taken when applying analytics to our platform. Visualize Score Report Mine Decide Analyze Platform (Analytics) Forecast Plan Collabor -ate Survey Simulate Predict Govern Model Discover 11 of 20
Unstructured Data Structured Data Recall the Forensic Analytics Maturity Low Model This Is Big Data Detection Rate High Matching, Grouping, Ordering, Joining, Filtering Traditional Rules-Based Queries & Analytics Anomaly Detection, Clustering Risk Ranking Statistical-Based Analysis Keyword Search Traditional Keyword Searching Data Visualization, Drill-down, Text Mining Data Visualization and Text Mining High False Positive Rate Low 12 of 20
Big Data and Anti-Fraud Structured and unstructured data is organized and risk scored for analysis and remediation. Email and Instant Message Transactional Data 3 rd Party Data Feeds Analysis Platform ERP Systems Social Media 13 of 20
A More Human Way to Look at Data Data Points Are Represented as Objects, With Logical Relationships Graphical representation of relationships between seemingly discrete entities Epicenters of activity become immediately discernable View supporting documents as dynamic objects 14 of 20
Search-Around Functionality Rapidly Build Networks of Interest and Tie In Multiple Data Sources Easily find entities, documents, events, etc. that are directly related to your selection 15 of 20
Geocoding and Heat Maps Identify Global Epicenters of Activity, As Well As Anomalies Hotspots of activity are easily identified 16 of 20
Employee-Risk Ranking Scored by Custodian and Time Period Based on Multiple Criteria 1.Keywords Percentage of EY-ACFE Fraud Triangle keywords around pressure, opportunity and rationalization in email and IM communications. Scaling: 3 2. T&E analysis Ranking of T&E out-of-compliance hits and overall email scoring. Scaling: 3 3. Sales activity Ranking of sales activity, field notes, and sales returns and allowances. Scaling: 4 4. User Activity 5. 3 rd Party Risk 6. Alias Clustering 7. Emotive Tone Percentage of instances within that week, where custodian sends or receives ESI involving those outside of peer group, as identified through hierarchies. Instances where employee is linked to high-risk 3 rd parties (e.g., customers, vendors, state owned entities, etc.) as determined by hits on OFAC, sanctions, PEP lists, or adverse media lists. Whether it be in email, T&E, or sales activity. Percentage of instances within that week, where custodian sends or receives ESI involving at least one (1) of their identified communicative aliases. Percentage of instances, where the employee sends or receives ESI with negative emotions (angry, frustrated, secretive, etc.) identified through linguistic analyses. Scaling: 2 Scaling: 2 Scaling: 3 Scaling: 5 Custodian C1 C2 C3 C4 C5 C6 C7 Scaling C1 Scaling C2 Scaling C3 Scaling C4 Scaling C5 Scaling C6 Scaling C7 Score A, Week 1 1 3 3 4 6 2 3 3 3 4 2 2 3 5 A, Week 2 2 2 4 5 3 4 2 37 45 17 of 20
Employee-Risk Scoring Risk Scoring Model Peer Stratification Dashboard Review Peer Stratification Dots represent clusters of high risk communications that can be reviewed by clicking. Detail-Level View 18 of 20
Course Recap Discussion 19 of 20
Contacts Vincent Walden, CFE, CPA Ernst & Young LLP Partner, Assurance Services Fraud Investigation & Dispute Services New York, NY (212) 773-3643 vincent.walden@ey.com 20 of 20