Taking Data Analytics to the Next Level

Similar documents
Using Data Analytics to Detect Fraud. Other Data Analysis Techniques

Demystifying Big Data Analytics

Emerging Trends in Fraud Analytics:

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Sunnie Chung. Cleveland State University

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Fraud Prevention, Detection and Response. Dean Bunch, Ernst & Young Fraud Investigation & Dispute Services

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

locuz.com Big Data Services

Are You Ready for Big Data?

There s no way around it: learning about Big Data means

Big Data Executive Survey

Data Refinery with Big Data Aspects

Are You Ready for Big Data?

The Next Wave of Data Management. Is Big Data The New Normal?

Fraud Triangle Analytics Anti-Fraud Research and Methodologies

Comprehensive Analytics on the Hortonworks Data Platform

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

SAP and Hortonworks Reference Architecture

Global EY FIDS Forensic Data Analytics Survey 2014

Big Data on Microsoft Platform

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Big Data Explained. An introduction to Big Data Science.

BIG DATA TECHNOLOGY. Hadoop Ecosystem

The 4 Pillars of Technosoft s Big Data Practice

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Community Driven Apache Hadoop. Apache Hadoop Basics. May Hortonworks Inc.

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Navigating Big Data business analytics

Hadoop for Enterprises:

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

How To Make Data Streaming A Real Time Intelligence

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Streaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment

Big Data Spatial Analytics An Introduction

How To Learn To Use Big Data

Streaming Big Data Performance Benchmark. for

Integrating a Big Data Platform into Government:

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Apache Hadoop Patterns of Use

Bringing the Power of SAS to Hadoop. White Paper

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Transforming the Telecoms Business using Big Data and Analytics

IBM Big Data Platform

L1: Introduction to Hadoop

Open source Google-style large scale data analysis with Hadoop

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

IBM Unstructured Data Identification and Management

Using Tableau Software with Hortonworks Data Platform

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Apache Hadoop: The Big Data Refinery

How To Turn Big Data Into An Insight

IoT and Big Data- The Current and Future Technologies: A Review

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

Using Data Analytics to Detect Fraud

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

BIG DATA SOLUTION DATA SHEET

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Ubuntu and Hadoop: the perfect match

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

BIG DATA TRENDS AND TECHNOLOGIES

Big Data / FDAAWARE. Rafi Maslaton President, cresults the maker of Smart-QC/QA/QD & FDAAWARE 30-SEP-2015

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

BIG DATA CHALLENGES AND PERSPECTIVES

Big Data on Cloud Computing- Security Issues

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Journal of Environmental Science, Computer Science and Engineering & Technology

Big Data a threat or a chance?

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

BIG DATA-AS-A-SERVICE

Modern Data Architecture for Predictive Analytics

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

IBM's Fraud and Abuse, Analytics and Management Solution

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

White Paper: Datameer s User-Focused Big Data Solutions

#TalendSandbox for Big Data

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Ten Common Hadoopable Problems

Big Data at Cloud Scale

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Transcription:

Taking Data Analytics to the Next Level Implementing and Supporting Big Data Initiatives

What Is Big Data and How Is It Applicable to Anti-Fraud Efforts? 2 of 20

Definition Gartner: Big data is high-volume, -velocity, and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. 3 of 20

Why Big Data? Fact Gathering on an Investigation or Proactive Compliance Program Interviews Interviews pull from document analysis and financial and operational analysis. Financial & operational analysis (structured data) Sales records Payment or expense details Selected general ledger accounts Financial reports and analysis Document analysis (unstructured data) Email & user documents Social media Corporate document repositories New feeds & research 4 of 20

IBM Projection: Massive Explosion of Data The Dawn of Big Data: The uncertainty of new information is growing alongside it s complexity 5 of 20

MapReduce MapReduce is built on the proven concept of divide and conquer: it s much faster to break a massive task into smaller chunks and process them in parallel. In 2004, Google decided to implement the power of parallel, distributed computing to digest enormous amounts of data produced in daily operations, which resulted in a group of technologies and architectural design philosophies known as MapReduce. 6 of 20

Hadoop Hadoop implementation of MapReduce was created by Doug Cutting and is written in Java. After it was created, Hadoop was turned over to the Apache Software Foundation. Now maintained as an open-source, top-level project with a global community of contributors. Original deployments include some of the most well-known, technologically advanced organizations such as Yahoo, Facebook, and LinkedIn. 7 of 20

Pig and Hive To build applications such as Hadoop, one normally employs a popular programming interface such as Java, Pig, or Hive Pig: A specialized higher-level MapReduce language Hive: A specialized SQL-based MapReduce language Many other programming interfaces exist 8 of 20

IBM Survey Big Data Sources Transactions Log Data Events Emails Social Media External Feeds Sensors Free-form text RFID scans or POS data Geospatial Audio Still images/video 43% 42% 42% 41% 41% 40% 38% 34% 59% 57% 73% 88% 0% 20% 40% 60% 80% 100% Source: IBM Report Analytics: The Real-World Use of Big Data 9 of 20

IBM Survey Big Data Analytics Activities Query and reporting Data mining Data visualization Predictive modeling Optimization Simulation Natural language text Geospatial analytics Streaming analytics Video analytics Voice analytics 77% 71% 67% 65% 56% 52% 43% 35% 26% 25% 91% 0% 20% 40% 60% 80% 100% Source: IBM Report Analytics: The Real-World Use of Big Data 10 of 20

Fraud Detection Requires a Comprehensive Approach For fraud detection, any direction can, and should, be taken when applying analytics to our platform. Visualize Score Report Mine Decide Analyze Platform (Analytics) Forecast Plan Collabor -ate Survey Simulate Predict Govern Model Discover 11 of 20

Unstructured Data Structured Data Recall the Forensic Analytics Maturity Low Model This Is Big Data Detection Rate High Matching, Grouping, Ordering, Joining, Filtering Traditional Rules-Based Queries & Analytics Anomaly Detection, Clustering Risk Ranking Statistical-Based Analysis Keyword Search Traditional Keyword Searching Data Visualization, Drill-down, Text Mining Data Visualization and Text Mining High False Positive Rate Low 12 of 20

Big Data and Anti-Fraud Structured and unstructured data is organized and risk scored for analysis and remediation. Email and Instant Message Transactional Data 3 rd Party Data Feeds Analysis Platform ERP Systems Social Media 13 of 20

A More Human Way to Look at Data Data Points Are Represented as Objects, With Logical Relationships Graphical representation of relationships between seemingly discrete entities Epicenters of activity become immediately discernable View supporting documents as dynamic objects 14 of 20

Search-Around Functionality Rapidly Build Networks of Interest and Tie In Multiple Data Sources Easily find entities, documents, events, etc. that are directly related to your selection 15 of 20

Geocoding and Heat Maps Identify Global Epicenters of Activity, As Well As Anomalies Hotspots of activity are easily identified 16 of 20

Employee-Risk Ranking Scored by Custodian and Time Period Based on Multiple Criteria 1.Keywords Percentage of EY-ACFE Fraud Triangle keywords around pressure, opportunity and rationalization in email and IM communications. Scaling: 3 2. T&E analysis Ranking of T&E out-of-compliance hits and overall email scoring. Scaling: 3 3. Sales activity Ranking of sales activity, field notes, and sales returns and allowances. Scaling: 4 4. User Activity 5. 3 rd Party Risk 6. Alias Clustering 7. Emotive Tone Percentage of instances within that week, where custodian sends or receives ESI involving those outside of peer group, as identified through hierarchies. Instances where employee is linked to high-risk 3 rd parties (e.g., customers, vendors, state owned entities, etc.) as determined by hits on OFAC, sanctions, PEP lists, or adverse media lists. Whether it be in email, T&E, or sales activity. Percentage of instances within that week, where custodian sends or receives ESI involving at least one (1) of their identified communicative aliases. Percentage of instances, where the employee sends or receives ESI with negative emotions (angry, frustrated, secretive, etc.) identified through linguistic analyses. Scaling: 2 Scaling: 2 Scaling: 3 Scaling: 5 Custodian C1 C2 C3 C4 C5 C6 C7 Scaling C1 Scaling C2 Scaling C3 Scaling C4 Scaling C5 Scaling C6 Scaling C7 Score A, Week 1 1 3 3 4 6 2 3 3 3 4 2 2 3 5 A, Week 2 2 2 4 5 3 4 2 37 45 17 of 20

Employee-Risk Scoring Risk Scoring Model Peer Stratification Dashboard Review Peer Stratification Dots represent clusters of high risk communications that can be reviewed by clicking. Detail-Level View 18 of 20

Course Recap Discussion 19 of 20

Contacts Vincent Walden, CFE, CPA Ernst & Young LLP Partner, Assurance Services Fraud Investigation & Dispute Services New York, NY (212) 773-3643 vincent.walden@ey.com 20 of 20