ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE



Similar documents
INVESTOR PRESENTATION. Third Quarter 2014

Investor Presentation. Second Quarter 2015

INVESTOR PRESENTATION. First Quarter 2014

Ramesh Bhashyam Teradata Fellow Teradata Corporation

Teradata Unified Big Data Architecture

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Parallel Data Warehouse

UNIFY YOUR (BIG) DATA

Teradata s Big Data Technology Strategy & Roadmap

SAS and Teradata Partnership

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Welcome. Host: Eric Kavanagh. The Briefing Room. Twitter Tag: #briefr

Advanced In-Database Analytics

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Artur Borycki. Director International Solutions Marketing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

QlikView Business Discovery Platform. Algol Consulting Srl

The Future of Data Management

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

HDP Hadoop From concept to deployment.

How To Use Hp Vertica Ondemand

Advanced Big Data Analytics with R and Hadoop

Efficient Big Data Analytics using SQL and Map-Reduce

Focus on the business, not the business of data warehousing!

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

The Enterprise Data Hub and The Modern Information Architecture

The Internet of Things and Big Data: Intro

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

How To Handle Big Data With A Data Scientist

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Data Warehouse as a Service. Lot 2 - Platform as a Service. Version: 1.1, Issue Date: 05/02/2014. Classification: Open

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Introducing Oracle Exalytics In-Memory Machine

Ganzheitliches Datenmanagement

Oracle Database 12c Plug In. Switch On. Get SMART.

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Sunnie Chung. Cleveland State University

The Future of Data Management with Hadoop and the Enterprise Data Hub

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

Turning Data Into Answers With HP Vertica

Building your Big Data Architecture on Amazon Web Services

In-Database Analytics

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Oracle Big Data SQL Technical Update

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

Harnessing the Value of Big Data Analytics

Big Data Technologies Compared June 2014

Evolution to Revolution: Big Data 2.0

Harnessing the power of advanced analytics with IBM Netezza

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

SQL Server 2012 Parallel Data Warehouse. Solution Brief

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Big Data and Your Data Warehouse Philip Russom

Data Refinery with Big Data Aspects

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

An Oracle White Paper October Oracle: Big Data for the Enterprise

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

SQL Server 2012 Performance White Paper

Data Integration Checklist

High Performance Data Management Use of Standards in Commercial Product Development

IBM Netezza High Capacity Appliance

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum

BIG DATA TRENDS AND TECHNOLOGIES

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Big Data and Its Impact on the Data Warehousing Architecture

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

So What s the Big Deal?

James Serra Sr BI Architect

Innovative technology for big data analytics

IBM BigInsights for Apache Hadoop

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Discovering Business Insights in Big Data Using SQL-MapReduce

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

2015 Ironside Group, Inc. 2

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

Navigating the Big Data infrastructure layer Helena Schwenk

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Unleash your intuition

Why Big Data in the Cloud?

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

<Insert Picture Here> Oracle and/or Hadoop And what you need to know

Modernizing Your Data Warehouse for Hadoop

Extend your analytic capabilities with SAP Predictive Analysis

Big Data Processing: Past, Present and Future

Transcription:

ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE

Big Data

Big Data What tax agencies are or will be seeing! Big Data Large and increased data volumes New and emerging data types/sources New multi-structured data types with unknown relationships that require processing of data regardless of size to discover insights. Examples: web logs, sensor networks, social networks, text. Increased reporting requirements such as Merchant cards (Form 1099-K) and Cost Basis Reporting on Securities Sales (Form 1099-B) Key Points Analyze all the data just not random samples The need for fast processing to detect and prevent fraud Single repository of the data

More s Law (as in more data) We are now looking at ZettaBytes (= 1 trillion gigabytes)

Big Data Challenges are More Than Data Size The Four Axes of Big Data CIOs face significant challenges in addressing the issues surrounding big data New technologies and applications are emerging and should be investigated to understand their potential value. Source: CEO Advisory: Big Data Equals Big Opportunity, Gartner, 31 March 2011.

Data in a Tax Agency Structured and Unstructured Data i.e. Audit Leads Nexus Payments Seller/Retailer Data Big Box Retailers/Corporations Social Media

Data in a Tax Agency Structured and Unstructured Data Call Center Data Web Logs i.e. Audit Leads Nexus Payments Work Papers Customs Data Case Notes Correspondence & Emails

Leveraging data for Taxpayer Education, Compliance and Service Enhancement Humans by nature are social, social media is just an enabler Untapped social network data EVERYWHERE! - Existing consumer/taxpayer transaction data & interaction data - You are not constrained to Twitter and Facebook feeds to obtain TP behavior and/or data What if.. you could determine by applying text analytics that a taxpayer that claimed no income in 2011 bought three motorcycles in 2011 What if.you could be notified a taxpayer claimed he cheated your tax department on a blog, on Facebook, etc?

Statistical Modeling The most powerful method is to use statistical models to assess fraud risk To build a predictive model, you need to identify some historical known cases Clustering can also be used to find cases with similar characteristics. This won t predict fraud, but can identify unusual groupings of cases Various modeling options exist 1.5 1.0 C3 T r a n s a c ti o n s 0.5 0.0-0.5-1.0 C2 C1-1.5-1.5-1.0-0.5 0.0 0.5 1.0 1.5 Login Time Cluster analysis can help find cases that have similar profiles Decision trees can help identify drivers of fraud and high risk cases Response modeling can provide rankings on overall fraud risk

One Analytic Data Solution Strategic & Operational Intelligence Big Data Insight Ad Hoc /OLAP Predictive Analytics Spatial/ Temporal Active Execution Pattern Analysis Path Analysis Graph Analysis SQL Analytics SQL-Map Reduce Analytics Teradata Integrated Data Warehouse Aster Data Analytic Platform Structure Multi-Structure CRM SCM ERP Trans 3 rd Party Web logs Text Social media Machine data

In-Database Analytic Processing Enabling Better, Faster Insight Reporting and OLAP Advanced Analytics Advanced Visualization Text Analytics Parallel Performance

Who is Teradata? Global Leader in Enterprise Data Warehousing Headquartered in Ohio 9,200+ associates Analytic Solutions and Consulting Services The leader in Gartner s Leaders Quadrant since 1999 U.S. publicly-traded software company S&P 500 Member, Listed NYSE: TDC Founded in 1979, public launch in 2007 Global presence and world-class customer list More than 1,300 customers, More than 2,500 installations 28 Federal and State partners Teradata Tax Team Deep tax domain Compliance Customer service Business Intelligence Extended Appliance Family Launched 2008 Simple Powerful Affordable!

GARTNER MAGIC QUADRANT DATA WAREHOUSE DBMS, 2012 Teradata is THE Leader and has been since 1999! 13 Magic Quadrant for Data Warehouse Database Management Systems Mark Beyer, Donald Feinberg, Merv Adrian, Roxanne Edjlali 2/6/12

Teradata Workload-Specific Platform Family 560 1650 2690 4600 66XX Data Mart Appliance Extreme Data Appliance Data Warehouse Appliance Extreme Performance Appliance Active Enterprise Data Warehouse Aster MapReduce Appliance Scalability Up to 12TB Up to 186PB Up to 315TB Up to 18TB Up to 92PB Up to 5PB Workloads Test/ Development or Smaller Data Marts Analytical Archive, Deep Dive Analytic Strategic Intelligence, Decision Support System, Fast Scan Operational Intelligence, Lower Volume, High Performance Strategic & Operational Intelligence, Real Time Update, Active workloads Discovery Platform for Big Data Analytics with embedded SQL MapReduce for new data types & sources 14

The Teradata Difference Scalability Across Multiple Dimensions Data Volume (Raw, User Data) Workload Management Query Concurrency Teradata can Scale Simultaneously Across Multiple Dimensions Driven by Business! Data Freshness Competition Scales One Dimension at the Expense of Others Limited by Technology! Query Complexity Query Freedom Schema Sophistication Query Data Volume 15 8/14/2012 Teradata Confidential

Teradata Database The Foundation Automatic Built-In Functionality Easy Set & G0 Optimization Options Fast Query Performance Quick Time to Value Simple to Manage Responsive to Business Change Powerful, Embedded Analytics Advanced Workload Management Intelligent Scan Elimination Parallel Everything design and smart Teradata optimizer enables fast query execution across platforms Simple set up steps with automatic hands off distribution of data, along with integrated load utilities result in rapid installations DBAs never have to set parameters, manage table space, or reorganize data Fully parallel MPP shared nothing architecture scales linearly across data, users, and applications providing consistent and predictable performance and growth In-database data mining, virtual OLAP/cubes, pre-built and custom application objects (User Defined Functions) drive efficient and differentiated business insight Workload management options by user, application, time of day and CPU exceptions Set and Go options reduce full file scanning (Primary, Secondary, Multi-level Partitioned Primary, Aggregate Join Index, Sync Scan) 16 8/14/2012 Teradata Confidential

Analytical Ecosystem The Ecosystem Is The Warehouse 2650 1650 66XX 560 2650 Aster Data SQL-Map Reduce 66XX

Teradata Aster Unified Big Data Architecture for the Enterprise Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Discovery Platform Integrated Data Warehouse Capture, Store, Refine Audio/ Video Images Text Web & Social Machine Logs CRM SCM ERP

Aster SQL-MapReduce: What Is It and Why It Is Important to In-Database Analytics? Patented Framework for advanced analytics that are hard to define in SQL - Couples SQL (relational) with MapReduce (SQL-MapReduce) - it s invoked from SQL. (automatically parallelized) - Includes library of pre-packaged Analytic Modules Aster Data ncluster App App App App App App SQL SQL- MapReduce Architecture for diverse, embedded analytics processing - Supports custom analytics written in a variety of languages i.e Java Combines SQL & visual tools - Makes MapReduce accessible from SQL/SQL-based tools (std. BI tools).

Ease of Development and Reuse Analytic Foundation : 50+ out-of-the-box modules Modules Path Analysis Discover patterns in rows of sequential data Statistical Analysis High-performance processing of common statistical calculations Relational Analysis Discover important relationships among data Business-ready SQL-MapReduce Functions npath: complex sequential analysis for time series analysis and behavioral pattern analysis Sessionization: identifies sessions from time series data in a single pass over the data Attribution: operator to help ad networks and websites to distribute credit Histogram: function to provide capability of generating Decision Trees: Native implementation of parallel random forests. Approximate percentiles and distinct counts: calculate percentiles and counts within specific variance Correlation: calculation that characterizes the strength of the relation between different data fileds Regression: performs linear or logistic regression between an output variable and a set of input variables Averages: calculate moving, weighted, exponential or volumeweighted averages over a window of data Graph analysis: finds shortest path from a distinct node to all other nodes in a graph Tokenization: splits strings into individual words to assist text processing

Ease of Development and Reuse Analytic Foundation : 50+ out-of-the-box modules Modules Text Analysis Derive patterns in textual data Cluster Analysis Discover natural groupings of data points Data Transformation Transform data for more advanced analysis SQL-MapReduce Analytic Functions Text Processing: counts occurrences of words, identifies roots, & tracks relative positions of words & multi-word phrases Text Partition: analyzes text data over multiple rows Levenshtein Distance: computes the distance between two words k-means: clusters data into a specified number of groupings Canopy: partitions data into overlapping subsets within which k- means is performed Minhash: buckets highly-dimensional items for cluster analysis Basket analysis: creates configurable groupings of related items from transaction records in single pass Collaborative Filter: predicts the interests of a user by collecting interest information from many users Unpack: extracts nested data for further analysis Pack: compress multi-column data into a single column Antiselect: returns all columns except for specified column Multicase: case statement that supports row match for multiple cases

Unified Big Data Architecture for the Enterprise Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Discovery Platform Integrated Data Warehouse Audio/ Video Images Text Web & Social Machine Logs CRM SCM ERP

Aster SQL-MapReduce and Hadoop MapReduce Aster SQL-MapReduce Hadoop MapReduce Customized MapReduce Deployed via SQL-MR and BI and Visualization tools Easy to manage database 50+ Packaged SQL-MapReduce Analytics SQL language of business Integrated Development Environment (IDE) Customized MapReduce Deployed via application code and people File System Batch Processing Requires lots of coding

Aster SQL-MapReduce and Hadoop Aster SQL-MapReduce Hadoop MapReduce Customized MapReduce SELECT * Deployed via SQL-MR and BI FROM npath ( and Visualization tools ON ( ) PARTITION Easy BY to sba_id manage database ORDER 50+ BY datestamp Packaged MODE (NONOVERLAPPING) SQL-MapReduce Analytics PATTERN ('(OTHER_EVENT FEE_EVENT)+') SYMBOLS SQL ( language of event business LIKE '%REVERSE FEE%' AS FEE_EVENT, Integrated Development event NOT LIKE '%REVERSE FEE%' AS Environment (IDE) OTHER_EVENT) RESULT ( ) ) n; Customized MapReduce Deployed via application code and people File System Batch Processing Requires lots of coding

Aster SQL-MapReduce and Hadoop Aster SQL-MapReduce Hadoop MapReduce Customized MapReduce SELECT * Deployed via SQL-MR and BI FROM npath ( and Visualization tools ON ( ) PARTITION Easy BY to sba_id manage database ORDER 50+ BY datestamp Packaged MODE (NONOVERLAPPING) SQL-MapReduce Analytics PATTERN ('(OTHER_EVENT FEE_EVENT)+') SYMBOLS SQL ( language of event business LIKE '%REVERSE FEE%' AS FEE_EVENT, Integrated Development event NOT LIKE '%REVERSE FEE%' AS Environment (IDE) OTHER_EVENT) RESULT ( ) ) n; Customized MapReduce Deployed via application code and people File System Batch Processing Requires lots of coding

Teradata Workload-Specific Platforms 560 1650 2690 4600 66XX Data Mart Appliance Extreme Data Appliance Data Warehouse Appliance Extreme Performance Appliance Active Enterprise Data Warehouse Aster MapReduce Appliance Scalability Up to 12TB Up to 186PB Up to 315TB Up to 18TB Up to 92PB Up to 5PB Workloads Test/ Development or Smaller Data Marts Analytical Archive, Deep Dive Analytic Strategic Intelligence, Decision Support System, Fast Scan Operational Intelligence, Lower Volume, High Performance Strategic & Operational Intelligence, Real Time Update, Active workloads Discovery Platform for Big Data Analytics with embedded SQL MapReduce for new data types & sources

Teradata Aster Solutions Teradata Aster Software Only Teradata Aster Cloud Edition Aster MapReduce Appliance Purpose Complex, High Speed Analytics For Emerging Big Data Teradata Aster ncluster for Amazon Web Services, AppNexus, Dell s Data Cloud and Terremark Integrated Discovery Platform Scalability Flexible Elastic Up to 5PB Sub Segment Massively parallel software solution with embedded SQL- MapReduce analytics for new data types and sources On-demand extreme scaling with no downtime, always-on data cloud availability for high performance nextgeneration analytics for big data Embedded SQL- MapReduce analytics on Teradata hardware.

Value Proposition: Comparing the Aster Appliance vs. Aster Software-Only Customer wants a ready-torun integrated solution with: Teradata Server Management Teradata support Customer wants to use commodity hardware Wants to run in the cloud Who Supports Appliance SW-Only Hardware Teradata Customer Software Teradata Teradata OS Teradata Customer Network Teradata Customer Set up Teradata Customer Issues Teradata Customer

Thank You!! What will you do different TOMORROW? Questions??