ANALYTICS MODERNIZATION TRENDS, APPROACHES, AND USE CASES. Copyright 2013, SAS Institute Inc. All rights reserved.



Similar documents
SAS and Teradata Partnership

Gør dine big data klar til analyse på en nem måde med Hadoop og SAS Data Loader for Hadoop. Jens Dahl Mikkelsen SAS Institute

WHAT S NEW IN SAS 9.4

Copyr i g ht 2014, SAS Ins titut e Inc. All rights res er ve d. WHAT S NEW IN SAS ANALYTICS 9.4

Новое в аналитике SAS

Hadoop & SAS Data Loader for Hadoop

QUEST meeting Big Data Analytics

High-Performance Analytics

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

The Future of Data Management

Big Data and Data Science: Behind the Buzz Words

SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform

Bringing the Power of SAS to Hadoop. White Paper

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Oracle Big Data SQL Technical Update

and Hadoop Technology

ANALYTICS IN BIG DATA ERA

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

The Future of Data Management with Hadoop and the Enterprise Data Hub

In-Memory Analytics for Big Data

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

HDP Hadoop From concept to deployment.

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Native Connectivity to Big Data Sources in MSTR 10

SAS ANALYTIC SOLUTIONS RUNNING ON A HADOOP CLUSTER USING YARN JAMES KOCHUBA. Copyright 2015, SAS Institute Inc. All rights reserved.

Cost-Effective Business Intelligence with Red Hat and Open Source

Oracle Big Data Essentials

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Introducing Oracle Exalytics In-Memory Machine

Paper SAS Techniques in Processing Data on Hadoop

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

9.4 Intelligence. SAS Platform. Overview Second Edition. SAS Documentation

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

UNIFY YOUR (BIG) DATA

An Oracle White Paper October Oracle: Big Data for the Enterprise

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Big Data and Advanced Analytics Technologies for the Smart Grid

Integrated Grid Solutions. and Greenplum

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

SAP and Hortonworks Reference Architecture

DATA VISUALIZATION: CONVERTING INFORMATION TO DECISIONS DAVID FRONING, PRINCIPAL PRODUCT MANAGER

Extend your analytic capabilities with SAP Predictive Analysis

Please give me your feedback

Hadoop Big Data for Processing Data and Performing Workload

Greenplum Database. Getting Started with Big Data Analytics. Ofir Manor Pre Sales Technical Architect, EMC Greenplum

Enterprise Data Management in an In-Memory World

Advanced In-Database Analytics

An Oracle White Paper June Oracle: Big Data for the Enterprise

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

HDP Enabling the Modern Data Architecture

Comprehensive Analytics on the Hortonworks Data Platform

I/O Considerations in Big Data Analytics

Big Data and the Analytic Race. Copyright 2012, SAS Institute Inc. All rights reserved.

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Advanced Big Data Analytics with R and Hadoop

Majed Al-Ghandour, PhD, PE, CPM Division of Planning and Programming NCDOT 2016 NCAMPO Conference- Greensboro, NC May 12, 2016

Modern Data Architecture for Predictive Analytics

Performance and Scalability Overview

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

DRIVING THE CHANGE ENABLING TECHNOLOGY FOR FINANCE 15 TH FINANCE TECH FORUM SOFIA, BULGARIA APRIL

Oracle Big Data Strategy Simplified Infrastrcuture

INVESTOR PRESENTATION. First Quarter 2014

Dell In-Memory Appliance for Cloudera Enterprise

Hadoop Ecosystem B Y R A H I M A.

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

Teradata s Big Data Technology Strategy & Roadmap

EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS

Big Data Technologies Compared June 2014

Tips and Techniques for Efficiently Updating and Loading Data into SAS Visual Analytics

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

SAS Visual Analytics. fact sheet What does SAS Visual Analytics do? Benefits

Data Mining from A to Z: Better Insights, New Opportunities WHITE PAPER

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH

Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries

Bringing Big Data Modelling into the Hands of Domain Experts

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

KNIME & Avira, or how I ve learned to love Big Data

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Safe Harbor Statement

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Architecture & Experience

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Big Data on Microsoft Platform

SEIZE THE DATA SEIZE THE DATA. 2015

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

SAP Real-time Data Platform. April 2013

Performance and Scalability Overview

ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE

Transcription:

ANALYTICS MODERNIZATION TRENDS, APPROACHES, AND USE CASES

STUNNING FACT Making the Modern World: Materials and Dematerialization - Vaclav Smil

Trends in Platforms Hadoop Microsoft PDW COST PER TERABYTE Oracle Greenplum Teradata Vertica $- $20 $40 $60 $80 $100 Thousands $20 $18 $16 $14 $12 $10 $8 $6 $4 $2 $- COST PER GIGABYTE Today 2009 COST OF STORAGE, MEMORY, COMPUTING In 2000 a GB of Disk $17 today < $0.07 In 2000 a GB of Ram $1800 today < $1 In 2009 a TB of RDBMS was $70K today < $ 20K

Shift in Mindset Scarcity Abundance Technology constrained Process-centric Focus on cost control Everything is forbidden unless it is permitted Focus on value Discovery-centric Technology empowered Everything is permitted unless it is forbidden

ADVANCED ANALYTICS TEXT ANALYTICS Finding treasures in unstructured data like social media or survey tools that could uncover insights about consumer sentiment FORECASTING Leveraging historical data to drive better insight into decision-making for the future INFORMATION MANAGEMENT OPTIMIZATION Analyze massive amounts of data in order to accurately identify areas likely to produce the most profitable results DATA MINING Mine transaction databases for data of spending patterns that indicate a stolen card.. STATISTICS Copyright 2011, SAS Institute Inc. All rights reserved. 6

CURRENT TRENDS IN ANALYTICS Complex Business Problems Are Driving Analytics Innovation Speed Will Be Of Essence Leverage Analytics To Unlock The Information Contained In Unstructured Data Operationalizing Analytics

VALUE OF BIG DATA Value of Data Unknown How should we adjust? What s the cause? Questions We Know Questions We Don t Know Sales are Down! Why? Value of Data Known

VALUE OF BIG DATA Value of Data Unknown How should we adjust? What s the cause? Questions We Know Questions We Don t Know + EDW C op yr i g h t 2 0 1 3, S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d. Why? Value of Data Known

VALUE OF BIG DATA Value of Data Unknown How should we adjust? What s the cause? Questions We Know Questions We Don t Know + EDW C op yr i g h t 2 0 1 3, S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d. + Value of Data Known EDW

VALUE OF BIG DATA Value of Data Unknown How should we adjust? + Questions We Know Questions We Don t Know + EDW C op yr i g h t 2 0 1 3, S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d. + Value of Data Known EDW

VALUE OF BIG DATA Value of Data Unknown + + Questions We Know Questions We Don t Know + EDW C op yr i g h t 2 0 1 3, S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d. + Value of Data Known EDW

VALUE OF BIG DATA Value of Data Unknown + + Data Scientist Statistician Questions We Know Questions We Don t Know + + Manager EDW C op yr i g h t 2 0 1 3, S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d. Business Analyst Value of Data Known EDW

VALUE OF BIG DATA + + Statistician Data Scientist + + Manager EDW C op yr i g h t 2 0 1 3, S A S I n s t i t u t e I n c. A l l r i g h t s r es er v e d. Business Analyst EDW

SAS ON HADOOP

SAS BIG DATA STRATEGY SAS AREAS

17 Japan

SAS WITHIN THE HADOOP ECOSYSTEM User Interface SAS Enterprise Guide SAS Data Integration SAS Enterprise Miner SAS Visual Analytics SAS In-Memory Statistics for Haodop SAS User Metadata SAS Metadata Next-Gen SAS User Data Access Base SAS & SAS/ACCESS to Hadoop SAS Access to Impala In-Memory Data Access Data Processing Pig Hive Map Reduce SAS Embedded Process Accelerators Impala SAS LASR Analytic Server SAS High- Performance Analytic Procedures MPI Based File System HDFS

SAS ACCESS TO HADOOP SAS SERVER Hive QL HADOOP

SAS/ACCESS TO CLOUDERA IMPALA General-purpose SQL query engine: should work both for analytical and transactional workloads will support queries that take from milliseconds to hours low latency response, 10-100x faster than Hive Runs directly within Hadoop: deploy on existing Hadoop clusters reads widely used Hadoop file formats talks to widely used Hadoop storage managers runs on same nodes that run Hadoop processes High performance: C++ instead of Java runtime code generation completely new execution engine that does not build on MapReduce

SAS / EMBEDDED PROCESS SAS SERVER SAS Data Step & DS2 HADOOP SAS/Scoring Accelerator for Hadoop SAS/Code Accelerator for Hadoop (2014 Q3) SAS/Data Quality Accelerator for Hadoop (2014 Q3) SAS/Data Director* (Name TBD 2014 Q3) proc ds2 ; /* thread ~ eqiv to a mapper */ thread map_program; method run(); set dbmslib.intab; /* program statements */ end; endthread; run; /* program wrapper */ data hdf.data_reduced; dcl thread map_program map_pgm; method run(); set from map_pgm threads=n; /* reduce steps */ end; enddata; run; quit; 21

SAS DATA LOADER What directive do you want to perform? Show: All Directives Saved Directives Open a previously created directive to run, view, or edit. Schedule a Directive to Run Schedule a directive to run at specified dates and times Chain Directives Together Run a number of directives in a specific order. Copy Data for Visualization Copy data from Hadoop and load it into LASR for visualization. Existing data in the target table will be replaced. Copy Data to Hadoop Copy data from a source and load it into Hadoop. Existing data in the target file will be replaced. Join Tables in Hadoop Create a table in Hadoop from multiple tables. Pivot a Table in Hadoop Transpose the columns of a table in Hadoop. Transform Data in Hadoop Transform the data in an Hadoop data file. Verify Mailing Address Check the validity of the mailing address data in a table. Profile Data Create a report profiling the data in a table. Generate Business Rules Analyze data in a table and generate business rules. Send Data for Remediation Select data to send to the remediation queue for further action.

SAS / HIGH PERFORMANCE ANALYTICS SAS SERVER SAS HPA Procedures HADOOP SAS High-Performance Statistics SAS High-Performance Data Mining SAS High-Performance Text Mining SAS High-Performance Econometrics SAS High-Performance Forecasting SAS High-Performance Optimization

SAS / HIGH PERFORMANCE ANALYTICS Prepare Explore / Transform Model HPDS2 HPDMDB HPSAMPLE HPSUMMARY HPCORR HPREDUCE HPIMPUTE HPBIN HPLOGISTIC HPREG HPNEURAL HPNLIN HPCOUNTREG HPMIXED HPSEVERITY HPFOREST HPSVM HPDECIDE HPQLIM HPLSO HPSPLIT HPTMINE HPTMSCORE Copyright Copyright 2013, SAS 2014, Institute SAS Inc. Institute All rights Inc. All reserved. rights reserved.

IN-MEMORY (LASR BASED) SOLUTIONS ON HADOOP Added Slide SAS ANALYTIC HADOOP ENVIRONMENT WEB CLIENTS APPLICATIONS Data Director* SAS LASR ANALYTIC SERVER SAS IN-MEMORY HADOOP ERP SCM CRM Visual Analytics SAS IN-MEMORY Images Visual Statistics SAS IN-MEMORY Audio and Video In-Memory Statistics SAS IN-MEMORY Machine Logs Visual Scenario Designer SAS IN-MEMORY 25 Text f Web and Social

SAS VISUAL ANALYTICS Interactive exploration, dashboards and reporting Auto-charting automatically picks the best graph Forecasting, scenario analysis, Decision Trees and other analytic visualizations Text analysis and content categorization Feature-rich mobile apps for ipad and Android 26 Copyright Copyright 2013, SAS 2014, Institute SAS Inc. Institute All rights Inc. All reserved. rights reserved.

27 Japan

SAS VISUAL STATISTICS Interactive, visual application for statistical modeling and classification Multiple methods: logistic, Regression, GLM, Trees, Forest, Clustering and more Model comparison and assessment Group BY Processing Copyright Copyright 2013, SAS 2014, Institute SAS Inc. Institute All rights Inc. All reserved. rights reserved.

Japan

SUMMARY SAS ON HADOOP OPTIONS SAS Access for Hadoop SAS Accelerators (Scoring, Code, Data Quality) High Performance Analytics Visual Analytics Visual Statistics In Memory Statistics for Hadoop (coding for Data Scientist) Copyright Copyright 2013, SAS 2014, Institute SAS Inc. Institute All rights Inc. All reserved. rights reserved.

USE CASES

A LARGE CANADIAN BANK ~20 million customers ~50 countries ~85000 employees Customer Pain: Building good models takes too long!

ITERATIVE APPROACH TUNING MULTIPLE CUSTOMER STATE DATA MARTS Multiple Iterations Segmented Models Opportunity Acquisition Baseline models

MODELING RESULTS Projected profit increase to Client (Cumulative) $6 million POC Objective was to build 3 models SAS Modelers built 10 Models POC validated and monetized the business impact of High Performance Data Mining and SAS Data Management Better results Increased productivity Customer Proceeds with HPA and Data Management 34

LARGE TELECOMMUNICATIONS COMPANY (AP REGION) Wireless Group 70+ mil subscribers, 50+ mil are active 98% are pre-paid, the rest are post-paid With at least 20 SMS/Day/Active Subscriber, more than 1B SMS are processed daily Wireline and Broadband ~2million subscribers for residential/individual lines ~200,000 for commercial business Customer Pain Point Volume of data and complexity of requirement has outgrown the legacy infrastructure Processing time limits creativity and fine grained campaigns 35

BAU TO SAS HPA PLANNING HIGH PERFORMANCE ANALYTICS Data Sources Grid Computing In-Memory Analytics SAS 9.3 SAS 9.4 SAS 9.4 SAS DI-EG SAS CM SAS HPAS/EM SAS VA EXADATA Other sources High Performance Switch Legacy sources SAS Datasets RDBMS SAS Datasets SAN STORAGE 36

32 X SERVERS Configuration Workflow Step CPU Runtime Ratio Client, 24 cores Explore (100K) 00:01:07:17 4.2 Partition 00:07:54:04 19.5 Impute 00:01:19:84 7.7 Transform 00:09:45:01 13.2 Logistic Regression (Step) 04:09:21:61 131.5 HPA Appliance, 32 x 24 = 768 cores Total 04:29:27:67 106.1 Explore 00:00:15:81 Partition 00:00:21:52 Impute 00:00:21:47 Transform 00:00:44:28 Logistic Regression 00:01:37:99 Total 00:02:21:07 Acceleration by factor 106! 37

32 X SERVERS Configuration Workflow Step CPU Runtime Ratio Client, 24 cores HPA Appliance, 32 x 24 = 768 cores Explore 00:01:07:17 4.2 Partition 01:01:09:31 170.5 Impute 00:02:45:81 7.7 Transform 01:26:06:22 116.7 Neural Net 18:21:28:54 478.9 Total 20:52:37:05 313 Explore 00:00:15:81 Partition 00:00:21:52 Impute 00:00:21:47 Transform 00:00:44:28 Neural Net 00:02:17:40 Total 00:04:00:48 Acceleration by factor 322! 38

THANK YOU sas.com