Big Data, Start Small! Dr. Frank Säuberlich, Director Advanced Analytics (Teradata International) 26 th May 2015



Similar documents
What is a Data Lake, anyway? Alec Gardner, GM Advanced Analytics, Teradata ANZ Wednesday 10 th June 2015

Bringing Intergalactic Data Speak (a.k.a.: SQL) to Hadoop Martin Willcox Director Big Data Centre of Excellence (Teradata

Key Trends in Big Data and Analytics

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Driving Value From Big Data

UNIFY YOUR (BIG) DATA

Energy Insight from OMNETRIC Group. Achieving quality and speed in analytics with data discovery

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

Teradata Unified Big Data Architecture

The Celebrus v8 Big Data Engine. Powering real-time personalisation, one-to-one data-driven marketing & advanced customer analytics.

Teradata s Big Data Technology Strategy & Roadmap

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

Safe Harbor Statement

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Investor Presentation. Second Quarter 2015

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

Unlock the business value of enterprise data with in-database analytics

B2B opportunity predictiona Big Data and Advanced. Analytics Approach. Insert

End Small Thinking about Big Data

The Future of Data Management

Big Data overview. Livio Ventura. SICS Software week, Sept Cloud and Big Data Day

CONNECTING DATA WITH BUSINESS

Discovering Business Insights in Big Data Using SQL-MapReduce

Consistent, Reusable Analytics for Big Data: The Hallmark of Analytic Applications

How To Analyze Data In A Database In A Microsoft Microsoft Computer System

Enabling Big Data with Cloud. Go faster Reduce risk Scale as you grow Avoid mistakes

A New Era Of Analytic

Cisco IT Hadoop Journey

The Analytical Revolution

Harnessing the Value of Big Data Analytics

Getting Started Practical Input For Your Roadmap

Achieving Business Value through Big Data Analytics Philip Russom

WHITE PAPER Business Process Management: The Super Glue for Social Media, Mobile, Analytics and Cloud (SMAC) enabled enterprises?

TEXT ANALYTICS INTEGRATION

Big Data Integration: A Buyer's Guide

What would you do if you knew?

The Future of Data Management with Hadoop and the Enterprise Data Hub

Big Data Strategy. Use Case Study. Amy O Connor // Field Sales Evangelist

Turn your information into a competitive advantage

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

FACTS ABOUT BIG DATA ANALYTICS PLATFORA. BIG DATA ANALYTICS Series

Predictive Analytics. Noam Zeigerson, CTO

Cloudera Enterprise Data Hub in Telecom:

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

The Enterprise Data Hub and The Modern Information Architecture

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Rapid Analytics. A visual, live approach to requirements gathering and business analytic development Mark Marinelli, VP of Product Management

Big Data Er Big Data bare en døgnflue? Lasse Bache-Mathiesen CTO BIM Norway

Real World Application and Usage of IBM Advanced Analytics Technology

Big Data and Your Data Warehouse Philip Russom

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

HIGH PERFORMANCE ANALYTICS FOR TERADATA

Smarter Analytics. Barbara Cain. Driving Value from Big Data

INVESTOR PRESENTATION. Third Quarter 2014

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

MAKING YOUR COMPANY BECOME DATA-DRIVEN

High-Performance Analytics

Predictive Analytics: Turn Information into Insights

2015 MATLAB Conference Perth 21 st May 2015 Nicholas Brown. Deploying Electricity Load Forecasts on MATLAB Production Server.

HDP Hadoop From concept to deployment.

Converging Technologies: Real-Time Business Intelligence and Big Data

PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis

SAS and Teradata Partnership

Engage your customers

Three proven methods to achieve a higher ROI from data mining

Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment

Using Tableau Software with Hortonworks Data Platform

SAP Predictive Analytics

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

Agilità per perseguire nuovi modelli di business e creare nuovo valore nel mercato delle utilities. Cristina Viscontino SoftwareAG Solution Architect

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Big Data Discovery: Five Easy Steps to Value

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

locuz.com Big Data Services

A Whole New World. Big Data Technologies Big Discovery Big Insights Endless Possibilities

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

The Future of Business Analytics is Now! 2013 IBM Corporation

Predictive Analytics at the Speed of Business

Apache Hadoop Patterns of Use

How To Understand Your Business Value From Big Data

Architected Blended Big Data with Pentaho

Welcome. Host: Eric Kavanagh. The Briefing Room. Twitter Tag: #briefr

How To Turn Big Data Into An Insight

INVESTOR PRESENTATION. First Quarter 2014

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

Apache Hadoop's Role in Your Big Data Architecture

The Real Benefits from Text Mining

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Data Mining from A to Z: Better Insights, New Opportunities WHITE PAPER

Transcription:

Big Data, Start Small! Dr. Frank Säuberlich, Director Advanced Analytics (Teradata International) 26 th May 2015

Agenda Introduction Big Data And The Emergence Of The Logical Data Warehouse Architecture Starting Small Example Deployments Lessons Learned Summary & Conclusions 2 Image source: Steven Dyer on www.photoshopcreative.co.uk

Human Generated Business Generated It s a Data Revolution Machine Generated Interaction Generated 3

The data-driven economy is emerging 1990s NOW Internet Economy Data Economy 4

The data-driven business puts data at the center 5

Data rich, insight poor 6

7 Required Capabilities The Logical Data Warehouse, a.k.a.: UDA

Analysts agree the Logical Data Warehouse is the future of Enterprise Analytical Architecture Gartner Logical Data Warehouse even if they can t agree what to call it Forrester Enterprise Data Hub We will abandon the old models based on the desire to implement for high-value analytic applications. Raw data in an affordable distributed data hub Firms that get this concept realise all data does not need first-class seating. 8

Big Data Starting Small Example Deployments Advanced Telco Churn Analysis Portfolio Optimization Predictive Maintenance A common misconception is that you can t start a Big Data project unless / until you have invested tens of millions of dollars in a fully-integrated Logical Data Warehouse, including a petabyte-scale Hadoop cluster, etc., etc., etc.; In the remainder of this presentation we will demonstrate that this is not the case and identify key learnings from other customers who have started small with Big Data. 9

Big Data: small Telecommunication Adcanved Churn Analysis Asian mobile telecommunications operator with > 6M subscribers Network and customer service problems had tarnished the company s reputation Urgent requirement to supplement the Analytics provided from the existing CDR Data Warehouse with Analytics that directly measured network performance and its impact on customer churn. 10

Big Data: small call drop outs data drop outs in web (PDP) sessions Path to churn Churn level of call quality (voice and data speed) * Sessionization * npath analysis Customer experience score 3G to 2G drop down and length of time on 2G. sentiment analysis from call center records (Didata) Propensity-to-churn model Propensity to churn Network data schema is evolving rapidly, so flexible information model is critical; Sessionization pre-packaged SQL-MapReduce function identifies sessions from time series data in a single pass over the data; npath - pre-packaged SQL-MapReduce function for finding sequences of events; Network data is first correlated in Aster and then stored in Hadoop, to optimise retention costs; Integration to EDW for Customer Reference / Profitability Data. 11

Big Data: small Project Delivery 3 phases, each of circa 3 months. Project Effort/Investment H/W & S/W 8 PS resources Business Value Operator is able to understand network performance from a customer perspective for the first time: improved customer service, reduced churn and revenue leakage from false complaints. 12

Big Data: smaller Banking portfolio optimization Large Retail Bank that had been forced to foreclose on very many residential mortgages - and that had a very large property portfolio to dispose of as a result Needed to understand market conditions and competitor pricing much better in order to ensure a rapid but orderly and efficient - disposal of these assets 13

Big Data: smaller Project Delivery Initial discovery PoC to demonstrate key concepts was delivered in 3 weeks by a small team of key Teradata, Bank staff. Project Effort/Investment H/W & S/W 3 PS resources Business Value First use-case alone has an estimated $2M impact to bottom-line profitability. 14

Big Data Smallest Manufacturing Industry Predictive Maintenance for Trains Large European train operator wanted to leverage engine sensor data to predict train failure Started with a small training set consisting of roughly one million sensor log observations and several thousand Engineer reports describing failure / fix Relevant data - several million train sensor observations and several thousand engineer s reports and their preparation 15

exploring the data using path and graph Analytics Affinity graph which components fail in combination (within the same train) identify candidates for failure prediction Sankey diagram exploring the path to failure (testing different categorizations of sensor readings as events) 16

actual actual using our understanding to build a predictive model...having profiled the predictive variables in this way, we have built a decision tree algorithm to predict engine failures Node 2 Failure Pct 3.20% Node 1 Failure Pct 3.41% Gear Power output low daily percentage <=0.44 Coolant temperature high daily pct <=0.204 Node 269 Failure Pct 15.98% Node 0 Failure Pct 3.55% Gear Oil Temp high daily percentage <= 0.256 Engine temperature high daily pct <=0.222 Node 287 Failure Pct 0.00% Node 286 Failure Pct 46.32% Gear Power output low daily percentage <=0.628 Node 288 Failure Pct 100.00% Confusion Matrix on Training and Test Data Sets Training Data Set prediction no failure 99% 1% no failure failure Test (holdout) Data Set prediction no failure failure no failure 99% 1% failure 13% 87% failure 16% 84% with high quality High degree of accuracy of the predictive model Very similar results on training and test (holdout) data sets (no overfitting) 17

Big Data: smallest Project Delivery First-cut model delivered on a PoC basis in only 2 weeks. Project Effort/Investment No up-front investment in H/W & S/W (PoC) 2PS resources Business Value Improved availability through significant reduction of unplanned downtime; reduced labour costs (quicker root cause analysis, improved first time fix rate, etc.); improved utilisation (more mileage, same trains). 18

Lessons learned from early deployments #1 By themselves, a big bucket of data and some fancy Analytic technology add no value; start with a business problem, not with a technology (ours or anybody else s). #2 New Big Analytics is often additive In many cases, Big Analytics is extending and enhancing existing analyses and business processes, not replacing them #3 Old business process + Expensive new technology = Expensive old business process The objective is not merely to gain insight the objective is operationalise that insight so that we change the way we do business #4 The time-consuming and expensive part of a traditional Business Intelligence & Analytics project is data integration; maybe we just shouldn t bother? #5 The failure rate for Analytic Exploration & Discovery is high, so cycle times are critical. 19

20 Summary & conclusions

The Logical Data Warehouse is the industry s adaptation to Big Data How will you deploy? How many / which platforms will you need? How will you integrate them? And which data need to be centralised and integrated? The Enterprise Data Warehouse Era The Logical Data Warehouse (a.k.a.: Unified Data Architecture) Era 1 Multi-structured data 2 Interaction / observation Analytics 5 3 4 Flat / falling IT budgets, exploding data volumes Agile Exploration & Discovery 1 3 2 4 Give me integrated, high quality data. 5 Operationalisation Centralise and integrate the data that are widely reused and shared, but integrate all of the analytics. 21

But equally, don t wait until you have deployed a full Logical Data Warehouse to start your Big Data journey Exploration & Discovery technology and processes can deliver value for you now and inform how you build-out your Logical Data Warehouse. 22

Thank you very much! Frank Säuberlich frank.saeuberlich@teradata.com 23