April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner. kwaehner@tibco.

Similar documents
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

whitepaper Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R

EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS

Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R

R / TERR. Ana Costa e SIlva, PhD Senior Data Scientist TIBCO. Copyright TIBCO Software Inc.

Welcome to the second half ofour orientation on Spotfire Administration.

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

Empowering the Masses with Analytics

6.0, 6.5 and Beyond. The Future of Spotfire. Tobias Lehtipalo Sr. Director of Product Management

locuz.com Big Data Services

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Microservices and Containers in the Middleware World

SEIZE THE DATA SEIZE THE DATA. 2015

Hurwitz ValuePoint: Predixion

Azure Machine Learning, SQL Data Mining and R

ORACLE TAX ANALYTICS. The Solution. Oracle Tax Data Model KEY FEATURES

Deriving Value From Big Data Visual, Predictive, GeoLocation and Event Analytics

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Ganzheitliches Datenmanagement

Bring your data to life with Microsoft Power BI. Peter Myers Bitwise Solutions

The Internet of Things

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Kai Wähner. The Next-Generation BPM for a Big Data World: Intelligent Business Process Management Suites (ibpms)

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Algorithmic Operations: Integrating Real-Time Analytics into your Business. Richard Tibbetts CTO, TIBCO Event Processing QCon San Francisco 2014

Predictive Analytics

The Role of Data & Analytics in Your Digital Transformation. Michael Corcoran Sr. Vice President & CMO

Implementation of Big Data and Analytics Projects with Big Data Discovery and BICS March 2015

GROW YOUR ANALYTICS MATURITY

HADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING. Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

How To Handle Big Data With A Data Scientist

BIG DATA ANALYTICS For REAL TIME SYSTEM

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Five Reasons Spotfire Is Better than Excel for Business Data Analytics

Data Mining + Business Intelligence. Integration, Design and Implementation

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

How To Make Sense Of Data With Altilia

Azure Data Lake Analytics

Big & Fast Data Analytics. Event Analytics for Production Surveillance and Machine Management. Michael O Connell, PhD Chief Data Scientist TIBCO

SAS Fraud Framework for Banking

<no narration for this slide>

Moving From Hadoop to Spark

Salesforce.com and MicroStrategy. A functional overview and recommendation for analysis and application development

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

Data Integration Checklist

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

Worldwide Advanced and Predictive Analytics Software Market Shares, 2014: The Rise of the Long Tail

The Internet of Things and Big Data: Intro

Unified Batch & Stream Processing Platform

Direct-to-Company Feedback Implementations

How To Make Data Streaming A Real Time Intelligence

Making big data simple with Databricks

Databricks. A Primer

Oracle Business Intelligence EE. Prab h akar A lu ri

Databricks. A Primer

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices

ANALYTICS CENTER LEARNING PROGRAM

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Self-service BI for big data applications using Apache Drill

SAP Predictive Analytics Roadmap Charles Gadalla SAP SESSION CODE: #####

Actionable Knowledge from Refined Data with Microsoft Business Intelligence

Hadoop Ecosystem B Y R A H I M A.

Izenda & SQL Server Reporting Services

Power BI as a Self-Service BI Platform:

Safe Harbor Statement

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

Advanced In-Database Analytics

Find the Hidden Signal in Market Data Noise

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture

Introducing the Reimagined Power BI Platform. Jen Underwood, Microsoft

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Actian SQL in Hadoop Buyer s Guide

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

From Spark to Ignition:

BIG DATA What it is and how to use?

This Symposium brought to you by

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

How To Create A Business Intelligence (Bi)

How SAP Business Intelligence Solutions provide real-time insight into your organization

VIEWPOINT. High Performance Analytics. Industry Context and Trends

MicroStrategy Course Catalog

Converging Technologies: Real-Time Business Intelligence and Big Data

HROUG. The future of Business Intelligence & Enterprise Performance Management. Rovinj October 18, 2007

Integrating a Big Data Platform into Government:

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Transcription:

April 2016 JPoint Moscow, Russia How to Apply Big Data Analytics and Machine Learning to Real Time Processing Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de LinkedIn / Xing Please connect!

Copyright 2000-2016 TIBCO Software Inc. Analyse and Act on Critical Business Moments 2

Key Take-Aways Insights are hidden in Historical Data on Big Data Platforms Machine Learning and Big Data Analytics find these Insights by building Analytics Models Event Processing uses these Models (without Rebuilding) to take Action in Real Time

Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Analysis of Historical Data 3) Real Time Processing 4) Live Demo 4

Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Analysis of Historical Data 3) Real Time Processing 4) Live Demo 5

Copyright 2000-2016 TIBCO Software Inc. Machine Learning Machine learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look. http://www.sas.com 6

Copyright 2000-2016 TIBCO Software Inc. 10 Examples of Machine Learning Spam Detection Credit Card Fraud Detection Digit Recognition Speech Understanding Face Detection Shape Detection Product Recommendation Medical Diagnosis Stock Trading Customer Segmentation http://machinelearningmastery.com/practical-machine-learning-problems/ 7

Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Analytics Self-service Visual Analytics Dashboards Advanced Analytics Event Processing Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Immediate Value to the Organization Long-Term Competitive Advantage A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases 8

Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Analytics Visual Analytics Advanced Analytics Event Processing Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Immediate Value to the Organization Long-Term Competitive Advantage A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases 9

Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Analytics Self-service Visual Analytics Dashboards Advanced Analytics Event Processing Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Immediate Value to the Organization Long-Term Competitive Advantage A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases 10

Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Analysis of Historical Data 3) Real Time Processing 4) Live Demo 11

Copyright 2000-2016 TIBCO Software Inc. Analytical Pipeline 12

Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Analytics Self-service Visual Analytics Dashboards Advanced Analytics Event Processing Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Immediate Value to the Organization Long-Term Competitive Advantage A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases 13

Copyright 2000-2016 TIBCO Software Inc. Analytical Pipeline 14

Data Acquisition Copyright 2000-2016 TIBCO Software Inc.

Copyright 2000-2016 TIBCO Software Inc. Analytical Pipeline 16

Data Munging / Wrangling / Mash-up Copyright 2000-2016 TIBCO Software Inc.

Copyright 2000-2016 TIBCO Software Inc. Data Munging - Transformations cust_id dept sku dollar gift date 1 104 C 12003 2.40 FALSE 2016-10-17 2 105 A 12005 62.85 FALSE 2016-10-17 3 102 C 12007 69.23 TRUE 2016-10-17 4 104 B 12004 9.33 FALSE 2016-10-18 5 105 C 12010 14.16 TRUE 2016-10-18 6 101 B 12003 90.43 FALSE 2016-10-19 7 103 C 12005 90.97 FALSE 2016-10-19 n cust_id A B C total # orders first_date last_date 1 100 21.76 23.67 0.00 45.43 2 2016-10-19 2016-10-20 2 101 0.01 74.65 0.00 74.66 3 2016-10-19 2016-10-20 3 102 0.00 60.92 50.29 111.21 6 2016-10-17 2016-10-20 4 103 0.00 0.00 52.30 52.30 2 2016-10-19 2016-10-20 5 104 31.34 9.33 2.40 43.06 4 2016-10-17 2016-10-20 6 105 62.85 0.00 56.00 118.85 3 2016-10-17 2016-10-20

Copyright 2000-2016 TIBCO Software Inc. Analytical Pipeline 19

Exploratory Data Analysis Copyright 2000-2016 TIBCO Software Inc.

Copyright 2000-2016 TIBCO Software Inc. Exploratory Data Analysis The greatest value of a picture is when it forces us to notice what we never expected to see John W. Tukey, 1977

Visual Analytics - Interactive Brush-Linked Copyright 2000-2016 TIBCO Software Inc.

Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Analytics Visual Analytics Advanced Analytics Event Processing Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Immediate Value to the Organization Long-Term Competitive Advantage A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases 23

Copyright 2000-2016 TIBCO Software Inc. Analytical Pipeline 24

Copyright 2000-2016 TIBCO Software Inc. Which picture represents a model? A model is a simplification of the truth that helps you with decision making.

Copyright 2000-2016 TIBCO Software Inc. Model Building Supervised Models known, labeled responses Regression (for example Linear Regression) Categorical (for example Random Forest) Unsupervised Models no labeled responses Clustering (for example k-means clustering)

Model Building Copyright 2000-2016 TIBCO Software Inc.

Model Building Employees who write longer emails earn higher salaries! Copyright 2000-2016 TIBCO Software Inc.

Model Improvement Copyright 2000-2016 TIBCO Software Inc.

Model Improvement Managers Staff Copyright 2000-2016 TIBCO Software Inc.

Copyright 2000-2016 TIBCO Software Inc. Analytical Pipeline 31

Copyright 2000-2016 TIBCO Software Inc. Model Validation How is the IQ of a kid related to the IQ of his / her mum?

What tools do Data Scientists use? Copyright 2000-2016 TIBCO Software Inc.

Copyright 2000-2016 TIBCO Software Inc. Alternatives for Data Scientists (no complete list) Tooling Open Source Closed Source R Source Code 34

R Language R is well known as the most and increasingly getting more popular programming language used by data scientists for modeling. It is developing very rapidly with a very active community. Copyright 2000-2016 TIBCO Software Inc.

R with Revolution Analytics (now Microsoft) Open Source GPL License (including its restrictions) http://www.revolutionanalytics.com/webinars/introducing-revolution-r-open-enhanced-open-source-r-distribution-revolution-analytics Copyright 2000-2016 TIBCO Software Inc.

TERR - TIBCO s Enterprise Runtime for R TIBCO has rewritten R as a Commercial Compute Engine Latest statistics scripting engine: S a S-PLUS a R a TERR Runs R code including CRAN packages Engine internals rebuilt from scratch at low-level Redesigned data objects, memory management High performance + Big Data TERR is licensed from TIBCO TERR Installs (free) with Spotfire Analyst / Desktop + other TIBCO products Spotfire Server can manage all TERR / R scripts, artifacts for reuse Standalone Developer Edition Supported by TIBCO No GPL license issues Copyright 2000-2016 TIBCO Software Inc.

Spark MLlib MLlib is Spark s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs. You can even combine Mllib module with R language Copyright 2000-2016 TIBCO Software Inc.

Copyright 2000-2016 TIBCO Software Inc. H20 An Extensible Open Source Platform for Analytics Best of Breed Open Source Technology Easy-to-use WebUI and Familiar Interfaces Data Agnostic Support for all Common Database and File Types Massively Scalable Big Data Analysis Real-time Data Scoring (Nanofast Scoring Engine) http://www.h2o.ai/

TIBCO Spotfire with R / TERR Integration Let the business user leverage Analytic Models (created by the Data Scientist) to find insights! Example: Customer Churn with Random Forest Algorithm refresh model button lives a random forest algorithm requires no a priori assumptions at all, it just always works The business user doesn t need to know what random forest is to be empowered by it Select variables for the model Copyright 2000-2016 TIBCO Software Inc.

TIBCO Spotfire with H2O Integration Example: Predictive Analytics for Manufacturing ( scrap parts as early as possible ) Copyright 2000-2016 TIBCO Software Inc.

TIBCO Spotfire with H2O Integration Example: Predictive Analytics for Manufacturing ( scrap parts as early as possible ) Copyright 2000-2016 TIBCO Software Inc.

SaaS Machine Learning Managed SaaS service for building ML models and generating predictions Integrated into the corresponding cloud ecosystem Easy to use, but limited feature set and potential latency issues if combined with external data or applications Copyright 2000-2016 TIBCO Software Inc. http://docs.aws.amazon.com/machine-learning/latest/dg/tutorial.html

Copyright 2000-2016 TIBCO Software Inc. PMML (Predictive Model Markup Language ) XML-based de facto standard to represent predictive analytic models Developed by the Data Mining Group (DMG) Easily share models between PMML compliant applications (e.g. between model creation and deployment for operations) http://www.ibm.com/developerworks/library/ba-ind-pmml1/

Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Analysis of Historical Data 3) Real Time Processing 4) Live Demo 45

Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Analytics Self-service Visual Analytics Dashboards Advanced Analytics Event Processing Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Immediate Value to the Organization Long-Term Competitive Advantage A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases 46

Copyright 2000-2016 TIBCO Software Inc. Streaming Analytics Event Streams time 1 2 3 4 5 6 7 8 9 Continuous Queries Sliding Windows Filter Aggregation Correlation

Copyright 2000-2016 TIBCO Software Inc. Operational Intelligence in Action Machine-to-Machine Automation Automated action based on models of history combined with live context and business rules Actions by Operations Human decisions in real time informed by up to date information The Challenge: Create, understand, and deploy algorithms & rules that automate key business reactions 48 The Challenge: Empower operations staff to see and seize key business moments

Copyright 2000-2016 TIBCO Software Inc. Alternatives for Stream Processing (no complete list!) PRODUCT OPEN SOURCE CLOSED SOURCE Azure Microsoft Stream Analytics FRAMEWORK 49

Copyright 2000-2016 TIBCO Software Inc. Comparison of Stream Processing Frameworks and Products Slide Deck from JavaOne 2016: http://www.kai-waehner.de/blog/2016/10/25/comparison-of-stream-processing-frameworks-and-products/ 50

Copyright 2000-2016 TIBCO Software Inc. Visual Development of Streaming Analytics Streaming Operators Connectivity Visual Development Testing & Simulation Mature Tooling / Support Middleware Integration

Live Datamart Ad-hoc continuous query Alerts Dynamic aggregation Action Live visualization 52

How to apply analytic models to real time processing without rebuilding them? Copyright 2000-2016 TIBCO Software Inc.

Real Time Close Loop: Understand Anticipate Act Streaming Analytics to operationalize insights and patterns in real time without rebuilding the models TERR Spark MLlib MATLAB Open Source R SAS H20 Stream Processing PMML

TIBCO StreamBase + R / TERR

TIBCO StreamBase + H20

TIBCO StreamBase + PMML

Closed Loop: Automatically Recompute (and Improve) the Analytic Model Compute your performance metric Spot not good enough performance Recompute model

Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Analysis of Historical Data 3) Real Time Processing 4) Live Demo 59

An outage on one well can cost $10M per hour. We have 20-100 outages per year. - Drilling operations VP, major oil company Copyright 2000-2013 TIBCO Software Inc.

Predictive Analytics (Fault Management) Temporal analytic: If vibration spike is followed by temp spike then voltage spike [within 4 hours] then flag high severity alert. Voltage Temperature Vibration Device history

Complete Big Data Architecture SENSOR DATA TRANSACTIONS Action Streaming Event Server Analytics Aggregate Correlate Analytics Live Monitoring Continuous query processing Alerts Operational Analytics MESSAGE BUS MACHINE DATA Integration Bus Rules Stream Processing Manual action, escalation Live UI Operations SOCIAL DATA Internal Data Cleansed Data History Data Storage Big Data HISTORICAL ANALYSIS Data Discovery Data Sheets BI Machine Learning Data Scientists API Enterprise Service Bus SOA ERP MDM DB WMS

Demo Environment CSV Batch JSON Real Time Action Streaming StreamBase Analytics Aggregate Correlate Analytics Live Datamart Continuous query processing Alerts Operational Analytics XML Real Time Rules Manual action, escalation Live UI Operations Internal Data Flume HDFS HDFS HISTORICAL ANALYSIS R / TERR Data Scientists H2O Oracle RDBMS Avro Parquet PMML Hadoop (Cloudera) Spotfire TIBCO Fast Data Platform

Live Demo TIBCO Spotfire + StreamBase + TERR + Live Datamart

Key Take-Aways Insights are hidden in Historical Data on Big Data Platforms Machine Learning and Big Data Analytics find these Insights by building Analytics Models Event Processing uses these Models (without Rebuilding) to take Action in Real Time

Questions? Please contact me! Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de LinkedIn / Xing Please connect!