April 2016 JPoint Moscow, Russia How to Apply Big Data Analytics and Machine Learning to Real Time Processing Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de LinkedIn / Xing Please connect!
Copyright 2000-2016 TIBCO Software Inc. Analyse and Act on Critical Business Moments 2
Key Take-Aways Insights are hidden in Historical Data on Big Data Platforms Machine Learning and Big Data Analytics find these Insights by building Analytics Models Event Processing uses these Models (without Rebuilding) to take Action in Real Time
Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Analysis of Historical Data 3) Real Time Processing 4) Live Demo 4
Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Analysis of Historical Data 3) Real Time Processing 4) Live Demo 5
Copyright 2000-2016 TIBCO Software Inc. Machine Learning Machine learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look. http://www.sas.com 6
Copyright 2000-2016 TIBCO Software Inc. 10 Examples of Machine Learning Spam Detection Credit Card Fraud Detection Digit Recognition Speech Understanding Face Detection Shape Detection Product Recommendation Medical Diagnosis Stock Trading Customer Segmentation http://machinelearningmastery.com/practical-machine-learning-problems/ 7
Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Analytics Self-service Visual Analytics Dashboards Advanced Analytics Event Processing Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Immediate Value to the Organization Long-Term Competitive Advantage A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases 8
Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Analytics Visual Analytics Advanced Analytics Event Processing Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Immediate Value to the Organization Long-Term Competitive Advantage A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases 9
Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Analytics Self-service Visual Analytics Dashboards Advanced Analytics Event Processing Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Immediate Value to the Organization Long-Term Competitive Advantage A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases 10
Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Analysis of Historical Data 3) Real Time Processing 4) Live Demo 11
Copyright 2000-2016 TIBCO Software Inc. Analytical Pipeline 12
Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Analytics Self-service Visual Analytics Dashboards Advanced Analytics Event Processing Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Immediate Value to the Organization Long-Term Competitive Advantage A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases 13
Copyright 2000-2016 TIBCO Software Inc. Analytical Pipeline 14
Data Acquisition Copyright 2000-2016 TIBCO Software Inc.
Copyright 2000-2016 TIBCO Software Inc. Analytical Pipeline 16
Data Munging / Wrangling / Mash-up Copyright 2000-2016 TIBCO Software Inc.
Copyright 2000-2016 TIBCO Software Inc. Data Munging - Transformations cust_id dept sku dollar gift date 1 104 C 12003 2.40 FALSE 2016-10-17 2 105 A 12005 62.85 FALSE 2016-10-17 3 102 C 12007 69.23 TRUE 2016-10-17 4 104 B 12004 9.33 FALSE 2016-10-18 5 105 C 12010 14.16 TRUE 2016-10-18 6 101 B 12003 90.43 FALSE 2016-10-19 7 103 C 12005 90.97 FALSE 2016-10-19 n cust_id A B C total # orders first_date last_date 1 100 21.76 23.67 0.00 45.43 2 2016-10-19 2016-10-20 2 101 0.01 74.65 0.00 74.66 3 2016-10-19 2016-10-20 3 102 0.00 60.92 50.29 111.21 6 2016-10-17 2016-10-20 4 103 0.00 0.00 52.30 52.30 2 2016-10-19 2016-10-20 5 104 31.34 9.33 2.40 43.06 4 2016-10-17 2016-10-20 6 105 62.85 0.00 56.00 118.85 3 2016-10-17 2016-10-20
Copyright 2000-2016 TIBCO Software Inc. Analytical Pipeline 19
Exploratory Data Analysis Copyright 2000-2016 TIBCO Software Inc.
Copyright 2000-2016 TIBCO Software Inc. Exploratory Data Analysis The greatest value of a picture is when it forces us to notice what we never expected to see John W. Tukey, 1977
Visual Analytics - Interactive Brush-Linked Copyright 2000-2016 TIBCO Software Inc.
Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Analytics Visual Analytics Advanced Analytics Event Processing Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Immediate Value to the Organization Long-Term Competitive Advantage A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases 23
Copyright 2000-2016 TIBCO Software Inc. Analytical Pipeline 24
Copyright 2000-2016 TIBCO Software Inc. Which picture represents a model? A model is a simplification of the truth that helps you with decision making.
Copyright 2000-2016 TIBCO Software Inc. Model Building Supervised Models known, labeled responses Regression (for example Linear Regression) Categorical (for example Random Forest) Unsupervised Models no labeled responses Clustering (for example k-means clustering)
Model Building Copyright 2000-2016 TIBCO Software Inc.
Model Building Employees who write longer emails earn higher salaries! Copyright 2000-2016 TIBCO Software Inc.
Model Improvement Copyright 2000-2016 TIBCO Software Inc.
Model Improvement Managers Staff Copyright 2000-2016 TIBCO Software Inc.
Copyright 2000-2016 TIBCO Software Inc. Analytical Pipeline 31
Copyright 2000-2016 TIBCO Software Inc. Model Validation How is the IQ of a kid related to the IQ of his / her mum?
What tools do Data Scientists use? Copyright 2000-2016 TIBCO Software Inc.
Copyright 2000-2016 TIBCO Software Inc. Alternatives for Data Scientists (no complete list) Tooling Open Source Closed Source R Source Code 34
R Language R is well known as the most and increasingly getting more popular programming language used by data scientists for modeling. It is developing very rapidly with a very active community. Copyright 2000-2016 TIBCO Software Inc.
R with Revolution Analytics (now Microsoft) Open Source GPL License (including its restrictions) http://www.revolutionanalytics.com/webinars/introducing-revolution-r-open-enhanced-open-source-r-distribution-revolution-analytics Copyright 2000-2016 TIBCO Software Inc.
TERR - TIBCO s Enterprise Runtime for R TIBCO has rewritten R as a Commercial Compute Engine Latest statistics scripting engine: S a S-PLUS a R a TERR Runs R code including CRAN packages Engine internals rebuilt from scratch at low-level Redesigned data objects, memory management High performance + Big Data TERR is licensed from TIBCO TERR Installs (free) with Spotfire Analyst / Desktop + other TIBCO products Spotfire Server can manage all TERR / R scripts, artifacts for reuse Standalone Developer Edition Supported by TIBCO No GPL license issues Copyright 2000-2016 TIBCO Software Inc.
Spark MLlib MLlib is Spark s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs. You can even combine Mllib module with R language Copyright 2000-2016 TIBCO Software Inc.
Copyright 2000-2016 TIBCO Software Inc. H20 An Extensible Open Source Platform for Analytics Best of Breed Open Source Technology Easy-to-use WebUI and Familiar Interfaces Data Agnostic Support for all Common Database and File Types Massively Scalable Big Data Analysis Real-time Data Scoring (Nanofast Scoring Engine) http://www.h2o.ai/
TIBCO Spotfire with R / TERR Integration Let the business user leverage Analytic Models (created by the Data Scientist) to find insights! Example: Customer Churn with Random Forest Algorithm refresh model button lives a random forest algorithm requires no a priori assumptions at all, it just always works The business user doesn t need to know what random forest is to be empowered by it Select variables for the model Copyright 2000-2016 TIBCO Software Inc.
TIBCO Spotfire with H2O Integration Example: Predictive Analytics for Manufacturing ( scrap parts as early as possible ) Copyright 2000-2016 TIBCO Software Inc.
TIBCO Spotfire with H2O Integration Example: Predictive Analytics for Manufacturing ( scrap parts as early as possible ) Copyright 2000-2016 TIBCO Software Inc.
SaaS Machine Learning Managed SaaS service for building ML models and generating predictions Integrated into the corresponding cloud ecosystem Easy to use, but limited feature set and potential latency issues if combined with external data or applications Copyright 2000-2016 TIBCO Software Inc. http://docs.aws.amazon.com/machine-learning/latest/dg/tutorial.html
Copyright 2000-2016 TIBCO Software Inc. PMML (Predictive Model Markup Language ) XML-based de facto standard to represent predictive analytic models Developed by the Data Mining Group (DMG) Easily share models between PMML compliant applications (e.g. between model creation and deployment for operations) http://www.ibm.com/developerworks/library/ba-ind-pmml1/
Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Analysis of Historical Data 3) Real Time Processing 4) Live Demo 45
Copyright 2000-2016 TIBCO Software Inc. Analytics Maturity Model Analytics Self-service Visual Analytics Dashboards Advanced Analytics Event Processing Measure Diagnose Predict Optimize Alert Automate Analytics Maturity Immediate Value to the Organization Long-Term Competitive Advantage A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases 46
Copyright 2000-2016 TIBCO Software Inc. Streaming Analytics Event Streams time 1 2 3 4 5 6 7 8 9 Continuous Queries Sliding Windows Filter Aggregation Correlation
Copyright 2000-2016 TIBCO Software Inc. Operational Intelligence in Action Machine-to-Machine Automation Automated action based on models of history combined with live context and business rules Actions by Operations Human decisions in real time informed by up to date information The Challenge: Create, understand, and deploy algorithms & rules that automate key business reactions 48 The Challenge: Empower operations staff to see and seize key business moments
Copyright 2000-2016 TIBCO Software Inc. Alternatives for Stream Processing (no complete list!) PRODUCT OPEN SOURCE CLOSED SOURCE Azure Microsoft Stream Analytics FRAMEWORK 49
Copyright 2000-2016 TIBCO Software Inc. Comparison of Stream Processing Frameworks and Products Slide Deck from JavaOne 2016: http://www.kai-waehner.de/blog/2016/10/25/comparison-of-stream-processing-frameworks-and-products/ 50
Copyright 2000-2016 TIBCO Software Inc. Visual Development of Streaming Analytics Streaming Operators Connectivity Visual Development Testing & Simulation Mature Tooling / Support Middleware Integration
Live Datamart Ad-hoc continuous query Alerts Dynamic aggregation Action Live visualization 52
How to apply analytic models to real time processing without rebuilding them? Copyright 2000-2016 TIBCO Software Inc.
Real Time Close Loop: Understand Anticipate Act Streaming Analytics to operationalize insights and patterns in real time without rebuilding the models TERR Spark MLlib MATLAB Open Source R SAS H20 Stream Processing PMML
TIBCO StreamBase + R / TERR
TIBCO StreamBase + H20
TIBCO StreamBase + PMML
Closed Loop: Automatically Recompute (and Improve) the Analytic Model Compute your performance metric Spot not good enough performance Recompute model
Copyright 2000-2016 TIBCO Software Inc. Agenda 1) Machine Learning and Big Data Analytics 2) Analysis of Historical Data 3) Real Time Processing 4) Live Demo 59
An outage on one well can cost $10M per hour. We have 20-100 outages per year. - Drilling operations VP, major oil company Copyright 2000-2013 TIBCO Software Inc.
Predictive Analytics (Fault Management) Temporal analytic: If vibration spike is followed by temp spike then voltage spike [within 4 hours] then flag high severity alert. Voltage Temperature Vibration Device history
Complete Big Data Architecture SENSOR DATA TRANSACTIONS Action Streaming Event Server Analytics Aggregate Correlate Analytics Live Monitoring Continuous query processing Alerts Operational Analytics MESSAGE BUS MACHINE DATA Integration Bus Rules Stream Processing Manual action, escalation Live UI Operations SOCIAL DATA Internal Data Cleansed Data History Data Storage Big Data HISTORICAL ANALYSIS Data Discovery Data Sheets BI Machine Learning Data Scientists API Enterprise Service Bus SOA ERP MDM DB WMS
Demo Environment CSV Batch JSON Real Time Action Streaming StreamBase Analytics Aggregate Correlate Analytics Live Datamart Continuous query processing Alerts Operational Analytics XML Real Time Rules Manual action, escalation Live UI Operations Internal Data Flume HDFS HDFS HISTORICAL ANALYSIS R / TERR Data Scientists H2O Oracle RDBMS Avro Parquet PMML Hadoop (Cloudera) Spotfire TIBCO Fast Data Platform
Live Demo TIBCO Spotfire + StreamBase + TERR + Live Datamart
Key Take-Aways Insights are hidden in Historical Data on Big Data Platforms Machine Learning and Big Data Analytics find these Insights by building Analytics Models Event Processing uses these Models (without Rebuilding) to take Action in Real Time
Questions? Please contact me! Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de LinkedIn / Xing Please connect!