Operationalising Predictive Insights To Impact the Bottom Line Ali Rahim Advanced Analytics Product Manager
Agenda 2 1. Predictive Analytics 2. Why RStat 3. Step through the Predictive process 4. RStat Roadmap 5. Operationalise RStat Results
Predictive Analytics Overview 3
WHAT IS PREDICTIVE ANALYTICS? Predictive Analytics (PA) helps one to Discover/understand what s going on Predict what s going to happen Improve overall decision making Improve business processes Create a competitive edge! Predictive Analytics IS a key business process Learning from experience User-centric, interactive Leverages analysis technologies and computing power An information-based approach to decision making Results are mainly used in a forward-looking style Historical and current transactional data analysis to identify risks and opportunities about future, or otherwise unknown, events
Copyright 2007, Information Predictive Analytics vs. Business Intelligence Business Intelligence User driven Rear view Manual methods All attributes are equally important Reportable info Experience-driven Predictive Analytics Data driven Forward view Automated methods A few attributes are the keys Actionable info Data-driven
6 PREDICTIVE ANALYTICS SUMMARY Sample Predictive Applications in Organizations used for: Reduce marketing/operational costs Increase sales Improve cross-sell/up-sell campaigns Increase retention/loyalty Detect and prevent fraud Identify credit risks Acquire new customers ROI is realized when: Decision-making is improved with forward-looking views of likely behavior Results are widely-distributed to end users where decisions are made
7 Misconceptions of Predictive Analytics 1. It s new. It actually has bee around since 1930 when Fisher and Durand created the first credit score model. 2. Produces perfect prediction. That depends on the data, and models are estimates. 3. Push-button solutions. Tools cannot provide everything; mentors should select the technique based on business context. 4. Build it and forget it. All models depend on the data that is provided. Data also has cutoff time periods; for this reason, models can get outdated. A refresh is required but varies by customers, industry and business case.
Why RStat? 8
WEBFOCUS ANALYTIC ENVIRONMENT We provide an integrated set of components to address predictive analytics 2 Process 3 Components
WEBFOCUS ANALYTIC ENVIRONMENT Copyright 2007, Information WebFOCUS - Data Access and Preparation (usually the bulk of an Predictive Analytics implementation project) Data Access Native access to 300 data sources no requirement to move all data to the warehouse or to build a warehouse Data Preparation Merging, filtering, aggregating, deriving, transforming, sampling, improving data quality Good predictive models require complete and relevant data
WEBFOCUS ANALYTIC ENVIRONMENT WebFOCUS RStat Predictive Analytics GUI approach to predictive model building, no code or syntax required Variety of techniques to discover patterns in historical data A model is trained to predict future behaviors and is consumed by WebFOCUS end user applications The R language can be used to supplement RStat as needed Commonly used Algorithms Regression. Decision Tree, Survival, Market Basket Analysis Variety of evaluation techniques to test models before deployment Accuracy, Lift, ROC, Predicted vs. Observed
PREDICTIVE MODEL DEVELOPMENT COMPONENTS WEBFOCUS DEVELOPMENT IDE AND RSTAT
WebFOCUS RStat Value Proposition Integrated Platform Integrated with WebFOCUS BI Platform Allows for easy data access and data preparation Deploys results to non-technical, business end users automatically Single server for BI and PA, eliminating additional costs Low Total Cost of Ownership Based on open-source R statistical language R language is not required for deployment User-friendly interface - Advanced analytics without coding or syntax Good exploratory and graphing capabilities - most commonly used predictive and exploratory modeling Extends very broadly with R package 6500 packaged extensions provides instant access to more models Quick Time to Market Openness, low TCO and usability combine for a quick time to market and high value for our customers
COMPREHENSIVE ANALYTIC FUNCTIONALITY DATA PREPARATION AND EXPLORATORY CAPABILITIES Train and Test partitioning of the data Sample seed for replication or revision of partitions Radio buttons for defining variables roles Input, target, ID, ignore Descriptive statistics: Summary statistics, distributions, correlations Variable reduction via principal components analysis Visualizations of Box, bar, dot plots, histograms, benford and mosaic charts Hypothesis Testing: T-Tests, F-tests Data Transformations Normalizations, missing value imputation, binning, cleanup
COMPREHENSIVE ANALYTIC FUNCTIONALITY MODEL BUILDING AND MODEL TESTING CAPABILITIES Supervised modeling techniques for classification and prediction Decision Trees Boosting Random Forests Regression Linear, GLM, Logistic, Poisson and Multinomial Support Vector Machines Feed Forward Neural Network Survival Analysis Cox PH and Parametric Unsupervised modeling techniques for exploratory work Clustering K-means and Hierarchical clustering for grouping records Association Rules - apriori algorithm for finding co-occurrences of items Model evaluation techniques Error matrix, risk chart, lift chart, ROC Curve, precision and sensitivity charts, predicted vs. observed charts
WEBFOCUS ANALYTIC ENVIRONMENT WebFOCUS - Delivering results to end users for decision making Information Delivery Dashboards, core reporting, charts, scorecards, maps, queries, active reports, OLAP, mobile, feeding a downstream system Consumption of the predictive results as a scoring function (derived fields) WebFOCUS seamlessly consumes and delivers the predicted results into the end user application in any form needed End User-friendly output
Step through the Predictive process 17
Current State Business Dilemma Throwing the Net out there Contact (marketing) 300k prospects Customer Acquisition Cost: $100 per Average Profit per response is $250 Response: a little over 29% of prospects receiving catalog Under these conditions, a catalog campaign will not be profitable - with a little math, actually lose $8.25million. # of Prospects 300,000.00 Cost per Prospect $ 100.00 Total Cost for Acquisition $ 30,000,000.00 Sales $ per response $ 250.00 % Responsed 29% Total Responses 87,000.00 Total Sales $ 21,750,000.00
With Predicted Model BULLSEYE Target Marketing the Prospects Target 20% of most likely responders based on model output. Likely to make $9million profit A realized gain $17.25 million # of Targeted Prospects 60,000.00 Cost per Prospect $ 100.00 Total Cost for Acquisition $ 6,000,000.00 Sales $ per response $ 250.00 % Responsed 100% Total Responses 60,000.00 Total Sales $ 15,000,000.00 Total Profit $ (9,000,000.00) Bottom Line Impact $ 17,250,000.00
RStat ROADMAP 20
Q4 2014 - RSTAT 1.6 RELEASE OCT. 2014 Built on R 3.0.3-32bit and 64bit R Random Forest C file export return class or regression values Common use is Variable Selection, Classification, or Regression Ada Boost C file export return class or probability for binary trees MODEL Tab: Model selection paradigm changed to handle multiple models simultaneously Cross Tab - a contingency table for cross-classifying factors Enhanced Correlation graphics with advanced graphics option Chi Square GUI: Independence Test and Goodness of fit test - Deals with nominal level data
22 NEW ADDITIONS IN RSTAT 1.6 EWKM: Entropy Weighted KMeans functionality included Regression Plots: outliers labeled, Quantile plot of residuals, Res vs Leverage shows undue influence on Reg. Density Plot: run a graphical representation for Kurtosis & Skewness Association Rules Plot: plot a graph representing the rules
2015-2016 - ROADMAP RStat integration with InfoAssist and InfoDiscovery RStat 2.0 SVM C routine completion of RStat models C TimeSeries Forecasting Arima, Regression, Exponential Smoothing Bayesian Networks represent causality maps linking measured and target variables FPC Cluster Boot to test stability of cluster Data Partitioning Train, Test, Validate Numeric predictions intervals For numeric prediction for example value is 150, calculate and show range (+/- CI) Anomaly detection technique Confirmatory Factor Analysis RStat Adapter Web Console deployments
2016 - INNOVATIONS Web Based RStat GUI (RStat Web BETA) Web Based Model Development Algorithms to included: Decision Tree, Regression Models, Clustering, TimeSeries Data Wrangling Machine Learning for InfoDiscovery Clustering (KMeans and EWKM) ARIMA forecasting
25 Operationalise RStat Results Applications of Predictive Analytics Displaying Predictive Output
Horizontal Applications of Predictive Analytics Marketing / CRM Offer and promotion targeting Customer segmentation Improve response rates Cross and up-selling Customer retention Reduce campaign costs Predict customer life time value Customer acquisition Fraud / Risk Fraud detection and prevention Credit risk Collections and recovery Patient outcomes Claims analysis Process improvement Quality improvement Warranty analysis Time to failure Resource allocation Demand forecasting
27 Business Initiatives That Predictive Analytics Can Address Sales, Marketing and CRM It s very expensive to acquire new customers, there must be a better way If I understood who my best customers are, I could target more like that I wish I knew which of my customers were interested in offers, instead of offering all products to all customers Response rates to our campaigns are low and declining, how can we better target our customers? I wish I knew which customers were most likely to churn so I could retain them How can I provide better service to my customers by understanding their needs and guide my interactions?
28 Business Initiatives That Predictive Analytics Can Address Fraud How can I predict fraudulent activity and at the same time avoid investigating 100% of my data? Risk I want to approve and price my prospects for insurance coverage appropriately I want to approve my prospects for loans or credit to maximize profit and minimize my risk Process Improvement How can I use my process data to uncover the root cause of defects? How can I better predict the time until some event (failure, attrition, churn) occurs?
29 Financial Services Applications of Predictive Analytics Growth Acquisition targeting Organic growth Cross selling, up selling, retention (churn) Promotion targeting Who to target, which offer, which channel, what time Customer segmentation Groupings of like customers Predicting customer lifetime value Profitability Inter-department analysis of promoting products to low-risk customers Collections and recovery Managing risk Credit approvals Predicting credit risk Anti-money laundering Fraud detection / prevention
Insurance Applications of PA Growth Acquisition targeting Organic growth Cross selling, up selling, retention (churn) Customer segmentation Groupings of like customers Predicting customer lifetime value Price optimization Profitability Inter-department analysis of promoting products to low-risk customers Managing risk Pricing / underwriting of policies Predicting claim risk and severity Fraudulent claim detection / prevention Claims processing Claim to agent routing Fast tracking claims Subrogation modeling Early total loss 30
31 Healthcare Applications of Predictive Analytics Patient care Predict which patients will develop chronic conditions Predict which patients will respond to which treatments Predict overall survival based on treatments Predict numerous lengths of time In the hospital In intensive care In recovery Operations Predict likely re-admissions Fraud detection and prevention Predict patient volumes Optimize the Master Schedule
32 Retail Applications of Predictive Analytics Growth Acquisition targeting Organic growth Cross selling, up selling, loyalty programs Customer segmentation Groupings of like customers Promotion targeting Who to target, which offer, which channel, what time Price Optimization Product demand predictions (supply chain) Fraud (Shrinkage) Prevention If customers are tracked (loyalty card), then this is done at a customer level If transactions are anonymous, then the results are deployed at an aggregated level (store, territory) Item placement on shelves, weekly flyer arrangement, displays
33 Higher Education Applications of Predictive Analytics Enrollment goals Who to admit Who will enroll Student segmentation Groupings of like students Student performance Who is at risk of dropping out Graduation rates Align programs with students needs Funding programs Targeting alumni for donations
34 Manufacturing Applications of PA Quality improvements Predicting the time to failure Root cause analysis of defects Warranty analysis Determine the length of a warranty to place on a part or system? Scrap or repair disposition for parts Predicting product demand Inventory management Predicting machine maintenance
WebFOCUS RStat Predictive Churn Dashboard
WebFOCUS Dashboard Displaying Predictive Output Active report and graphical output of predicted patient volumes for Healthcare 36
WebFOCUS Dashboard Displaying Predictive Output Graphical output of predicted defaults and non defaults of loans 37
WebFOCUS Dashboard Displaying Predictive Output Active report of product affinities and recommendations for cross selling 38
WebFOCUS Dashboard Displaying Predictive Output Active report and graphical output of predicted students at risk 39
WebFOCUS Dashboard Displaying Predictive Output Active report and graphical output of predicted failures of manufactured parts 40
Predictive Crime Analytics
WebFOCUS Dashboard Displaying Predictive Output GIS, active report and graphical output of predicted responses to a marketing campaign
3i Health Solutions Predictive Readmissions Performance Predict and Score Patients Most Likely to Readmit based on Previous Clinical History based on Demographics, Consumer Habits and Clinical Factors
Predictive Application Displays value, churn likelihood and offers for inbound and outbound targeting 44
OUTPUT FORMAT AND ANALYSIS
Extending WebFOCUS Predictive Platform Leveraging R - R-Script + WF Dialog Manager
SCIENTIFIC QUERY DATA PREP, DATA WRANGLE, STATISTICAL SCRIPTS, VISUALIZATION AND OUTPUT DISTRIBUTION
SCIENTIFIC QUERY DATA PREP, DATA WRANGLE, STATISTICAL SCRIPTS, VISUALIZATION AND OUTPUT DISTRIBUTION
WRAP-UP Thank you for your time today! For additional information or if you have any questions, please contact your local Information Builders Account Executive
RStat Screenshots Churn Demo 50
Historical Report of Telco Customers Demographic, Account, Sentiment and Churn Data
Launch RStat Assign the attribute roles in the data tab
Explore Tab Summary and descriptive statistics of the data set
Explore Tab Histogram and bar chart for visualizations
Explore Tab Correlation analysis with visualization
Test Tab T-test of the average sentiment score for churners vs. non-churners
Transform Tab Revise the data via rescale, impute, recode and cleanup
Cluster Tab KMeans and Hierarchical algorithms for clustering
Associate Tab Discovers affinity rules for co-occurrence of items
Model Tab Decision Tree output to predict churn, sentiment data excluded
Model Tab Decision Tree output to predict churn, sentiment data included
Evaluate Tab Displays model accuracy; model with sentiment data performs much better Model error rate = 44% without sentiment data Model error rate = 7% with sentiment data
Model Deployment Map the attributes to the model parameters required
Model Deployment Finished computed model scoring attribute Churn Prediction
Deployed Scoring Model for Customer Churn Report displays customers predicted to churn and a likelihood