An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

Similar documents
CRISP-DM: The life cicle of a data mining project. KDD Process

Introducing the Reimagined Power BI Platform. Jen Underwood, Microsoft

ANALYTICS CENTER LEARNING PROGRAM

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

BIG DATA & DATA SCIENCE

Data Mining Applications in Higher Education

Big Data and Data Science: Behind the Buzz Words

Banking Analytics Training Program

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts.

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

Predictive Analytics Certificate Program

Easily Identify the Right Customers

CS590D: Data Mining Chris Clifton

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining

Real World Application and Usage of IBM Advanced Analytics Technology

IBM SPSS Direct Marketing

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Introduction to Data Mining

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Product recommendations and promotions (couponing and discounts) Cross-sell and Upsell strategies

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Predictive Analytics for Database Marketing

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

How To Learn To Use Big Data

An Introduction to Advanced Analytics and Data Mining

Database Marketing, Business Intelligence and Knowledge Discovery

Data Mining: Overview. What is Data Mining?

IBM SPSS Direct Marketing 23

IBM SPSS Modeler Professional

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Data Science Certificate Program

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

X Predictive Analytics for Marketing, Reg#255343

KnowledgeSEEKER Marketing Edition

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

IBM SPSS Modeler Premium

Introduction to Data Mining

An In-Depth Look at In-Memory Predictive Analytics for Developers

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

Easily Identify Your Best Customers

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

2015 Workshops for Professors

Data Isn't Everything

Azure Machine Learning, SQL Data Mining and R

Kingdom Big Data & Analytics Summit 28 FEB 1 March 2016 Agenda MASTERCLASS A 28 Feb 2016

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

HIGH PERFORMANCE ANALYTICS FOR TERADATA

Master of Science in Marketing Analytics (MSMA)

An Introduction to Data Mining

MACHINE LEARNING BASICS WITH R

IBM SPSS Direct Marketing 22

The Big Data Deluge: Creating Serious Business Problems. Analytics: Harnessing Big Data Deluge to Acquire Business Power

Better planning and forecasting with IBM Predictive Analytics

Predictive Models for Enhanced Audit Selection: The Texas Audit Scoring System

Prerequisites. Course Outline

Working with telecommunications

Data Mining Techniques

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

7 Steps to Successful Data Blending for Excel

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Starting Smart with Oracle Advanced Analytics

Solve Your Toughest Challenges with Data Mining

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

Data Analytical Framework for Customer Centric Solutions

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Building and Deploying Customer Behavior Models

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices

Three proven methods to achieve a higher ROI from data mining

Pentaho Data Mining Last Modified on January 22, 2007

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

Data UNC. Vinayak Deshpande

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG

PREDICTIVE ANALYTICS DEMYSTIFIED

How To Get More Business From Big Data And Analytics

Analytics and Big Data with the PI System Part 2: Statistical Analytics

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

High-Performance Analytics

Using predictive analytics to maximise the value of charity donors

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

RevoScaleR Speed and Scalability

Advanced Big Data Analytics with R and Hadoop

Planning successful data mining projects

Chapter 7: Data Mining

Transcription:

An Overview of Predictive Analytics for Practitioners Dean Abbott, Abbott Analytics

Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor and manage user created content. Deliver access to all data types across structured and unstructured sources. www.microsoft.com/bi Hortonworks develops, distributes and supports the only 100% Open Source distribution of Apache Hadoop architected, built and tested for enterprise deployments. http://hortonworks.com/ 2

Dean Abbott Co-founder and Chief Data Scientist at SmarterHQ, based in Indianapolis, Indiana President of Abbott Analytics in San Diego, California Internationally recognized data mining and predictive analytics expert with over two decades experience Author of Applied Predictive Analytics (Wiley, 2014), co-author of IBM SPSS Modeler Cookbook (Packt Publishing, 2013). Advisory board and instructor for UC Irvine Predictive Analytics Certificate Program and UC San Diego Data Mining Certificate Program.

Speaker Social Media @deanabb abbottanalytics.blogspot.com/ http://www.linkedin.com/in /deanabbott/ www.abbottanalytics.com/ 4

The Analyst s Journey Gain critical business and data analytics skills Uncover insights and provide value to your organization Put your knowledge to use immediately REGISTER TODAY passbaconference.com

An Overview of Predictive Analytics for Practitioners Dean Abbott, Abbott Analytics

What do Predictive Modelers do? The CRISP-DM Process Model CRoss-Industry Standard Process Model for Data Mining Describes Components of Complete Data Mining Cycle from the Project Manager s Perspective Shows Iterative Nature of Data Mining Deployment Business Understanding Data Data Data Evaluation Data Understanding Data Preparation Modeling 7

CRISP-DM: Business Understanding Steps Ask Relevant Business Questions Define Business Objectives Background Business Objectives Business Success Criteria Determine Data Requirements to Answer Business Question Translate Business Question into Appropriate Data Mining Approach Determine Project Plan for Data Mining Approach Assess Situation Determine Data Mining Objectives Inventory of Resources Data Mining Goals Requirements, Assumptions, Constraints Data Mining Success Criteria Risks and Contingencies Terminology Produce Project Plan Project Plan Initial Assess-ment of Tools & Techniques Costs and Benefits 8

Objective s Business objective: Random test mailing to NRA s house file achieved a 11% response rate Need a model that finds population with a minimum response rate of 13.5% to be profitable Modeling Objectives: Develop a binary outcome model that will rank-order current database based on propensity to respond to traditional mailing, optimizing at a cumulative average response rate of >= 13.5%.

CRISP-DM Step 2: Data Understanding Steps Collect Initial Data Describe Data Explore Data Verify Data Quality Initial Data Collection Report Data Description Report Data Exploration Report Data Quality Report Collect initial data Internal data: historical customer behavior, results from previous experiments External data: demographics & census, other studies and government research Extract superset of data (rows and columns) to be used in modeling Identify form of data repository: multiple vs. single table, flat file vs. database, local copy vs. data mart Perform Preliminary Analysis Characterize Data (describe, explore, verify) Condition Data 10

Source Data Business partner provided data that summarizes transactional data for every active NRA member - 49 independent variables. TN Marketing enhanced the database with demographic data- 18 appended variables. I-Miner was used to derive new variable features and transformations of pre existing data points - 79 derived variables.

CRISP-DM Step 3: Data Preparation (Conditioning) Steps Select Data Rationale for Inclusion/Exclusion Fix Data Problems Clean Data Data Cleaning Report Create Features Construct Data Derived Attributes Generated Records Integrate Data Merged Data Format Data Reformatted Data 12

Data Preparation Key transformations Date Features Filling missing data Use Distribution when possible for numeric fields Use Constant for categoricals For numeric data with both in-house and third-party versions, use in-house when available, and if not, use third party Binning and Binarization Reduce # values if nominal variables with many poorly populated values 13

Data Size Original Data Data after data cleanup and feature creation Data after further cleanup, and adding interaction terms 14

CRISP-DM Step 4: Modeling Steps Algorithm Selection Select Modeling Techniques Modeling Techniques Modeling Assumptions Sampling Generate Test Design Test Design Algorithms Build Model Parameter Settings Models Model Description Model Ranking Assess Model Model Assessment Revised Parameter Settings 15

Sampling Randomly split the 21,557 records into two data sets, training and validation Build response model on training data set: 10,778 records Validate model by scoring test data set: 10,779 records Ideally, have a third held out data set to provide final assessment of models

Classifiers Find Different Decision Boundaries Actual Data 11-Nearest Neighbor Neural Network Naïve Bayes Logistic Regression Decision Tree 17

Assess Models: ROC Curves 18

CRISP-DM Step 6: Deployment Steps How to deploy model? Software, source code, in database How often, when to update model Report results Plan Deployment Plan Moni-toring and Maintenance Deployment Plan Monitoring & Maintenance Plan Produce Final Report Final Report Final Presentation Lessons learned Review Project Experience Documentation

Model Results after Deployment Scored over 2,100,000 prospects Actual results from the rollout Average response rate = 13.67% Significant gross revenue generated for business partner.

What do We Call What We Do?

What is Predictive Analytics? Simple Definitions Data driven analysis for [large] data sets Data-driven to discover input combinations Data-driven to validate models OR Discovering interesting patterns in data automatically from the data Input variables are selected automatically Input combinations are discovered automatically 22

Customer Analytics: BI vs. PA Customer Analytics: Business Intelligence What were the e-mail open, click-through, and response rates? Which regions/states/zips had the highest response rates? Which products had the highest/lowest clickthrough rates? How many repeat purchasers were there last month? How many new subscriptions to the loyalty program were there? What is the average spend of those who belong to the loyalty program? Those who aren t a part of the loyalty program? Is this a significant difference? How many visits to the store/website did a person have? Customer Analytics for Predictive Analytics What is the likelihood an e-mail will be opened? What is the likelihood a customer will click-through a link in an e-mail? Which product is a customer most likely to purchase if given the choice? How many e-mails should the customer receive to maximize the likelihood of a purchase? What is the best product to up-sell to the customer after they purchase a product? What is the visit volume expected on the website next week? What is the likelihood a product will sell out if it is put on sale? What is the estimated customer lifetime value (CLV) of each customer? 23

Predictive Analytics vs. Data Science Predictive Analytics and Data Mining have always covered the same ground except for Big data-centricity Advanced database technology (to handle big data) Hadoop Other NoSQL (MongoDB, Cassandra ) Programming language-centricity (not listed) R, Python 24

What Degree Does it Take to Be a Predictive Modeler? Highest Degree 7 PhDs 1 Masters 2 Bachelors You don t need an advanced degree to be a great practitioner! Max. Degree Count Math 2 Computer Science 2 Social Science 2 Statistics 1 Economics 1 Machine Learning 1 Engineering 1 http://www.deep-data-mining.com/2013/05/the-10-most-influential-peoplein-data-analytics.html

Questions? 26

PASS Virtual Chapters for Business Analytics FREE ONLINE LEARNING www.sqlpass.org/vc 27

Like What You Heard? Dean will be presenting at BAC 2015! Pre-Conference (full day): An Overview of Predictive Analytics for Practitioners Breakout Sessions (60 mins): Starting Your First Predictive Analytics Project What Skills Do Predictive Modelers Need?

REGISTER TODAY passbaconference.com

Coming up next Productivity Revolution in Excel Avi Singh, PowerPivotPro and Chandoo, chandoo.org