Measuring the propensity to purchase Creating and interpreting the gain chart. Ricco RAKOTOMALALA

Similar documents
Knowledge Discovery and Data Mining

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Performance Measures in Data Mining

ROC Curve, Lift Chart and Calibration Plot

Data mining and statistical models in marketing campaigns of BT Retail

Evaluation & Validation: Credibility: Evaluating what has been learned

Data Mining Algorithms Part 1. Dejan Sarka

All Models are Wrong but Some are Useful: the Use of Predictive Analytics in Direct Marketing

Mining Life Insurance Data for Customer Attrition Analysis

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Performance Measures for Machine Learning

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

In this tutorial, we try to build a roc curve from a logistic regression.

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

A Decision Theoretic Approach to Targeted Advertising

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Grow Revenues and Reduce Risk with Powerful Analytics Software

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 22

A Data-Driven Approach to Predict the. Success of Bank Telemarketing

Understanding Characteristics of Caravan Insurance Policy Buyer

Customer Life Time Value

How To Understand The Data Mining Process Of Andsl

Chapter 6. The stacking ensemble approach

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Knowledge Discovery and Data Mining

Advanced analytics at your hands

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Data Mining Practical Machine Learning Tools and Techniques

Data Mining - Evaluation of Classifiers

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Implementation of Data Mining Techniques to Perform Market Analysis

Chapter 7. Who Wants My Product? Affinity-Based Marketing. Acronyms. 7.1 Introduction

A Hybrid Data Mining Model to Improve Customer Response Modeling in Direct Marketing

CS 6220: Data Mining Techniques Course Project Description

Scoring the Data Using Association Rules

3 Secrets to Adding Names (and Dollars) to Your Marketing Program

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Health Care and Life Sciences

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Azure Machine Learning, SQL Data Mining and R

Knowledge Discovery and Data Mining

Modeling Lifetime Value in the Insurance Industry

Introducing diversity among the models of multi-label classification ensemble

PREDICTING SUCCESS IN THE COMPUTER SCIENCE DEGREE USING ROC ANALYSIS

Direct Marketing When There Are Voluntary Buyers

Exploratory Spatial Data Analysis

Data Mining - The Next Mining Boom?

How To Cluster

Data Mining for Direct Marketing: Problems and

USING DATA MINING FOR BANK DIRECT MARKETING: AN APPLICATION OF THE CRISP-DM METHODOLOGY

Performance Metrics for Graph Mining Tasks

Data Mining Techniques Chapter 4: Data Mining Applications in Marketing and Customer Relationship Management

Plot the following two points on a graph and draw the line that passes through those two points. Find the rise, run and slope of that line.

KnowledgeSEEKER Marketing Edition

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Didacticiel Études de cas

Getting Even More Out of Ensemble Selection

Spatial sampling effect of laboratory practices in a porphyry copper deposit

Chapter 4. Probability and Probability Distributions

Predictive Modeling in Automotive Direct Marketing: Tools, Experiences and Open Issues

Personalized Spam Filtering for Gray Mail

not possible or was possible at a high cost for collecting the data.

Targeted Marketing, KDD Cup and Customer Modeling

IBM SPSS Direct Marketing 19

Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables

Easily Identify the Right Customers

Identifying Potentially Useful Header Features for Spam Filtering

Microsoft Excel. Qi Wei

A Proposed Data Mining Model for the Associated Factors of Alzheimer s Disease

DATA MINING TECHNIQUES AND APPLICATIONS

Categorical Data Visualization and Clustering Using Subjective Factors

Graphing Information

1 Maximum likelihood estimation

Data Mining with SQL Server Data Tools

Analyzing CRM Results: It s Not Just About Software and Technology

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

Probability Distributions

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Data Mining Applications in Fund Raising

Absorbance Spectrophotometry: Analysis of FD&C Red Food Dye #40 Calibration Curve Procedure

PASW Direct Marketing 18

Random Forest Based Imbalanced Data Cleaning and Classification

Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1

IBM SPSS Direct Marketing 20

Statistics Revision Sheet Question 6 of Paper 2

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

MHI3000 Big Data Analytics for Health Care Final Project Report

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

The Enron Corpus: A New Dataset for Classification Research

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies

Data Mining Applications in Higher Education

Implementing Data Models and Reports with Microsoft SQL Server

How To Identify A Churner

Programming Exercise 3: Multi-class Classification and Neural Networks

An Approach to Detect Spam s by Using Majority Voting

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Data Mining Techniques For Marketing, Sales, and Customer Relationship Management Second Edition

Transcription:

Measuring the propensity to purchase Creating and interpreting the gain chart Ricco RAKOTOMALALA Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 1

Customer targeting process Promoting a new product to customers Goal: Promoting a new product Direct marketing: seek the most receptive customers (responders, buyers) the budget is limited do not solicit the hostile customers Tools: customer database a target variable which specifies the buyers (positive individuals, +) and the non-buyers (negative, -). we do not dispose to this variable initially. learning method which enables to assign a score (a probability to be positive, a propensity to purchase) to the individuals applying the score to the database - sorting the individuals according to their propensity soliciting actually the customers with high propensity 2 evaluation criteria (the baseline is to select at random the individuals) the rate of return (proportion of + among the individuals targeted) the recall (proportion of + recovered), market share Note: the approach can be applied to any domains where we want to target a subset of the population (screening campaign in medicine, etc.) Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 2

Overall outline Title InsuranChildrWages Mrs No 2 148 Mr No 2 1294 Mrs No 1 181 Mrs Yes 18 Mr No 5 177 Mr No 1 155 Mrs Yes 2 1561 Mrs Yes 2 1561 Mrs No 1 166 Mrs No 2 148 Mrs Yes 1 142 Mrs No 862 Mr Yes 1 1914 Mrs No 2 2324 Mrs No 2 862 Mrs No 892 Mr No 1 2214 Mrs No 1 221 Mr No 1 1425 Mrs No 1863 Mrs No 1318 Mr Yes 1 18 Mrs No 1 981 Mrs No 2 29 Mr No 54 Customer database (22, customers) 2, customers solicited from a test mailing (random sample) 1 customers have responded positively = 1/2, 5% (baseline rate of return) 2, customers Title InsuranChildrWages Retour Mrs No 2 148 + Mr No 2 1294 + Mrs No 1 181 - Mrs Yes 18 + Mr No 5 177 + Mr No 1 155 - Mrs Yes 2 1561 + Title InsuranChild Wages SCORE Mr No 2185.9997 Mrs No 1 9.9992 Mrs No 2 3.9987 Mr No 1 141.9976 Mrs No 2 16.9956 Mrs No 152.9931 Mr No 54.9898 Mrs No 2 24.9888 Mrs Yes 3 1237.987 Mr No 2 1572.9863 Mrs No 1 2621.9861 Mrs No 2 1782.9855 Mr No 24.9841 Mrs No 2 12.9836 Mrs No 1812.9828 Mrs No 147.9821 Mrs No 2 132.9799 Mrs No 1 18.9788 1, Test sample 1, Train sample Gain chart Evaluating the performance of the targeting S( R) ( X) Score function: a binary classifier which enables to assign a score to the individuals Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 3 1 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 1 (1) Applying the score function to the database (2) Sorting according to the score (3) Targeting the individuals with high score (4) Evaluating the performance (expected buyers for a number of solicited customers) with the Gain Chart Potential of buyers (+) : 5% of 2, = 1, positive customers

How to build the Gain chart (says also Cumulative lift curve ) from a labeled sample? Taux de vrais positifs (Rappel) Responders (+ or -) Sorting in descending order according to the score ( Score is often the estimation of the probability to be positive. But, it may be any value which reflects the propensity to be positive.) i Retour Score Taille Cible Rappel (TVP).. 1 positif 1..33.67 2 positif 1..67.133 3 positif.999.1.2 4 positif.999.133.267 5 positif.998.167.333 6 positif.992.2.4 7 négatif.987.233.4 8 positif.987.267.467 9 positif.974.3.533 1 positif.969.333.6 11 positif.953.367.667 12 positif.952.4.733 13 positif.942.433.8 14 positif.825.467.867 15 négatif.772.5.867 16 positif.59.533.933 17 négatif.57.567.933 18 négatif.37.6.933 19 négatif.294.633.933 2 négatif.19.667.933 21 positif.73.7 1. 22 négatif.35.733 1. 23 négatif.24.767 1. 24 négatif.16.8 1. 25 négatif.15.833 1. 26 négatif.9.867 1. 27 négatif.4.9 1. 28 négatif.3.933 1. 29 négatif.2.967 1. 3 négatif. 1. 1. 1..9.8.7.6.5.4.3.2.1...2.4.6.8 1. Taille (relative) de la cible N 3 N(positif) 15 Relative cumulative number of cases = i / N TPR (true positive rate) = N(+ among the i first cases) / N(+) Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 4

How to interpret the Gain chart on the test sample? Proportion of + recovered in % 1, cases in the test sample 5 (5%) are positive The dataset is sorted in descending order according to the score. 1 % of + = 5 cases 1 9 8 7 6 5 Targeting. Soliciting in priority the cases with high score Target size = 5% (5 first cases of the sample) 8% of + are recovered (4 cases + ) 4 3 2 1 1 2 3 4 5 6 7 8 9 1 Size of the target in % No targeting. Select cases at random. Target size = 5% (5 cases of the sample) 5% of + are recovered (25 cases + ) 1 % of the target = 1, cases Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 5

How to transpose the reading of the gain chart on the customer database? Proportion of + recovered in % 2, cases in the customer database We do not know who are positive But we expect that ~5% are positive i.e. ~1, cases The dataset is sorted in descending order according to the score. 1 % of + = 1, cases 1 9 8 7 6 5 Targeting. Soliciting in priority the cases with high score Target size = 5% (1, first cases of the database) 8% of + are recovered (8, cases + ) 4 3 2 1 1 2 3 4 5 6 7 8 9 1 Size of the target in % No targeting. Select cases at random. Target size = 5% (1, cases of the database) 5% of + are recovered (5, cases + ) 1 % of the target = 2, cases Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 6

By fixing the target size (costs), how many positive instances (benefit) will be obtained? We specify the budget of the campaign e.g. 4, prospects We found 1,8 additional buyers 38% of + are recovered i.e..38 x 1, = 3,8 + At random, 2% of + recovered i.e..2 x 1, = 2, + 1 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 1 Budget: 4, mailing (2% of the database) Conclusion: Rate of return: 3,8 / 4, = 9,5% 5% if we select the customers at random Market share: 3,8 / 1, = 38% it remains 6,2 unsolicited buyers Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 7

By fixing the objective, how many customers must be solicited? We specify the number of buyers we must obtain e.g. 5, buyers 1 9 8 7 5, buyers i.e. 5% of potential buyers = 5, / 1, 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 1 We must send mails to 27% of the customers with the higher scores i.e..27 x 2, = 54, individuals At random, we must send 1, mails to obtain this objective Conclusion: We save 46, mails Rate of return : 5, / 54, = 9,25% 5% if we select the customers at random Market share: 5, / 1, = 5% this is a given in this context Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 8

Conclusion No targeting (selecting cases at random) and perfect targeting (all the positives have higher score than the negatives) Taux de vrais positifs (Rappel) Perfect targeting i.e. there are no negative individuals with higher score than positive ones Y-axis = 1 X-axis = N(+)/N 1.9.8.7.6.5.4.3.2.1..2.4.6.8 1. Taille (relative) de la cible Targeting at random i.e. The score is not efficient and may be considered as a random value Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 9

References Microsoft, Lift chart (Analysis Services Data Mining), SQL Server 214. H. Hamilton, Cumulative Gains and Lift Charts, in CS 831 Knowledge Discovery in Databases, 212. M. Vuk, T. Curk, ROC Curve, Lift Chart and Calibration Plot, in Metodoloski zvezki, 3(1), 89-18, 26. S. Sayad, Model Evaluation Classification, in Introduction to Data Mining, 212. Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 1