Measuring the propensity to purchase Creating and interpreting the gain chart Ricco RAKOTOMALALA Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 1
Customer targeting process Promoting a new product to customers Goal: Promoting a new product Direct marketing: seek the most receptive customers (responders, buyers) the budget is limited do not solicit the hostile customers Tools: customer database a target variable which specifies the buyers (positive individuals, +) and the non-buyers (negative, -). we do not dispose to this variable initially. learning method which enables to assign a score (a probability to be positive, a propensity to purchase) to the individuals applying the score to the database - sorting the individuals according to their propensity soliciting actually the customers with high propensity 2 evaluation criteria (the baseline is to select at random the individuals) the rate of return (proportion of + among the individuals targeted) the recall (proportion of + recovered), market share Note: the approach can be applied to any domains where we want to target a subset of the population (screening campaign in medicine, etc.) Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 2
Overall outline Title InsuranChildrWages Mrs No 2 148 Mr No 2 1294 Mrs No 1 181 Mrs Yes 18 Mr No 5 177 Mr No 1 155 Mrs Yes 2 1561 Mrs Yes 2 1561 Mrs No 1 166 Mrs No 2 148 Mrs Yes 1 142 Mrs No 862 Mr Yes 1 1914 Mrs No 2 2324 Mrs No 2 862 Mrs No 892 Mr No 1 2214 Mrs No 1 221 Mr No 1 1425 Mrs No 1863 Mrs No 1318 Mr Yes 1 18 Mrs No 1 981 Mrs No 2 29 Mr No 54 Customer database (22, customers) 2, customers solicited from a test mailing (random sample) 1 customers have responded positively = 1/2, 5% (baseline rate of return) 2, customers Title InsuranChildrWages Retour Mrs No 2 148 + Mr No 2 1294 + Mrs No 1 181 - Mrs Yes 18 + Mr No 5 177 + Mr No 1 155 - Mrs Yes 2 1561 + Title InsuranChild Wages SCORE Mr No 2185.9997 Mrs No 1 9.9992 Mrs No 2 3.9987 Mr No 1 141.9976 Mrs No 2 16.9956 Mrs No 152.9931 Mr No 54.9898 Mrs No 2 24.9888 Mrs Yes 3 1237.987 Mr No 2 1572.9863 Mrs No 1 2621.9861 Mrs No 2 1782.9855 Mr No 24.9841 Mrs No 2 12.9836 Mrs No 1812.9828 Mrs No 147.9821 Mrs No 2 132.9799 Mrs No 1 18.9788 1, Test sample 1, Train sample Gain chart Evaluating the performance of the targeting S( R) ( X) Score function: a binary classifier which enables to assign a score to the individuals Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 3 1 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 1 (1) Applying the score function to the database (2) Sorting according to the score (3) Targeting the individuals with high score (4) Evaluating the performance (expected buyers for a number of solicited customers) with the Gain Chart Potential of buyers (+) : 5% of 2, = 1, positive customers
How to build the Gain chart (says also Cumulative lift curve ) from a labeled sample? Taux de vrais positifs (Rappel) Responders (+ or -) Sorting in descending order according to the score ( Score is often the estimation of the probability to be positive. But, it may be any value which reflects the propensity to be positive.) i Retour Score Taille Cible Rappel (TVP).. 1 positif 1..33.67 2 positif 1..67.133 3 positif.999.1.2 4 positif.999.133.267 5 positif.998.167.333 6 positif.992.2.4 7 négatif.987.233.4 8 positif.987.267.467 9 positif.974.3.533 1 positif.969.333.6 11 positif.953.367.667 12 positif.952.4.733 13 positif.942.433.8 14 positif.825.467.867 15 négatif.772.5.867 16 positif.59.533.933 17 négatif.57.567.933 18 négatif.37.6.933 19 négatif.294.633.933 2 négatif.19.667.933 21 positif.73.7 1. 22 négatif.35.733 1. 23 négatif.24.767 1. 24 négatif.16.8 1. 25 négatif.15.833 1. 26 négatif.9.867 1. 27 négatif.4.9 1. 28 négatif.3.933 1. 29 négatif.2.967 1. 3 négatif. 1. 1. 1..9.8.7.6.5.4.3.2.1...2.4.6.8 1. Taille (relative) de la cible N 3 N(positif) 15 Relative cumulative number of cases = i / N TPR (true positive rate) = N(+ among the i first cases) / N(+) Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 4
How to interpret the Gain chart on the test sample? Proportion of + recovered in % 1, cases in the test sample 5 (5%) are positive The dataset is sorted in descending order according to the score. 1 % of + = 5 cases 1 9 8 7 6 5 Targeting. Soliciting in priority the cases with high score Target size = 5% (5 first cases of the sample) 8% of + are recovered (4 cases + ) 4 3 2 1 1 2 3 4 5 6 7 8 9 1 Size of the target in % No targeting. Select cases at random. Target size = 5% (5 cases of the sample) 5% of + are recovered (25 cases + ) 1 % of the target = 1, cases Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 5
How to transpose the reading of the gain chart on the customer database? Proportion of + recovered in % 2, cases in the customer database We do not know who are positive But we expect that ~5% are positive i.e. ~1, cases The dataset is sorted in descending order according to the score. 1 % of + = 1, cases 1 9 8 7 6 5 Targeting. Soliciting in priority the cases with high score Target size = 5% (1, first cases of the database) 8% of + are recovered (8, cases + ) 4 3 2 1 1 2 3 4 5 6 7 8 9 1 Size of the target in % No targeting. Select cases at random. Target size = 5% (1, cases of the database) 5% of + are recovered (5, cases + ) 1 % of the target = 2, cases Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 6
By fixing the target size (costs), how many positive instances (benefit) will be obtained? We specify the budget of the campaign e.g. 4, prospects We found 1,8 additional buyers 38% of + are recovered i.e..38 x 1, = 3,8 + At random, 2% of + recovered i.e..2 x 1, = 2, + 1 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 1 Budget: 4, mailing (2% of the database) Conclusion: Rate of return: 3,8 / 4, = 9,5% 5% if we select the customers at random Market share: 3,8 / 1, = 38% it remains 6,2 unsolicited buyers Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 7
By fixing the objective, how many customers must be solicited? We specify the number of buyers we must obtain e.g. 5, buyers 1 9 8 7 5, buyers i.e. 5% of potential buyers = 5, / 1, 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 1 We must send mails to 27% of the customers with the higher scores i.e..27 x 2, = 54, individuals At random, we must send 1, mails to obtain this objective Conclusion: We save 46, mails Rate of return : 5, / 54, = 9,25% 5% if we select the customers at random Market share: 5, / 1, = 5% this is a given in this context Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 8
Conclusion No targeting (selecting cases at random) and perfect targeting (all the positives have higher score than the negatives) Taux de vrais positifs (Rappel) Perfect targeting i.e. there are no negative individuals with higher score than positive ones Y-axis = 1 X-axis = N(+)/N 1.9.8.7.6.5.4.3.2.1..2.4.6.8 1. Taille (relative) de la cible Targeting at random i.e. The score is not efficient and may be considered as a random value Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 9
References Microsoft, Lift chart (Analysis Services Data Mining), SQL Server 214. H. Hamilton, Cumulative Gains and Lift Charts, in CS 831 Knowledge Discovery in Databases, 212. M. Vuk, T. Curk, ROC Curve, Lift Chart and Calibration Plot, in Metodoloski zvezki, 3(1), 89-18, 26. S. Sayad, Model Evaluation Classification, in Introduction to Data Mining, 212. Tutoriels Tanagra - http://tutoriels-data-mining.blogspot.fr/ 1