A Basic Guide to Modeling Techniques for All Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC
Overview Types and Uses of Models Descriptive Segmentation Profiling Predictive Regression Trees Neural Networks Genetic Algorithms Association Rules Latent Class Variable Implementing Models Why Good Models Fail Scoring errors Backend failures
Segmentation Analysis Segmentation analysis groups variables with like characteristics. Can be market driven: analyst determines the segments. Can be data driven: data determines the segments (clustering) MALE FEMALE M S D W M S D W < 40 55 58 54 45 35 52 46 31 40-49 57 60 58 55 40 54 49 37 50-59 61 63 60 58 46 55 51 42 60+ 58 60 61 55 44 46 36 27
Profiling: Credit Card Customers High R I S K Low Low Low Potential Average Balance = $1,089 Average APR = 12.3% Average Tenure = 2.4 Years Average Charge-off = $111 Average Profits = $8 Good Potential Average Balance = $549 Average APR = 8.4% Average Tenure = 1.2 Years Average Charge-off = $29 Average Profits = $33 Revenue Cautious Potential Average Balance = $5,315 Average APR = 15.8% Average Tenure = 2.8 Years Average Charge-off = $584 Average Profits = $239 Best Customer Average Balance = $3,288 Average APR = 13.7% Average Tenure = 3.7 Years Average Charge-off = $102 Average Profits = $440 High
Profiling: Credit Card Customers High R I S K Low Potential Charge Annual Fee Increase APR Low Priority Service No Solicitations Good Potential Good Customer Service Decrease APR Offer Balance Transfers Offer Cardholder Benefits Cautious Potential Charge Annual Fee Increase APR Monitor Payment Behavior Offer secured loan Best Customer Best Customer Service No Annual Fee Automatic Line Increase Offer Cardholder Benefits Low Low Revenue High
Association Rules Rules derived from past behavior such as movement on Website or purchase groupings. Used to enhance Website structure and modify Web traffic. Used to make real time targeted offers.
Linear Regression Uses continuous values to predict continuous value. Explains variation in data using ordinary least squares (OLS). Useful in predicting: amount of sale ~ advertising, cost, demographics charge-off dollars ~ balance, financial risk profile, demographics amount of claim ~ age, health risk profile, geography dollar balance ~ financial risk profile, action to account, market pressure average profitability ~ financial risk profile, price sensitivity, demographics
Simple Linear Regression Advertising $120 $160 $205 $210 $225 $230 $290 $315 $375 $390 $440 $475 $490 $550 Sales $1,503 $1,755 $2,971 $1,682 $3,497 $1,998 $4,528 $2,937 $3,622 $4,402 $3,844 $4,470 $5,492 $4,398 S A L E S $6K $5K $4K $3K $2K $1K 0 0 $100 $200 $300 $400 $500 $600 ADVERTISING
Simple Linear Regression Goal: characterize relationship between advertising and sales Result: equation that predicts sales dollars based on advertising dollars spent S A L E S $6K $5K $4K $3K $2K $1K Minimize Squared Error Sales = B 0 + B 1 Advertising 0 0 $100 $200 $300 $400 $500 $600 ADVERTISING
Multiple Linear Regression Minimizes squared error in N-dimensional space Credit card balances payment amount years gender (0/1) Balances = 2.1774 +.0966Payment + 1.2494 Months +.4412Gender
Logistic Regression Uses continuous values to predict probability of discrete outcome Iterative method of minimizing error using method of maximum likelihood Useful in predicting probability of: response to loan offer ~ financial risk profile, demographics response to insurance offer ~ health risk profile, demographics activation ~ financial risk profile, demographics, market pressure charge-off ~ balance, financial risk profile, demographics claim ~ health risk profile, demographics fraud ~ financial risk profile, account activity account closure ~ account activity, market pressure
Logistic Regression Predicts probability of event occurring using function of linear predictors p = probability of event occurring p/(1-p) is the odds of an event occurring. Log of the odds: log(p/(1-p)) is linear function of predictors. 1 0 Uses s-shaped curve instead of linear function to fit the data. log(p/(1-p)) = B 0 + B 1 X 1 + b 2 X 2 + B n X n P = 1/(1+e -(B 0 + B 1 X 1 + b 2 X 2 + B n X n ) )
Classification Trees Mailed 10,000 Resp Rate 2.6% Male 4,677 Resp Rate 3.2% Female 5,323 Resp Rate 2.1% <$30K 1,290 Resp Rate 1.7% >$45K 1,281 Resp Rate 4.1% Age => 40 2,211 Resp Rate 4.3% $30K-$45K 2,106 Resp Rate 3.6% Age < 40 3,112 Resp Rate 0.7%
Decision Trees Profit Issue Loan Yes 97% 3% x $728 (Interest) x $4872 (Loss) $706 ($146) No $0 Decision Node Chance Node Allows you to quantify the best action.
Neural Networks amount of sale ~ advertising, cost, demographics charge-off dollars ~ balance, financial risk profile, demographics amount of claim ~ age, health risk profile, geography dollar balance ~ financial risk profile, action to account, market pressure average profitability ~ financial risk profile, price sensitivity, demographics
Artificial Neural Networks Multiple hidden nodes Each node is linear transformation of output from previous node Structure is too complex to interpret weights. Output layer Hidden layer Stopping rules Error threshold Time limit Change in error Input layer
Artificial Neural Networks Advantages Handles non-linearity Handles interactions Considered very accurate Useful for complex optimization Disadvantages Not interpretable CPU intensive Poor handling of missing data Sensitive to input variable selection Explodes categorical data Risk of over-fitting -> not robust
Genetic Algorithms Based on Darwin s Principle of Survival of the Fittest. Genetic Operators Reproduction (Copying) Mating (Crossover) Mutation (Altering) Process starts with initial population of random models. Models with poor performance (fitness) die out - are deleted.
Genetic Algorithms Methodology Fitness of the new population improves by: 1. Copying good models. 2. Mating good models to create better offspring models with improved fitness. 3. Altering good models to create mutants with improved fitness. 4. Repeat steps 1-3 until stopping rules are met. The Best Evolved model is the solution.
Genetic Algorithms Models are composed of Functions arithmetic (+, -,, ) mathematical (log, exp, max,... ) trigonometric (sin, cos, tan, arcsin,...) logics (and, or, not, gt, lt, eq,...) conditional (if-then-else) Variables independent variables numeric values (constants, random numbers)
GA s - Initialize Random Model Models Objective Predict response Let the function set consist of +, -,,, exp Let the variable set consist of 20% X1, X2, b 20% _ + 20% exp 20% 20%
GA s - Initialize Random Model Models are displayed in trees. Response + 12.5% 12.5% b X1 12.5% X1 b _ + 12.5% Repeat M times 12.5% X2 exp 12.5% 12.5% 12.5%
GA s Generate M Models Response Response _ Y = b exp(x1) X1 X2 b exp Y = X1X2 X1
GA s Compare Fitness 26% Model 1 Y = x1(b + X2) Model 2 Y = b - exp(x1) Model 3 Y = x1 - X2 Model 4 Y = x1x2 Model 5 Y = b + x1 M2 23% M3 M1 M4 17% M5 6% Fitness Value (r-square) PTF Model 1 0.61 0.29 Model 2 0.55 0.26 Model 3 0.48 0.23 Model 4 0.36 0.17 Model 5 0.12 0.06 Total 2.12 1.00 29%
Genetic Methodology Fitness Improves by: Copying models based on PTF Mating models based on PTF Altering models based on PTF Continue above until stopping rules are met The best-evolved model is the solution
Latent Class Models Used more in academic circles Software only allowed small sets and a small number of variables LatentGOLD developed by Statistical Innovations (Jay Magidson, inventor of CHAID) Scalable sofware Disparate sources of data
3 Kinds of Latent Class Models Traditional Applications in scaling and classification Factor Applications in exploratory and confirmatory factor analysis Regression Uses are in the prediction and explanation when the population is not homogenous
Traditional LCM vs. LC Factor Traditional Latent Class Models identify classes which group together persons who share similar interest/values/characteristics/behavior Latent Class Factor Models identify factors which group together variables sharing a common source of variation
Implementing Models How do we select based on model results? What is the impact to the bottom line?
Gains Table Number Accounts Predicted Actual Cum Actual Lift Cum Lift 1 48,342 4,891 10.35% 10.12% 10.12% 3.57 3.57 2 48,343 3,945 8.44% 8.16% 9.14% 2.88 3.22 3 48,342 2,783 5.32% 5.76% 8.01% 2.03 2.83 4 48,342 1,151 2.16% 2.38% 6.60% 0.84 2.33 5 48,343 519 1.03% 1.07% 5.50% 0.38 1.94 6 48,342 269 0.48% 0.56% 4.67% 0.20 1.65 7 48,342 112 0.31% 0.23% 4.04% 0.08 1.43 8 48,343 25 0.06% 0.05% 3.54% 0.02 1.25 9 48,342 5 0.01% 0.01% 3.15% 0.00 1.11 10 48,342 1 0.00% 0.00% 2.83% 0.00 1.00
Gains Chart 100% 90% P e r c e n t A c t i v e 80% 70% 60% 50% 40% 30% 20% 10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Percent Mailed
Modeling Lifetime Value Predict probability of activation for a life insurance offer using logistic regression, neural networks, genetic algorithms. Use probability to calculate Lifetime Value (LTV) for life insurance prospect for a five year period LTV = Pr(Activation) Risk (Product Profitability+ Cross Sales)Lapse Indicator - Marketing Expense Activation - probability given by a model Risk - indices in matrix of gender marital status age group Product Profitability - present value of product specific 5 year profit measure Cross Sales additional net revenues for five years following activation Lapse Indicator adjustment based on payment method Marketing Expense - cost of package, postage & processing
Lifetime Value LTV = Pr(Active) Risk (Cross Sell Profit + Product Profitability) Lapse Indicator Index - Marketing Expense Active Cross Risk Lapse Product Average Average Sum Number Rate Sell Index Indicator Profitability LTV CUM LTV Cum LTV 1 96,685 10.36% $120 0.94 0.99 $553 $64.76 $64.76 $6,261,266 2 96,685 8.63% $104 0.99 1.00 $553 $55.35 $60.06 $11,612,984 3 96,685 5.03% $105 0.96 0.99 $553 $30.99 $50.37 $14,609,591 4 96,685 1.94% $107 0.93 0.97 $553 $11.13 $40.56 $15,685,475 5 96,685 0.96% $98 1.01 0.99 $553 $5.53 $33.55 $16,220,346 6 96,685 0.28% $101 1.02 1.00 $553 $1.09 $28.14 $16,325,522 7 96,685 0.11% $97 1.03 1.01 $553 ($0.04) $24.12 $16,321,311 8 96,685 0.08% $98 0.99 1.01 $553 ($0.26) $21.07 $16,295,747 9 96,685 0.01% $94 1.04 1.02 $553 ($0.75) $18.64 $16,223,586 10 96,685 0.00% $95 1.09 1.02 $553 ($0.78) $16.70 $16,148,199 How many deciles do you mail?
Why Good Models Fail (Allison s Top Ten for Troubleshooting) 1. Check the phones; make sure the site is functioning properly 2. Track the mail 3. Listen in on the call center 4. Implementation Issues Programming errors Inverted scoring 5. Did they pull the right group?
More of Why Good Models Fail 6. Practice crop rotation 7. External validity 8. Internal validity 9. Bad ingredients make for bad models 10.Old models, like old horses, have to be put out to pasture
All models are wrong, but some are useful. George Box
C. Olivia Rud Executive Vice President DataSquare, LLC 733 Summer St. Stamford, CT 06901 203 964-9733 x103 Olivia@datasquare.com Specializing in Data Mining, Statistical Modeling and Marketing Strategy for Marketing, Risk and Customer Relationship Management
Allison Cornia Database Marketing Manager CRM/Home & Retail Division Microsoft Corporation One Microsoft Way Redmond, WA 98052 425-882-8080 Allisonc@microsoft.com
A Basic Guide to Modeling Techniques for all Direct Marketing Challenges Allison Cornia Database Marketing Manager Microsoft Corporation C. Olivia Rud Executive Vice President Data Square, LLC