Successfully Implementing Predictive Analytics in Direct Marketing



Similar documents
Gerry Hobbs, Department of Statistics, West Virginia University

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Using Adaptive Random Trees (ART) for optimal scorecard segmentation

A Property & Casualty Insurance Predictive Modeling Process in SAS

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

Data Mining Applications in Higher Education

Easily Identify Your Best Customers

Easily Identify the Right Customers

IBM SPSS Direct Marketing

Marketing Strategies for Retail Customers Based on Predictive Behavior Models

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Modeling Lifetime Value in the Insurance Industry

By Peter Schoewe, Director of Analytics Mal Warwick Donordigital

Predictive Modeling of Titanic Survivors: a Learning Competition

A Property and Casualty Insurance Predictive Modeling Process in SAS

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Paper AA Get the highest bangs for your marketing bucks using Incremental Response Models in SAS Enterprise Miner TM

Data Mining Methods: Applications for Institutional Research

Big Data Analytics. Benchmarking SAS, R, and Mahout. Allison J. Ames, Ralph Abbey, Wayne Thompson. SAS Institute Inc., Cary, NC

Predicting Churn. A SAS White Paper

Leveraging Ensemble Models in SAS Enterprise Miner

Data Mining Applications in Fund Raising

Target Analytics Nonprofit Cooperative Database

IBM SPSS Direct Marketing 23

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

IBM SPSS Direct Marketing 22

Chapter 7: Data Mining

Optimizing Enrollment Management with Predictive Modeling

How To Make A Credit Risk Model For A Bank Account

Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1

Business Intelligence. Tutorial for Rapid Miner (Advanced Decision Tree and CRISP-DM Model with an example of Market Segmentation*)

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

Alex Vidras, David Tysinger. Merkle Inc.

COPYRIGHT 2012 VERTICURL WHITEPAPER: TOP MISTAKES TO AVOID WHEN BUILDING A DEMAND CENTER

Predictive Analytics for Donor Management

NFP. Capitalizing on merge strategies to boost your return on donor marketing

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

Hurwitz ValuePoint: Predixion

Nature Conservancy of Canada. Position Profile. Vice President, Development & Marketing

Data Mining: An Introduction

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca

Working with telecommunications

analytics stone Automated Analytics and Predictive Modeling A White Paper by Stone Analytics

A Marketer s Guide to Analytics

KnowledgeSEEKER Marketing Edition

A fast, powerful data mining workbench designed for small to midsize organizations

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Better credit models benefit us all

Numerical Algorithms Group

Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Reducing the Costs of Employee Churn with Predictive Analytics

Data Mining Algorithms Part 1. Dejan Sarka

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Identifying SPAM with Predictive Models

The Predictive Data Mining Revolution in Scorecards:

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

Benchmarking of different classes of models used for credit scoring

Donor Management Systems Use and Satisfaction Study

Data-Driven Decisions: Role of Operations Research in Business Analytics

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page Analytics and Data Mining 1

Classification of Bad Accounts in Credit Card Industry

Enhancing Compliance with Predictive Analytics

Better planning and forecasting with IBM Predictive Analytics

Virtual Site Event. Predictive Analytics: What Managers Need to Know. Presented by: Paul Arnest, MS, MBA, PMP February 11, 2015

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

Public, Private and Hybrid Clouds

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

SUSTAINABILITY & EMPLOYEE ENGAGEMENT

IBM SPSS Direct Marketing 19

Why Nonprofits Need Nonprofit Accounting Software

Prediction of Stock Performance Using Analytical Techniques

Internet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers

Reject Inference Methodologies in Credit Risk Modeling Derek Montrichard, Canadian Imperial Bank of Commerce, Toronto, Canada

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.

Credit Risk Analysis Using Logistic Regression Modeling

Using Statistical Modeling to Increase

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Predictive Analytics Certificate Program

Model Validation Techniques

Business Intelligence Solutions for Gaming and Hospitality

Transcription:

Successfully Implementing Predictive Analytics in Direct Marketing John Blackwell and Tracy DeCanio, The Nature Conservancy, Arlington, VA ABSTRACT Successfully Implementing Predictive Analytics in Direct Marketing An analysis of the benefits of modeling in direct marketing as well as a review of the dangers associated with common data mining mistakes. The presentation will review specific examples of how SAS data mining products have been successfully utilized at the one of the world's largest non-profit organizations to dramatically improve ROI. Specifically, this session will focus on 1) measuring the true value of predictive models and 2) preparing data for modeling and avoiding pitfalls and 3) building models and selecting the best for deployment and 4) the importance of industry expertise and questioning what the data are "saying" and 5) assessing the readiness of your organization in building analytics capacity in-house. INTRODUCTION As organizations scale back marketing budgets many are foregoing the expense associated with predictive analytics. However, the importance of analytics must be viewed in the context of being ready to bounce back when the economy rebounds as opposed to stressing that there can also be a short term benefit despite the current climate. While it is true that data-mining might not be able to completely reverse decreases in revenue, it can have immediate positive return on investment. This paper presents a case study for one of world's largest fundraising organizations who used analytics several years ago to mitigate the impact of a negative economic environment and continues to do so today in order to efficiently raise money. BACKGROUND The Nature Conservancy ( TNC ) is the world s largest environmental non-profit with a mission to preserve the plants, animals and natural communities that represent the diversity of life on Earth by protecting the lands and waters they need to survive. TNC is able to fund its mission primarily through fundraising from which it generates about five-hundred million dollars a year from its 3 million members. While data mining is used throughout the conservancy this paper will focus on its application in direct-mail campaigns. BEFORE ANALYTICS Prior to 2003, TNC s decisions about who to mail and how much to ask for were mostly done through gut-feel and rules of thumb that had often not been tested empirically. While the organization was mostly successful, events in 2003 including a weaker economy and negative press put downward pressure on TNC s ability to fundraise. Indeed, a membership base which had previously been growing steadily came to an abrupt halt. To maintain its reputation and continue its mission The Nature Conservancy needed to increase it revenues through increased efficiencies. The organization has met the challenge through making fact-based decisions with an emphasis on analytics. Today when TNC plans a direct mail solicitation they determine who to mail based on which donors are the most likely to respond with the largest gifts. The following SAS Enterprise Miner diagram shows a typical modeling development flow for a response: 1

Details of the value behind the ensemble model are discussed in more detail later in the paper. The score from the above model (using a binary target for response) is then multiplied by a predicted gift amount generated through a separate model. The resulting product is a prediction of gross revenue per mailing. The marketing team will then decide who to mail based on a chart summarizing the model results such as the sample below. The chart shows the donors ranked in deciles according to their model score and shows the predicted response, revenue per mailing sent, profit and the cost associated with raising each dollar. Model Rank Size of Group Resp. Rate Rev. Per Piece Profit 2 Cost Per $ Raised 1 96,715 6.9% $7.46 $668,961 $0.07 2 96,740 5.4% $2.67 $206,185 $0.20 3 96,294 4.4% $1.70 $111,441 $0.32 4 96,778 3.8% $1.24 $68,075 $0.43 5 97,917 3.6% $0.98 $42,595 $0.55 6 95,817 2.9% $0.75 $19,859 $0.72 7 104,412 2.4% $0.55 $594 $0.99 8 89,794 1.8% $0.41 ($11,857) $1.32 9 95,245 1.5% $0.30 ($22,643) $1.79 10 97,424 1.3% $0.20 ($32,891) $2.67

As illustrated above, the model is able to identify a significant portion of the available audience for any given mailing that is not profitable to mail (wherever the cost per dollar raised is greater than one dollar) in addition to identifying which donors will be the most profitable. MAKING THE CASE An initial hurdle in making the case for predictive modeling was to convince management that it would be beneficial not only to the organization but also to the membership population. Charitable giving is highly seasonal and while many people may prefer to give at the end of the year some are also more likely to give around tax time and others over the summer months. It was clear that we had to show that modeling could help to better understand the seasonality of the subsets of the membership population and could actually result in a decrease in the amount of direct mail sent without a corresponding decrease in revenue. For example a model could allow us to target our audience when they where ready to give as opposed to sending an expensive, steady stream of solicitations throughout the year. PROVING THE VALUE In their book Competing on Analytics, Davenport and Harris refer to the detour that some organizations need in order to prove the value of fact-based decisions before analytics can be fully implemented. At The Nature Conservancy there was much skepticism about the notion that a model could make better predictions than current business practices. For the first few campaigns that were modeled we randomly split the available audience into two segments and allowed the Marketing team to apply their usual selection criteria to the first half. For the second half, we used a model select the accounts to be mailed. At the completion of the campaign the results were analyzed and shown to the decision-makers. In all cases the modeled select significantly out-performed the standard select. The chart below shows how the modeling boosted revenue for monthly solicitation campaigns in the first year after the detour. The bars show dollar amount per piece sent and the line represents the percentage increase in total revenue per piece mailed associated with the model select. Model Standard Rev. Per Piece Increase $4.00 80% $3.50 70% $3.00 60% $2.50 50% $2.00 40% $1.50 30% $1.00 20% $0.50 10% $- 0% July September October November December January February April June While needing to prove the value of modeling took more time and resulted in foregone revenue, the results were definitive and allowed us to get the buy-in from management that we needed to expand the scope of analytics within the organization. 3

AVOIDING PITFALLS TOO MUCH DATA? The massive amounts of data that are now available about an organization s customers are one reason predictive modeling has been so effective. However, if not treated with caution, in some cases it can lead to a weaker model. Vendors that deal in customer data appends will often tout the hundreds of demographic and psychographic variables that can be appended to an individual s database record. In addition, the low cost of storage allows most companies to retain hundreds of data points associated with a customer relationship over time, which can be used in model development. However there is always the danger that correlations of marketing data in a development data set are merely noise and not predictors of future behavior. There are some well-known examples of chance correlations associated with political events. For example, from 1952 to 1976 when an American League team won the World Series a Republican would win the U.S. Presidency. A similar more recent example involving the Washington Redskins was true until the election of 2004. Examples such as the above are often cited as curiosities and are not necessarily taken seriously. However, because a correlation is plausible does not necessarily mean that it will provide any more value than the above. While data is the backbone of a strong predictive model in direct-marketing, it can also weaken models when the apparent correlation is not one that holds up. An article in the Economist on August 18, 2007 effectively sums up the problem: As the old saw has it, garbage in, garbage out. The difficulty comes when you do not know what garbage looks like. How do you know what variables are truly predictive and which are just correlated by chance in a development data set? While you can never say what is a true predictor with absolute certainty, there are several approaches that you can use to avoid using noise variables and to lessen their impact if they are selected by a modeling algorithm. The following approach can help to identify which variables are more likely than others to be meaningless and also to lessen the impact that the "garbage" has on a model. A CASE STUDY As illustration I created a dataset that contained a binary dependant variable and had about 50 potential independent variables taken from the marketing data base. For the purpose of providing an example, 100 random variables were appended to the actual data. The data were split into development, validation and test samples and an initial model was built only allowing for the random variables to be used as input variables. When run through a stepwise logistic regression and entropy-reduction decision tree model, each model selected between four and six variables. This example is merely to illustrate that even with high significance levels for variable selection, when the number of candidate variables is large enough some will be selected even in the event when there is nothing beyond noise. In the next scenario the same models were run using the random and the real input variables. In this case both models continue to select two of the random variables. However, the overall fit statistics on the test partition (as shown below) are strong and do not suggest any cause for concern. Model Test: Roc Index Test: Gini Coefficient Test: Kolmogorov-Smirnov Statistic Decision Tree 0.709 0.419 0.289 Regression 0.715 0.429 0.329 When re-running the model without the random variables we can see that the fit statistics on the test partition of the data set improve. In the case of the logistic regression the same non-random variables are selected, however, given the higher ROC index on the test data this suggests stronger coefficients. In the case of the decision tree, the random variables that had been selected were selected as the second to last split and removing them allowed the tree to grow differently, resulting in a stronger model. Since at any depth a decision tree will always look for the next most significant split without regard to the best possible combination of splits, if it is confused at one level by noise, predictive variables might never be selected. 4

Model Test: Roc Index Test: Gini Coefficient Test: Kolmogorov-Smirnov Statistic Decision Tree 0.738 0.477 0.356 Regression 0.742 0.483 0.360 The above examples serve to show that it is easy for noise variables to be selected by models, and the emphasis on any candidate predictor variable should be less about deciding whether the split is plausible and more about questioning whether the relationship is real. While it may be impractical to fully analyze every candidate variable for every model and still meet deadlines, one way to guard against noise is to perform in-depth analysis on new variables appended to the file as well as variables that are only selected by one particular modeling technique. Similar to the above, sometimes merely questioning the data and treating all relationships with scientific skepticism can be useful. A fairly well-known example of the pitfalls associated with not questioning what the data are saying is a look at the apparent correlation between SAT scores by state and the amount of money spent on education: The above chart shows that on the balance, the more spent on education the lower the test scores are. However, someone familiar with the subject would be able to explain that the correlation is not meaningful since the states that spend less on education are over-represented in states where students take the ACT as opposed to the SAT. In these states the students that take the SAT are often those that are applying to competitive east-coast schools that require the test, making the data useless as a display of the effect of money on education. The more familiar an analyst is with the industry and the data (s)he is analyzing, the less likely (s)he is to be fooled by an apparent variable that looks to be predictive as in the case above. In the SAT example perhaps it is counter-intuitive enough that most people would question it. However, if the data showed the trend reversed most people would probably never question the apparent relationship. Another effective approach to building strong models in spite of the preponderance of data is to lessen the impact of any one variable selected by a model by averaging or ensembling the scores of several models. Within any 5

individual model the emphasis is to make it satisfy a certain threshold using as few inputs as possible. However we have found that most of the strongest models that we have created use a large number of input variables and lessen the impact any one of them can have through averaging. For example, a simplified, yet illustrative model flow is shown below: This flow shows the development data being partitioned and fed through a neural network, regression and a decision tree node before the ensemble node averages the scores of all three individual models to create a new model. Since none of the models select the same set of independent variables the total number is larger, however, the averaging helps to ensure that no one variable has a particularly large impact on the final score. This is particularly true for variables with weaker correlations to the target since these are less likely to be selected by all three modeling techniques. And it is within these weaker yet important inputs where the noise data are most likely to be found. When the data set that combined the random and non-random variables was run through the model diagramed above the power of the ensemble model is significantly higher than any of the individual models. Random variables were still selected as predictors; however their influence was reduced. There are several reasons this could be the case including the fact that the different models treatment of the random variables largely cancel each other out when combined. The strength of ensembling is also true of the data set without the random variables. As displayed in the ROC chart below the green line represents the averaged model and the others show the individual models. 6

CONCLUSIONS While there are many pitfalls associated with modeling in direct marketing they can be avoided through a careful analysis of the data combined with modeling techniques that do not place too much significance on variables that may in fact be noise. Predictive analytics will almost always enable better business decisions. While customer or donor behavior may have changed due to the current economic climate, the value provided by a model will always be better than that driven by instinct or gut-feel. When organizations try and cut the expense associated with analytics it might be incumbent on the analysts to once again take the detour as described by Davenport and Harris to prove the value of analytics (even if this has been done previously) by setting up regular experiments to show the difference between the performance of modeled selections for mailings versus a non-modeled select. REFERENCES: Van den Poel Dirk, Larivière Bart (2004), Attrition Analysis for Financial Services Using Proportional Hazard Models, European Journal of Operational Research, 157 (1), 196-217 Copas, JB (1983). Regression, prediction and shrinkage (with discussion). Journal of the Royal Statistical Society, B45, 311-354. Derksen, S & Keselman, HJ (1992). Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology, 45, 265-282. Hastie, T. and Tibshirani, R. (1996). Classification by pairwise coupling. Technical report, Stanford University and University of Toronto. ACKNOWLEDGMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: John Blackwell The Nature Conservancy 4245 N. Fairfax Drive Arlington, VA 22310 Work Phone: 703-841-8786 Email: jblackwell@tnc.org Web: www.nature.org Tracy DeCanio The Nature Conservancy 4245 N. Fairfax Drive Arlington, VA 22310 Work Phone: 703-841-4581 Email: tdecanio@tnc.org Web: www.nature.org 7