1 THE PREDICTIVE MODELLING PROCESS Models are used extensively in business and have an important role to play in sound decision making. This paper is intended for people who need to understand the process for developing predictive models because they interface in some way with technical analysts. This could be as a business user who interacts with the analytics team, a person involved with the preparation of data sets or a user of the outputs from a modelling process. This paper does not instruct people on how to build models, but covers the steps involved and the practical issues to consider. WHAT IS A MODEL? A mathematical model is an expression of relationships between variables, frequently in the form of an equation. An equation is an expression that contains an equal sign (=). To many the concept of an equation, or model, seems more complex than it actually is. The model itself is not complex, complexity arises from the process and work required to generate a good model. Consider the following, R = P x Q, where R is the revenue, P is the price and Q is the quantity sold. This equation can be used to calculate revenue by multiplying the price by the quantity sold. The equation defines mathematically the relationship between the two variables Q and R. That is, given the price of a product we can calculate R for any value of Q. This is the basis for mathematical models. Equations (such as the one above) can be represented by a straight line on a graph and are referred to as linear equations. In real life relationships between variables are more complex, to represent this complexity in equations, variables are squared, rooted and cubed etc. These equations are referred to as non-linear since they cannot be represented by a straight line. This paper focuses on the approach required to develop sound predictive models. Predictive modelling is the process of using past data to determine statistically, what is likely to happen in the future on the assumption that past trends will continue to apply. One application is in the forecasting area: predicting what will happen in the short to medium term future, in order to take pre-emptive measures such as allocating resources. Another use of predictive modelling is to identify individuals who are most likely to respond positively to some intervention. They can then be THE PREDICTIVE MODELLING PROCESS SALLY CAREY DATAMINE LTD 1

2 preferentially targeted to achieve maximum effect. Applications in the marketing arena include customer acquisition, customer retention and cross-selling. A key aspect of predictive modelling is the application of regular feedback, updates to reflect current conditions and maximise efficiency to ensure a model that can accurately predict an outcome for a customer base. STEPS IN PREDICTIVE MODEL DEVELOPMENT PLAN BUILD IMPLEMENT Define the objective Build model Apply model Create data sets Calculate a score Rank customers Validate model Drive initiatives Figure 1: steps in predictive model development Define the objective A clear specific objective for the model is required. Each model is developed for a specific purpose and cannot be used effectively in another situation. For example a model that predicts home loan customer churn cannot be used to predict credit card churn. An example of a clearly defined model objective contains the event or action that the model is to predict and the period it is likely to happen. For example, the objective could be to predict customers that are likely to churn their credit card within the next month. THE PREDICTIVE MODELLING PROCESS SALLY CAREY DATAMINE LTD 2

3 Create data sets An important step in the modelling process is the creation of the dataset to use for the model development. Broadly the dataset covers behavioural, demographic, geographic and external eg competitor information or weather. Variables that are not included in the dataset will not form part of the prediction. Variables cover both static fields such as income and triggers such as change in spend. Both technical and business people need to be involved in the decisions regarding the contents of the dataset. Focus needs to be on the behavioural information as this is more powerful for predictions than demographic data. Consideration needs to be given to which customers to exclude from the model build process. Customers need to be excluded if they are going to impair the performance of the model. Potential exclusions include bad debt, staff and new customers as they have insufficient history. The actual exclusions applied relates to the specific purpose of the model. For example, if the model is to predict customers that are likely to become bad debts, bad debt customers would be included in the model! Build statistical model The model will be built using a sample from the data set created. This is the part that can be left with the technical analysts. The resulting model will contain a subset of the original list of variables considered for the model. This is ok, and happens because some of the variables considered for the model will be correlated with each other, for example floor number and height of building. Others will have been discarded as they add little or nothing to the model s predictive power. The building of statistical models is the domain of statisticians and the technical aspects are not covered in this paper. Calculate a score The model developed will be an equation that, when applied to the customer base, will allocate a score to each customer. The score represents a customers likelihood to do whatever the model is predicting. For example, predicting a customer s likelihood to churn within a month or predicting a customer s expected order value. THE PREDICTIVE MODELLING PROCESS SALLY CAREY DATAMINE LTD 3

4 Validate model Typically models are validated against a hold out group. This group contains customers that have not been included in the development of the model. As such, they represent a group of previously unseen customers that are representative of the customer base. To achieve an accurate prediction of lift the hold out group must not be made up of the customers that for one reason or another have been excluded from the model development process. Figure 2: shows that by targeting the top 20% identified by the model, almost 40% of the targets the model is aimed at are actually identified. If left to a random selection only 20% would occur in 20% of the base. Apply the model The model will be built on a subset of data. Once the model is complete and has been validated the model will be run over the customer base. THE PREDICTIVE MODELLING PROCESS SALLY CAREY DATAMINE LTD 4

5 Rank These scores allow a customer base to be ranked in order of the predicted score, such as from highest expected order value to lowest or from most likely to least likely to churn. In reality some customers will churn and some customers will not. Therefore, in absolute terms, these predictions will not be accurate! The ranked list provides a superb base upon which to vary the treatment, and therefore the level of service or marketing spend, to groups of customers. Other useful considerations Key issues that arise are how often to run the model across the base and score the customers. This will depend on the actions being taken as a result of the model and the speed of change within the customer base and market. For example, telecommunications is a faster moving industry than insurance. Once a model has been applied and actions are being taken there is a requirement for tracking and managing the interactions. For example, control files will be required to test initiatives and to test the model performance. The results from this will help determine how frequently the model needs to be refreshed. As a rule of thumb, a model needs to be reviewed, and possibly rebuilt, annually. THE PREDICTIVE MODELLING PROCESS SALLY CAREY DATAMINE LTD 5

6 MEASURING EFFECTIVENESS The following provides a framework for monitoring the effectiveness of the model and communications. It is recommended that this is discussed and implications on implementation addressed during the model build process. Customer Base High Score Customers Lower Score Customers 90% for programme 10% control 10% control group 2 Tests the effectiveness of the communications Tests the accuracy of the model BUSINESS USER KEY INVOLVEMENT To summarise, the following are the key times that the business user needs involvement in the modelling process Communicating and verifying the purpose of the model Contributing to the list of variables for consideration in the model Identifying exclusions from the model build Organising the implementation of model related communicating and testing Socialising and on-boarding the use of models within the business Being a sounding board and supporting the analysts with their endeavours! THE PREDICTIVE MODELLING PROCESS SALLY CAREY DATAMINE LTD 6

7 BRIEF AUTHOR BIOGRAPHY Sally Carey Sally is Director of Datamine Ltd, a New Zealand based analytics consultancy that moves its clients beyond guesswork. Sally has over 25 years of B to B and B to C marketing and using quantitative approaches for business decision making. Sally has an MBA from Bradford University (UK) and is a Fellow of the Institute of Direct & Digital Marketing (UK). Sally believes that extraordinary results are achieved by a combination of analysis and intuition, and have even been referred to by some clients as magic. Key words: analytics, predictive modelling, process, model, equation THE PREDICTIVE MODELLING PROCESS SALLY CAREY DATAMINE LTD 7

### Acquiring customers profitably. With Credit Bureau Scores

Acquiring customers profitably With Credit Bureau Scores Uncover the true face of new customers before it s too late In the current climate, characterized by tough competition and economic slowdowns, identifying