WHITE PAPER Using Segmented Models for Better Decisions Summary Experienced modelers readily understand the value to be derived from developing multiple models based on population segment splits, rather than a single model for the entire population, in any model development project. Analysis of data and model development on subpopulations reveals unique predictive patterns that build greater precision into the segmented model. However, experienced modelers are also aware of the pitfalls in segmentation analysis. It can be extremely time-consuming, and result in over-fitting segmentation trees with excessive leaf nodes that may or may not add value in predicting the target of interest. A common response to over-fitting, selecting the best and second-best splits to grow the segmentation tree, limits the possibility of finding the most promising initial splits, and splits at each node. Segmented models are collections of models, where each model is developed for a different portion of the total population. Since the relationship between predictors and the target is often different in each subpopulation, building a segmented scorecard frequently results in more powerful predictions than a single scorecard. Hence segmented models are widely recognized as a highly effective way to increase predictive power and capture interaction effects, while maintaining a completely sustainable and tractable scoring formula. However, the traditional process for deriving such systems is often too laborious to justify the effort. Simple decision tree growth algorithms are ill-suited to the task. This paper describes a solution that intelligently automates the search for optimal segmented models, and allows the analyst to quickly discover, compare, engineer and document the most suitable segmented models for their predictive modeling application. www.fico.com Make every decision count TM
Based on experience in developing thousands of segmented models, FICO developed a segmentation search algorithm within the Segmented Scorecard module of FICO Model Builder that enables modelers to combine pure machine learning with their own domain knowledge. With that, modelers can efficiently test innumerable combinations of segmentation variables, split points and split sequencing, and quickly derive the best possible segmentation schemes. Further, the tool provides unique collaboration features to accelerate the analysis, testing, documentation and implementation of the final segmented model, even when its segmentation scheme is defined or partially defined a priori. The Motivation for Segmented Model Development The process of predictive model development requires the identification of features in the data that exhibit clear and interpretable patterns with respect to the target variable of interest. For many predictive modeling applications it is common to develop a single predictive model for the entire population. However, segmented models can often produce better decision-enabling predictions through higher precision, lower bias, or both than a single, standalone model. With a segmented model, a segmentation tree is used to define nested population splits, and individual predictive models developed for leaf nodes of that segmentation tree. Each leaf node is commonly referred to as a subpopulation, and the predictive model, typically a scorecard, for each subpopulation is unique to that population, including unique choices of design constraints (fine and coarse binnings, user-defined restrictions, variable selections, etc.). Other training elements (e.g., target variable, sample weight, reason code list) are generally identical across the subpopulations. Typically the individual leaf-node models are scaled (or FIGURE 1: SINGLE VS. SEGMENTED SCORECARD Single scorecard J J JJL LJ J J J J J JJJ L L LL JL L JL L J Segmented scorecard J JJLJ J L LJ J J J J J J JJ L L L L L JL L J ScorecardPortfolio Income High Low Geography ScorecardL East West ScorecardHE ScorecardHW Single scorecard vs. segmented scorecard: a collection of three scorecards works as a hybrid of models to form a cohesive system that predicts more accurately than a single scorecard on the left. June 2014 2014 Fair Isaac Corporation. All rights reserved. page 2
aligned ) in such a manner that they all predict on a coherent, shared scale. By aligning all leaf node models to a single scale, score interpretation is possible without any reference to the subpopulation from which it arose. Thus, the scaling objective is another training element common to all leaf node models. Consider, for example, an analyst charged with the task of ranking a population of prospective customers by the likelihood that each will respond to an offer for a new low-interest credit card. One of the key predictors of responsiveness to this offer is the age of our prospective customers. As a standalone predictor, age might exhibit the following pattern: FIGURE 2: PROBABILITY OF RESPONSE VS. AGE P (Response) 12% 11% 10% 9% 8% 7% 6% 5% 4% 3% 2% 15 35 55 75 95 Age That is, when the entire population is ranked by age, the analyst notices that younger prospects are more likely to respond, and older prospects are less likely to respond. This discovery could be leveraged to accomplish the task at hand by including age in a predictive model (e.g., a scorecard) such that younger prospects receive a higher score (i.e., indicating higher response likelihood) and older prospects receive a lower score. To build a good model, the analyst will continue searching the data to find other predictors in addition to age that also exhibit useful patterns with respect to the target, and contribute favorably to the total power of the scorecard. FIGURE 3: PROBABILITY OF RESPONSE VS. AGE, SPLIT BY INCOME Lower income sub-segment P (Response) 12% 11% 10% 9% 8% 7% 6% 5% 4% 3% 2% 15 35 55 75 95 Age Higher income sub-segment P (Response) 12% 11% 10% 9% 8% 7% 6% 5% 4% 3% 2% 15 35 55 75 95 Age During the course of this investigation, however, the analyst also notices that not all younger prospects are highly likely to respond to this particular offer. In fact, the nice, intuitive response pattern shown above begins to look quite different when the population is split into subsegments representing different levels of income. 2014 Fair Isaac Corporation. All rights reserved. page 3
It turns out that the relation of response to age is much stronger in the lower income subsegment when compared to the overall population. In fact, it appears that age becomes almost useless as a predictor of response when income is high. Rather than build a single model for the entire population, this discovery leads the analyst to build a separate model in each of these subpopulations created by splitting on income, with age included as a predictor in the lower income scorecard but not in the higher income scorecard. As the analyst continues with her analysis in these two subpopulations, she discovers some additional variables that exhibit different predictive patterns across the two groups. This leads her to develop a segmented model that yields much more accurate estimates of responsiveness than can be accomplished with a single model trained from the total population. The segmented model is a strong predictor because it is able to capture the heterogeneous nature of population subsets. For example, in credit card account management, the spending habits (and more importantly, risk patterns connected to those spending habits) are known to differ between revolvers (card holders carrying a balance each month) versus transactors (card holders paying their statements in full each month). For auto insurance providers, the relevance of different predictors may change by age group. For example, academic measures such as grade point averages might be useful in predicting driving risk for new drivers, but are entirely irrelevant for more experienced drivers. In some cases, we find the oft-imagined (but rarely encountered) perfect interaction, where the pattern of a good predictor is even reversed along two sides of a subpopulation split. For subpopulation A, the variable is ascending in risk, while for subpopulation B, it is descending in risk. However, this is quite rare. More commonly, we see partial reversals where the bottom of a u-shaped pattern (or the top of an n-shaped pattern) varies by subpopulation, and thus there are local areas of a predictor where one subpopulation sees a rising pattern, and the other still sees a declining pattern. Successful segmentation schemes create predictive lift by locating and capturing the additional signal arising from three areas that might vary across well-designed subpopulations: 1. Differences in available information (selected, relevant predictors). An example of this is a split on no delinquency versus any delinquency. On one side of that split you will have models that include measures (severity, frequency and recency) of delinquency, and on the other side, by definition, you will have no such predictors. 2. Differences in predictive magnitude (relative strengths of predictors). A possible example of differences in predictive magnitude is a split on driver age. For very young drivers, the grade point average (GPA) is a strong predictor, but for post-high school and post-college age drivers GPA is less of a predictor (if any at all). 3. Differences in predictive pattern (slope or peak of in-common predictors). In a credit scoring model, on both sides of a split on modest versus no delinquency the revolving credit utilization will be predictive, but it might be more predictive on the no delinquency side. Figure 4 illustrates cases [1] and [2]. Figure 3 is an example of [3]. 2014 Fair Isaac Corporation. All rights reserved. page 4
FIGURE 4: Why Build Segmented SCORECARDS? Predictors and Their Patterns Can Vary Across Subpopulations Past Payment History Dirty Clean Highest utilization Net fraction revolving burden Length of credit history # of inquiries in last 12 months Months since last 30+ days past due Severity of worst delinquncy ever Usage and Signal Strength The Shortcomings of the Traditional Approach to Segmented Model Development In the absence of an automated approach, some analysts use a manual search process. Typically, this involves starting with the entire population and building a scorecard to use as a baseline. Then, the search begins by using their business knowledge to split the entire population into two segments. The modeler then builds a model for each segment to see if making that split enabled them to build a segmented model that is more predictive than the baseline scorecard. If the baseline is better, the user tries making a different initial split, again based on business expertise. If the segmented model is better, the modeler continues, using business knowledge again to select a variable and a split threshold to divide one of the two populations, thus creating three segments. The modeler builds a model for each segment and compares to the baseline and the segmented model with two segments. The modeler continues recursively, until either predictive power stops improving, or the analysts run out of time or energy. Another common practice seen in the industry is to begin with the decision tree techniques such as CHAID or C&RT to develop a decision tree model that predicts the target of interest. Techniques such as cross validation are used to prune the tree back and reduce over-fitting. As you can see in the example in Figure 5, even once the tree is pruned it may have more segments than you would like for developing a segmented model. 2014 Fair Isaac Corporation. All rights reserved. page 5
FIGURE 5: CHAID TREE PREDICTING CREDIT RISK Occupation={Banker Doctor Executive Lawyer} CbUtilization<=75.00% 91.1%: Good (81488) 97.4%: Good (29227) Occupation={Manager Other} 87.5%: Good (52261) FinanceCompany={Citizens Liberty} 92.2%: Good (37215) FICOScore<=502 84.3%: Good (13492) 502<FICOScore<=595 93.2%: Good (9682) FICOScore>595 99.0%: Good (14041) CbUtilization<=24.98% 90.2%: Good (7927) CbUtilization>24.98% 75.9%: Good (5565) CbInquiriesLast5Months<=8 77.9%: Good (5115) CbInquiriesLast5Months>8 53.6%: Good (450) AutoLoanRisk5 87.4%: Good (100000) FinanceCompany={EFinance National} 76.1%: Good (15046) CbUtilization>75.00% 71.3%: Good (18512) FICO Score<=502 48.0%: Good (6787) 502<FICOScore<=600 70.3%: Good (3691) FICOScore>600 91.4%: Good (8034) CbInquiriesLast5Months<=6 54.5%: Good (4668) CbInquiriesLast5Months>6 33.6%: Good (2119) LoanToValueRatio<=83 71.1%: Good (1201) LoanToValueRatio>83 48.8%: Good (3467) Of course, the fitting objective of CHAID and C&RT trees is to attain a purity in the leaf nodes with respect to the target, and to make the tree itself a direct predictor of the target variable. While these are familiar tree-growing algorithms, and they provide useful guidance, they have no specific abilities to ensure the tree will represent a good segmentation scheme for building a segmented model. On the other hand, the goal of the segmentation tree is not to predict the outcome directly, but rather to discover distinct populations to develop models on that will result in the most powerful scoring system based on variations of the predictors and their patterns across leaf models. Given this distinction, how can we make best use of CHAID and C&RT tools? Rather than simply proceeding to develop a segmented model based on the best tree that comes directly out of a CHAID or C&RT algorithm, it is common to use the insights from the algorithm to develop a handful of candidate segmentations to consider. In the example shown in Figure 6, we investigated two segmentations, selecting the best and second best initial split using C&RT. Looking at the first candidate, we can see that the tree is effective in finding subpopulations that have high and low bad rates respectively (51.98% vs 2.65%). The second candidate is FIGURE 6: WHICH SEGMENTATION? Candidate 1 Candidate 2 AutoLoansRisk5 12.60% (100000) CbUtilization<=75.003 8.94% (81488) CbUtilization>75.003 28.70% (18512) Occupation=(Banker Doctor Executive Lawyer) 2.65% (29227) Occupation=(Manager Other) 12.46% (52261) FICO Score<=502.5745 51.98% (6787) FICOScore>502.5745 15.22% (11725) AutoLoansRisk5 12.60% (100000) FICO Score<=500.0055 26.58% (21956) FICOScore>500.0055 8.66% (78044) LoanToValueRatio<=89.9335 18.53% (9825) LoanToValueRatio>89.9335 33.11% (12131) Income<=5009.24 19.14% (9453) Income>5009.24 7.22% (68591) 2014 Fair Isaac Corporation. All rights reserved. page 6
not as effective at finding subpopulations with high and low bad rates, although that does not necessarily mean it is an inferior tree for building a segmented model. Indeed, a poorly predicting tree can be an excellent segmentation tree. This is because finding a tree with purity in the leaf nodes is a very different goal than finding a tree that identifies interactions that will allow productive leaf-node modeling! A Much-Improved Approach to Segmented Scorecard Development FICO has been a worldwide leader in the development of highly effective and robust segmented model systems for more than 25 years. We have developed unique technologies that help streamline the search process and identify the combination of splitters and split points to define an overall segmentation tree that is best for segmented modeling. Several years ago, we designed the Adaptive Random Trees technology to use a global procedure for finding a great segmented model. After extensive research, our analysts have furthered their thinking on segmentation modeling and have come up with an even better method for searching thousands of combinations of segmentations and models to find the best segmented model. The Segmented Scorecard module in FICO Model Builder searches the vast space of potential segmentation trees to identify an optimized segmented model system. Based on userspecified splitter variables, the module s search algorithm analyzes thousands of candidate splits using scorecards to calculate precise performance estimates, ensuring only the most powerful splits are chosen. It then ranks and returns the best segmented model for further refinement and allows for immediate tuning of the subpopulation models. Even if you plan to use existing segmentation schemes, the Segmented Scorecard module helps you build better model suites in less time by centralizing the definition and management of all leaf node models, enabling collaborative, parallel development, and providing a complete deployment of the modeling system without manual recoding. How FICO Segmented Models Work Rather than using inspiration from decision tree analysis to manually construct a handful of candidate segmentations, FICO s best practice solution begins by considering every possible initial split and then developing scorecards with automated binning and variable selections for each leaf node. The true segmented model divergence is computed to identify the best splits as the trees are grown. While a greedy algorithm would immediately focus on building out the two-way split that resulted in the best two-leaf segmented model, FICO looks at a diverse handful of the best initial splits and proceeds recursively to identify the best segmented models with three-leafs, four-leafs, etc., until an analyst s stopping criteria is obtained. This is often based on a maximum tree size (number of leaves), or minimum counts requirements for leaf node. This approach keeps track of not just the best split at each node, but rather a number of promising splits. A key principle of FICO s patented Segmented Scorecard module is the automated training of scorecards to quantify the true performance of each candidate split as the tree is recursively grown. An additional benefit is obtained from retaining the best N splits at each node for further growth, creating a variety of similarly performing but differently structured trees, and avoiding the pitfalls frequently associated with greedy searches. The algorithm retains the most promising candidate segmented models. To streamline review, when the algorithm terminates, comparison reports document summarized and detailed information for the most predictive segmented models. 2014 Fair Isaac Corporation. All rights reserved. page 7
Data Requirements of FICO Segmented Models Generally, developing a segmented model has similar data requirements as developing a single model for the entire dataset. The dataset should contain a target variable to predict (e.g., has the account ever been 60+ days past due), a set of candidate predictor variables, and optionally a sample weight for each observation. In addition to these general requirements, it is important that the dataset have enough records, such that when the data is partitioned into segments, each segment has enough examples to build a robust model. If the target is a binary outcome, then each segment must have enough examples of each of the outcomes (e.g., goods and bads) to build a model. Suppose that the modeler wants to have at least 500 examples of good customers and at least 500 examples of bad customers for building a model for a single segment. Assume also that the modeler will use bootstrap validation so that all 500 examples of the good and bad customers can be used for training the model for that segment. Then, for a segmented model with two segments, at least 2,000 records would be required. Typically, where segmented models have anywhere between 5 and 50 segments, the number of records required for segmented model development could be well above 10,000 records. If the modeler wishes to use 50% of the data for training and 50% for testing, rather than bootstrap, then that requirement doubles. The key point is that the modeler must consider the number of records available for training in each segment and potentially acquire a much larger sample for developing a segmented model than for building a single model. User Inputs In addition to providing the data, segmented modeling allows the analyst to provide input to guide the development of the segmented model. The first key input that an analyst can provide is a scorecard developed on the entire dataset. This serves as a benchmark against which to compare all of the segmented models that will be developed and evaluated. As discussed above, the analyst may wish to constrain the segmentation to ensure a particular minimum number of examples will exist in each segment. This can be specified both in terms of the total examples in the segment and/or in terms of the required number of examples of the rare outcome (e.g., the number of bads). Adding these constraints ahead of time will ensure that the resulting candidate segmentations will be useful for segmented model development. In order to search for a segmentation, the algorithm will need to know the candidate list of variables to use as splitters. The list of splitters can include the same candidate variables that will be used in the models developed for each segment, or can include different candidate variables. For each splitter, the algorithm will also need to know the desired potential split points. The split points are defined in a binning library so that the analyst can select split points that are at the desirable level of granularity. For example, when segmenting on a variable like Income, it would make sense to constrain the split points to things like $30k, $40k, $50k,... $100k, $200k, etc. so that, first, each potential split contains sufficient examples to build a model, and second, so that the split points are round numbers that are presentable to a model reviewer rather than a figure such as $30,531, which would make the resulting segmentation difficult to interpret. The binning functionality and rounding rules in FICO Model Builder allow the analyst to quickly define meaningful candidate split points for each splitter variable. 2014 Fair Isaac Corporation. All rights reserved. page 8
With the candidate split points defined, the analyst may wish to provide business guidance to the search process. This can be done by providing a partial ordering on the splitter variables, defining starting trees to grow out further, or by manually defining fixed segmentation trees for comparison. First, sometimes business users want to specify in which order the variables are used in the segmented model. For example, the user might require that the first split be on Product Type, the second split be on Risk Score, or on Attrition Score, and that the algorithm is then free to choose any variables for the subsequent splits. It is also common that businesses will want to define the first few splits in a segmented model. For example, the first split might be mandated to be on the distinction between transactors and revolvers. Then the algorithm might be asked to consider making additional splits for the transactors, for the revolvers, or for both. Finally, the user may wish to specify one or more champion trees that are to be evaluated by automatically building models for each segment they define without any modification to the given champion trees. Once a segmentation is defined, it will be evaluated by building a model for each segment. That process can also be controlled. The analyst may desire to specify the candidate variables to be used in the model that will be automatically built for each segment. If the model is a scorecard, the analyst may also desire to define the initial binning of each candidate variable. With guidance provided to the algorithm, the system can proceed to identify the optimized segmented model. Search Algorithm The following diagram (figure 7) illustrates the search for the optimized set of segmented models. The algorithm will return the top segmented model found. FICO Model Builder will also provide a comparison of the structure and performance of the top models, an analysis of the segmentation logic, segment statistics, variable usage in each segment and details regarding the variables and bins in each segment s model. There are a number of key points to notice when reviewing the algorithm: First, the algorithm starts with a population of trees where each tree splits on a different candidate splitter variable. The algorithm then attempts to grow out each of the trees. This is important to avoid a greedy result where even though the best first split might be on Variable A, a better result may be obtained by splitting on Variable B and making subsequent splits. Second, the quality of a segmented model is evaluated not based on how well the tree predicts, but on how well the segmented model predicts when the tree is used to route data to segment-level models that are specialized for the segment. Third, the algorithm returns not only the best segmented model, but the top segmented models along with segmented models for any segmentations manually entered by the analyst. This enables the analyst to understand performance with respect to alternative segmented models that vary in terms of complexity, palatability and model performance. 2014 Fair Isaac Corporation. All rights reserved. page 9
FIGURE 7 Input Train Data Validation Data Settings Parent Model Initialize Start with given initial trees or one empty tree. Yes Initial Statistics Compute initial statistics on the data. More Trees to Grow? Are any leaf nodes further splittable? No Output Population of trees with their optimization measure Best tree Next Split Node For every tree find the next node to split (node with greatest number of records). Generate Splits For every splitter variable generate candidate splits for the next split node of every tree. Node and Split Stats Train new pairs of predictive scorecard models for each side of every candidate split. Compute net increase in objective function to assess the value of each candidate split. Choose Best N Splits For each input tree, retain the top N augmented trees, as ranked by increase in objective function from their newly added splits. Iteration Result Take the chosen split on each tree to generate next tree population. 2014 Fair Isaac Corporation. All rights reserved. page 10
Giving the Analyst Control The flexible nature of the Segmented Scorecard module gives the analyst unprecedented control over the segmented model. This tool assumes that the analyst s best thinking will bring about the best model and therefore gives the analyst room to experiment to best determine the path to effective decisions. The Segmented Scorecard module in FICO Model Builder provides capabilities to: Automatically rank the performance of each segmented model in a variety of measures. Visualize and quickly navigate segmentation trees. Enable business experts to refine discovered trees to infuse domain knowledge, or add splits required by the business process. Empower analysts to tune every aspect of the search algorithm, including: Comparing results against existing trees. Specifying initial, partial trees to be empirically grown. Setting binning, constraint and variable selection criteria to use in automated model training. Controlling the trade-off between depth of search and execution time. Specifying splitters to consider for the first, or any, level of the tree. Controlling maximum tree size. Global lender improves profit by 26% A global lender wanted to improve its bottom line by reducing its bad debt rates. FICO applied its segmented modeling technology to the lender s model development process. The result was a new segmented scorecard designed to achieve maximum precision in predicting delinquency and default behavior. The lender projects a 26% profit increase or $440k per 100k new applicants to its portfolio. This approach is also projected to yield a significant 8.5% profit improvement over the lender s old approach of manually designing a segmented model system. Business Value of Segmented Models By automating the process of identifying optimized segmented models, the Segmented Scorecard module delivers substantial benefits to your model development process. The Segmented Scorecard module allows you to: Boost the precision of your models. You can realize the bottom-line benefits of optimally tuned segmented model on nearly every modeling project. Segmented models naturally capture the most important interactions in the dataset. Unlike other powerful modeling paradigms, such as neural networks, the resulting segmented model gives the user immediate insights into the interactions among variables and the predictive patterns that are most important for modeling the populations in each segment. Accelerate the model development process. Automating key steps dramatically reduces the time to develop effective segmented models. The Segmented Scorecard module not only finds great segmentations, but guides the user through the process of choosing, refining and deploying the final segmented model with all the required elements such as reason code assignments and score scaling parameters. 2014 Fair Isaac Corporation. All rights reserved. page 11
Improve efficiency and reduce costs. The segmentation discovery process can yield superior segmentation trees with fewer leaves than manually created trees, reducing the number of models required, and thus lowering the costs to develop, deploy and maintain the best segmented model. Segmented Modeling in Practice FICO has applied segmented modeling across a wide range of industry applications. As the inventor of this technology, FICO s own model development teams have benefited from the accelerated segmented modeling process and have been able to reap the benefits of segmented modeling on virtually any modeling project with ample data. Our clients have also benefited from much more effective segmentation solutions enabled by a smarter tree search technology. Segmented Scorecard Module Availability The Segmented Scorecard is available as an optional module for the FICO Model Builder platform. The Segmented Scorecard module builds upon Model Builder s core capabilities for importing data, visualizing and exploring predictive patterns, defining predictive variables, creating scorecards and other types of predictive models, evaluating their quality and swiftly deploying predictive analytics. FICO (NYSE: FICO) is a leading analytics software company, helping businesses in 90+ countries make better decisions that drive higher levels of growth, profitability and customer satisfaction. The company s groundbreaking use of Big Data and mathematical algorithms to predict consumer behavior has transformed entire industries. FICO provides analytics software and tools used across multiple industries to manage risk, fight fraud, build more profitable customer relationships, optimize operations and meet strict government regulations. Many of our products reach industry-wide adoption such as the FICO Score, the standard measure of consumer credit risk in the United States. FICO solutions leverage open-source standards and cloud computing to maximize flexibility, speed deployment and reduce costs. The company also helps millions of people manage their personal credit health. Learn more at www.fico.com. For more information North America Latin America & Caribbean Europe, Middle East & Africa Asia Pacific www.fico.com +1 888 342 6336 +55 11 5189 8222 +44 (0) 207 940 8718 +65 6422 7700 info@fico.com LAC_info@fico.com emeainfo@fico.com infoasia@fico.com FICO and Make every decision count are trademarks or registered trademarks of Fair Isaac Corporation in the United States and in other countries. Other product and company names herein may be trademarks of their respective owners. 2014 Fair Isaac Corporation. All rights reserved. 3085WP 06/14 PDF