Credit Rating Prediction Using Ant Colony Optimization

Transcription

1 Credit Rating Prediction Using Ant Colony Optimization David Martens a,b Tony Van Gestel c,d Manu De Backer a Raf Haesen a Jan Vanthienen a Bart Baesens e,a a Department of Decision Sciences & Information Management, K.U.Leuven Naamsestraat 69, B-3000 Leuven, Belgium {David.Martens;Manu.DeBacker;Raf.Haesen;Jan.Vanthienen}@econ.kuleuven.be b Department of Business Administration and Public Management Hogeschool Gent Voskenslaan 270, Ghent 9000, Belgium David.Martens@hogent.be c Credit Risk Modelling, Group Risk Management, Dexia Group Square Meeus 1, 1000 Brussel, Belgium Tony.Vangestel@dexia.com d Department of Electrical Engineering, ESAT-SCD-SISTA, K.U.Leuven Kasteelpark Arenberg 10, B-3001 Leuven (Heverlee), Belgium e University of Southampton, School of Management, United Kingdom Highfield Southampton, SO17 1BJ, United Kingdom Bart@soton.ac.uk Abstract The introduction of the Basel II Capital Accord has encouraged financial institutions to build internal rating systems assessing the credit risk of their various credit portfolios. One of the key outputs of an internal rating system is the probability of default (PD), which reflects the likelihood that a counterparty will default on his/her financial obligation. Since the PD modeling problem basically boils down to a discrimination problem (defaulter or not), one may rely on the myriad of classification techniques that have been suggested in the literature. However, since the credit risk models will be subject to supervisory review and evaluation, they must be easy to understand and transparent. Hence, techniques such as neural networks or support vector machines are less suitable due to their black box nature. Building upon previous research, we will use AntMiner+ to build internal rating systems for credit risk. AntMiner+ allows to infer a propositional rule set from a given data set hereby using the principles from Ant Colony Optimization. Experiments will be conducted using various types of credit data sets (retail, small- and medium-sized enterprises (SMEs) and banks). It will be shown that the extracted rule sets are both powerful in terms of discriminatory power, and comprehensibility. Furthermore, a Preprint submitted to Elsevier 29 October 2008

2 framework will be presented describing how AntMiner+ fits into a global Basel II credit risk management system. Key words: Ant Colony Optimization, Classification, Credit Scoring, Bankruptcy Prediction, Basel II 1 Introduction Over the past decades, financial institutions have seen an ever growing need for quantitative analysis techniques to optimize and monitor decisions related to risk and investment management. The gradual adoption of data warehousing and knowledge discovery in data (KDD) technology is allowing these institutions to analyze ever larger amounts of data, using a range of powerful techniques from various disciplines such as conventional statistics, machine learning, neurocomputing, and operations research. This process is only being further accelerated by the recent implementation of several international financial and accounting standards (such as Basel II, Solvency II, Sarbanes- Oxley and IFRS). For example, by allowing banks to use their internal credit risk assessment models as input for the minimum regulatory capital calculations, the Basel II framework is providing financial institutions with additional incentives to refine existing credit scoring models since more accurate predictions require less conservative capital requirements. Hence, there has been a growing interest throughout the financial world in research on novel data mining techniques and information technologies to support the implementation of such compliance frameworks. As a result of a longstanding interest from the research community, a myriad of techniques have been proposed for many of the aforementioned problems, in particular for classification problems such as credit scoring and bankruptcy prediction. However, not all of these approaches have proven readily transferable from the academic domain to financial practice. Many of the representations applied by the suggested algorithms cannot be easily interpreted and validated by humans. For example, neural networks are considered a black box technique, since the reasoning behind how the non-linear prediction models reach their conclusions cannot easily be obtained from their structure. This has not only hindered their acceptance by practitioners, but also fails to address the increasing need for transparency under various regulatory frameworks. Credit risk analysts are unlikely to accept black box techniques such as neural networks to make credit decisions, since under the Basel II accord, they are now required to demonstrate and periodically validate their models, and present reports to the national regulator for approval. Therefore, recent research proposed the use of rule-based classification techniques to generate 2

3 powerful, as well as intuitive and transparent decision models. Such a rule-based classification technique that has recently been proposed is AntMiner+, which uses Ant Colony Optimization (ACO) to infer accurate rules from the data. This paper will describe how this technique can be used to generate comprehensible credit scoring models, which can then be fit into a Basel II-compliant decision support system. The paper is structured as follows. The next Section discusses the issues related to building credit scoring models within the Basel II regulatory framework. Section 3 provides an overview of the AntMiner+ classification technique, as well as an introduction to ACO on which the technique is based. The experimental Section 4 provides AntMiner+ credit scoring models for retail banking, small and medium-sized enterprises (SMEs) and banks. Section 5 describes the further steps needed to obtain a Basel II compliant decision support system, and finally, Section 6 concludes the paper. 2 Credit Scoring and Bankruptcy Prediction within Basel II The recent introduction of the Basel II Capital Accord encourages financial institutions to calculate their minimum regulatory safety capital to ensure that they are able to return depositor funds at all times [? ]. The minimum safety capital is determined at 8% of risk weighted assets, which are in turn quantified taking into account three types of risk: credit risk, operational risk and market risk. In calculating credit risk, banks must use three key risk parameters: probability of default (PD), loss given default (LGD) and exposure at default (EAD). These three parameters are then used as input to a Merton/Vasicek model which then calculates the regulatory safety capital [? ]. The PD, LGD and EAD parameters can be obtained in three different ways. The standard approach for credit risk allows banks to buy risk ratings from external rating agencies, often called External Credit Assessment Institutions (ECAIs) in the spirit of the Accord. Examples of well-known ECAIs are Moody s, Standard & Poor s and Fitch. The risk ratings are then translated to risk weights provided in the Accord, which then allow to calculate the risk weighted assets (RWA) and as such the regulatory capital. The foundation internal ratings based (IRB) approach allows banks to build their own PD models and get LGD and EAD estimates from the supervisors, whereas the advanced internal ratings based approach allows financial institutions to estimate all three risk parameters themselves. Many financial institutions in Western Europe, Asia and the US are currently taking steps to implement the advanced IRB approach. More than ever, this has triggered the interest and need to develop credit scoring and bankruptcy prediction models for estimat- 3

4 ing the PD of a set of obligors. For retail portfolios, application scoring models will be developed that try to quantify the credit risk of a set of recently acquired customers, given their application characteristics (e.g. age, marital status, credit history, savings amount,...). Behavioural scoring models will be used to monitor the credit risk of the existing customer base, given their most recent behaviour (e.g. average checking account status during previous month, number of credit cards,...). For small and medium-sized enterprises (SMEs), financial institutions will develop bankruptcy prediction models that will quantify the risk of financial failure given a set of accounting ratio s and measurements. For both retail and SME type of obligors, one can usually assume that a sufficient number of defaults are present in order to make statistical discrimination and classification meaningful. However, for certain type of counterparties, such as banks, insurance companies and sovereign entities, the lack of default observations necessitates the use of alternative methods. In this context, financial institutions will often build rating models hereby mimicking a set of externally provided ratings (e.g. by an ECAI) given a set of candidate explanatory variables collected by the institution. Ideally, the credit scoring, bankruptcy prediction and rating models should be very powerful in terms of discriminatory power, so as to minimize the cost of granting credit to bad customers or the profit lost when good customers are rejected. Since these models now play a pivotal role in the risk management strategy of a bank, they are also subject to supervisory review and validation by financial regulators. Furthermore, in most countries, financial institutions are obliged to explain why credit has been denied to an applicant. Both these trends basically prohibit the use of black box, mathematically complex application scoring models, but instead stimulate the use of comprehensible, easy-to-understand models. Numerous classification techniques have been adopted for credit risk measurement and for financial forecasting in general. These techniques include traditional statistical methods (e.g., discriminant analysis and logistic regression [?? ]), nonparametric statistical models (e.g., k-nearest neighbor [?? ], decision tree [?? ] and rule learners [? ]) and neural networks [??? ]. Often, conflicts may be found when the conclusions of some of these studies are compared. In [? ], a large-scale benchmarking study compares the classification performance of various state-of-the art classification techniques on eight real-life credit scoring data sets. It concludes that neural networks perform very well in terms of classification accuracy. However, their opacity and black box nature prevents them from being used in a Basel II context. That is why in this paper, we will use the rule-based classification technique, AntMiner+, which provides comprehensible, accurate models that are in line with existing domain knowledge. 4

5 3 AntMiner+: Classification based on Ant Colony Optimization 3.1 Ant Colony Optimization Ant Colony Optimization (ACO) is a metaheuristic inspired on the foraging behavior of real ant colonies [? ]. A biological ant by itself is a simple insect with limited capabilities, and is guided by straightforward decision rules. However, these simple rules are sufficient for the overall ant colony to find short paths from the nest to the food source. By dropping a chemical substance called pheromone that attracts other ants, an ant indirectly communicates with its fellow ants from the colony. How this indirect communication leads to shortest path finding capabilities is shown in Fig. 1. Suppose two ants start from their nest (left) and look for the shortest path to a food source (right). Initially no pheromone is present on either trails, so there is a chance of choosing either of the two possible paths (see Fig. 1(a)). Suppose one ant chooses the lower trail, and the other one the upper trail. The ant that has chosen the lower (shorter) trail will have returned faster to the nest, resulting in twice as many pheromone on the lower trail as on the upper one, as illustrated in Fig. 1(b). As a result, the probability that the next ant will choose the lower, shorter trail will be twice as high, resulting in more pheromone and thus more ants will choose this trail, until eventually (almost) all ants will follow the shorter path. Note that the pheromone on the longer trail will finally disappear through evaporation. Ant Colony Optimization employs artificial ants that cooperate in a similar manner as their biological counterparts, in order to find good solutions for discrete optimization problems [? ]. The first ACO algorithm is Ant System [?? ], where ants iteratively construct solutions and add pheromone to the paths corresponding to these solutions. Path selection is a stochastic procedure based on not only a history-dependent pheromone value, but also a problem-dependent heuristic value. The pheromone value gives an indication of the number of ants that chose the trail recently, while the heuristic value is a problem dependent quality measure. When an ant reaches a decision point, it is more likely to choose the trail with the higher pheromone and heuristic values. Once the ant arrives at its destination, the solution corresponding to the ant s followed path is evaluated and the pheromone value of the path is increased accordingly. Additionally, evaporation causes the pheromone level of all trails to diminish gradually. Hence, trails that are not reinforced gradually lose pheromone and will in turn have a lower probability of being chosen by subsequent ants. The performance of traditional ACO algorithms, however, is rather poor on large instance problems [? ]. To overcome this issue, other ACO algorithms 5

6 have been proposed, such as Ant Colony System [? ], rank-based Ant System [? ], Elitist Ant System [? ] and MAX-MIN Ant System [? ]. As the latter is the one employed in the AntMiner+ classification technique, the main features of MAX-MIN Ant System are discussed next. Stützle et al. [? ] advocate that a better exploitation of the best solutions can be obtained by only adding pheromone to the path of the best ant. To avoid early search stagnation, which is the situation where all ants take the same path and thus describe the same solution, possible pheromone values are limited to the interval [τ min,τ max ]. Finally, initializing the pheromone values to τ max entails a higher exploration at the beginning of the algorithm. ACO has been applied to a wide variety of problems [? ], such as the vehicle routing problem [??? ], scheduling [?? ], timetabling [? ], the traveling salesman problem [??? ] and routing in packet-switched networks [? ]. Recently, ACO has also entered the data mining domain, addressing both the clustering [?? ] and classification task [??? ], which is the topic of interest in this paper. The first application of ACO to the classification task is reported by Parpinelli et al. in [? ] and was named AntMiner. Extensions were put forward by Liu et al. in AntMiner2 [? ] and AntMiner3 [? ]. Our approach, AntMiner+, differs from these previous AntMiner versions in several ways, resulting in an improved performance, as described in [? ]. Next follows a brief discussion of the principles and workings of AntMiner AntMiner+ Algorithm ACO can be used to induce comprehensible and accurate rule-based classification models from data, as done in the AntMiner+ classification technique [? ]. First of all, an environment needs to be defined in which the ants operate. When an ant moves through the environment from Start to Stop vertex, it should incrementally construct a solution to the problem at hand, in this case the classification problem. In order to build a set of classification rules, we define the construction graph in such a way that each ant s path will implicitly describe a classification rule. For each variable V i a vertex v i,j is created for each of its values V alue i,j. The set of vertices for one variable is defined as a vertex group. To allow for rules where not all variables are involved, hence shorter rules, an extra dummy vertex is added to each variable whose value is undetermined, meaning it can take any of the values available. Although only categorical variables are allowed, we make a distinction between nominal (no apparent ordering in its values, e.g. sex and purpose of loan) and ordinal variables (a clear ordering of the values, e.g. amount on savings or checking 6

7 account and income). Each nominal variable has one vertex group (with the inclusion of the mentioned dummy vertex), but for the ordinal variables however, we build two vertex groups to allow for intervals to be chosen by the ants. The first vertex group corresponds to the lower bound of the interval and should thus be interpreted as < V i+1 V alue i,k >, the second vertex group determines the upper bound, giving < V i+2 V alue i+1,l > (of course, the choice of the upper bound is constrained by the lower bound). This allows to have less, shorter and actually better rules. To extract a rule set that is exhaustive, such that all future data points can be classified, the majority class is not included in the vertex group of the class variable, and will be the predicted class for the final else clause. An example AntMiner+ construction graph for a credit scoring data set with only three variables (purpose of the loan, amount on savings account and credit history of the applicant) is shown in Fig. 2. The path denoted in bold describes the rule if Purpose = car and Savings Account 0e and Savings Account 500e and Credit History=any then class=bad. A formal illustration of the construction graph is provided in Fig. 3, for a data set with d classes, n variables, of which the first and last variable are nominal and V 2 is ordinal (hence the two vertex groups). The weight parameters α and β determine the relative importance of the pheromone and heuristic values, and its notion is described by (1). Now the environment is defined, we can explain the workings of the technique. All ants begin in the Start vertex and walk through their environment to the Stop vertex, gradually constructing a rule. Only the ant that describes the best rule will update the pheromone of its path, as imposed by the MAX- MIN Ant System approach. Evaporation decreases the pheromone of all edges, while the pheromone levels are constrained to lie within the given interval [τ min,τ max ]. Then another iteration occurs with ants walking from Start to Stop. Convergence occurs when all the edges of one path have a pheromone level τ max and all others edges have pheromone level τ min. Next, the rule corresponding to the path with τ max is extracted and added to the rule set. Finally, training data covered by this rule is removed from the training set. This iterative process will be repeated until the stop criterion is met, which is early stopping. This procedure monitors the accuracy on a separate validation set, and will stop inducing rules when the validation accuracy starts to decrease. Next we will have a closer look at the algorithm specifics, such as the edge probabilities and rule quality measure. P ij (t) = [τ (v i 1,k,v i,j )(t)] α.[η vi,j (t)] β pi l=1 [τ (v i 1,k,v i,l )(t)] α.[η vi,l (t)] β (1) η ij = T ij & CLASS = class ant T ij (2) 7

8 τ (vi 1,k,v i,j )(0) =τ max (3) τ (vi 1,k,v i,j )(t + 1) =ρ τ (vi 1,k,v i,j )(t) + Q+ best 10 (4) The edge to choose when an ant arrives at a vertex v i 1,k, and thus the term to add next, is dependent on the pheromone value of the edge between vertices v i 1,k and v i,j (τ (vi 1,k,v i,j )) and the heuristic value of the vertex v i,j (η i,j ), and normalized over all possible vertices, providing a probability P ij for each of the possible vertices, according to (1). As the heuristic function η is problemdependent, we have defined the heuristic value η ij of vertex v i,j, corresponding to the term V i = V alue i,j, as the fraction of training cases that are correctly covered (described) by this term, as defined by (2). Let us illustrate this definition with a simplified credit scoring data set of five data instances i 1,i 2,...,i 5 and three variables Sex, Term of the loan and nominal variable Real Estate stating what kind of real estate the applicant owns. Consider the vertex corresponding to Sex = Male. As this is a binary classification problem, the only class in the construction graph is the bad class, giving a heuristic value for this vertex of: Sex = male & CLASS = bad Sex = male = 3/4 (5) The initial pheromone value is by definition τ max, as imposed by MAX-MIN Ant System. The pheromone to add to the path of the best ant should be proportional to the quality of the path, which we define as the sum of the confidence and the coverage of the corresponding rule. Confidence measures the fraction of the number of correctly classified remaining (not yet covered by any of the extracted rules) data points by a rule compared to the total number of remaining data points covered by that rule. The coverage gives an indication of the overall importance of the specific rule by measuring the number of correctly classified remaining data points over the total number of remaining data points. More formally, the pheromone amount to add to the path of the iteration best ant is given by the benefit of the path of the iteration best ant, as indicated by (6), with rule ant the rule antecedent (if part) comprising of a conjunction of terms corresponding to the path chosen by the ant, rule c ant the conjunction of rule ant with the class chosen by the ant, and Cov a binary variable expressing whether a data point is already covered by one of the extracted rules (Cov = 1) or not (Cov = 0). The number of remaining data points can therefore be expressed as Cov = 0. This means that, taking into account the evaporation factor as well, the update rule for the best ant s path is described by (4), where the division by ten is a scaling factor that is needed such that both the pheromone and heuristic values lie 8

9 within the range [0, 1]. Q + = rulec ant + rulec ant rule ant Cov = 0 }{{}}{{} confidence coverage (6) For example, returning to our simple data set (see Table 1), suppose we have following two rules: R1 : if Sex = M and Term 1 y and Term 15 y then customer = Bad R2 : if Sex = M and Term 1 y and Term 1 y and Real Estate = A then customer = Bad As shown in Table 1, rule R1 correctly classifies 3 of the 4 data instances described by the rule antecedent, yielding a confidence of The coverage of R1 is 0.6, as it correctly describes 3 of the 5 instances in the data set. Similarly for rule R2, a confidence and coverage of respectively 1 and 0.2 is obtained. This example shows that although rule R2 is completely accurate, shown by the confidence of 1, it is not the best rule, as we also take into account the coverage of the rule. The coverage makes sure that we avoid overfitting and obtain less rules. In previous research, a benchmarking study of AntMiner+ with state-of-theart classification techniques, such as C4.5, RIPPER and support vector machines, showed that AntMiner+ ranks at the absolute top when considering both accuracy and comprehensibility [? ]. However, a reluctance to accept the classification models may still exist as possibly unexpected signs in the hyperplane part of the AntMiner+ rules may arise, which may be due to spurious correlations in the data, but do not represent the actual risk relationship (simply put wrong inequation signs, e.g. rules as: if Income e and Savings Account e then customer = bad). To counter such inconsistencies with existing domain knowledge, we have extended the AntMiner+ classification technique to incorporate domain knowledge [? ]. The basic principle is as follows: considering our credit scoring example, we can make sure that increasing the amount on the applicant s savings account cannot lead to a customer changing from good to bad by removing the vertex group corresponding to Savings Account (see Fig. 2): since the ants look only for rules to classify bad customers (only the final else clause will classify a customer as good), the term with Savings Account can only be in the form Savings Account X. This allows the domain expert to enforce hard constraints on the inequality signs. Furthermore, a bias may also exist towards certain values, in which case the constraint is preferred and not mandatory. To deal with such soft constraints, the heuristic values can be adapted. For more details we refer to [? ]. The ability to incorporate domain knowledge is of crucial importance within 9

10 a credit scoring context, and reduces the Validation & Verification process of the model dramatically (see Section 5.1, further in the text). AntMiner+ is implemented in the platform-independent, object-oriented Java programming environment, with usage of the MySQL open source database server. Example screenshots of the Graphical User Interface (GUI) of AntMiner+ are included in Appendix. 4 Building Credit Risk Models with AntMiner+ In this section, we will illustrate how AntMiner+ can be used to build credit risk systems in three different contexts: retail banking, small and medium sized enterprises (SMEs), and bank ratings. As AntMiner+ can only deal with categorical variables, a discretization preprocessing step takes place in which the continuous variables are turned into discrete variables. This process is done in an automatic manner with the Weka workbench [? ] according the criterion of Fayyad [? ]. All experiments were run with 1000 ants and ρ set at 0.85, as suggested in [? ]. 4.1 Retail Banking In this section, we will illustrate how AntMiner+ can be used to develop application scoring models in a retail banking context. The purpose of application scoring is to provide a score or classification of a credit applicant given the application characteristics provided. The data set that we will use is the German credit data set, which is a publicly available application scoring data set (see mlearn/mlrepository.html) having 1000 observations and 20 application characteristics. Table 2 presents the rules that were extracted using AntMiner+. The extracted rule set is concise and easy to understand. Only 5 of the original 20 application characteristics are used for making the discrimination. This clearly has a beneficial impact on interpretability, but also on operational cost and efficiency. 4.2 SME Bankruptcy Prediction Under the IRB approach for corporate credits, the Basel II Capital accord allows banks to separately distinguish exposures to SME borrowers (defined 10

11 as corporate exposures where the reported sales for the consolidated group of which the firm is a part is less than 50 millione) from those to large firms. The SME data set consists 422 observations, 74 bankrupt and 348 solvent companies. The default data were collected from , while the other data were extracted from the period only. A total number of 40 candidate input variables was selected from financial statement data, using a.o. liquidity, profitability and solvency measures (see [? ] for an extensive description of this data set. Table 3 represents the rules that were extracted by AntMiner+. Again, only 5 of the 40 original inputs are used in making the discrimination decision. Note that the numbers were rounded and one variable was scaled randomly for confidentiality reasons. 4.3 Rating Prediction For retail and SME portfolios, one typically has a sufficient number of default observations in order to make statistical discrimination meaningful. However, when modeling credit risk for entities such as banks, sovereigns, or insurance companies, the lack of default observations necessitates the use of an alternative modeling approach. That is why many financial institutions opt for a mapping to external ratings in this context. In this section, we will study how AntMiner+ can be used to model credit risk for bank entities. The data was retrieved from the Bankscope database, which contains financial statements of more than banks. For each of these banks the Moody s rating will be used as the basis of the target variable (low/speculative-grade or good/investment-grade rating). These ratings were retrieved for the period The rating at the end of May of the year T + 1 is predicted based on a 3-year history of inputs observed during years T, T 1, T 2. A variety of different inputs was selected covering, amongst others, asset quality, capital, operational result and liquidity. The size variable Total Assets was also included as well as a geographical indicator Region (Euro-zone, dollar-zone, EU accession countries, Japan and others). After data preprocessing, the data set consisted of a cleaned database of 2996 observations with 37 inputs (see [? ] for a more extensive description). 4.4 Classification Model Performance Table 5 shows the results of the classification models induced by AntMiner+, C4.5, support vector machine (SVM) and majority vote. The experimental setup is the same for all included data sets. The data set is split up into training, validation and test set according following fractions: 4/9, 2/9 and 3/9, 11

12 as is common practice in data mining [?? ]. To eliminate any chance of having unusually good or bad training and test sets, 10 runs are conducted where the order of observations is first randomized before the training, validation and test set are chosen. For each randomization AntMiner+ is run with hard monotonicity constraints, as imposed by the financial expert. The best average test set performance over the 10 randomizations is underlined and denoted in bold face for each data set. We then use a paired t-test to test the performance differences. Performances that are not significantly different at the 5% level from the top performance with respect to a one-tailed paired t-test are tabulated in bold face. Statistically significant underperformances at the 1% level are emphasized in italics. Performances significantly different at the 5% level but not at the 1% level are reported in normal script. Since the observations of the randomizations are not independent, we remark that this standard t-test is used as a common heuristic to test the performance differences [? ]. As Table 5 shows, the non-linear SVM classifiers performs best in terms of accuracy, as can be expected [? ]. However, as mentioned before, the black-box nature of such non-linear classifiers make them less suited for credit scoring, where validation is required. When comparing the rule- and tree-based classifiers AntMiner+ and C4.5 we can observe very competitive accuracies, but when considering the number of rules as well AntMiner+ comes out as the best performing technique. On top of that, the AntMiner+ rule sets comply with stated domain constraints, which, as pointed out in [? ], can result in a decrease in accuracy. Yet a small decrease in accuracy can be allowable, an inconsistency with domain knowledge is not. 5 Towards a Basel II Credit Risk Management System Up till now, we have largely focused on extracting a comprehensible set of rules to do risk management in a Basel II context. These rules now need to be further analyzed and used in various activities so as to arrive at a full-fledged, integrated Basel II risk decision and management application. In what follows, we will discuss the most important activities, which are summarized in Fig Verification and Validation A first set of tools can be used to verify and validate (V&V) the extracted rule set. Verification will attempt to look for syntax based anomalies in the rule set. Whether the rule set is exhaustive (all cases being covered) and exclusive (a 12

13 case only covered by 1 rule) will be investigated in this step. Because of the ifthen-else nature of the AntMiner+ rule sets, they are by definition exhaustive and exclusive, making the verification step obsolete. In the validation step, it will be investigated whether the rules adequately model the risk involved from a human interpretation viewpoint. The financial credit expert will also be consulted and asked to interpret the rule set in this step. In order to facilitate the verification and validation step, decision tables may be adopted [? ]. Decision tables provide an alternative way of representing the AntMiner+ rule sets in a user-friendly way. A decision table (DT) consists of four quadrants, separated by double-lines, both horizontally and vertically (cf. Fig. 5). The vertical line divides the table into a condition part (left), specifying the inputs to be checked, and an action part (right) specifying the classes assigned. Each condition entry describes a relevant subset of values (called a state) for a given input, or contains a dash symbol ( ) if its value is irrelevant within the context of that column. Subsequently, every action entry holds a value assigned to the outcome class. True, false and unknown action values are typically abbreviated by,, and, respectively. Every row in the entry part of the DT thus comprises a classification rule, indicating what class results from a certain combination of inputs. If each row only contains simple states (no contracted or irrelevant entries), the table is called an expanded DT, whereas otherwise the table is called a contracted DT. Table contraction can be achieved by combining rows that lead to the same outcome class. The number of rows in the contracted table can then be further minimised by changing the order of the conditions. It is obvious that a DT with a minimal number of rows is to be preferred since it provides a more parsimonious and comprehensible representation of the extracted rule set than an expanded DT. This is illustrated in Fig. 6. In the literature, several kinds of DTs have been proposed. We will require that the condition entry part of a DT satisfies the following two criteria: completeness: all possible combinations of input values are included; exclusivity: no combination is covered by more than one column. As such, we deliberately restrict ourselves to single-hit tables, wherein columns have to be mutually exclusive, because of their advantages with respect to verification and validation [? ]. It is this type of DT that can be easily checked for potential anomalies, such as inconsistencies (a particular counterparty being assigned to more than one class) or incompleteness (no class assigned). The decision table formalism thus allows for easy verification of the extracted AntMiner+ rules. Additionally, for ease of legibility, the rows are arranged in lexicographical order, in which entries at lower rows alternate first. As a 13

14 result, a tree structure emerges in the condition entry part of the DT, which lends itself very well to a top-down evaluation procedure: starting at the first column, and then working one s way to the right of the table by choosing from the relevant condition states, one safely arrives at the outcome class for a given case. This condition-oriented inspection approach often proves to be more intuitive, faster, and less prone to human error, than evaluating a set of rules one by one. Decision tables can also be usefully adopted for validation purposes, as an easily be checked for potential anomalies, such as inconsistency with monotonicity constraints: by placing the assumingly monotone variable in the last column, adjacent rows are found with data entries that are equal in all variables except the last one. It can then be easily seen whether or not the class variable changes in the expected manner. As AntMiner+ has the supplementary benefit of incorporating such monotonicity constraints, as demonstrated in Section 3.2, the decision table will reveal no counter-intuitive patterns any more. For example, Table 6 depicts the decision table corresponding to the rule set extracted for the German credit scoring data set (see Table 2). Based on this table, we can easily check that credit history can only have a positive effect on the applicants assessment, if any. We can conclude that this first step of verifying and validating the model has been releaved significantly thanks to the nature of the induced rule sets (exhaustive and exclusive) and because of the incorporation of monotonicity constraints. This does however not mean that this phase is no longer needed, as the domain expert still needs to check whether the model is suitable. From that perspective, decision tables are still a very useful tool. 5.2 Traffic Light Decision Support System Once the rule set has been verified and validated, it needs to be implemented as a decision support system (DSS) which can be used by the credit officers so as to make the actual credit decision: accept or reject. The DSS can be implemented using a traffic light indicator approach that gives three possible outcomes: a green light, an orange light or a red light [? ]. A green light indicates that the rule set is confident enough to classify a customer as a good payer and credit should be accepted. An orange light indicates a doubt case for which human intervention is needed. This can be due to for example, low confidence of the rule set, external information obtained from a credit bureau (e.g. Equifax, Experian), a customer which is rejected borderline by the rule set but is very profitable on other financial products, and/or a new marketing campaign in which the financial institution decides to grant credit to some of the more risky customers. The orange light can allow for model overrides by 14

15 the credit expert. A low side override means that a customer rejected by the rule set is accepted, and a high side override vice versa. A red light indicates that the rule set is confident enough to classify a customer as a bad payer and credit should be rejected. Note that this traffic light indicator approach can also be implemented using four colors (green, yellow, orange, red) or gauges in a dashboard application. An implementation of a traffic light indicator approach using four colors could be as follows. Red when the rule set predicts bad customer and this is confirmed by the credit bureau information; Orange when the rule set predicts bad customer, but credit bureau says customer is good risk; Yellow when the rule set predicts bad customer, but confidence is very low and the credit bureau says customer is good risk; and Green when the rule set says good customer and the credit bureau says customer is good risk. Note that the financial institutions can decide for themselves on the number of colors and their meaning. 5.3 Interface to Basel II Calculation Engine The extracted rule set must also interface with a Basel II calculation engine which will use the rule outputs to calculate expected loss and the regulatory capital that a financial institution needs to set aside in order to cover unexpected credit losses. Therefore, in a calibration phase, each rule should be accompanied by a PD estimate which should be forward looking and based on five years of historical data. Once the estimates for the LGD and EAD have been obtained, the expected loss and the regulatory capital can be calculated. The expected loss (EL) can be calculated as EL = PD LGD EAD. It represents the long-run average credit loss and will be used for debt provisioning. The regulatory safety capital can then also be calculated based on the formula s provided in the Basel II Accord. E.g., for retail exposures the formula s are as follows K = LGD (Φ( 1 1 ρ Φ 1 (PD) + regulatory capital = K EAD ρ 1 ρ Φ 1 (0.999)) PD) (7) whereby Φ (Φ 1 ) represents the (inverse) cumulative standard normal distribution, and ρ the asset correlation factor which is fixed in the Accord [? ] (e.g for residential mortgage exposures). 5.4 Evaluating the Model over Time: Backtesting and Benchmarking The Basel II Capital Accord requires credit risk systems to be validated, at least annually. The accord distinguishes between backtesting, which is com- 15

16 paring the predicted outcome by the rule set with the realized outcome, and benchmarking, which is comparing the predicted outcome of the rule set with the outcomes of models of other parties in the industry (such as credit bureaus, other financial institutions, or financial regulators). From a backtesting perspective, the performance of the rule set needs to be monitored. Again, a traffic light indicator approach can be adopted with three outcomes: green light, orange light, red light [? ]. The decision which light to switch on can be determined based on the outcome of a test statistic which monitors the classification accuracy (e.g. McNemar s test [? ]). A green light indicates that the rule set performance is stable, e.g. no significant differences at the 5% level are reported. It means the rule set can continue to be used. An orange light may indicate e.g. a difference at the 5% level but not at the 1% level of significance. It indicates a performance difference which requires no immediate action but needs to be closely monitored in the future. A red light then indicates a significant performance difference at the 1% level. It indicates that the model is no longer appropriate for the current data which could possibly be due to a change of the population (often referred to as population drift) or a new strategy of the financial institution. In other words, the model needs to be rebuilt, which in our context would mean extracting a new rule set using AntMiner+. From a benchmarking perspective, a similar process can be conducted, whereby the traffic lights now indicate how much the two parties agree or disagree on their credit decisions. 6 Conclusion The introduction of the recently suggested Basel II Capital Accord has encouraged financial institutions to build efficient and high-performing credit risk models assessing the creditworthiness of their counterpartys. Ideally, these models should be both powerful, in terms of discriminating defaulters from non-defaulters, and comprehensible, in terms of explanatory power. In this paper, we discussed how Ant Colony Optimization can be used to build credit risk models for Basel II. More specifically, we used the AntMiner+ algorithm, which is a rule induction technique based on the principles of MAX-MIN Ant System. AntMiner+ distinguishes itself by the comprehensibility of the induced models which are in line with existing domain knowledge. We have also shown how decision tables can be useful to provide even more insight into the classification model. Experiments were conducted using three real-life credit risk data sets: one in retail, one for SMEs, and one for bank ratings. It was illustrated that for each of these data sets AntMiner+ extracted a powerful and concise rule set. Furthermore, it was also discussed how the induced rule sets could fit into a global credit risk management strategy and architecture. An interesting topic 16

17 for further research is to extend the algorithm to handle continuous targets and generate regression rules, which could be useful e.g. for modeling LGD and EAD. Acknowledgment We extend our gratitude to the (associate) editor and the anonymous reviewers, as their many constructive and detailed remarks certainly contributed much to the quality of this paper. Further, we would like to thank the Flemish Research Council (FWO, Grant G ), and the Microsoft and KBC- Vlekho-K.U.Leuven Research Chairs for financial support to the authors. References [] A. Abraham and V. Ramos. Web usage mining using artificial ant colony clustering. In the Congress on Evolutionary Computation, pages IEEE Press, [] B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, and J. Suykens, J.A.K.and Vanthienen. Benchmarking state of the art classification algorithms for credit scoring. Journal of the Operational Research Society, 54(6): , [] Basel Committee on Banking Supervision. International convergence of capital measurement and capital standards: a revised framework. Technical report, BIS, June [] C. Blum. Beam-ACO hybridizing ant colony optimization with beam search: An application to open shop scheduling. Computers & Operations Research, 32(6): , [] B. Bullnheimer, R. F. Hartl, and C. Strauss. A new rank based version of the ant system: A computational study. Central European Journal for Operations Research and Economics, 7(1):25 38, [] B. Bullnheimer, R.F. Hartl, and C. Strauss. Applying the ant system to the vehicle routing problem. In S. Voss, S. Martello, I.H. Osman, and C. Roucairol, editors, Meta-Heuristics: Advances and Trends in Local Search Paradigms for Optimization, [] G. Di Caro and M. Dorigo. Antnet: Distributed stigmergetic control for communications networks. Journal of Artificial Intelligence Research, 9: , [] A. Colorni, M. Dorigo, V. Maniezzo, and M. Trubian. Ant system for jobshop scheduling. Journal of Operations Research, Statistics and Computer Science, 34(1):39 53, [] V.S. Desai, J.N. Crook, and G.A. Overstreet Jr. A comparison of neu- 17

18 ral networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1):24 37, [] T. G. Dietterich. Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation, 10(7): , [] M. Dorigo and L. M. Gambardella. Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1):53 66, April [] M. Dorigo, V. Maniezzo, and A. Colorni. Positive feedback as a search strategy. Technical Report 91016, Dipartimento di Elettronica e Informatica, Politecnico di Milano, IT, [] M. Dorigo, V. Maniezzo, and A. Colorni. Ant System: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, 26(1):29 41, [] M. Dorigo and T. Stützle. Ant Colony Optimization. MIT Press, Cambridge, MA, [] U.M. Fayyad and K.B. Irani. Multi-interval discretization of continuousvalued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI), pages , Chambéry, France, Morgan Kaufmann. [] L. M. Gambardella and M. Dorigo. Ant-Q: A reinforcement learning approach to the traveling salesman problem. In A. Prieditis and S. Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages , Palo Alto, CA, Morgan Kaufmann Publishers Inc. [] D. Hand. Pattern detection and discovery. In D. Hand, N. Adams, and R. Bolton, editors, Pattern Detection and Discovery, volume 2447 of Lecture Notes in Computer Science, pages Springer, [] J. Handl, J. Knowles, and M. Dorigo. Ant-based clustering and topographic mapping. Artificial Life, 12(1):35 61, [] W.E. Henley and D.J. Hand. Construction of a k-nearest neighbour credit-scoring system. IMA Journal of Mathematics Applied In Business and Industry, 8: , [] B. Liu, H. A. Abbass, and B. McKay. Density-based heuristic for rule discovery with ant-miner. In 6th Australasia-Japan Joint Workshop on Intelligent and Evolutionary Systems (AJWIS2002), Canberra, Australia, [] B. Liu, H. A. Abbass, and B. McKay. Classification rule discovery with ant colony optimization. In IAT, pages IEEE Computer Society, [] D. Martens, M. De Backer, R. Haesen, B. Baesens, C. Mues, and J. Vanthienen. Ant-based approach to the knowledge fusion problem. In Proceedings of the Fifth International Workshop on Ant Colony Optimization and Swarm Intelligence, Lecture Notes in Computer Science, pages Springer,

19 [] D. Martens, M. De Backer, R. Haesen, M. Snoeck, J. Vanthienen, and B. Baesens. Classification with ant colony optimization. IEEE Transaction on Evolutionary Computation, 11(5): , [] R. Montemanni, L. M. Gambardella, A. E. Rizzoli, and A. Donati. Ant colony system for a dynamic vehicle routing problem. Journal of Combinatorial Optimization, 10(4): , [] R. S. Parpinelli, H. S. Lopes, and A. A. Freitas. An ant colony based system for data mining: Applications to medical data. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pages , San Francisco, California, USA, Morgan Kaufmann. [] D. Quintana, C. Luque, and P. Isasi. Evolutionary rule-based system for IPO underpricing prediction. In GECCO 05: Proceedings of the 2005 conference on Genetic and evolutionary computation, pages , New York, NY, ACM Press. [] D.J. Sheskin. Handbook of parametric and nonparametric statistical procedures. Chapman and Hall/CRC, [] K. Socha, J. Knowles, and M. Sampels. A MAX-MIN ant system for the university timetabling problem. In M. Dorigo, G. Di Caro, and M. Sampels, editors, Proceedings of ANTS 2002 Third International Workshop on Ant Algorithms, volume 2463 of Lecture Notes in Computer Science, pages 1 13, Berlin, Germany, September Springer-Verlag. [] A. Steenackers and M.J. Goovaerts. A credit scoring model for personal loans. Insurance: Mathematics and Economics, 8:31 34, [] T. Stützle and H. H. Hoos. Improving the ant-system: A detailed report on the MAX-MIN ant system. Technical Report AIDA 96-12, FG Intellektik, TU Darmstadt, Germany, [] T. Stützle and H. H. Hoos. MAX-MIN ant system. Future Generation Computer Systems, 16(8): , [] D. Tasche. Traffic lights approach to PD validation. Technical report, [] E. Tsang, P. Yung, and J. Li. EDDIE-automation, a decision support tool for financial forecasting. Decision Support Systems, 37(4): , September [] T. Van Gestel, B. Baesens, P. Van Dijcke, J. Garcia, J.A.K. Suykens, and J. Vanthienen. A process model to develop an internal rating system: sovereign credit ratings. Decision Support Systems, 42(2): , [] T. Van Gestel, B. Baesens, P. Van Dijcke, J.A.K. Suykens, J. Garcia, and T. Alderweireld. Linear and nonlinear credit scoring by combining logistic regression and support vector machines. Journal of Credit Risk, 1(4), [] J. Vanthienen, C. Mues, and A. Aerts. An illustration of verification and validation in the modelling phase of KBS development. Data and Knowledge Engineering, 27(3): , [] J. Vanthienen, C. Mues, and A. Aerts. An illustration of verification 19

20 and validation in the modelling phase of kbs development. Data and Knowledge Engineering, 27: , [] J. Vanthienen and G. Wets. From decision tables to expert system shells. Data and Knowledge Engineering, 13(3): , [] A. Wade and S. Salhi. An ant system algorithm for the mixed vehicle routing problem with backhauls. In Metaheuristics: computer decision-making, pages , Norwell, MA, Kluwer Academic Publishers. [] D. West. Neural network credit scoring models. Computers and Operations Research, 27: , [] I. H. Witten and E. Frank. Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, [] M.B. Yobas, J.N. Crook, and P. Ross. Credit scoring using neural and evolutionary techniques. IMA Journal of Mathematics Applied in Business and Industry, 11: , Appendix: Screenshots of AntMiner+ GUI Several screenshots of the AntMiner+ Graphical User Interface are provided in Fig. 7 and 8. Fig. 7 shows the initial menu of AntMiner+, allowing the user to choose the number of ants and evaporation rate ρ. The minimal fraction uncovered data input variable can be used as an alternative for the early stopping stop criterion: no more rules will be extracted when all but x% of the data has been covered by the extracted rule set. Note that all experiments were conducted with the early stopping criterion. Fig. 8 shows the construction graph for the SME data set during different stages of execution: from initialization (top) to convergence (bottom), with the width of the edges being proportional to their pheromone level. In the bottom box of each screenshot, the extracted rules with their accuracy on both training, validation and test set are displayed. 20

21 50% 33% 50% (a) 67% (b) Fig. 1. Path selection directed by pheromone: the more pheromone on a path, the more likely an ant will follow the path. This simple mechanism of indirect communication is sufficient for the overall ant colony to find short paths from the nest to the food source. Savings Savings Credit Class Purpose Account Account History Start bad car education business 0e 0e 100e 100e 250e 250e 500e 500e all paid none taken critical Stop any 1000e 1000e 3000e 3000e any Fig. 2. Example of a path described by an ant for a credit scoring construction graph defined by AntMiner+. The rule corresponding to the chosen path is if Purpose = car and Savings Account [0e,500e] then class = bad. Weight Parameters Class α = β = V0,= V1,= V2, V3, Vm,= v 0,1 v 1,1 v 2,1 v 3,1 v m,1 a 1 b 1 Start a 2 b 2 v 0,2 v 3,2 v 1,2 v 2,2 v m,2 Stop a 3 b 3 a 4 b 4 v 0,d 1 v 3,p 3 v 1,p 1 +1 v 2,p 2 v n,p m +1 Fig. 3. Multiclass construction graph of AntMiner+, with the inclusion of weight parameters. 21

22 data V&V Decision Support System AntMiner+ R1: if (Checking Account < 100 and Duration > 15m) then class = bad R2: if (Purpose = new car and Credit History = critical) then class = bad R3: else if (Checking Account < 0 and Purpose = furniture and Savings Account < 250 ) then class = bad R4: else class = good PD LGD EAD Backtesting & Benchmarking Capital Requirements Fig. 4. Credit risk management system with the use of AntMiner+. The induced rule set is verified and validated, after which it can be used as a decision support system to make actual credit risk decisions (accept or deny credit), and to calculate capital requirements. Finally, backtesting and benchmarking validate the credit risk management system over time. condition subjects condition entries action subjects action entries Fig. 5. DT quadrants. 22

23 1. Condition1 2. Condition2 3. Condition3 1. Class1 2. Class2 yes no yes no yes no yes no yes no yes no yes no (a) Expanded decision table 1. Condition1 2. Condition2 3. Condition3 1. Class1 2. Class2 yes yes no yes no no yes no (b) Contracted decision table Fig. 6. Minimizing the number of columns of a lexicographically ordered DT [? ]. Fig. 7. Screenshot of AntMiner+ initial menu. 23

24 Fig. 8. Screenshots of AntMiner+ run on the SME credit risk data set during different stages of execution: from initialization (top) to convergence (bottom) 24