Credit Rating Prediction Using Ant Colony Optimization

Size: px
Start display at page:

Download "Credit Rating Prediction Using Ant Colony Optimization"

Transcription

1 Credit Rating Prediction Using Ant Colony Optimization David Martens a,b Tony Van Gestel c,d Manu De Backer a Raf Haesen a Jan Vanthienen a Bart Baesens e,a a Department of Decision Sciences & Information Management, K.U.Leuven Naamsestraat 69, B-3000 Leuven, Belgium {David.Martens;Manu.DeBacker;Raf.Haesen;Jan.Vanthienen}@econ.kuleuven.be b Department of Business Administration and Public Management Hogeschool Gent Voskenslaan 270, Ghent 9000, Belgium David.Martens@hogent.be c Credit Risk Modelling, Group Risk Management, Dexia Group Square Meeus 1, 1000 Brussel, Belgium Tony.Vangestel@dexia.com d Department of Electrical Engineering, ESAT-SCD-SISTA, K.U.Leuven Kasteelpark Arenberg 10, B-3001 Leuven (Heverlee), Belgium e University of Southampton, School of Management, United Kingdom Highfield Southampton, SO17 1BJ, United Kingdom Bart@soton.ac.uk Abstract The introduction of the Basel II Capital Accord has encouraged financial institutions to build internal rating systems assessing the credit risk of their various credit portfolios. One of the key outputs of an internal rating system is the probability of default (PD), which reflects the likelihood that a counterparty will default on his/her financial obligation. Since the PD modeling problem basically boils down to a discrimination problem (defaulter or not), one may rely on the myriad of classification techniques that have been suggested in the literature. However, since the credit risk models will be subject to supervisory review and evaluation, they must be easy to understand and transparent. Hence, techniques such as neural networks or support vector machines are less suitable due to their black box nature. Building upon previous research, we will use AntMiner+ to build internal rating systems for credit risk. AntMiner+ allows to infer a propositional rule set from a given data set hereby using the principles from Ant Colony Optimization. Experiments will be conducted using various types of credit data sets (retail, small- and medium-sized enterprises (SMEs) and banks). It will be shown that the extracted rule sets are both powerful in terms of discriminatory power, and comprehensibility. Furthermore, a Preprint submitted to Elsevier 29 October 2008

2 framework will be presented describing how AntMiner+ fits into a global Basel II credit risk management system. Key words: Ant Colony Optimization, Classification, Credit Scoring, Bankruptcy Prediction, Basel II 1 Introduction Over the past decades, financial institutions have seen an ever growing need for quantitative analysis techniques to optimize and monitor decisions related to risk and investment management. The gradual adoption of data warehousing and knowledge discovery in data (KDD) technology is allowing these institutions to analyze ever larger amounts of data, using a range of powerful techniques from various disciplines such as conventional statistics, machine learning, neurocomputing, and operations research. This process is only being further accelerated by the recent implementation of several international financial and accounting standards (such as Basel II, Solvency II, Sarbanes- Oxley and IFRS). For example, by allowing banks to use their internal credit risk assessment models as input for the minimum regulatory capital calculations, the Basel II framework is providing financial institutions with additional incentives to refine existing credit scoring models since more accurate predictions require less conservative capital requirements. Hence, there has been a growing interest throughout the financial world in research on novel data mining techniques and information technologies to support the implementation of such compliance frameworks. As a result of a longstanding interest from the research community, a myriad of techniques have been proposed for many of the aforementioned problems, in particular for classification problems such as credit scoring and bankruptcy prediction. However, not all of these approaches have proven readily transferable from the academic domain to financial practice. Many of the representations applied by the suggested algorithms cannot be easily interpreted and validated by humans. For example, neural networks are considered a black box technique, since the reasoning behind how the non-linear prediction models reach their conclusions cannot easily be obtained from their structure. This has not only hindered their acceptance by practitioners, but also fails to address the increasing need for transparency under various regulatory frameworks. Credit risk analysts are unlikely to accept black box techniques such as neural networks to make credit decisions, since under the Basel II accord, they are now required to demonstrate and periodically validate their models, and present reports to the national regulator for approval. Therefore, recent research proposed the use of rule-based classification techniques to generate 2

3 powerful, as well as intuitive and transparent decision models. Such a rule-based classification technique that has recently been proposed is AntMiner+, which uses Ant Colony Optimization (ACO) to infer accurate rules from the data. This paper will describe how this technique can be used to generate comprehensible credit scoring models, which can then be fit into a Basel II-compliant decision support system. The paper is structured as follows. The next Section discusses the issues related to building credit scoring models within the Basel II regulatory framework. Section 3 provides an overview of the AntMiner+ classification technique, as well as an introduction to ACO on which the technique is based. The experimental Section 4 provides AntMiner+ credit scoring models for retail banking, small and medium-sized enterprises (SMEs) and banks. Section 5 describes the further steps needed to obtain a Basel II compliant decision support system, and finally, Section 6 concludes the paper. 2 Credit Scoring and Bankruptcy Prediction within Basel II The recent introduction of the Basel II Capital Accord encourages financial institutions to calculate their minimum regulatory safety capital to ensure that they are able to return depositor funds at all times [? ]. The minimum safety capital is determined at 8% of risk weighted assets, which are in turn quantified taking into account three types of risk: credit risk, operational risk and market risk. In calculating credit risk, banks must use three key risk parameters: probability of default (PD), loss given default (LGD) and exposure at default (EAD). These three parameters are then used as input to a Merton/Vasicek model which then calculates the regulatory safety capital [? ]. The PD, LGD and EAD parameters can be obtained in three different ways. The standard approach for credit risk allows banks to buy risk ratings from external rating agencies, often called External Credit Assessment Institutions (ECAIs) in the spirit of the Accord. Examples of well-known ECAIs are Moody s, Standard & Poor s and Fitch. The risk ratings are then translated to risk weights provided in the Accord, which then allow to calculate the risk weighted assets (RWA) and as such the regulatory capital. The foundation internal ratings based (IRB) approach allows banks to build their own PD models and get LGD and EAD estimates from the supervisors, whereas the advanced internal ratings based approach allows financial institutions to estimate all three risk parameters themselves. Many financial institutions in Western Europe, Asia and the US are currently taking steps to implement the advanced IRB approach. More than ever, this has triggered the interest and need to develop credit scoring and bankruptcy prediction models for estimat- 3

4 ing the PD of a set of obligors. For retail portfolios, application scoring models will be developed that try to quantify the credit risk of a set of recently acquired customers, given their application characteristics (e.g. age, marital status, credit history, savings amount,...). Behavioural scoring models will be used to monitor the credit risk of the existing customer base, given their most recent behaviour (e.g. average checking account status during previous month, number of credit cards,...). For small and medium-sized enterprises (SMEs), financial institutions will develop bankruptcy prediction models that will quantify the risk of financial failure given a set of accounting ratio s and measurements. For both retail and SME type of obligors, one can usually assume that a sufficient number of defaults are present in order to make statistical discrimination and classification meaningful. However, for certain type of counterparties, such as banks, insurance companies and sovereign entities, the lack of default observations necessitates the use of alternative methods. In this context, financial institutions will often build rating models hereby mimicking a set of externally provided ratings (e.g. by an ECAI) given a set of candidate explanatory variables collected by the institution. Ideally, the credit scoring, bankruptcy prediction and rating models should be very powerful in terms of discriminatory power, so as to minimize the cost of granting credit to bad customers or the profit lost when good customers are rejected. Since these models now play a pivotal role in the risk management strategy of a bank, they are also subject to supervisory review and validation by financial regulators. Furthermore, in most countries, financial institutions are obliged to explain why credit has been denied to an applicant. Both these trends basically prohibit the use of black box, mathematically complex application scoring models, but instead stimulate the use of comprehensible, easy-to-understand models. Numerous classification techniques have been adopted for credit risk measurement and for financial forecasting in general. These techniques include traditional statistical methods (e.g., discriminant analysis and logistic regression [?? ]), nonparametric statistical models (e.g., k-nearest neighbor [?? ], decision tree [?? ] and rule learners [? ]) and neural networks [??? ]. Often, conflicts may be found when the conclusions of some of these studies are compared. In [? ], a large-scale benchmarking study compares the classification performance of various state-of-the art classification techniques on eight real-life credit scoring data sets. It concludes that neural networks perform very well in terms of classification accuracy. However, their opacity and black box nature prevents them from being used in a Basel II context. That is why in this paper, we will use the rule-based classification technique, AntMiner+, which provides comprehensible, accurate models that are in line with existing domain knowledge. 4

5 3 AntMiner+: Classification based on Ant Colony Optimization 3.1 Ant Colony Optimization Ant Colony Optimization (ACO) is a metaheuristic inspired on the foraging behavior of real ant colonies [? ]. A biological ant by itself is a simple insect with limited capabilities, and is guided by straightforward decision rules. However, these simple rules are sufficient for the overall ant colony to find short paths from the nest to the food source. By dropping a chemical substance called pheromone that attracts other ants, an ant indirectly communicates with its fellow ants from the colony. How this indirect communication leads to shortest path finding capabilities is shown in Fig. 1. Suppose two ants start from their nest (left) and look for the shortest path to a food source (right). Initially no pheromone is present on either trails, so there is a chance of choosing either of the two possible paths (see Fig. 1(a)). Suppose one ant chooses the lower trail, and the other one the upper trail. The ant that has chosen the lower (shorter) trail will have returned faster to the nest, resulting in twice as many pheromone on the lower trail as on the upper one, as illustrated in Fig. 1(b). As a result, the probability that the next ant will choose the lower, shorter trail will be twice as high, resulting in more pheromone and thus more ants will choose this trail, until eventually (almost) all ants will follow the shorter path. Note that the pheromone on the longer trail will finally disappear through evaporation. Ant Colony Optimization employs artificial ants that cooperate in a similar manner as their biological counterparts, in order to find good solutions for discrete optimization problems [? ]. The first ACO algorithm is Ant System [?? ], where ants iteratively construct solutions and add pheromone to the paths corresponding to these solutions. Path selection is a stochastic procedure based on not only a history-dependent pheromone value, but also a problem-dependent heuristic value. The pheromone value gives an indication of the number of ants that chose the trail recently, while the heuristic value is a problem dependent quality measure. When an ant reaches a decision point, it is more likely to choose the trail with the higher pheromone and heuristic values. Once the ant arrives at its destination, the solution corresponding to the ant s followed path is evaluated and the pheromone value of the path is increased accordingly. Additionally, evaporation causes the pheromone level of all trails to diminish gradually. Hence, trails that are not reinforced gradually lose pheromone and will in turn have a lower probability of being chosen by subsequent ants. The performance of traditional ACO algorithms, however, is rather poor on large instance problems [? ]. To overcome this issue, other ACO algorithms 5

6 have been proposed, such as Ant Colony System [? ], rank-based Ant System [? ], Elitist Ant System [? ] and MAX-MIN Ant System [? ]. As the latter is the one employed in the AntMiner+ classification technique, the main features of MAX-MIN Ant System are discussed next. Stützle et al. [? ] advocate that a better exploitation of the best solutions can be obtained by only adding pheromone to the path of the best ant. To avoid early search stagnation, which is the situation where all ants take the same path and thus describe the same solution, possible pheromone values are limited to the interval [τ min,τ max ]. Finally, initializing the pheromone values to τ max entails a higher exploration at the beginning of the algorithm. ACO has been applied to a wide variety of problems [? ], such as the vehicle routing problem [??? ], scheduling [?? ], timetabling [? ], the traveling salesman problem [??? ] and routing in packet-switched networks [? ]. Recently, ACO has also entered the data mining domain, addressing both the clustering [?? ] and classification task [??? ], which is the topic of interest in this paper. The first application of ACO to the classification task is reported by Parpinelli et al. in [? ] and was named AntMiner. Extensions were put forward by Liu et al. in AntMiner2 [? ] and AntMiner3 [? ]. Our approach, AntMiner+, differs from these previous AntMiner versions in several ways, resulting in an improved performance, as described in [? ]. Next follows a brief discussion of the principles and workings of AntMiner AntMiner+ Algorithm ACO can be used to induce comprehensible and accurate rule-based classification models from data, as done in the AntMiner+ classification technique [? ]. First of all, an environment needs to be defined in which the ants operate. When an ant moves through the environment from Start to Stop vertex, it should incrementally construct a solution to the problem at hand, in this case the classification problem. In order to build a set of classification rules, we define the construction graph in such a way that each ant s path will implicitly describe a classification rule. For each variable V i a vertex v i,j is created for each of its values V alue i,j. The set of vertices for one variable is defined as a vertex group. To allow for rules where not all variables are involved, hence shorter rules, an extra dummy vertex is added to each variable whose value is undetermined, meaning it can take any of the values available. Although only categorical variables are allowed, we make a distinction between nominal (no apparent ordering in its values, e.g. sex and purpose of loan) and ordinal variables (a clear ordering of the values, e.g. amount on savings or checking 6

7 account and income). Each nominal variable has one vertex group (with the inclusion of the mentioned dummy vertex), but for the ordinal variables however, we build two vertex groups to allow for intervals to be chosen by the ants. The first vertex group corresponds to the lower bound of the interval and should thus be interpreted as < V i+1 V alue i,k >, the second vertex group determines the upper bound, giving < V i+2 V alue i+1,l > (of course, the choice of the upper bound is constrained by the lower bound). This allows to have less, shorter and actually better rules. To extract a rule set that is exhaustive, such that all future data points can be classified, the majority class is not included in the vertex group of the class variable, and will be the predicted class for the final else clause. An example AntMiner+ construction graph for a credit scoring data set with only three variables (purpose of the loan, amount on savings account and credit history of the applicant) is shown in Fig. 2. The path denoted in bold describes the rule if Purpose = car and Savings Account 0e and Savings Account 500e and Credit History=any then class=bad. A formal illustration of the construction graph is provided in Fig. 3, for a data set with d classes, n variables, of which the first and last variable are nominal and V 2 is ordinal (hence the two vertex groups). The weight parameters α and β determine the relative importance of the pheromone and heuristic values, and its notion is described by (1). Now the environment is defined, we can explain the workings of the technique. All ants begin in the Start vertex and walk through their environment to the Stop vertex, gradually constructing a rule. Only the ant that describes the best rule will update the pheromone of its path, as imposed by the MAX- MIN Ant System approach. Evaporation decreases the pheromone of all edges, while the pheromone levels are constrained to lie within the given interval [τ min,τ max ]. Then another iteration occurs with ants walking from Start to Stop. Convergence occurs when all the edges of one path have a pheromone level τ max and all others edges have pheromone level τ min. Next, the rule corresponding to the path with τ max is extracted and added to the rule set. Finally, training data covered by this rule is removed from the training set. This iterative process will be repeated until the stop criterion is met, which is early stopping. This procedure monitors the accuracy on a separate validation set, and will stop inducing rules when the validation accuracy starts to decrease. Next we will have a closer look at the algorithm specifics, such as the edge probabilities and rule quality measure. P ij (t) = [τ (v i 1,k,v i,j )(t)] α.[η vi,j (t)] β pi l=1 [τ (v i 1,k,v i,l )(t)] α.[η vi,l (t)] β (1) η ij = T ij & CLASS = class ant T ij (2) 7

8 τ (vi 1,k,v i,j )(0) =τ max (3) τ (vi 1,k,v i,j )(t + 1) =ρ τ (vi 1,k,v i,j )(t) + Q+ best 10 (4) The edge to choose when an ant arrives at a vertex v i 1,k, and thus the term to add next, is dependent on the pheromone value of the edge between vertices v i 1,k and v i,j (τ (vi 1,k,v i,j )) and the heuristic value of the vertex v i,j (η i,j ), and normalized over all possible vertices, providing a probability P ij for each of the possible vertices, according to (1). As the heuristic function η is problemdependent, we have defined the heuristic value η ij of vertex v i,j, corresponding to the term V i = V alue i,j, as the fraction of training cases that are correctly covered (described) by this term, as defined by (2). Let us illustrate this definition with a simplified credit scoring data set of five data instances i 1,i 2,...,i 5 and three variables Sex, Term of the loan and nominal variable Real Estate stating what kind of real estate the applicant owns. Consider the vertex corresponding to Sex = Male. As this is a binary classification problem, the only class in the construction graph is the bad class, giving a heuristic value for this vertex of: Sex = male & CLASS = bad Sex = male = 3/4 (5) The initial pheromone value is by definition τ max, as imposed by MAX-MIN Ant System. The pheromone to add to the path of the best ant should be proportional to the quality of the path, which we define as the sum of the confidence and the coverage of the corresponding rule. Confidence measures the fraction of the number of correctly classified remaining (not yet covered by any of the extracted rules) data points by a rule compared to the total number of remaining data points covered by that rule. The coverage gives an indication of the overall importance of the specific rule by measuring the number of correctly classified remaining data points over the total number of remaining data points. More formally, the pheromone amount to add to the path of the iteration best ant is given by the benefit of the path of the iteration best ant, as indicated by (6), with rule ant the rule antecedent (if part) comprising of a conjunction of terms corresponding to the path chosen by the ant, rule c ant the conjunction of rule ant with the class chosen by the ant, and Cov a binary variable expressing whether a data point is already covered by one of the extracted rules (Cov = 1) or not (Cov = 0). The number of remaining data points can therefore be expressed as Cov = 0. This means that, taking into account the evaporation factor as well, the update rule for the best ant s path is described by (4), where the division by ten is a scaling factor that is needed such that both the pheromone and heuristic values lie 8

9 within the range [0, 1]. Q + = rulec ant + rulec ant rule ant Cov = 0 }{{}}{{} confidence coverage (6) For example, returning to our simple data set (see Table 1), suppose we have following two rules: R1 : if Sex = M and Term 1 y and Term 15 y then customer = Bad R2 : if Sex = M and Term 1 y and Term 1 y and Real Estate = A then customer = Bad As shown in Table 1, rule R1 correctly classifies 3 of the 4 data instances described by the rule antecedent, yielding a confidence of The coverage of R1 is 0.6, as it correctly describes 3 of the 5 instances in the data set. Similarly for rule R2, a confidence and coverage of respectively 1 and 0.2 is obtained. This example shows that although rule R2 is completely accurate, shown by the confidence of 1, it is not the best rule, as we also take into account the coverage of the rule. The coverage makes sure that we avoid overfitting and obtain less rules. In previous research, a benchmarking study of AntMiner+ with state-of-theart classification techniques, such as C4.5, RIPPER and support vector machines, showed that AntMiner+ ranks at the absolute top when considering both accuracy and comprehensibility [? ]. However, a reluctance to accept the classification models may still exist as possibly unexpected signs in the hyperplane part of the AntMiner+ rules may arise, which may be due to spurious correlations in the data, but do not represent the actual risk relationship (simply put wrong inequation signs, e.g. rules as: if Income e and Savings Account e then customer = bad). To counter such inconsistencies with existing domain knowledge, we have extended the AntMiner+ classification technique to incorporate domain knowledge [? ]. The basic principle is as follows: considering our credit scoring example, we can make sure that increasing the amount on the applicant s savings account cannot lead to a customer changing from good to bad by removing the vertex group corresponding to Savings Account (see Fig. 2): since the ants look only for rules to classify bad customers (only the final else clause will classify a customer as good), the term with Savings Account can only be in the form Savings Account X. This allows the domain expert to enforce hard constraints on the inequality signs. Furthermore, a bias may also exist towards certain values, in which case the constraint is preferred and not mandatory. To deal with such soft constraints, the heuristic values can be adapted. For more details we refer to [? ]. The ability to incorporate domain knowledge is of crucial importance within 9

10 a credit scoring context, and reduces the Validation & Verification process of the model dramatically (see Section 5.1, further in the text). AntMiner+ is implemented in the platform-independent, object-oriented Java programming environment, with usage of the MySQL open source database server. Example screenshots of the Graphical User Interface (GUI) of AntMiner+ are included in Appendix. 4 Building Credit Risk Models with AntMiner+ In this section, we will illustrate how AntMiner+ can be used to build credit risk systems in three different contexts: retail banking, small and medium sized enterprises (SMEs), and bank ratings. As AntMiner+ can only deal with categorical variables, a discretization preprocessing step takes place in which the continuous variables are turned into discrete variables. This process is done in an automatic manner with the Weka workbench [? ] according the criterion of Fayyad [? ]. All experiments were run with 1000 ants and ρ set at 0.85, as suggested in [? ]. 4.1 Retail Banking In this section, we will illustrate how AntMiner+ can be used to develop application scoring models in a retail banking context. The purpose of application scoring is to provide a score or classification of a credit applicant given the application characteristics provided. The data set that we will use is the German credit data set, which is a publicly available application scoring data set (see mlearn/mlrepository.html) having 1000 observations and 20 application characteristics. Table 2 presents the rules that were extracted using AntMiner+. The extracted rule set is concise and easy to understand. Only 5 of the original 20 application characteristics are used for making the discrimination. This clearly has a beneficial impact on interpretability, but also on operational cost and efficiency. 4.2 SME Bankruptcy Prediction Under the IRB approach for corporate credits, the Basel II Capital accord allows banks to separately distinguish exposures to SME borrowers (defined 10

11 as corporate exposures where the reported sales for the consolidated group of which the firm is a part is less than 50 millione) from those to large firms. The SME data set consists 422 observations, 74 bankrupt and 348 solvent companies. The default data were collected from , while the other data were extracted from the period only. A total number of 40 candidate input variables was selected from financial statement data, using a.o. liquidity, profitability and solvency measures (see [? ] for an extensive description of this data set. Table 3 represents the rules that were extracted by AntMiner+. Again, only 5 of the 40 original inputs are used in making the discrimination decision. Note that the numbers were rounded and one variable was scaled randomly for confidentiality reasons. 4.3 Rating Prediction For retail and SME portfolios, one typically has a sufficient number of default observations in order to make statistical discrimination meaningful. However, when modeling credit risk for entities such as banks, sovereigns, or insurance companies, the lack of default observations necessitates the use of an alternative modeling approach. That is why many financial institutions opt for a mapping to external ratings in this context. In this section, we will study how AntMiner+ can be used to model credit risk for bank entities. The data was retrieved from the Bankscope database, which contains financial statements of more than banks. For each of these banks the Moody s rating will be used as the basis of the target variable (low/speculative-grade or good/investment-grade rating). These ratings were retrieved for the period The rating at the end of May of the year T + 1 is predicted based on a 3-year history of inputs observed during years T, T 1, T 2. A variety of different inputs was selected covering, amongst others, asset quality, capital, operational result and liquidity. The size variable Total Assets was also included as well as a geographical indicator Region (Euro-zone, dollar-zone, EU accession countries, Japan and others). After data preprocessing, the data set consisted of a cleaned database of 2996 observations with 37 inputs (see [? ] for a more extensive description). 4.4 Classification Model Performance Table 5 shows the results of the classification models induced by AntMiner+, C4.5, support vector machine (SVM) and majority vote. The experimental setup is the same for all included data sets. The data set is split up into training, validation and test set according following fractions: 4/9, 2/9 and 3/9, 11

12 as is common practice in data mining [?? ]. To eliminate any chance of having unusually good or bad training and test sets, 10 runs are conducted where the order of observations is first randomized before the training, validation and test set are chosen. For each randomization AntMiner+ is run with hard monotonicity constraints, as imposed by the financial expert. The best average test set performance over the 10 randomizations is underlined and denoted in bold face for each data set. We then use a paired t-test to test the performance differences. Performances that are not significantly different at the 5% level from the top performance with respect to a one-tailed paired t-test are tabulated in bold face. Statistically significant underperformances at the 1% level are emphasized in italics. Performances significantly different at the 5% level but not at the 1% level are reported in normal script. Since the observations of the randomizations are not independent, we remark that this standard t-test is used as a common heuristic to test the performance differences [? ]. As Table 5 shows, the non-linear SVM classifiers performs best in terms of accuracy, as can be expected [? ]. However, as mentioned before, the black-box nature of such non-linear classifiers make them less suited for credit scoring, where validation is required. When comparing the rule- and tree-based classifiers AntMiner+ and C4.5 we can observe very competitive accuracies, but when considering the number of rules as well AntMiner+ comes out as the best performing technique. On top of that, the AntMiner+ rule sets comply with stated domain constraints, which, as pointed out in [? ], can result in a decrease in accuracy. Yet a small decrease in accuracy can be allowable, an inconsistency with domain knowledge is not. 5 Towards a Basel II Credit Risk Management System Up till now, we have largely focused on extracting a comprehensible set of rules to do risk management in a Basel II context. These rules now need to be further analyzed and used in various activities so as to arrive at a full-fledged, integrated Basel II risk decision and management application. In what follows, we will discuss the most important activities, which are summarized in Fig Verification and Validation A first set of tools can be used to verify and validate (V&V) the extracted rule set. Verification will attempt to look for syntax based anomalies in the rule set. Whether the rule set is exhaustive (all cases being covered) and exclusive (a 12

13 case only covered by 1 rule) will be investigated in this step. Because of the ifthen-else nature of the AntMiner+ rule sets, they are by definition exhaustive and exclusive, making the verification step obsolete. In the validation step, it will be investigated whether the rules adequately model the risk involved from a human interpretation viewpoint. The financial credit expert will also be consulted and asked to interpret the rule set in this step. In order to facilitate the verification and validation step, decision tables may be adopted [? ]. Decision tables provide an alternative way of representing the AntMiner+ rule sets in a user-friendly way. A decision table (DT) consists of four quadrants, separated by double-lines, both horizontally and vertically (cf. Fig. 5). The vertical line divides the table into a condition part (left), specifying the inputs to be checked, and an action part (right) specifying the classes assigned. Each condition entry describes a relevant subset of values (called a state) for a given input, or contains a dash symbol ( ) if its value is irrelevant within the context of that column. Subsequently, every action entry holds a value assigned to the outcome class. True, false and unknown action values are typically abbreviated by,, and, respectively. Every row in the entry part of the DT thus comprises a classification rule, indicating what class results from a certain combination of inputs. If each row only contains simple states (no contracted or irrelevant entries), the table is called an expanded DT, whereas otherwise the table is called a contracted DT. Table contraction can be achieved by combining rows that lead to the same outcome class. The number of rows in the contracted table can then be further minimised by changing the order of the conditions. It is obvious that a DT with a minimal number of rows is to be preferred since it provides a more parsimonious and comprehensible representation of the extracted rule set than an expanded DT. This is illustrated in Fig. 6. In the literature, several kinds of DTs have been proposed. We will require that the condition entry part of a DT satisfies the following two criteria: completeness: all possible combinations of input values are included; exclusivity: no combination is covered by more than one column. As such, we deliberately restrict ourselves to single-hit tables, wherein columns have to be mutually exclusive, because of their advantages with respect to verification and validation [? ]. It is this type of DT that can be easily checked for potential anomalies, such as inconsistencies (a particular counterparty being assigned to more than one class) or incompleteness (no class assigned). The decision table formalism thus allows for easy verification of the extracted AntMiner+ rules. Additionally, for ease of legibility, the rows are arranged in lexicographical order, in which entries at lower rows alternate first. As a 13

14 result, a tree structure emerges in the condition entry part of the DT, which lends itself very well to a top-down evaluation procedure: starting at the first column, and then working one s way to the right of the table by choosing from the relevant condition states, one safely arrives at the outcome class for a given case. This condition-oriented inspection approach often proves to be more intuitive, faster, and less prone to human error, than evaluating a set of rules one by one. Decision tables can also be usefully adopted for validation purposes, as an easily be checked for potential anomalies, such as in- consistency with monotonicity constraints: by placing the assumingly monotone variable in the last column, adjacent rows are found with data entries that are equal in all variables except the last one. It can then be easily seen whether or not the class variable changes in the expected manner. As AntMiner+ has the supplementary benefit of incorporating such monotonicity constraints, as demonstrated in Section 3.2, the decision table will reveal no counter-intuitive patterns any more. For example, Table 6 depicts the decision table corresponding to the rule set extracted for the German credit scoring data set (see Table 2). Based on this table, we can easily check that credit history can only have a positive effect on the applicants assessment, if any. We can conclude that this first step of verifying and validating the model has been releaved significantly thanks to the nature of the induced rule sets (exhaustive and exclusive) and because of the incorporation of monotonicity constraints. This does however not mean that this phase is no longer needed, as the domain expert still needs to check whether the model is suitable. From that perspective, decision tables are still a very useful tool. 5.2 Traffic Light Decision Support System Once the rule set has been verified and validated, it needs to be implemented as a decision support system (DSS) which can be used by the credit officers so as to make the actual credit decision: accept or reject. The DSS can be implemented using a traffic light indicator approach that gives three possible outcomes: a green light, an orange light or a red light [? ]. A green light indicates that the rule set is confident enough to classify a customer as a good payer and credit should be accepted. An orange light indicates a doubt case for which human intervention is needed. This can be due to for example, low confidence of the rule set, external information obtained from a credit bureau (e.g. Equifax, Experian), a customer which is rejected borderline by the rule set but is very profitable on other financial products, and/or a new marketing campaign in which the financial institution decides to grant credit to some of the more risky customers. The orange light can allow for model overrides by 14

15 the credit expert. A low side override means that a customer rejected by the rule set is accepted, and a high side override vice versa. A red light indicates that the rule set is confident enough to classify a customer as a bad payer and credit should be rejected. Note that this traffic light indicator approach can also be implemented using four colors (green, yellow, orange, red) or gauges in a dashboard application. An implementation of a traffic light indicator approach using four colors could be as follows. Red when the rule set predicts bad customer and this is confirmed by the credit bureau information; Orange when the rule set predicts bad customer, but credit bureau says customer is good risk; Yellow when the rule set predicts bad customer, but confidence is very low and the credit bureau says customer is good risk; and Green when the rule set says good customer and the credit bureau says customer is good risk. Note that the financial institutions can decide for themselves on the number of colors and their meaning. 5.3 Interface to Basel II Calculation Engine The extracted rule set must also interface with a Basel II calculation engine which will use the rule outputs to calculate expected loss and the regulatory capital that a financial institution needs to set aside in order to cover unexpected credit losses. Therefore, in a calibration phase, each rule should be accompanied by a PD estimate which should be forward looking and based on five years of historical data. Once the estimates for the LGD and EAD have been obtained, the expected loss and the regulatory capital can be calculated. The expected loss (EL) can be calculated as EL = PD LGD EAD. It represents the long-run average credit loss and will be used for debt provisioning. The regulatory safety capital can then also be calculated based on the formula s provided in the Basel II Accord. E.g., for retail exposures the formula s are as follows K = LGD (Φ( 1 1 ρ Φ 1 (PD) + regulatory capital = K EAD ρ 1 ρ Φ 1 (0.999)) PD) (7) whereby Φ (Φ 1 ) represents the (inverse) cumulative standard normal distribution, and ρ the asset correlation factor which is fixed in the Accord [? ] (e.g for residential mortgage exposures). 5.4 Evaluating the Model over Time: Backtesting and Benchmarking The Basel II Capital Accord requires credit risk systems to be validated, at least annually. The accord distinguishes between backtesting, which is com- 15

16 paring the predicted outcome by the rule set with the realized outcome, and benchmarking, which is comparing the predicted outcome of the rule set with the outcomes of models of other parties in the industry (such as credit bureaus, other financial institutions, or financial regulators). From a backtesting perspective, the performance of the rule set needs to be monitored. Again, a traffic light indicator approach can be adopted with three outcomes: green light, orange light, red light [? ]. The decision which light to switch on can be determined based on the outcome of a test statistic which monitors the classification accuracy (e.g. McNemar s test [? ]). A green light indicates that the rule set performance is stable, e.g. no significant differences at the 5% level are reported. It means the rule set can continue to be used. An orange light may indicate e.g. a difference at the 5% level but not at the 1% level of significance. It indicates a performance difference which requires no immediate action but needs to be closely monitored in the future. A red light then indicates a significant performance difference at the 1% level. It indicates that the model is no longer appropriate for the current data which could possibly be due to a change of the population (often referred to as population drift) or a new strategy of the financial institution. In other words, the model needs to be rebuilt, which in our context would mean extracting a new rule set using AntMiner+. From a benchmarking perspective, a similar process can be conducted, whereby the traffic lights now indicate how much the two parties agree or disagree on their credit decisions. 6 Conclusion The introduction of the recently suggested Basel II Capital Accord has encouraged financial institutions to build efficient and high-performing credit risk models assessing the creditworthiness of their counterpartys. Ideally, these models should be both powerful, in terms of discriminating defaulters from non-defaulters, and comprehensible, in terms of explanatory power. In this paper, we discussed how Ant Colony Optimization can be used to build credit risk models for Basel II. More specifically, we used the AntMiner+ algorithm, which is a rule induction technique based on the principles of MAX-MIN Ant System. AntMiner+ distinguishes itself by the comprehensibility of the induced models which are in line with existing domain knowledge. We have also shown how decision tables can be useful to provide even more insight into the classification model. Experiments were conducted using three real-life credit risk data sets: one in retail, one for SMEs, and one for bank ratings. It was illustrated that for each of these data sets AntMiner+ extracted a powerful and concise rule set. Furthermore, it was also discussed how the induced rule sets could fit into a global credit risk management strategy and architecture. An interesting topic 16

17 for further research is to extend the algorithm to handle continuous targets and generate regression rules, which could be useful e.g. for modeling LGD and EAD. Acknowledgment We extend our gratitude to the (associate) editor and the anonymous reviewers, as their many constructive and detailed remarks certainly contributed much to the quality of this paper. Further, we would like to thank the Flemish Research Council (FWO, Grant G ), and the Microsoft and KBC- Vlekho-K.U.Leuven Research Chairs for financial support to the authors. References [] A. Abraham and V. Ramos. Web usage mining using artificial ant colony clustering. In the Congress on Evolutionary Computation, pages IEEE Press, [] B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, and J. Suykens, J.A.K.and Vanthienen. Benchmarking state of the art classification algorithms for credit scoring. Journal of the Operational Research Society, 54(6): , [] Basel Committee on Banking Supervision. International convergence of capital measurement and capital standards: a revised framework. Technical report, BIS, June [] C. Blum. Beam-ACO hybridizing ant colony optimization with beam search: An application to open shop scheduling. Computers & Operations Research, 32(6): , [] B. Bullnheimer, R. F. Hartl, and C. Strauss. A new rank based version of the ant system: A computational study. Central European Journal for Operations Research and Economics, 7(1):25 38, [] B. Bullnheimer, R.F. Hartl, and C. Strauss. Applying the ant system to the vehicle routing problem. In S. Voss, S. Martello, I.H. Osman, and C. Roucairol, editors, Meta-Heuristics: Advances and Trends in Local Search Paradigms for Optimization, [] G. Di Caro and M. Dorigo. Antnet: Distributed stigmergetic control for communications networks. Journal of Artificial Intelligence Research, 9: , [] A. Colorni, M. Dorigo, V. Maniezzo, and M. Trubian. Ant system for jobshop scheduling. Journal of Operations Research, Statistics and Computer Science, 34(1):39 53, [] V.S. Desai, J.N. Crook, and G.A. Overstreet Jr. A comparison of neu- 17

18 ral networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1):24 37, [] T. G. Dietterich. Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation, 10(7): , [] M. Dorigo and L. M. Gambardella. Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1):53 66, April [] M. Dorigo, V. Maniezzo, and A. Colorni. Positive feedback as a search strategy. Technical Report 91016, Dipartimento di Elettronica e Informatica, Politecnico di Milano, IT, [] M. Dorigo, V. Maniezzo, and A. Colorni. Ant System: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, 26(1):29 41, [] M. Dorigo and T. Stützle. Ant Colony Optimization. MIT Press, Cambridge, MA, [] U.M. Fayyad and K.B. Irani. Multi-interval discretization of continuousvalued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI), pages , Chambéry, France, Morgan Kaufmann. [] L. M. Gambardella and M. Dorigo. Ant-Q: A reinforcement learning approach to the traveling salesman problem. In A. Prieditis and S. Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages , Palo Alto, CA, Morgan Kaufmann Publishers Inc. [] D. Hand. Pattern detection and discovery. In D. Hand, N. Adams, and R. Bolton, editors, Pattern Detection and Discovery, volume 2447 of Lecture Notes in Computer Science, pages Springer, [] J. Handl, J. Knowles, and M. Dorigo. Ant-based clustering and topographic mapping. Artificial Life, 12(1):35 61, [] W.E. Henley and D.J. Hand. Construction of a k-nearest neighbour credit-scoring system. IMA Journal of Mathematics Applied In Business and Industry, 8: , [] B. Liu, H. A. Abbass, and B. McKay. Density-based heuristic for rule discovery with ant-miner. In 6th Australasia-Japan Joint Workshop on Intelligent and Evolutionary Systems (AJWIS2002), Canberra, Australia, [] B. Liu, H. A. Abbass, and B. McKay. Classification rule discovery with ant colony optimization. In IAT, pages IEEE Computer Society, [] D. Martens, M. De Backer, R. Haesen, B. Baesens, C. Mues, and J. Vanthienen. Ant-based approach to the knowledge fusion problem. In Proceedings of the Fifth International Workshop on Ant Colony Optimization and Swarm Intelligence, Lecture Notes in Computer Science, pages Springer,

19 [] D. Martens, M. De Backer, R. Haesen, M. Snoeck, J. Vanthienen, and B. Baesens. Classification with ant colony optimization. IEEE Transaction on Evolutionary Computation, 11(5): , [] R. Montemanni, L. M. Gambardella, A. E. Rizzoli, and A. Donati. Ant colony system for a dynamic vehicle routing problem. Journal of Combinatorial Optimization, 10(4): , [] R. S. Parpinelli, H. S. Lopes, and A. A. Freitas. An ant colony based system for data mining: Applications to medical data. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pages , San Francisco, California, USA, Morgan Kaufmann. [] D. Quintana, C. Luque, and P. Isasi. Evolutionary rule-based system for IPO underpricing prediction. In GECCO 05: Proceedings of the 2005 conference on Genetic and evolutionary computation, pages , New York, NY, ACM Press. [] D.J. Sheskin. Handbook of parametric and nonparametric statistical procedures. Chapman and Hall/CRC, [] K. Socha, J. Knowles, and M. Sampels. A MAX-MIN ant system for the university timetabling problem. In M. Dorigo, G. Di Caro, and M. Sampels, editors, Proceedings of ANTS 2002 Third International Workshop on Ant Algorithms, volume 2463 of Lecture Notes in Computer Science, pages 1 13, Berlin, Germany, September Springer-Verlag. [] A. Steenackers and M.J. Goovaerts. A credit scoring model for personal loans. Insurance: Mathematics and Economics, 8:31 34, [] T. Stützle and H. H. Hoos. Improving the ant-system: A detailed report on the MAX-MIN ant system. Technical Report AIDA 96-12, FG Intellektik, TU Darmstadt, Germany, [] T. Stützle and H. H. Hoos. MAX-MIN ant system. Future Generation Computer Systems, 16(8): , [] D. Tasche. Traffic lights approach to PD validation. Technical report, [] E. Tsang, P. Yung, and J. Li. EDDIE-automation, a decision support tool for financial forecasting. Decision Support Systems, 37(4): , September [] T. Van Gestel, B. Baesens, P. Van Dijcke, J. Garcia, J.A.K. Suykens, and J. Vanthienen. A process model to develop an internal rating system: sovereign credit ratings. Decision Support Systems, 42(2): , [] T. Van Gestel, B. Baesens, P. Van Dijcke, J.A.K. Suykens, J. Garcia, and T. Alderweireld. Linear and nonlinear credit scoring by combining logistic regression and support vector machines. Journal of Credit Risk, 1(4), [] J. Vanthienen, C. Mues, and A. Aerts. An illustration of verification and validation in the modelling phase of KBS development. Data and Knowledge Engineering, 27(3): , [] J. Vanthienen, C. Mues, and A. Aerts. An illustration of verification 19

20 and validation in the modelling phase of kbs development. Data and Knowledge Engineering, 27: , [] J. Vanthienen and G. Wets. From decision tables to expert system shells. Data and Knowledge Engineering, 13(3): , [] A. Wade and S. Salhi. An ant system algorithm for the mixed vehicle routing problem with backhauls. In Metaheuristics: computer decision-making, pages , Norwell, MA, Kluwer Academic Publishers. [] D. West. Neural network credit scoring models. Computers and Operations Research, 27: , [] I. H. Witten and E. Frank. Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, [] M.B. Yobas, J.N. Crook, and P. Ross. Credit scoring using neural and evolutionary techniques. IMA Journal of Mathematics Applied in Business and Industry, 11: , Appendix: Screenshots of AntMiner+ GUI Several screenshots of the AntMiner+ Graphical User Interface are provided in Fig. 7 and 8. Fig. 7 shows the initial menu of AntMiner+, allowing the user to choose the number of ants and evaporation rate ρ. The minimal fraction uncovered data input variable can be used as an alternative for the early stopping stop criterion: no more rules will be extracted when all but x% of the data has been covered by the extracted rule set. Note that all experiments were conducted with the early stopping criterion. Fig. 8 shows the construction graph for the SME data set during different stages of execution: from initialization (top) to convergence (bottom), with the width of the edges being proportional to their pheromone level. In the bottom box of each screenshot, the extracted rules with their accuracy on both training, validation and test set are displayed. 20

21 50% 33% 50% (a) 67% (b) Fig. 1. Path selection directed by pheromone: the more pheromone on a path, the more likely an ant will follow the path. This simple mechanism of indirect communication is sufficient for the overall ant colony to find short paths from the nest to the food source. Savings Savings Credit Class Purpose Account Account History Start bad car education business 0e 0e 100e 100e 250e 250e 500e 500e all paid none taken critical Stop any 1000e 1000e 3000e 3000e any Fig. 2. Example of a path described by an ant for a credit scoring construction graph defined by AntMiner+. The rule corresponding to the chosen path is if Purpose = car and Savings Account [0e,500e] then class = bad. Weight Parameters Class α = β = V0,= V1,= V2, V3, Vm,= v 0,1 v 1,1 v 2,1 v 3,1 v m,1 a 1 b 1 Start a 2 b 2 v 0,2 v 3,2 v 1,2 v 2,2 v m,2 Stop a 3 b 3 a 4 b 4 v 0,d 1 v 3,p 3 v 1,p 1 +1 v 2,p 2 v n,p m +1 Fig. 3. Multiclass construction graph of AntMiner+, with the inclusion of weight parameters. 21

22 data V&V Decision Support System AntMiner+ R1: if (Checking Account < 100 and Duration > 15m) then class = bad R2: if (Purpose = new car and Credit History = critical) then class = bad R3: else if (Checking Account < 0 and Purpose = furniture and Savings Account < 250 ) then class = bad R4: else class = good PD LGD EAD Backtesting & Benchmarking Capital Requirements Fig. 4. Credit risk management system with the use of AntMiner+. The induced rule set is verified and validated, after which it can be used as a decision support system to make actual credit risk decisions (accept or deny credit), and to calculate capital requirements. Finally, backtesting and benchmarking validate the credit risk management system over time. condition subjects condition entries action subjects action entries Fig. 5. DT quadrants. 22

23 1. Condition1 2. Condition2 3. Condition3 1. Class1 2. Class2 yes no yes no yes no yes no yes no yes no yes no (a) Expanded decision table 1. Condition1 2. Condition2 3. Condition3 1. Class1 2. Class2 yes yes no yes no no yes no (b) Contracted decision table Fig. 6. Minimizing the number of columns of a lexicographically ordered DT [? ]. Fig. 7. Screenshot of AntMiner+ initial menu. 23

24 Fig. 8. Screenshots of AntMiner+ run on the SME credit risk data set during different stages of execution: from initialization (top) to convergence (bottom) 24

CLASSIFICATION is one of the most frequently occurring

CLASSIFICATION is one of the most frequently occurring IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 11, NO. 5, OCTOBER 2007 651 Classification With Ant Colony Optimization David Martens, Manu De Backer, Raf Haesen, Student Member, IEEE, Jan Vanthienen,

More information

From Knowledge Discovery to Implementation: A Business Intelligence Approach Using Neural Network Rule Extraction and Decision Tables

From Knowledge Discovery to Implementation: A Business Intelligence Approach Using Neural Network Rule Extraction and Decision Tables From Knowledge Discovery to Implementation: A Business Intelligence Approach Using Neural Network Rule Extraction and Decision Tables Christophe Mues 1,2, Bart Baesens 1, Rudy Setiono 3, and Jan Vanthienen

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

USING LOGIT MODEL TO PREDICT CREDIT SCORE

USING LOGIT MODEL TO PREDICT CREDIT SCORE USING LOGIT MODEL TO PREDICT CREDIT SCORE Taiwo Amoo, Associate Professor of Business Statistics and Operation Management, Brooklyn College, City University of New York, (718) 951-5219, Tamoo@brooklyn.cuny.edu

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

Modified Ant Colony Optimization for Solving Traveling Salesman Problem

Modified Ant Colony Optimization for Solving Traveling Salesman Problem International Journal of Engineering & Computer Science IJECS-IJENS Vol:3 No:0 Modified Ant Colony Optimization for Solving Traveling Salesman Problem Abstract-- This paper presents a new algorithm for

More information

Using Ant Colony Optimization for Infrastructure Maintenance Scheduling

Using Ant Colony Optimization for Infrastructure Maintenance Scheduling Using Ant Colony Optimization for Infrastructure Maintenance Scheduling K. Lukas, A. Borrmann & E. Rank Chair for Computation in Engineering, Technische Universität München ABSTRACT: For the optimal planning

More information

ANT COLONY OPTIMIZATION ALGORITHM FOR RESOURCE LEVELING PROBLEM OF CONSTRUCTION PROJECT

ANT COLONY OPTIMIZATION ALGORITHM FOR RESOURCE LEVELING PROBLEM OF CONSTRUCTION PROJECT ANT COLONY OPTIMIZATION ALGORITHM FOR RESOURCE LEVELING PROBLEM OF CONSTRUCTION PROJECT Ying XIONG 1, Ya Ping KUANG 2 1. School of Economics and Management, Being Jiaotong Univ., Being, China. 2. College

More information

Credit Scorecards for SME Finance The Process of Improving Risk Measurement and Management

Credit Scorecards for SME Finance The Process of Improving Risk Measurement and Management Credit Scorecards for SME Finance The Process of Improving Risk Measurement and Management April 2009 By Dean Caire, CFA Most of the literature on credit scoring discusses the various modelling techniques

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

An ACO Approach to Solve a Variant of TSP

An ACO Approach to Solve a Variant of TSP An ACO Approach to Solve a Variant of TSP Bharat V. Chawda, Nitesh M. Sureja Abstract This study is an investigation on the application of Ant Colony Optimization to a variant of TSP. This paper presents

More information

IMPLEMENTATION NOTE. Validating Risk Rating Systems at IRB Institutions

IMPLEMENTATION NOTE. Validating Risk Rating Systems at IRB Institutions IMPLEMENTATION NOTE Subject: Category: Capital No: A-1 Date: January 2006 I. Introduction The term rating system comprises all of the methods, processes, controls, data collection and IT systems that support

More information

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90 FREE echapter C H A P T E R1 Big Data and Analytics Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90 percent of the data in the

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

ACO Hypercube Framework for Solving a University Course Timetabling Problem

ACO Hypercube Framework for Solving a University Course Timetabling Problem ACO Hypercube Framework for Solving a University Course Timetabling Problem José Miguel Rubio, Franklin Johnson and Broderick Crawford Abstract We present a resolution technique of the University course

More information

Journal of Theoretical and Applied Information Technology 20 th July 2015. Vol.77. No.2 2005-2015 JATIT & LLS. All rights reserved.

Journal of Theoretical and Applied Information Technology 20 th July 2015. Vol.77. No.2 2005-2015 JATIT & LLS. All rights reserved. EFFICIENT LOAD BALANCING USING ANT COLONY OPTIMIZATION MOHAMMAD H. NADIMI-SHAHRAKI, ELNAZ SHAFIGH FARD, FARAMARZ SAFI Department of Computer Engineering, Najafabad branch, Islamic Azad University, Najafabad,

More information

An Improved ACO Algorithm for Multicast Routing

An Improved ACO Algorithm for Multicast Routing An Improved ACO Algorithm for Multicast Routing Ziqiang Wang and Dexian Zhang School of Information Science and Engineering, Henan University of Technology, Zheng Zhou 450052,China wzqagent@xinhuanet.com

More information

Decision Diagrams in Machine Learning: an Empirical. Study on Real-Life Credit-Risk Data

Decision Diagrams in Machine Learning: an Empirical. Study on Real-Life Credit-Risk Data Decision Diagrams in Machine Learning: an Empirical Study on Real-Life Credit-Risk Data Christophe Mues, Bart Baesens,2, Craig M. Files 3, Jan Vanthienen K.U.Leuven, Dept. of Applied Economic Sciences,

More information

Introduction to consumer credit and credit scoring

Introduction to consumer credit and credit scoring University Press Scholarship Online You are looking at 1-10 of 12 items for: keywords : credit scoring Credit scoring Tony Van Gestel and Bart Baesens in Credit Risk Management: Basic Concepts: Financial

More information

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone

More information

An Improved Ant Colony Optimization Algorithm for Software Project Planning and Scheduling

An Improved Ant Colony Optimization Algorithm for Software Project Planning and Scheduling An Improved Ant Colony Optimization Algorithm for Software Project Planning and Scheduling Avinash Mahadik Department Of Computer Engineering Alard College Of Engineering And Management,Marunje, Pune Email-avinash.mahadik5@gmail.com

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Implementing Ant Colony Optimization for Test Case Selection and Prioritization

Implementing Ant Colony Optimization for Test Case Selection and Prioritization Implementing Ant Colony Optimization for Test Case Selection and Prioritization Bharti Suri Assistant Professor, Computer Science Department USIT, GGSIPU Delhi, India Shweta Singhal Student M.Tech (IT)

More information

Validation of Internal Rating and Scoring Models

Validation of Internal Rating and Scoring Models Validation of Internal Rating and Scoring Models Dr. Leif Boegelein Global Financial Services Risk Management Leif.Boegelein@ch.ey.com 07.09.2005 2005 EYGM Limited. All Rights Reserved. Agenda 1. Motivation

More information

BIG DATA IN BANKING AND INSURANCE

BIG DATA IN BANKING AND INSURANCE BIG DATA IN BANKING AND INSURANCE Prof. dr. Bart Baesens Department of Decision Sciences and Information Management, KU Leuven (Belgium) School of Management, University of Southampton (United Kingdom)

More information

On-line scheduling algorithm for real-time multiprocessor systems with ACO

On-line scheduling algorithm for real-time multiprocessor systems with ACO International Journal of Intelligent Information Systems 2015; 4(2-1): 13-17 Published online January 28, 2015 (http://www.sciencepublishinggroup.com/j/ijiis) doi: 10.11648/j.ijiis.s.2015040201.13 ISSN:

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

The ACO Encoding. Alberto Moraglio, Fernando E. B. Otero, and Colin G. Johnson

The ACO Encoding. Alberto Moraglio, Fernando E. B. Otero, and Colin G. Johnson The ACO Encoding Alberto Moraglio, Fernando E. B. Otero, and Colin G. Johnson School of Computing and Centre for Reasoning, University of Kent, Canterbury, UK {A.Moraglio, F.E.B.Otero, C.G.Johnson}@kent.ac.uk

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

TEST CASE SELECTION & PRIORITIZATION USING ANT COLONY OPTIMIZATION

TEST CASE SELECTION & PRIORITIZATION USING ANT COLONY OPTIMIZATION TEST CASE SELECTION & PRIORITIZATION USING ANT COLONY OPTIMIZATION Bharti Suri Computer Science Department Assistant Professor, USIT, GGSIPU New Delhi, India bhartisuri@gmail.com Shweta Singhal Information

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

STUDY OF PROJECT SCHEDULING AND RESOURCE ALLOCATION USING ANT COLONY OPTIMIZATION 1

STUDY OF PROJECT SCHEDULING AND RESOURCE ALLOCATION USING ANT COLONY OPTIMIZATION 1 STUDY OF PROJECT SCHEDULING AND RESOURCE ALLOCATION USING ANT COLONY OPTIMIZATION 1 Prajakta Joglekar, 2 Pallavi Jaiswal, 3 Vandana Jagtap Maharashtra Institute of Technology, Pune Email: 1 somanprajakta@gmail.com,

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Perspectives on Data Mining

Perspectives on Data Mining Perspectives on Data Mining Niall Adams Department of Mathematics, Imperial College London n.adams@imperial.ac.uk April 2009 Objectives Give an introductory overview of data mining (DM) (or Knowledge Discovery

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

Ant colony optimization techniques for the vehicle routing problem

Ant colony optimization techniques for the vehicle routing problem Advanced Engineering Informatics 18 (2004) 41 48 www.elsevier.com/locate/aei Ant colony optimization techniques for the vehicle routing problem John E. Bell a, *, Patrick R. McMullen b a Department of

More information

npsolver A SAT Based Solver for Optimization Problems

npsolver A SAT Based Solver for Optimization Problems npsolver A SAT Based Solver for Optimization Problems Norbert Manthey and Peter Steinke Knowledge Representation and Reasoning Group Technische Universität Dresden, 01062 Dresden, Germany peter@janeway.inf.tu-dresden.de

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

An ant colony optimization for single-machine weighted tardiness scheduling with sequence-dependent setups

An ant colony optimization for single-machine weighted tardiness scheduling with sequence-dependent setups Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization, Lisbon, Portugal, September 22-24, 2006 19 An ant colony optimization for single-machine weighted tardiness

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Obtaining Optimal Software Effort Estimation Data Using Feature Subset Selection

Obtaining Optimal Software Effort Estimation Data Using Feature Subset Selection Obtaining Optimal Software Effort Estimation Data Using Feature Subset Selection Abirami.R 1, Sujithra.S 2, Sathishkumar.P 3, Geethanjali.N 4 1, 2, 3 Student, Department of Computer Science and Engineering,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Optimal Planning with ACO

Optimal Planning with ACO Optimal Planning with ACO M.Baioletti, A.Milani, V.Poggioni, F.Rossi Dipartimento Matematica e Informatica Universit di Perugia, ITALY email: {baioletti, milani, poggioni, rossi}@dipmat.unipg.it Abstract.

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

Statistics for Retail Finance. Chapter 8: Regulation and Capital Requirements

Statistics for Retail Finance. Chapter 8: Regulation and Capital Requirements Statistics for Retail Finance 1 Overview > We now consider regulatory requirements for managing risk on a portfolio of consumer loans. Regulators have two key duties: 1. Protect consumers in the financial

More information

BMOA: Binary Magnetic Optimization Algorithm

BMOA: Binary Magnetic Optimization Algorithm International Journal of Machine Learning and Computing Vol. 2 No. 3 June 22 BMOA: Binary Magnetic Optimization Algorithm SeyedAli Mirjalili and Siti Zaiton Mohd Hashim Abstract Recently the behavior of

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

SACOC: A spectral-based ACO clustering algorithm

SACOC: A spectral-based ACO clustering algorithm SACOC: A spectral-based ACO clustering algorithm Héctor D. Menéndez, Fernando E. B. Otero, and David Camacho Abstract The application of ACO-based algorithms in data mining is growing over the last few

More information

Operations research and dynamic project scheduling: When research meets practice

Operations research and dynamic project scheduling: When research meets practice Lecture Notes in Management Science (2012) Vol. 4: 1 8 4 th International Conference on Applied Operational Research, Proceedings Tadbir Operational Research Group Ltd. All rights reserved. www.tadbir.ca

More information

Beating the MLB Moneyline

Beating the MLB Moneyline Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

More information

Low Default Portfolio (LDP) modelling

Low Default Portfolio (LDP) modelling Low Default Portfolio (LDP) modelling Probability of Default (PD) Calibration Conundrum 3 th August 213 Introductions Thomas Clifford Alexander Marianski Krisztian Sebestyen Tom is a Senior Manager in

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

CONTACT(S) Riana Wiesner rwiesner@ifrs.org +44(0)20 7246 6926 Jana Streckenbach jstreckenbach@ifrs.org +44(0)20 7246 6473

CONTACT(S) Riana Wiesner rwiesner@ifrs.org +44(0)20 7246 6926 Jana Streckenbach jstreckenbach@ifrs.org +44(0)20 7246 6473 IASB Agenda ref 5D STAFF PAPER IASB Meeting Project Paper topic Financial Instruments: Impairment Definition of default 16-19 September 2013 CONTACT(S) Riana Wiesner rwiesner@ifrs.org +44(0)20 7246 6926

More information

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

More information

DATA PREPARATION FOR DATA MINING

DATA PREPARATION FOR DATA MINING Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI

More information

The Predictive Data Mining Revolution in Scorecards:

The Predictive Data Mining Revolution in Scorecards: January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms

More information

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique

More information

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

More information

Software Project Planning and Resource Allocation Using Ant Colony Optimization with Uncertainty Handling

Software Project Planning and Resource Allocation Using Ant Colony Optimization with Uncertainty Handling Software Project Planning and Resource Allocation Using Ant Colony Optimization with Uncertainty Handling Vivek Kurien1, Rashmi S Nair2 PG Student, Dept of Computer Science, MCET, Anad, Tvm, Kerala, India

More information

On the Empirical Evaluation of Las Vegas Algorithms Position Paper

On the Empirical Evaluation of Las Vegas Algorithms Position Paper On the Empirical Evaluation of Las Vegas Algorithms Position Paper Holger Hoos ½ Computer Science Department University of British Columbia Email: hoos@cs.ubc.ca Thomas Stützle IRIDIA Université Libre

More information

PLAANN as a Classification Tool for Customer Intelligence in Banking

PLAANN as a Classification Tool for Customer Intelligence in Banking PLAANN as a Classification Tool for Customer Intelligence in Banking EUNITE World Competition in domain of Intelligent Technologies The Research Report Ireneusz Czarnowski and Piotr Jedrzejowicz Department

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Despite its emphasis on credit-scoring/rating model validation,

Despite its emphasis on credit-scoring/rating model validation, RETAIL RISK MANAGEMENT Empirical Validation of Retail Always a good idea, development of a systematic, enterprise-wide method to continuously validate credit-scoring/rating models nonetheless received

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Title: Domain Knowledge Integration in Data Mining using Decision Tables: Case. Studies in Churn Prediction

Title: Domain Knowledge Integration in Data Mining using Decision Tables: Case. Studies in Churn Prediction 1 Title: Domain Knowledge Integration in Data Mining using Decision Tables: Case Studies in Churn Prediction Elen Lima (corresponding author) School of Management University of Southampton Southampton

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Random Forest Based Imbalanced Data Cleaning and Classification

Random Forest Based Imbalanced Data Cleaning and Classification Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem

More information

NEURAL NETWORKS IN DATA MINING

NEURAL NETWORKS IN DATA MINING NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,

More information

Master of Science in Health Information Technology Degree Curriculum

Master of Science in Health Information Technology Degree Curriculum Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525

More information

Banking Analytics Training Program

Banking Analytics Training Program Training (BAT) is a set of courses and workshops developed by Cognitro Analytics team designed to assist banks in making smarter lending, marketing and credit decisions. Analyze Data, Discover Information,

More information

HYBRID ACO-IWD OPTIMIZATION ALGORITHM FOR MINIMIZING WEIGHTED FLOWTIME IN CLOUD-BASED PARAMETER SWEEP EXPERIMENTS

HYBRID ACO-IWD OPTIMIZATION ALGORITHM FOR MINIMIZING WEIGHTED FLOWTIME IN CLOUD-BASED PARAMETER SWEEP EXPERIMENTS HYBRID ACO-IWD OPTIMIZATION ALGORITHM FOR MINIMIZING WEIGHTED FLOWTIME IN CLOUD-BASED PARAMETER SWEEP EXPERIMENTS R. Angel Preethima 1, Margret Johnson 2 1 Student, Computer Science and Engineering, Karunya

More information

Ant Colony Optimization (ACO)

Ant Colony Optimization (ACO) Ant Colony Optimization (ACO) Exploits foraging behavior of ants Path optimization Problems mapping onto foraging are ACO-like TSP, ATSP QAP Travelling Salesman Problem (TSP) Why? Hard, shortest path problem

More information

Network Load Balancing Using Ant Colony Optimization

Network Load Balancing Using Ant Colony Optimization Network Load Balancing Using Ant Colony Optimization Mr. Ujwal Namdeo Abhonkar 1, Mr. Swapnil Mohan Phalak 2, Mrs. Pooja Ujwal Abhonkar 3 1,3 Lecturer in Computer Engineering Department 2 Lecturer in Information

More information

Running Time Analysis of ACO Systems for Shortest Path Problems

Running Time Analysis of ACO Systems for Shortest Path Problems Running Time Analysis of ACO Systems for Shortest Path Problems Christian Horoba 1 and Dirk Sudholt 1,2 1 Fakultät für Informatik, LS 2, Technische Universität Dortmund, Dortmund, Germany, lastname@ls2.cs.tu-dortmund.de

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

An Integer Programming Model for the School Timetabling Problem

An Integer Programming Model for the School Timetabling Problem An Integer Programming Model for the School Timetabling Problem Geraldo Ribeiro Filho UNISUZ/IPTI Av. São Luiz, 86 cj 192 01046-000 - República - São Paulo SP Brazil Luiz Antonio Nogueira Lorena LAC/INPE

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information