Recovery Rate Modelling of Nonperforming Consumer Credit Using Data Mining Algorithms


 Jacob Dorsey
 1 years ago
 Views:
Transcription
1 Recovery Rate Modelling of Nonperforming Consumer Credit Using Data Mining Algorithms Markus Hoechstoetter, Abdolreza Nazemi, Svetlozar T. Rachev and Caslav Bozic RMI Working Paper No. 12/09 Submitted: September, 2012 Abstract There have been more studies on recovery rate modeling of bonds than of personal loans and retail credit. As far as the authors are aware, there exists no research of recovery rate modeling in retail credit for thirdparty buyers. The goal of this paper is to fill this gap. In our study, over nine million defaulted or nonperforming consumer credit data provided by a German debt collection company are used. According to the findings, the optimum times of the collection processes are not the same for all industries. Moreover, from a variety of characteristics, those debtor characteristics that are most significant in predicting the recovery have been determined. To select the best prediction and classification model, a variety of statistical and data mining methods such as logistic regression, neural network, Knearest neighbor, CHAID, CART, Support Vector Machine and regression will be examined. A twostage model which first classifies debts to extreme and nonextreme recovery rate is applied; then, the extreme debts are classified into full payment and nonpayment. Moreover, the nonextreme recovery rates are predicted. Keywords: Recovery rate, Data Mining, Thirdparty Company Markus Hoechstoetter School of Business Engineering Karlsruhe Institute of Technology, Germany Svetlozar T. Rachev College of Business Stony Brook University, New York, USA Abdolreza Nazemi National University of Singapore Risk Management Institute Caslav Bozic School of Business Engineering Karlsruhe Institute of Technology, Germany Abdolreza Nazemi. Views expressed herein are those of the author and do not necessarily reflect the views of NUS Risk Management Institute (RMI).
2 Recovery Rate Modelling of Nonperforming Consumer Credit Using Data Mining Algorithms Markus Hoechstoetter a,1, Abdolreza Nazemi b,2, Svetlozar T. Rachev c,d,3, Caslav Bozic a,4 a School of Business Engineering, Karlsruhe Institute of Technology, Germany b Risk Management Institute, National University of Singapore, Singapore c College of Business at Stony Brook University, New York, USA d FinAnalytica, New York, USA Abstract There have been more studies on recovery rate modeling of bonds than of personal loans and retail credit. As far as the authors are aware, there exists no research of recovery rate modeling in retail credit for thirdparty buyers. The goal of this paper is to ll this gap. In our study, over nine million defaulted or nonperforming consumer credit data provided by a German debt collection company are used. According to the ndings, the optimum times of the collection processes are not the same for all industries. Moreover, from a variety of characteristics, those debtor characteristics that are most signicant in predicting the recovery have been determined. To select the best prediction and classication model, a variety of statistical and data mining methods such as logistic regression, neural network, Knearest neighbor, CHAID, CART, Support Vector Machine and regression will be examined. A twostage model which rst classies debts to extreme and nonextreme recovery rate is applied; then, the extreme debts are classied into full payment and nonpayment. Moreover, the nonextreme recovery rates are predicted. Keywords: Recovery rate, Data Mining, Thirdparty Company addresses: (Markus Hoechstoetter), (Abdolreza Nazemi ), (Svetlozar T. Rachev), (Caslav Bozic) 1 Markus Hoechstoetter is a post doctoral researcher at the Chair of Statistics, Econometrics and Mathematical Finance at the School of Economics and Business Engineering, Karlsruhe Institute of Technology, Germany. 2 Abdolreza Nazemi is a post doctoral researcher at the Risk Management Institute, National University of Singapore, Singapore. 3 Svetlozar Rachev is Professor at College of Business, Stony Brook University in New York and Chief Scientist of FinAnalytica. 4 Caslav Bozic is a doctoral student at the AIFB Institute of the Faculty of Economics and Business Engineering, Karlsruhe Institute of Technology, Germany and a research and develpment engineer at Avedas AG, Karlsruhe, Germany.
3 2 1. Introduction The latest, or should we say current, crisis is still present in various aspects of life. The damage to the corporate world can be looked up in Standard & Poor's (2011), for example. However, individuals have also suered from the crisis as many employees were laid o and, to make things worse, banks began to suppress lending on a large scale which resulted in the credit crunch. This is even more worrying since borrowing or, more generally, being in debt has become the omnipresent situation worldwide. For example, in the USA, corporate and consumer debt has reached dizzying levels. According to the Board of Governors of the Federal Reserve System (2011), the accumulated debt amounts to over 10 trillion dollars. This trend, however, is not unique to the USA. In Europe, the trend is similar, as pointed out by Thomas (2009); and, if this trend is not reversed, the need for the expansion of lending will remain a pressing issue. It is obvious that there has to be a sophisticated system that will enable credit to be extended and guarantee that the impact of defaulted debt will be cushioned. Hence, there is a need for thirdparty buyers to relieve originators from the stressful collection business. In the context of lending and the inherent risk, the terminology and denitions given in the Basel II Accords are widely used. Mainly, the terms 'expected loss', the lossgivendefault conditional on default as well as recovery rate are commonly used in the context of nancial debt. Before the regulation in the Basel Accords was constituted, lenders in the retail sector designed a now widelyused tool, the credit scorecard. This device helps to assess the probability of a new customer defaulting. The actual score is usually a threedigit number. It is computed from the set of consumer characteristics. The scorecard helps to distinguish between potentially good and bad borrowers, i.e., those who default and those who do not. As references, we recommend Hand (2001) and Thomas et al. (2002). However, this tool has not proved reliable in predicting complete loss or recovery once the debtor has defaulted on his/her obligations. Instead, logistic regression has been applied more often in the analyses of recovery rates. A problem arising from the denitions in the Basel accords, however, is the sometimes not uniquely specied parameters, leaving room for interpretation and diverging implementations. For example, the recovery rate as the percentage of the original amount of debt outstanding that is repaid to or repossessed by the lender can be stated either as the price the distressed debt would achieve if sold immediately after default or as the sum of discounted future payments received from the debtor. The main interest of this paper is the prediction of the recovery rate or, conversely, the lossgivendefault (LGD). Practitioners as well as researchers agree that the recovery rate has to be modelled by some quantity that is restricted to the interval between zero and one. But there are no standardized models for the modelling and prediction of the recovery rate. This is common to basically all industries. So, some models include macroeconomic variables, which is in line with the Basel Accords, while others are restricted to information on the debt. Most of all, the
4 3 methodological side suers from a lack of freely accessible data, thus hampering the production of reliable results of general value. Moreover, since the collection process can be carried out by the original lender, i.e., inhouse, as well as by some third party acting either as agent or as purchaser of the debt, the models have to consider the dierence in accessibility of relevant information for the two options. In the model, we will use only debtorrelated variables without macroeconomic factors. Our ndings will reveal that age of debtor and the amount of debt outstanding are important determinants for the recovery rate. We furthermore see as one of our main contributions the presentation of results on recovery rates from various nonbanking industries on a scale not presented before in the literature to our knowledge. The remainder of the paper is organized as follows. The following section presents the features of the dierent options of the lender with respect to collection and ownership of the debt. In section 3, statistical and data mining methods will be reviewed in brief. Then, we present our data in section four. The important results of the exploratory data analysis are set out in section ve. We discuss the recovery rate modelling using data mining methods in section 6. This is followed by an explanation of our results in section Recovery and collection process In this section, we will present the dierent collection options that exist for a lender, and their implications for LGD, an excellent account of which is given in Thomas et al. (2011). Generally, collecting debt and recovering it after default consists of a sequence of measures such as written reminders, more specic letters, telephone calls and, ultimately, legal action such as court orders and sending marshals. The dierent measures commonly result in various actions by the debtor. Often, as a result of a telephone call, creditor and debtor negotiate a repayment plan. It is believed by most collection entities that a telephone call is preferable to an appearance in person because the conversation is still personal while the debtor is not confronted with a physical intrusion. Usually, the collection process becomes more robust as default continues; in the public perception this is commonly attributed to a debt collector's alleged rude demeanour and it may even discredit the entire lending industry. Proof of this public disapproval of the loan and collection business has been appearing in abundance in the press since the culmination of the latest crisis. Unless hindered by the legal framework, a lender might consider any of three basic collection options during the life of the lending relationship, the rst of which is the internal (or inhouse) collection and recovery process. Lenders usually resort to this, at least in the beginning. Secondly, a lender may decide to remain owner of the debt but assign collection to an agent. The third option is to sell the debt to a third party and, hence, end the relationship with the borrower. The measures taken may be the result of discretionary considerations relating to certain customers only, or part of a standardized provision formulating automated procedures. For as long as repayment is
5 4 on schedule, the lender will retain the collection in his business. Also, if the lender determines that he is better able than an outside agent to recover the debt after default based on his experience and also because the relationship with the defaulting borrower is valuable to him, he will opt to collect it himself. When the collection process is too involving for the creditor or might harm his reputation because the language to be applied at a certain stage of collection may have to become more direct, the lender may choose to hand the collection process to an outside agent while retaining ownership of the debt. If default seems more likely or the ownership of the distressed debt causes additional aggravation for the lender, he will probably sell the debt at a discount to a thirdparty buyer. The price will have to be lower than the recovery value expected by the buyer. An advantage of internal collection may be that all characteristics concerning the debt are known whereas a thirdparty buyer is lacking important information such as loan details, borrower repayment behaviour or change in score, which is a privilege of the original lender, according to Fama (1985). The thirdparty buyer does receive essential information on exactly when default occurred, the exact amount outstanding, and when the last payment was made. The thirdparty buyer like the outside agent, however, may suer from a negative selection because he will most likely only have access to poorly performing debt, as argued in Ramakrishnan and Thakor (1984). According to Thomas et al. (2011), it is not surprising that only 7% repaid the whole debt, 16.3% paid a fraction, and the vast majority, i.e. 83%, did not pay anything when the collection was carried out by a third party. This is in sharp contrast to the outcome when collection was undertaken by the inhouse collection department: 30% repaid the total amount, while 60% and 10% repaid only a share or nothing, respectively. In our results section, we will provide very detailed information on recovery rates faced by a thirdparty buyer of nonperforming consumer debt. 3. Statistical and Data Mining Models 3.1. Regression 3.2. Logistic Regression Logistic regression is used for quantitative variables, particularly when the response variable is a categorical variable. Let us dene a binary random variable as 1 if default occur Y = 0 if default does not occur with π = P r(y = 1) and 1 π = P r(y = 0) which is the famous example of occurrence of default. The multiple logistic regression is as follows: π i = exp(x β) 1 + exp(x β) = exp(β 0 + β 1 X β n X n ) 1 + exp(β 0 + β 1 X β n X n ) (1)
6 3.3 Neural Network 5 where β is vector of coecients and X is matrix of observations Neural Network The design of the neural network is especially appealing because of its several layers of perceptrons. Common to any design are an input layer, one or more hidden layers of neurons, and an output layer. In the simplest version with just one hidden layer, input data consisting of observations x j of j = 1, 2,..., d variables enters neuron i of the hidden layer to be transformed there into a weighted functional output d h i = f (1) (b i + w i,j x j ) with weights w i,j and neuronspecic constant b i. Output from all n h hidden neurons is then turned into network output n h y = f (2) (b (2) + v i h i ) with neuron weights v i. The neural network allows for a exible yet sometimes unintuitive design. This technique is particularly apt in separating samples with respect to objective functions such as, for example, zero or full recovery Knearest neighbor KNN is a popular method in data mining and is used for classication based on closest observations in the variable space. The Knearest neighbor was introduced in the early 1950s. KNN can also be applied for prediction. 5 Suppose that the learning sample has n attributes, which means each learning sample is shown as a point in n dimensional space. If we want to classify a new sample using Knearest neighbor, then KNN is looking for Knearest learning samples that are nearest to the new sample. After that, Knearest neighbor detects a class of the new sample as the class majority of these Knearest neighbors. When k = 1, the class of the new sample is the same as the class of the nearest learning sample to this point. The Euclidean distance or Mahalanobis distance is applied as a distance metric. The Euclidean distance between two points or tuples X 1 = (x 11, x 12,..., x 1n ) and X 2 = (x 21, x 22,..., x 2n ) is dened as D E (X 1, X 2 ) = n (x 1i x 2i ) 2 j=1 i=1 i=1 5 see, Han and Kamber, (2006).
7 3.5 Trees 6 For calculation, we usually use the normalized variables. Suppose that, for X = (x 1,..., x n ) T, the covariance matrix is equal to Σ and µ = (µ 1,..., µ n ) T is the mean vector; then the Mahalanobis distance is dened as D M (X) = (x µ) T Σ 1 (x µ) The Mahalanobis distance is based on correlations between variables by which dierent patterns can be identied and analyzed. It is a useful way of determining similarity of variables. The Euclidean distance applies in many classication problems. In KNN, when the value of a variable is missing, KNN uses the maximum dierence between two samples; this means that, if both values of a normalized variable in sample X 1 and X 2 are missing, the dierence is assumed to be 1. On the other hand, if only one of them is a missing value and the other one has a value b, the dierence is considered to be 1 b or b. For a categorical variable, KNN assumes it to be 1. For the Knearest neighbor algorithm, we need to determine the number of nearest neighbors and distance measure. The best number of k depends only on data and we can nd it by comparing error rates for dierent k values. In general, the larger the values of k, the more stable the model of classication. However, the larger values of k mean that the learning data samples now being included are not very close to the new sample. If k is equal to 1, then the class of new sample is predicted as the class of the closest learning sample; this is called the nearest neighbor algorithm and it makes a rather unstable classier Trees Breiman and Friedman introduced recursive partitioning algorithms in Decision trees are usually classied into two groups: classication trees and regression trees. In classication trees, the target variable is categorical or qualitative but we can also use classication trees when the dependent variable is continuous. Thomas et al. (2002) mentioned that classication trees were applied in credit scoring by Makowski in The basic idea of tree construction is to nd subsets with maximum homogeneity or cases that are located in a subset belonging only to one class of target variable. At each step of splitting, tree algorithms split cases with independent variables that have maximum homogeneity. We dene impurity of a node as a function of the probability of a dierent class in the node under consideration: i(t) = φ(p 1, p 2,..., p J ) 6 see, Giudici and Figini, (2009), Hand et al. (2001), Han and Kamber, (2006). 7 see, Thomas et al., (2002), Giudici and Figini, (2009).
8 3.6 Support Vector Machine 7 where the p j is a probability of cases belonging to class j. There are dierent kinds of impurity function with these characteristics. One of the principal dierences in tree algorithms is related to impurity function. Breiman et al. (1984), Deville (2006), Giudici (2003) and Thomas et al. (2002) pointed out certain impurity functions, for example, Gini i(t) = j p(j t)(1 p(j t)), Entropy i(t) = (p(0 l) p(0 r))2 j p(j t)log(p(j t)) or Maximize halfsum of squares Chi = n(l)n(r) where n(l)+n(r) n(r) and n(l) are the number of observations in the right and left nodes. The large value of χ 2 statistic Chi means that the two proportions are not the same. The reduction of impurity that the split obtained was dened as quality of a split as: i = i(v) [π(l)i(l) + π(r)i(r)] where π(l) and π(r) are the observed proportions of observations in classication. In fact, tree algorithms select the variable that has best quality of a split. Finally, tree algorithms label leaf nodes based on the majority of target variables. In regression trees, tree tted ŷ i that is equal to mean of dependent variable for observations in considering leaf node. Classication and regression trees (CART) are the most usual tree algorithms (Breiman et al., 1984). In CART, the target variable could be categorical and continuous. The impurity function of CART is assumed to be Gini or entropy. Chisquare Automatic Detection (CHAID) was developed by Kass in Furthermore, the impurity is assumed to be chisquare Support Vector Machine The SVMs are used to separate debtors into two categories (y = 1 or y = 1) based on some hyperplane threshold with perpendicular vector w maximizing the minimal distance of each of the two groups from the threshold. With the optimal hyperplane, the training data keep a minimum distance of b from the hyperplane to guarantee generality of the model. The optimization problem using all n observations (y i, x i ), x i ɛr d is thus given by or in the dual form min w,b w 2 2, s.t. y i(< w, x i > +b) 1, i = 1, 2,..., n (2) min a n a i 1 2 i=1 n a i a j y i y j < x i, x j >, i=1,j s.t. n a i y i = 0 (3) where <, > denotes the inner product. The separating rule is then given by f(x) = sign(< w, x > +b) or, equivalently, f(x) = sign( n i=1 a iy i < x i, x > +b). A problem occurs if the data are not linearly separable as required. To this end, the original data vector xɛr d is mapped into a higher dimensional (K > d) feature space with a nonlinear function φ : R d R k, x φ(x). To circumvent the calculations of the inner products and associated dot products in the higher dimension, the so called kerneltrick is applied, requiring only computation i=1
9 3.7 Dmneural 8 of kernel functions k(x i, x j ) =< φ(x i ), φ(x j ) > for the dot products. Thus, the transformation into the higher dimension space can be actually avoided. The resulting separating function is now f(x) = sign( n i=1 a iy i k(x i, x) + b). Common kernel functions are, for example, polynomial k(x i, x) =< x i, x > or radial basis k(x i, x) = exp( x i x 2 /c). The authors state that the advantages are given by the use of key observations only for the sake of speed, the translation of the discrimination problem into a quadratic problem, and the projection of the original problem onto a higher dimensional space to apply a linear discrimination function. They begin the modelling with a stepwise selection process of the most powerful variables to separate the data set into homogenous subsets. LSSVM is a version of SVM to conduct a linear regression of the form y = φ(x) i b + ε with the original data x mapped into a higher feature space by φ to obtain a higher degree of linearity. Using a kernel K(x, x i ) = φ(x) T φ(x i ) simplies the optimization in the preferred dual form y = n i=1 a iφ(x) T φ(x i ) + e Dmneural As far as we are aware, dmneural network training is not a very popular method in data mining but we will compare the performance of this model to other models. In the learning dataset, the dmneural species the best principal components of independent variables for maximum variation in the response variable; consequently it chooses the best group of independent variables in prediction or classication of response variable. Dmneural omits the independent variables that have less information for prediction of the response variable. According to principal components' characteristics, they are uncorrelated. An activation function is applied to the linear combination of independent variables and principal components. Matignon (2007) points out the eight dierent activation functions among them Gaussian, logistic, exponential and square. 8 The misclassication rate in the classication problem and the sum of square error in the prediction problem are used in specifying the best activation function in the next phases. The dmneural applies the response variable and the residual in prediction or classication of response variables from the rst step. The dmneural model constructs an additive nonlinear model as follows. Matignon (2007) points out the following additive nonlinear model as dmneural model: ŷ = nphases i=1 g(f(x, α)) Where g is the link function and the best activation function is f in phase i. 8 see, Matignon, (2007).
10 9 4. Data description 4.1. Data provider Our data consist of close to ten million dierent unsecured debts purchased between 2001 and 2010 by arvato infoscore, one of the largest debt purchasers in Germany. The company combines a collection business (German Inkasso), scoring services, and factoring. Factoring, as known in Germany, is a particular form of thirdparty nancial service for originator lenders. The most common variations are fullservice, selective, notication, semifactoring, and silent factoring. In case of normal factoring, the debt buyer, i.e., the factor, receives all debt from the originator in an automatically revolving process agreed upon in advance. The factor is owner as well as collector of the debt after its cession from the originator. It is the most common form of factoring in Germany. Selective factoring describes a construct where only selected debt is sold o to the thirdparty factor. When the third party oers notication factoring, the debtor is informed about the sale of the debt and can only repay to the thirdparty factor. The default risk, however, remains with the originator which, in the case of default, has to reimburse the factor. In silent factoring, the debtor is ignorant of the sale of the debt and payment is only possible to the original creditor. A negative consequence for the factor is a lack of inuence over the debtor since he is not entitled to collect. And nally, when semifactoring is chosen between originator and factor, the debtor remains ignorant of the sale of the debt, as well, but payments are to be made exclusively to accounts or addresses that belong to the factor. In the case of arvato infoscore, the company engages in fullservice factoring. Although a legally separate entity, in the case of collection and scoring businesses combined in the same company, the thirdparty buyer has the advantage that the collection department has often developed long lasting relationships with debtors during the period of initial ownership by the respective originators. These relationships yield precious information the thirdparty buyer would not have access under any other circumstances. However, legally, this is sometimes limited to the information that would be oered to any thirdparty buyer. In the following, we consider data only accessible to regular outside buyers Data The data consist of roughly ten million defaulted or nonperforming unsecured receivables from nine dierent categories that are customers of the thirdparty buyer. On the one hand, these categories represent the following industries: telecommunications, online shopping and mail ordering, nancial services including credit cards, and the utility and energy sector. Moreover, receivables from the nonprotentities of the public sector (community services) and public transport are also part of these categories as are failed return debit notes as well as anything that does not t into any of the prior categories; these are subsumed in the miscellaneous category including,
11 4.2 Data 10 for example, unpaid parking tickets. In the following, we will use these abbreviations to indicate the respective industries: Mail order (MO), businesstobusiness (B2B), energy and utilities (NRGY), nancial services (FS), miscellaneous (MI), public sector (PS), return debit note (RDN), telecommunications (TC), and public transport (PT). Each debtor is assigned a unique identication number. For each receivable a unique identi cation number is issued, and all payments on the account of a particular receivable have to be labelled with the respective identication number. The relationship between receivable and borrower is not unique since a borrower might have defaulted on more than one receivable in arrears, whereas a receivable in arrears can only belong to one borrowing entity. A payment is characterized by the identication number of the receivable and, thus, can be traced to the corresponding debtor. Furthermore, we selected from all given payment characteristics those that could most easily be transformed into a numerical variable or a categorical variable of low dimension. The resulting variables relating to the debtor include age, gender, residential status and address, as well as current credit history. The variables related to the accounts receivables include age of debt, date of purchase by third party, amount outstanding and last payment date, while the original receivable amount is usually unknown. This yields about 15 variables that can be used for the subsequent analysis. For example, information on the quality of the location of the residence, which can be obtained by transforming the postal code into a rating, has not yet been considered. Henceforth, we will use the terms 'category' and 'industry' interchangeably. In Table 1, we have presented the most important statistics of the data sorted by industry. It also contains some initial results that we will discuss further in the last section. As we can see from adding the values of `# debts (original)', the total number of receivables is 9,793,590. Because our computational capacity was limited at the time we received the data, we decided to use only 100,000 randomly selected receivables from the mail order industry and only 500,000 randomly selected receivables from the public transport category. We will use the complete dataset for other categories. We assume that the means of debts in the complete datasets of mail order and public transport categories are the same as in their samples; thus, the amount of debt outstanding is 1,248,266, Euros. The recovery rate in all categories is not the same and the range of mean recovery rate is between and The mean recovery rate of public sector is the lowest and the mail order category has the highest mean of recovery rate. The nancial services category has the highest mean of debt at and one of the lowest mean of recovery rate at In Table 1, if we subtract mean debt age from mean debt age in thirdparty we have the duration between default occurrence and sello to thirdparty company. The categories with shorter periods between default and sello to thirdparty company have seen more recovery than the categories that have longer period between default and sello to thirdparty. For example, these periods in nancial services and public sector are around 43 and 23 months, although for mail order and telecommunications, they are around ve months. The payments that did not convey reliable information on all the
12 11 used variables were discarded from further analysis. The missing information is indicated by the respective superscripts of the industry. This was age of debtor, age of receivable, or the identity of the debtor. Moreover, we cleaned with respect to `Earliest entry' and `Last entry' per industry, since there were unreasonable values, most likely the result of laxity during data entry. Eliminating these outliers resulted in the new values as presented here. At this point, it becomes obvious how important the quality of the data is for the thirdparty buyer since he generally has no means of validating and, if necessary, correcting them. However, the thirdparty buyer has to cope with many aws in the data. TABLE 1 5. Exploratory data analysis We use the complete data sets except mail order and public transport industries. Because our computational capacity was limited at the time we received the data, we decided to use only 100,000 observations randomly selected from mail order and 500,000 observations from the public transport section. The mail order section has the highest recovery rate that is and the public sector has the lowest recovery rate in the portfolio that is Table 2 shows the quantile of debt amount, recovery rate, time until full payment by debtor who paid fully and time of last payment by debtor who paid at least something. The very important result from this table is that the optimum time of collection process depends on the industry. As an illustration, in nancial services 99 percent of fullypaying debtors paid fully before 48 months and 90% of them paid fully before 28 months; meanwhile, 99% of nonfullypaying debtors did not pay more after 39 months and 90% of them did not pay more after 26 months. In contrast, 99% of fullypaying debtors paid fully before 11 months and 90% of them paid fully before 4 months in the miscellaneous category. Additionally, 90% of nonfullypaying debtors did not pay more after 6 months and 99% of them did not pay more after 12 months in the miscellaneous category. In other words, the time for calculation of nal recovery rate and the reasonable collection process time are dierent in dierent categories. We also analyze the recovery rate distributions for the horizons of 12, 24 and 36 months. 9 Our conclusion is that, for all nine industries, the variation in the respective distributions is minimal with mass slightly shifting from RR = 0 to RR = 1 since more debtors payo debts as time progresses. After one year, representing most of the receivables, the frequency of RR = 0 is slightly over 60% while, after three years, this frequency decreases minimally to just below 60%. So, either the rst year successfully predicted the recovery rate or three years are not long enough as a horizon, since we have censored data as payments are observed on debt that is much older 9 For each horizon, only receivables with an age greater than or equal to the horizon are included.
13 12 than the period considered by our scope. So far, our ndings appear to contrast with those of the bank loan data. TABLE 2 6. Recovery rates modelling The above mentioned literature, mostly on bank loan data, reports fairly high recovery rates of 60% or and more, on average. This may be due to two factors. First, the collection may have been retained by the banks; second, banks tend to have an advantage since they have insight into the borrower's nancial situation which lenders from other industries fail to acquire. This is argued by Fama (1985), for example. Since our data are from a nonbank thirdparty buyer, we expect rather low recovery rates. From Table 1, we see that this is justiable given that recovery rates are below 40% and even below 30% in many cases. In the next analysis, we consider the empirical distribution of the recovery rate across the nine dierent industries. It is apparent that nearly all probability mass is at RR = 0 and RR = 1. Across all nine industries, the majority, by far, of the recoveries are equal to 0. We hypothesize that this is the result of the relatively low average debt amounts (EAD) except for FS. After univariate exploratory analysis, we start to perform data analysis. We are applying a twostage model which rst classies debts to extreme and nonextreme recovery rate; we then classify the extreme debts to full payment and nonpayment. Moreover, the nonextreme recovery rate will be predicted. It is clear that we have classication problems in two steps, as the target variable is binary and the goals are to classify whether a defaulted debt will be extreme or nonextreme and to classify whether an extreme debt will be full payment or nonpayment. We also have a prediction problem in the nal step, as the response variable is continuous and the aim is to predict nonextreme recovery rates. As pointed out before, the complete datasets are applied, apart from the modelling step for mail order and public transport industries. We use only 100,000 randomly selected items from the mail order section and 500,000 observations from the public transport section. We delete outlier samples and impute the missing values in the data cleaning phase. After data cleaning, we divide the considered datasets into two sets randomly: training or learning dataset and validation dataset. The training datasets contain 70% of the observations and the validation datasets contain 30% of the debts in each industry. We used stratied sampling with equal sizes based on target variable for training and validation. The number of observations decreases because datasets are not balanced based on the response variable. We build each model on the training dataset and then these models are evaluated on the validation dataset to classify the debts. The misclassication rate is one of the usual criteria in classication model comparison.
14 6.1 Classifying debts as extreme and nonextreme 13 Data mining algorithms will be used for classication steps, such as neural network, CART, CHAID, Knearest neighbor, dmneural, logistic regression and Support Vector Machine. Thereafter, neural network, CART, CHAID, regression and Support Vector Machine will be applied as prediction models in the nal step. In each step, we will compare models using Rsquared, ROC curve, misclassication, average square error and sum square errors. However, the basic criteria are misclassication and Rsquared Classifying debts as extreme and nonextreme Models building and comparisons We come to the modelling of the recovery rate by means of the wellknown logistic regression model, i.e. the recovery rate RR is the nonlinear transform of the linear model including real and coded categorical numerical data. The target variables are extreme and nonextreme recovery where 0 indicates nonextreme recovery and 1 shows extreme recovery. It is clear that extreme recovery consists of full payment and nonpayment. As an initial step of selecting the individually most signicant debtorrelated variables, we perform the logistic regression for each individual variable alone and assess its ability through an R 2 measure. This yielded the following set of seven variables individually most signicant for predicting RR: the debt amount (debt outstanding at sale), debtor age, prop title, debt date until sello (time between default and purchase by third party), rating (classifying the creditworthiness into an ordinal rating with seven levels), address (the validity of the debtor's address), and debtor type (either male, female or corporate entity). We choose a logistic regression model so a modelling selection procedure is not applied. This yielded the linear regression model P (Y =1) log P (Y =0) = µ + α.amount + β.debtor age + γ.prop title + δ.debt date till selloff ρ 1.rating 1 + ρ 2.rating 2 + ρ 3.rating 3 + ρ 4.rating 4 + ρ 5.rating ρ 6.rating 6 + ρ 7.rating 7 + φ.addresok + θ 1.debtor type 1 + θ 2.debtor type 2 + ɛ Here, we use the dummy variables rating 1 through rating 6 as well as debtortype 1 and debtortype 2 for the categorical variables. Table 3 shows the maximum likelihood estimates of the logistic regression parameters in classifying debts to extreme and nonextreme corresponding to the nal model and the statistical signicance of the parameters. For the explanatory variables, when the pvalue is lower than 0.05, the null hypothesis is rejected. This means these explanatory variables have a statistically signicant inuence on the response variable. Now, we want to interpret the logistic regression model. In our model, when the debtor has extreme recovery rate Y=1; thus we can interpret that, for variables with negative coecients, the probability of extreme recovery rate decreases and, inversely, variables with positive coecients cause an increase in the probability of extreme recovery rate. For example, in nancial services, address ok, time between debt occurrence and sello to thirdparty company, and rating original equal to 5 have statistically signicant positive eects on the probability of extreme recovery
15 6.1 Classifying debts as extreme and nonextreme 14 rate. The probability of extreme recovery rate from debtors with address could be higher than the probability of extreme recovery rate from debtors without address. Debts with a longer time between default occurrence and sello to thirdparty company have a higher probability of extreme recovery rate. On the other hand, debts with 0, 2 and 4 rating original present a higher probability of nonextreme recovery rate than other debts. In summary, for nancial services, the variables that increase the probability of nonextreme recovery rate are: Do not have address Longer period between default and sello If we compare the table 5 we can see that the CHAID tree uses exactly the variables that have a statistically signicant inuence in logistic regression. Moreover, the CART tree uses the variables that are signicant in the logistic regression and debtor age. TABLE 3 CART and CHAID algorithms are used as classication methods. The chisquared and Gini are impurity measures for CHAID and CART. To obtain a parsimonious tree, we use a signicance level of 0.2 in the stopping rule. Table 4 presents the results from the CART classication tree analysis for nancial services. The total number of splitting variables in the CART classication tree for nancial services is 6: address, rating, debt amount, debt date until sello, prop title and debtor age. All the dependent variables are used in the CART, except debtor type. The splitting variables in the CHAID classication tree for nancial services are address, rating, debt amount, debt date until sello, prop title and debtor age. The CART tree is slightly more complicated than the CHAID tree for nancial services. As mentioned, we classify all the debts based on the majority of debts in each leaf. We can calculate the misclassication rate as a performance measure. In the next section, we will examine the misclassication rate of our models in training and testing datasets. At the beginning of the application of a neural network, we should specify the structure of the neural network. We used a neural network with one and two hidden layers consisting of between two and ve neurons. For example, we examined neural networks with 2, 3, 4 and 5 neurons for the datasets. In general, neural networks are black boxes and we cannot interpret them. An important step in using the Knearestneighbor is the width K specication. This establishes the size of the neighborhood of the independent variables that will be applied to classify the target variable. We checked the misclassication of KNN with dierent K. TABLE 4
16 6.2 Classifying extreme debts to full payment and nonpayment 15 As mentioned before, we divide all the datasets into two sets: training or learning datasets and validation datasets. The training datasets contain 70% of the observations and the validation data sets contain 30% of the debts in each industry. We use stratied sampling with equal sizes in target variable, which means that the number of extreme and nonextreme debts in both datasets, training and validation, are the same. We build each model on the training dataset and then evaluate these models on the validation data set to classify the debts. The misclassication rate is one of the usual criteria in model comparison. We should mention that we do not need to use crossvalidation because our data sets are large; consequently our results are stable. In Table 5, we show some criteria such as misclassication rate, sum square error in training and validation, and average square error in validation datasets in dierent industries. The misclassication rate in validation dataset is the principal criterion for model comparison in this study. The CHAID classication tree is the best model for classifying the debts to extreme and nonextreme recovery in public sector, nancial services and mail order. Based on the misclassication rate in the validation dataset, neural network is the best classi er for businesstobusiness, miscellaneous and return debit note. On the other hand, the CART classication tree is the best classication model for energy and utilities and public transport. The Support Vector Machine has the highest accuracy rate in the validation dataset for telecommunications. For example, Figure 1 shows the ROC curve for nancial industries. We know that, in a ROC curve, one model dominates another when the curve of one model is completely above the curve of another model. The CHAID classication tree has the highest accuracy rate between our models in dierent industries at 0.805%. TABLE 5 and FIG Classifying extreme debts to full payment and nonpayment Models building and comparisons We now come to the modeling of the recovery rate by logistic regression model. The response variables are full payment and nonpayment recovery where 0 indicates nonpayment recovery and 1 shows full payment. The independent variables are the debt amount, debtor age, prop title, debt date until sello, rating, address, and debtor type. This yielded the regression model P (Y =1) log P (Y =0) = µ + α.amount + β.debtor age + γ.prop title + δ.debt date till selloff ρ 1.rating 1 + ρ 2.rating 2 + ρ 3.rating 3 + ρ 4.rating 4 + ρ 5.rating ρ 6.rating 6 + ρ 7.rating 7 + φ.addresok + θ 1.debtor type 1 + θ 2.debtor type 2 + ɛ Here, we use the dummy variables rating 1 through rating 6 as well as debtortype 1 and debtortype 2 for the categorical variables. The maximum likelihood estimates of the logistic regression parameters in classifying extreme debt are shown in table 6.
17 6.2 Classifying extreme debts to full payment and nonpayment 16 Now, we want to interpret the logistic regression model. In our model, when the debtor has fully paid Y=1; thus we can interpret that, for variables with negative coecients, the probability of fullpayment decreases and, inversely, variables with positive coecients cause an increase in the probability of fullpayment. For example, in nancial services, address not ok, debt amount, debt date unit sello, and rating original with labels 5, 6 have a statistically signicant negative eect on the recovery rate of debtor; also, prop title and rating original with labels 0, 2 and 4 variables have positive signicant inuence on the recovery rate in nancial services. The probability of fullpayment by debtors with address could be higher than the probability of fullpayment by debtors without address. Similarly, debts with a longer period from default to sello have a lower probability of fullpayment. On the other hand, debts with 0 and 1 rating original present a higher probability of fullpayment than other debts. In summary, for the nancial services, the variables that increase the probability of nonpayment are: Do not have address Higher debt amount Longer period between default and sello Rating original= 05, 06 As with the telecommunications and nancial services categories, we can interpret logistic regression models for other categories. The address ok, debt amount, rating original and debt date until sello are statistically signicant in all industries. Also, debtor age and debtor type variables are signicant except in nancial services, and the prop title variable has a signicant inuence on nancial services, miscellaneous, return debit note, energy and utilities, telecommunications and mail order. We use CART, CHAID, SVM, neural network, KNN and dmneural as in the last step as classication methods. The misclassication rate in the validation data sets are applied for models comparison. Table 7 shows some criteria such as misclassication rate, sum square error in training and validation, and average square error in validation data set in dierent industries. The classication tree is the best model for classifying the debts with extreme recovery to fullpayment and nonpayment in return debit note, energy and utilities and telecommunication. Based on the misclassication rate in the validation data set, neural network is the best classier for businesstobusiness, nancial services and mail order. The Support Vector Machine has the highest accuracy rate in validation data sets for public sector, public transport and miscellaneous categories. TABLES 6,7
18 6.3 Prediction of nonextreme debts Prediction of nonextreme debts Models building and comparisons We now come to the prediction of the recovery rate using the regression model. The response variable is recovery rate of nonextreme debts. The independent variables are the same as in the previous steps: the debt amount (debt outstanding at sale), debtor age, prop title, debt date until sello (time between default and purchase by third party), rating (classifying the creditworthiness into an ordinal rating with seven levels), address (the validity of the debtor's address), and debtor type (either male, female, or corporate entity). Our regression model is: y = µ + α.amount + β.debtor age + γ.prop title + δ.debt date till selloff ρ 1.rating 1 + ρ 2.rating 2 + ρ 3.rating 3 + ρ 4.rating 4 + ρ 5.rating ρ 6.rating 6 + ρ 7.rating 7 + φ.addresok + θ 1.debtor type 1 + θ 2.debtor type 2 + ɛ We choose a regression model so the modelling selection procedure is not applied. Table 8 shows the maximum likelihood estimates of the regression parameters in predicting recovery rate of nonextreme debts. For the explanatory variables, where the pvalue is lower than 0.05, the null hypothesis is rejected. This means that these explanatory variables have a statistically signicant inuence on the response variable. Now, we want to interpret this model. The response variable in our model is recovery rate of debts; thus we can interpret that, for variables with negative coecients, the debt's recovery rate decreases and, inversely, variables with positive coecients cause an increase in the debt's recovery rate. For example, in businesstobusiness, address not ok, debt amount and debtor age variables decrease the recovery rate. On the other hand, women debtors, rating original equal to 0 and 1, prop title and time between default and sello to thirdparty company have a positive inuence on recovery rate. The interpretation of regression models for the other categories is the same as the businesstobusiness category. The debt amount, debt date until sello and rating variables are signicant in all industries, while the address variable is not signicant except in the miscellaneous category. The debtor type is statistically signicant in prediction of recovery rate except in mail order and public transport. Only in nancial services does the debtor age have no signicant inuence. We use CART and CHAID algorithms as prediction models. The Ftest statistic and variance reduction are impurity measures for CHAID and CART. On the other hand, SVM, regression and neural network are applied on all the data sets. As mentioned before, we divided nonextreme data sets into two sets: training and validation data sets. The training data sets contain 70% of the debts and the validation data sets contain 30% of the observations in each industry. We build each model on the learning data set and then
19 18 these models are evaluated on the validation data set for prediction of recovery rate value for nonextreme debts. The Rsquare and sum square error are applied as performance measures. Table 9 shows some criteria such as Rsquare, sum square error and average square error in dierent industries. The CHAID or CART classication tree are the best prediction models for predicting recovery rate of nonextreme debts in public sector, nancial services, businesstobusiness, mail order, telecommunications and public transport. On the other hand, neural network is the best model for the return debit note and mail order categories, and SVM is the best predictor in the miscellaneous category. TABLES 8,9 7. Conclusions We analyzed the recovery rates of 9,779,239 debts of a thirdparty company in this paper; these were classied into 9 categories: mail order, businesstobusiness, energy and utilities, nancial services, miscellaneous, public sector, return debit note, telecommunications, and public transport. The lossgivendefault in all categories is not the same while the range of mean recovery rate is between and The mail order category has the highest mean of recovery rate and the mean recovery rate of public sector is the lowest. The nancial services category has the highest mean of debt at and the lowest mean of recovery rate at According to Table 1, the category with a lower debt date until sello has a higher mean of recovery rate. As shown in Table 2, the optimum times of collection process are not the same in all the industries. For instance, one year is denitely enough collection process time for miscellaneous but it is not long enough in the nancial services category. Neural network, CART, CHAID, Support Vector Machine, Knearest neighbor, dmneural and logistic regression were applied in classifying debts to extreme and nonextreme recovery rate. These techniques were used in all the datasets. The decision trees algorithms are the best models in ve industries and neural network has the best performance in two industries. The Support Vector Machine is the best classier in the telecommunications category. The important advantage of the classication tree is the interpretability of this model. Conversely, neural network and SVM are black box methods. We applied neural network, CART, CHAID, Support Vector Machine, Knearest neighbor, dmneural and logistic regression in classifying extreme debts to full payment and nonpayment. The decision trees algorithms are the best models in three industries and neural network has the best performance in three industries. The Support Vector Machine is the best classier in the public sector, public transport and miscellaneous categories. Neural network, CART, CHAID, Support Vector Machine and regression were used as prediction models of debts with nonextreme recovery rate. The neural network has the best performance
20 19 in two industries and Support Vector Machine is the best predictor for the miscellaneous category. Decision trees have the best results in the other data sets. In summary, the nonstatistical methods produce better results than the statistical methods. References [1] Altman, E. I., A. Resti, and A. Sironi (2005). Recovery risk: The next challenge in credit risk management, Chapter Loss given default; a review of the literature in recovery risk, pp Risk Books, London. [2] Asarnow, E. and D. Edwards (1995). Measuring loss on defaulted bank loans. A 24year study. Journal of Commercial Lending Vol. 77, No. 7. [3] Avery, R. B., P. U. Calem, and G. B. Canner (2004). Consumer credit scoring: Do situational circumstances matter. Journal of Banking and Finance 28, [4] Bastos, J. (2010a). Forecasting bank loans lossgiven default. Jourrnal of Banking and Finance Vol. 34(10), [5] Bastos, J. (2010b). Predicting bank loan recovery rates with neural networks. Technical report, Working Paper. [6] BCBS (2005). International convergence of capital measurement and capital standards. a revised framework, bank for international settlements. [7] Bellotti, T. and J. Crook (2008). Modelling and predicting loss given default for credit cards. Technical report, Working Paper. [8] Belotti, T. (2011 (forthcoming)). Loss given default models incorporating macroeconomic variables for credit cards. International Journal of Forecasting. [9] Board of Governors of the Federal Reserve System,. (2011). Federal reserve statistical release. [10] Calabrese, R. (2010). Regression for recovery rates with both continuous and discrete characteristics, proccedings of the 45th scientic meeting of the italian statitistical society (sis), italy. [11] Chen, T. H. and C. W. Chen (2010). Application of data mining to the spatial heterogeneity of foreclosed mortgages. Expert Systems with Application Vol. 37(2), [12] Crook, J., D. Edelman, and L. Thomas (2007). Recent developments in consumer credit risk assessment. European Journal of Operations Research Vol. 183(3),
Introduction to Data Mining and Knowledge Discovery
Introduction to Data Mining and Knowledge Discovery Third Edition by Two Crows Corporation RELATED READINGS Data Mining 99: Technology Report, Two Crows Corporation, 1999 M. Berry and G. Linoff, Data Mining
More informationUsing Data Mining to Predict Automobile Insurance Fraud
UNIVERSIDADE CATÓLICA PORTUGUESA CATÓLICA LISBON SCHOOL OF BUSINESS AND ECONOMICS Master of Science in Business Administration Using Data Mining to Predict Automobile Insurance Fraud JOÃO BERNARDO DO VALE
More informationCorporate Finance Instrument: SingleFactoring and Sales of Defaulted Accounts (at the example of car dealers and repair service in Germany)
Saimaa University of Applied Sciences Faculty of Business Administration, Lappeenranta Degree Programme in International Business Michael Farrenkopf Corporate Finance Instrument: SingleFactoring and Sales
More information1 The Data Revolution and Economic Analysis
1 The Data Revolution and Economic Analysis Liran Einav, Stanford University and NBER Jonathan Levin, Stanford University and NBER Executive Summary Many believe that big data will transform business,
More informationStrategies for detecting fraudulent claims in the automobile insurance industry
European Journal of Operational Research 176 (2007) 565 583 O.R. Applications Strategies for detecting fraudulent claims in the automobile insurance industry Stijn Viaene a, Mercedes Ayuso b, Montserrat
More informationCollateral, Type of Lender and Relationship Banking as Determinants of Credit Risk
Collateral, Type of Lender and Relationship Banking as Determinants of Credit Risk Gabriel Jiménez Jesús Saurina Bank of Spain. DirectorateGeneral of Banking Regulation May 2003 Abstract This paper analyses
More informationOnline publication date: 15 September 2010
This article was downloaded by: [Atiya, Amir] On: 15 September 2010 Access details: Access Details: [subscription number 926965465] Publisher Taylor & Francis Informa Ltd Registered in England and Wales
More informationMaxPlanckInstitut für biologische Kybernetik Arbeitsgruppe Bülthoff
MaxPlanckInstitut für biologische Kybernetik Arbeitsgruppe Bülthoff Spemannstraße 38 7276 Tübingen Germany Technical Report No. 44 December 996 Nonlinear Component Analysis as a Kernel Eigenvalue Problem
More informationWill There Be Blood? Incentives and Substitution Effects in Prosocial Behavior
DISCUSSION PAPER SERIES IZA DP No. 4567 Will There Be Blood? Incentives and Substitution Effects in Prosocial Behavior Nicola Lacetera Mario Macis Robert Slonim November 2009 Forschungsinstitut zur Zukunft
More informationUsing Focal Point Learning to Improve HumanMachine Tacit Coordination
Using Focal Point Learning to Improve HumanMachine Tacit Coordination Inon Zuckerman 1, Sarit Kraus 1, Jeffrey S. Rosenschein 2 1 Department of Computer Science BarIlan University RamatGan, Israel {zukermi,
More informationDoes Distance Still Matter? The Information Revolution in Small Business Lending
March 2001 Does Distance Still Matter? The Information Revolution in Small Business Lending Mitchell A. Petersen Kellogg Graduate School of Management Northwestern University and Raghuram G. Rajan Graduate
More informationAN INTRODUCTION TO PREMIUM TREND
AN INTRODUCTION TO PREMIUM TREND Burt D. Jones * February, 2002 Acknowledgement I would like to acknowledge the valuable assistance of Catherine Taylor, who was instrumental in the development of this
More informationAn Overview of Consumer Data and Credit Reporting.
An Overview of Consumer Data and Credit Reporting. Robert B. Avery, Paul S. Calem, and Glenn B. Canner, of the Board's Division of Research and Statistics, and Raphael W. Bostic, of the University of Southern
More informationBank Liquidity Risk Management and Supervision: Which Lessons from Recent Market Turmoil?
Journal of Money, Investment and Banking ISSN 1450288X Issue 10 (2009) EuroJournals Publishing, Inc. 2009 http://www.eurojournals.com/jmib.htm Bank Liquidity Risk Management and Supervision: Which Lessons
More informationAn Introduction to Variable and Feature Selection
Journal of Machine Learning Research 3 (23) 11571182 Submitted 11/2; Published 3/3 An Introduction to Variable and Feature Selection Isabelle Guyon Clopinet 955 Creston Road Berkeley, CA 9478151, USA
More informationHousehold OverIndebtedness
Household OverIndebtedness Definition and Measurement with Italian Data Giovanni D Alessio* and Stefano Iezzi Abstract The last decade has seen significant increases in consumer indebtedness in western
More informationChecking Accounts and Bank Monitoring
Financial Institutions Center Checking Accounts and Bank Monitoring by Loretta J. Mester Leonard I. Nakamura Micheline Renault 9902C The Wharton Financial Institutions Center The Wharton Financial Institutions
More informationMINING DATA STREAMS WITH CONCEPT DRIFT
Poznan University of Technology Faculty of Computing Science and Management Institute of Computing Science Master s thesis MINING DATA STREAMS WITH CONCEPT DRIFT Dariusz Brzeziński Supervisor Jerzy Stefanowski,
More informationGenerating Guaranteed Income: Understanding Income Annuities
Generating Guaranteed Income: Understanding Income Annuities Vanguard Investment Counseling & Research Executive summary. Income annuities are a form of insurance intended to address the uncertainty investors
More informationoption: Evidence from a bankbased economy
Accounts receivable management and the factoring option: Evidence from a bankbased economy Thomas HartmannWendels Alwin Stöter Abstract: We analyze a firm s decision of whether to manage trade credits
More informationWhere the Bugs Are. Thomas J. Ostrand AT&T Labs  Research 180 Park Avenue Florham Park, NJ 07932 ostrand@research.att.com. Elaine J.
Where the Bugs Are Thomas J. Ostrand AT&T Labs  Research 180 Park Avenue Florham Park, NJ 07932 ostrand@research.att.com Elaine J. Weyuker AT&T Labs  Research 180 Park Avenue Florham Park, NJ 07932 weyuker@research.att.com
More informationMaking Small Business Lending Profitablessss. Proceedings from the Global Conference on Credit Scoring April 2 3, 2001 Washington, D.C.
Making Small Business Lending Profitablessss Proceedings from the Global Conference on Credit Scoring April 2 3, 2001 Washington, D.C. GLOBAL FINANCIAL MARKETS GROUP, IFC FINANCIAL SECTOR VICE PRESIDENCY,
More informationAddressing Cold Start in Recommender Systems: A Semisupervised Cotraining Algorithm
Addressing Cold Start in Recommender Systems: A Semisupervised Cotraining Algorithm Mi Zhang,2 Jie Tang 3 Xuchen Zhang,2 Xiangyang Xue,2 School of Computer Science, Fudan University 2 Shanghai Key Laboratory
More informationChoosing Multiple Parameters for Support Vector Machines
Machine Learning, 46, 131 159, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Choosing Multiple Parameters for Support Vector Machines OLIVIER CHAPELLE LIP6, Paris, France olivier.chapelle@lip6.fr
More informationThe significance of insolvency statistics and the regression analysis thereof the example of the Czech Republic
The significance of insolvency statistics and the regression analysis thereof the example of the Czech Republic LUBOŠ SMRČKA 1, JAROSLAV SCHÖNFELD 1, MARKÉTA ARLTOVÁ 2 and JAN PLAČEK 1 1 Department of
More informationI Want to, But I Also Need to : StartUps Resulting from Opportunity and Necessity
DISCUSSION PAPER SERIES IZA DP No. 4661 I Want to, But I Also Need to : StartUps Resulting from Opportunity and Necessity Marco Caliendo Alexander S. Kritikos December 2009 Forschungsinstitut zur Zukunft
More informationDo Class Size Effects Differ Across Grades?
[Preliminary] Do Class Size Effects Differ Across Grades? Anne Brink Nandrup Department of Economics and Business, Aarhus University, annebn@asb.dk This version: January, 2014 Abstract Class size eects
More informationTop 10 algorithms in data mining
Knowl Inf Syst (2008) 14:1 37 DOI 10.1007/s1011500701142 SURVEY PAPER Top 10 algorithms in data mining Xindong Wu Vipin Kumar J. Ross Quinlan Joydeep Ghosh Qiang Yang Hiroshi Motoda Geoffrey J. McLachlan
More informationSteering User Behavior with Badges
Steering User Behavior with Badges Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell University Cornell University Stanford University ashton@cs.stanford.edu {dph,
More informationIs there a difference between solicited and unsolicited bank ratings and if so, why?
Working paper research n 79 February 2006 Is there a difference between solicited and unsolicited bank ratings and if so, why? Patrick Van Roy Editorial Director Jan Smets, Member of the Board of Directors
More information