Discussion Papers. Support Vector Machines (SVM) as a Technique for Solvency Analysis. Laura Auria Rouslan A. Moro. Berlin, August 2008

Similar documents

An Alternative Way to Measure Private Equity Performance

Support Vector Machines

Forecasting the Direction and Strength of Stock Market Movement

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Statistical Methods to Develop Rating Models

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

How To Evaluate A Dia Fund Suffcency

L10: Linear discriminants analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Single and multiple stage classifiers implementing logistic discrimination

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Method for assessment of companies' credit rating (AJPES S.BON model) Short description of the methodology

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Lecture 5,6 Linear Methods for Classification. Summary

Logistic Regression. Steve Kroon

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

What is Candidate Sampling

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

DEFINING %COMPLETE IN MICROSOFT PROJECT

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

The OC Curve of Attribute Acceptance Plans

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Performance Analysis and Coding Strategy of ECOC SVMs

SVM Tutorial: Classification, Regression, and Ranking

Calculation of Sampling Weights

Credit Limit Optimization (CLO) for Credit Cards

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

How To Calculate The Accountng Perod Of Nequalty

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Reporting Forms ARF 113.0A, ARF 113.0B, ARF 113.0C and ARF 113.0D FIRB Corporate (including SME Corporate), Sovereign and Bank Instruction Guide

Multiple-Period Attribution: Residuals and Compounding

Analysis of Premium Liabilities for Australian Lines of Business

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Support vector domain description

Capital efficiency and market value in knowledge and capitalintensive firms: an empirical study

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques

Calculating the high frequency transmission line parameters of power cables

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Intra-year Cash Flow Patterns: A Simple Solution for an Unnecessary Appraisal Error

Lecture 2: Single Layer Perceptrons Kevin Swingler

7.5. Present Value of an Annuity. Investigate

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Forecasting and Stress Testing Credit Card Default using Dynamic Models

BERNSTEIN POLYNOMIALS

Prediction of Disability Frequencies in Life Insurance

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Prediction of Stock Market Index Movement by Ten Data Mining Techniques

Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Efficient Project Portfolio as a tool for Enterprise Risk Management

Gender Classification for Real-Time Audience Analysis System

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Financial market forecasting using a two-step kernel learning method for the support vector regression

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

1. Measuring association using correlation and regression

Copulas. Modeling dependencies in Financial Risk Management. BMI Master Thesis

Solution: Let i = 10% and d = 5%. By definition, the respective forces of interest on funds A and B are. i 1 + it. S A (t) = d (1 dt) 2 1. = d 1 dt.

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

Economic Interpretation of Regression. Theory and Applications

The Greedy Method. Introduction. 0/1 Knapsack Problem

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

Traffic-light a stress test for life insurance provisions

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Traffic State Estimation in the Traffic Management Center of Berlin

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Section 5.4 Annuities, Present Value, and Amortization

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

CHAPTER 14 MORE ABOUT REGRESSION

SIMPLE LINEAR CORRELATION

Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank.

The Application of Fractional Brownian Motion in Option Pricing

Chapter 7: Answers to Questions and Problems

Fixed income risk attribution

ERP Software Selection Using The Rough Set And TPOSIS Methods

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

IMPACT ANALYSIS OF A CELLULAR PHONE

Simple Interest Loans (Section 5.1) :

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

New Approaches to Support Vector Ordinal Regression

An Interest-Oriented Network Evolution Mechanism for Online Communities

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Stress test for measuring insurance risks in non-life insurance

Transcription:

Deutsches Insttut für Wrtschaftsforschung www.dw.de Dscusson Papers 8 Laura Aura Rouslan A. Moro Support Vector Machnes (SVM) as a Technque for Solvency Analyss Berln, August 2008

Opnons expressed n ths paper are those of the author and do not necessarly reflect vews of the nsttute. IMPRESSUM DIW Berln, 2008 DIW Berln German Insttute for Economc Research Mohrenstr. 58 07 Berln Tel. +49 (30) 897 89-0 Fax +49 (30) 897 89-200 http://www.dw.de ISSN prnt edton 433-020 ISSN electronc edton 69-4535 Avalable for free downloadng from the DIW Berln webste. Dscusson Papers of DIW Berln are ndexed n RePEc and SSRN. Papers can be downloaded free of charge from the followng webstes: http://www.dw.de/englsh/products/publcatons/dscusson_papers/27539.html http://deas.repec.org/s/dw/dwwpp.html http://papers.ssrn.com/sol3/jeljour_results.cfm?form_name=ournalbrowse&ournal_d=07999

Support Vector Machnes (SVM) as a Technque for Solvency Analyss by Laura Aura and Rouslan A. Moro 2 Abstract Ths paper ntroduces a statstcal technque, Support Vector Machnes (SVM), whch s consdered by the Deutsche Bundesbank as an alternatve for company ratng. A specal attenton s pad to the features of the SVM whch provde a hgher accuracy of company classfcaton nto solvent and nsolvent. The advantages and dsadvantages of the method are dscussed. The comparson of the SVM wth more tradtonal approaches such as logstc regresson (Logt) and dscrmnant analyss (DA) s made on the Deutsche Bundesbank data of annual ncome statements and balance sheets of German companes. The out-of-sample accuracy tests confrm that the SVM outperforms both DA and Logt on bootstrapped samples. Keywords: company ratng, bankruptcy analyss, support vector machnes JEL Classfcaton: C3, G33, C45 Acknowledgements: the work of R. Moro was supported by Deutsche Bank and ts foundaton Geld und Währung. Addtonally R. Moro acknowledges the support of the Deutsche Forschungsgemenschaft through the SFB 649 Economc Rsk. All analyss was done on the premses of Deutsche Bank n Hannover and Frankfurt.. Introducton There s a plenty of statstcal technques, whch am at solvng bnary classfcaton tasks such as the assessment of the credt standng of enterprses. The most popular technques nclude tradtonal statstcal methods lke lnear Dscrmnant Analyss (DA) and Logt or Probt Models and non-parametrc statstcal models lke Neural Networks. SVMs are a new promsng non-lnear, non-parametrc classfcaton technque, whch already showed good results n the medcal dagnostcs, optcal character recognton, electrc load forecastng and other felds. Appled to solvency analyss, the common obectve of all these clas- Deutsche Bundesbank, Georgplatz 5, 3059 Hannover. 2 German Insttute for Economc Research, Mohrenstr. 58, 07 Berln.

sfcaton technques s to develop a functon, whch can accurately separate the space of solvent and nsolvent companes, by benchmarkng ther score value. The score reduces the nformaton contaned n the balance sheet of a company to a one-dmensonal summary ndcator, whch s a functon of some predctors, usually fnancal ratos. Another am of solvency analyss s to match the dfferent score values wth the related probablty of default (PD) wthn a certan perod. Ths aspect s especally mportant n the Eurosystem, when credt scorng s performed wth the target of classfyng the elgblty of company credt labltes as a collateral for central bank refnancng operatons, snce the concept of elgblty s related to a benchmark value n terms of the annual PD. The selecton of a classfcaton technque for credt scorng s a challengng problem, because an approprate choce gven the avalable data can sgnfcantly help mprovng the accuracy n credt scorng practce. On the other hand, ths decson should not be seen as an ether / or choce, snce dfferent classfcaton technques can be ntegrated, thus enhancng the performance of a whole credt scorng system. In the followng paper SVMs are presented as a possble classfcaton technque for credt scorng. After a revew of the bascs of SVMs and of ther advantages and dsadvantages on a theoretcal bass, the emprcal results of an SVM model for credt scorng are presented. 2. Bascs of SVMs SVMs are a new technque sutable for bnary classfcaton tasks, whch s related to and contans elements of non-parametrc appled statstcs, neural networks and machne learnng. Lke classcal technques, SVMs also classfy a company as solvent or nsolvent accordng to ts score value, whch s a functon of selected fnancal ratos. But ths functon s nether lnear nor parametrc. The formal bascs of SVMs wll be subsequently brefly explaned. The case of a lnear SVM, where the score functon s stll lnear and parametrc, wll frst be ntroduced, n order to clarfy the concept of margn maxmsaton n a smplfed context. Afterwards the SVM wll be made non-lnear and non-parametrc by ntroducng a kernel. As explaned further, t s ths characterstc that makes SVMs a useful tool for credt scorng, n the case the dstrbuton assumptons about avalable nput data can not be made or ther relaton to the PD s non-monotone. Margn Maxmzaton Assume, there s a new company, whch has to be classfed as solvent or nsolvent accordng to the SVM score. In the case of a lnear SVM the score looks lke a DA or Logt score, whch s a lnear combnaton of relevant fnancal ratos x = (x, x 2, x d ), where x s a vector wth d fnancal ratos and x k s the value of the fnancal rato number k for company, k=,,d. So z, the score of company, can be expressed as: z = w x + w2x 2 +... + wd x d + b. () 2

In a compact form: T z = x w + b where w s a vector whch contans the weghts of the d fnancal ratos and b s a constant. The comparson of the score wth a benchmark value (whch s equal to zero for a balanced sample) delvers the forecast of the class solvent or nsolvent for company. ( ) In order to be able to use ths decson rule for the classfcaton of company, the SVM has to learn the values of the score parameters w and b on a tranng sample. Assume ths conssts of a set of n companes =, 2,,n. From a geometrc pont of vew, calculatng the value of the parameters w and b means lookng for a hyperplane that best separates solvent from nsolvent companes accordng to some crteron. The crteron used by SVMs s based on margn maxmzaton between the two data classes of solvent and nsolvent companes. The margn s the dstance between the hyperplanes boundng each class, where n the hypothetcal perfectly separable case no observaton may le. By maxmsng the margn, we search for the classfcaton functon that can most safely separate the classes of solvent and nsolvent companes. The graph below represents a bnary space wth two nput varables. Here crosses represent the solvent companes of the tranng sample and crcles the nsolvent ones. The threshold separatng solvent and nsolvent companes s the lne n the mddle between the two margn boundares, whch are canoncally represented as x T w+b= and x T w+b=-. Then the margn s 2 / w, where w s the norm of the vector w. In a non-perfectly separable case the margn s soft. Ths means that n-sample classfcaton errors occur and also have to be mnmzed. Let ξ be a non-negatve slack varable for n-sample msclassfcatons. In most cases ξ =0, that means companes are beng correctly classfed. In the case of a postve ξ the company of the tranng sample s beng msclassfed. A further crteron used by SVMs for calculatng w and b s that all msclassfcatons of the tranng sample have to be mnmzed. Let y be an ndcator of the state of the company, where n the case of solvency y =- and n the case of nsolvency y =. By mposng the constrant that no observaton may le wthn the margn except some classfcaton errors, SVMs requre that ether x T w+b -ξ or x T w+b -+ξ, whch can be summarzed wth: y T ( x w b),,..., n. + ξ (3) = 3

Fgure. Geometrcal Representaton of the SVM Margn Source: W. Härdle, R.A. Moro, D. Schäfer, March 2004, Ratng Companes wth Support Vector Machnes, Dscusson Paper Nr. 46, DIW Berln. The optmzaton problem for the calculaton of w and b can thus be expressed by: n 2 mn w w + C ξ (2) 2 = T ( x w b), s. t. y ξ + (3) ξ 0 (4) In the frst part of (2) we maxmse the margn 2 / w by mnmzng w 2 / 2, where the square n the norm of w comes from the second term, whch orgnally s the sum of n-sample msclassfcaton errors ξ / w tmes the parameter C. Thus SVMs maxmze the margn wdth whle mnmzng errors. Ths problem s quadratc.e. convex. C = capacty s a tunng parameter, whch weghts n-sample classfcaton errors and thus controls the generalsaton ablty of an SVM. The hgher s C, the hgher s the weght gven to n-sample msclassfcatons, the lower s the generalzaton of the machne. Low generalsaton means that the machne may work well on the tranng set but would perform mserably on a new sample. Bad generalsaton may be a result of overfttng on the tranng sample, for example, n the case that ths sample shows some untypcal and non-repeatng data structure. By choosng a low C, the rsk of overfttng an SVM on the tranng sample s reduced. It can be demonstrated that C s lnked to the wdth of the margn. The smaller s C, the wder s the margn, the more and larger n-sample classfcaton errors are permtted. Solvng the above mentoned constraned optmzaton problem of calbratng an SVM means searchng for the mnmum of the followng Lagrange functon: 4

n T { y ( w x + b) + ξ} ν ξ, n n T L( w, b, ξ ; α, v) = w w + C ξ α (5) 2 = = = where α 0 are the Lagrange multplers for the nequalty constrant (3) and ν 0 are the Lagrange multplers for the condton (4). Ths s a convex optmzaton problem wth nequalty constrants, whch s solved my means of classcal non-lnear programmng tools and the applcaton of the Kuhn-Tucker Suffcency Theorem. The soluton of ths optmsaton problem s gven by the saddle-pont of the Lagrangan, mnmsed wth respect to w, b, and ξ and maxmsed wth respect to α and ν. The entre task can be reduced to a convex quadratc programmng problem n α. Thus, by calculatng α, we solve our classfer constructon problem and are able to calculate the parameters of the lnear SVM model accordng to the followng formulas: n w = y x b = 2 α (6) T T ( x + x ) w = + (7) As can be seen from (6), α, whch must be non-negatve, weghs dfferent companes of the tranng sample. The companes, whose α are not equal to zero, are called support vectors and are the relevant ones for the calculaton of w. Support vectors le on the margn boundares or, for non-perfectly separable data, wthn the margn. By ths way, the complexty of calculatons does not depend on the dmenson of the nput space but on the number of support vectors. Here x + and x - are any two support vectors belongng to dfferent classes, whch le on the margn boundares. By substtutng (6) nto the score ( ), we obtan the score z as a functon of the scalar product of the fnancal ratos of the company to be classfed and the fnancal ratos of the support vectors n the tranng sample, of α, and of y. By comparng z wth a benchmark value, we are able to estmate f a company has to be classfed as solvent or nsolvent. z n = + = y α x, x b (8) Kernel-transformaton In the case of a non-lnear SVM, the score of a company s computed by substtutng the scalar product of the fnancal ratos wth a kernel functon. 5

z = n = n y α x, x + b z = α y K( x, x ) + b, (8 ) = Kernels are symmetrc, sem-postve defnte functons satsfyng the Mercer theorem. If ths theorem s satsfed, ths ensures that there exsts a (possbly) non-lnear map Φ from the nput space nto some feature space, such that ts nner product equals the kernel. The non-lnear transformaton Φ s only mplctly defned through the use of a kernel, snce t only appears as an nner product. K( x, x ) = Φ( x ), Φ( x ). (9) Ths explans how non-lnear SVMs solve the classfcaton problem: the nput space s transformed by Φ nto a feature space of a hgher dmenson, where t s easer to fnd a separatng hyperplane. Thus the kernel can sde-step the problem that data are non-lnearly separable by mplctly mappng them nto a feature space, n whch the lnear threshold can be used. Usng a kernel s equvalent to solvng a lnear SVM n some new hgher-dmensonal feature space. The non-lnear SVM score s thus a lnear combnaton, but wth new varables, whch are derved through a kernel transformaton of the pror fnancal ratos. The score functon does not have a compact functonal form, dependng on the fnancal ratos but on some transformaton of them, whch we do not know, snce t s only mplctly defned. It can be shown that the soluton of the constraned optmsaton problem for non-lnear SVM s gven by: n w = y Φ( x ) b = α (6 ) n n y K( x, x+ ) + α yk( x, x ) 2 = = = α (7 ) But, accordng to (7 ) and (8 ), we do not need to know the form of the functon Φ, n order to be able to calculate the score. Snce for the calculaton of the score (8) the nput varables are used as a product, only the kernel functon s needed n (8 ). As a consequence, Φ and w are not requred for the soluton of a non-lnear SVM. One can choose among many types of kernel functons. In practce, many SVM models work wth statonary Gaussan kernels wth an ansotropc radal bass. The reason why s that they are very flexble and can buld fast all possble relatons between the fnancal ratos. For example lnear transformatons are a specal case of Gaussan kernels. K( x, x ) = e ( x T x ) r 2 Σ ( x x ) / 2 (0) Here Σ s the varance-covarance matrx of all fnancal ratos of the tranng set. Ths kernel frst transforms the ansotropc data to the same scale for all varables. Ths s the meanng of sotropc. So 6

there s no rsk that fnancal ratos wth greater numerc ranges domnate those wth smaller ranges. The only parameter whch has to be chosen when usng Gaussan kernels s r, whch controls the radal bass of the kernel. Ths reduces the complexty of model selecton. The hgher s r, the smoother s the threshold whch separates solvent from nsolvent companes. 3 Gaussan kernels non-lnearly map the data space nto a hgher dmensonal space. Actually the defnton of a Gaussan process by specfyng the covarance functon (dependng on the dstance of the company to be evaluated from each company of the tranng sample) avods explct defnton of the functon class of the transformaton. There are many possble decompostons of ths covarance and thus also many possble transformaton functons of the nput fnancal ratos. Moreover each company shows ts own covarance functon, dependng on ts relatve poston wthn the tranng sample. That s why the kernel operates locally. The value of the kernel functon depends on the dstance between the fnancal ratos of the company to be classfed and respectvely one company of the tranng sample. Ths kernel s a normal densty functon up to a constant multpler. x s the center of ths kernel, lke the mean s the center of a normal densty functon. 3. What Is the Pont n Usng SVMs as a Classfcaton Technque? All classfcaton technques have advantages and dsadvantages, whch are more or less mportant accordng to the data whch are beng analysed, and thus have a relatve relevance. SVMs can be a useful tool for nsolvency analyss, n the case of non-regularty n the data, for example when the data are not regularly dstrbuted or have an unknown dstrbuton. It can help evaluate nformaton,.e. fnancal ratos whch should be transformed pror to enterng the score of classcal classfcaton technques. The advantages of the SVM technque can be summarsed as follows:. By ntroducng the kernel, SVMs gan flexblty n the choce of the form of the threshold separatng solvent from nsolvent companes, whch needs not be lnear and even needs not have the same functonal form for all data, snce ts functon s non-parametrc and operates locally. As a consequence they can work wth fnancal ratos, whch show a non-monotone relaton to the score and to the probablty of default, or whch are non-lnearly dependent, and ths wthout needng any specfc work on each non-monotone varable. 2. Snce the kernel mplctly contans a non-lnear transformaton, no assumptons about the functonal form of the transformaton, whch makes data lnearly separable, s necessary. The transformaton occurs mplctly on a robust theoretcal bass and human expertse udgement beforehand s not needed. 3. SVMs provde a good out-of-sample generalzaton, f the parameters C and r (n the case of a Gaussan kernel) are approprately chosen. Ths means that, by choosng an approprate generalzaton grade, SVMs can be robust, even when the tranng sample has some bas. 3 By choosng dfferent r values for dfferent nput values, t s possble to rescale outlers. 7

4. SVMs delver a unque soluton, snce the optmalty problem s convex. Ths s an advantage compared to Neural Networks, whch have multple solutons assocated wth local mnma and for ths reason may not be robust over dfferent samples. 5. Wth the choce of an approprate kernel, such as the Gaussan kernel, one can put more stress on the smlarty between companes, because the more smlar the fnancal structure of two companes s, the hgher s the value of the kernel. Thus when classfyng a new company, the values of ts fnancal ratos are compared wth the ones of the support vectors of the tranng sample whch are more smlar to ths new company. Ths company s then classfed accordng to wth whch group t has the greatest smlarty. Here are some examples where the SVM can help copng wth non-lnearty and non-monotoncty. One case s, when the coeffcents of some fnancal ratos n equaton (), estmated wth a lnear parametrc model, show a sgn that does not correspond to the expected one accordng to theoretcal economc reasonng. The reason for that may be that these fnancal ratos have a non-monotone relaton to the PD and to the score. The unexpected sgn of the coeffcents depends on the fact, that data domnate or cover the part of the range, where the relaton to the PD has the opposte sgn. One of these fnancal ratos s typcally the growth rate of a company, as ponted out by [0]. Also leverage may show non-monotoncty, snce f a company prmary works wth ts own captal, t may not explot all ts external fnancng opportuntes properly. Another example may be the sze of a company: small companes are expected to be more fnancally nstable; but f a company has grown too fast or f t has become too statc because of ts dmenson, the bg sze may become a dsadvantage. Because of these characterstcs, the above mentoned fnancal ratos are often sorted out, when selectng the rsk assessment model accordng to a lnear classfcaton technque. Alternatvely an approprate evaluaton of ths nformaton n lnear technques requres a transformaton of the nput varables, n order to make them monotone and lnearly separable. 4 A common dsadvantage of non-parametrc technques such as SVMs s the lack of transparency of results. SVMs cannot represent the score of all companes as a smple parametrc functon of the fnancal ratos, snce ts dmenson may be very hgh. It s nether a lnear combnaton of sngle fnancal ratos nor has t another smple functonal form. The weghts of the fnancal ratos are not constant. Thus the margnal contrbuton of each fnancal rato to the score s varable. Usng a Gaussan kernel each company has ts own weghts accordng to the dfference between the value of ther own fnancal ratos and those of the support vectors of the tranng data sample. Interpretaton of results s however possble and can rely on graphcal vsualzaton, as well as on a local lnear approxmaton of the score. The SVM threshold can be represented wthn a b-dmensonal graph for each par of fnancal ratos. Ths vsualzaton technque cuts and proects the multdmensonal feature space as well as the multvarate threshold functon separatng solvent and nsolvent companes on a b-dmensonal one, by fxng the values of the other fnancal ratos equal to the values of the company, whch has to be classfed. By ths way, dfferent companes wll have dfferent threshold proectons. 4 See [6] for an analyss of the unvarate relaton between the PD and sngle fnancal ratos as well as for possble transformatons of nput fnancal ratos n order to reach lnearty. 8

However, an analyss of these graphs gves an mportant nput about the drecton towards whch the fnancal ratos of non-elgble companes should change, n order to reach elgblty. The PD can represent a thrd dmenson of the graph, by means of soquants and colour codng. The approach chosen for the estmaton of the PD can be based on emprcal estmates or on a theoretcal model. Snce the relaton between score and PD s monotone, a local lnearzaton of the PD can be calculated for sngle companes by estmatng the tangent curve to the soquant of the score. For sngle companes ths can offer nterestng nformaton about the factors nfluencng ther fnancal soldty. In the fgure below the PD s estmated by means of a Gaussan kernel 5 on data belongng to the trade sector and then smoothed and monotonzed by means of a Pool Adacent Volator algorthm. 6 The pnk curve represents the proecton of the SVM threshold on a bnary space wth the two varables K2 (net ncome change) and K24 (net nterest rato), whereas all other varables are fxed at the level of company. The blue curve represents the soquant for the PD of company, whose coordnates are marked by a trangle. Fgure 2. Graphcal Vsualzaton of the SVM Threshold and of a Local Lnearzaton of the Score Functon: Example of a Proecton on a B-dmensonal Graph wth PD Colour Codng 5 Ths methodology s based on a non-parametrc estmaton of the PD and has the advantage that t delvers an ndvdual PD for each company based on a contnuous, smooth and monotonc functon. Ths PD-functon s computed on an emprcal bass, so there s no need for a theoretcal assumpton about the form of a lnk functon. 6 See []. 9

The grey lne corresponds to the lnear approxmaton of the score or PD functon proecton for company. One nterestng result of ths graphcal analyss s that successful companes wth a low PD often le n a closed space. Ths mples that there exsts an optmal combnaton area for the fnancal ratos beng consdered, outsde of whch the PD gets hgher. If we consder the net ncome change, we notce that ts nfluence on the PD s non-monotone. Both too low or too hgh growth rates mply a hgher PD. Ths may ndcate the exstence of the optmal growth rate and suggest that above a certan rate a company may get nto trouble; especally f the cost structure of the company s not optmal.e. the net nterest rato s too hgh. But f a company les n the optmal growth zone, t can also afford a hgher net nterest rato. 4. An Emprcal SVM Model for Solvency Analyss In the followng chapter, an emprcal SVM model for solvency analyss on German data s beng presented. 7 The estmaton of score functons and ther valdaton are based on balance sheets of solvent and nsolvent companes, whereas a company s classfed as nsolvent f t s the subect of falure udcal proceedng. The study s conducted over a long perod, n order to construct durable scores that are resstant, as far as possble, to cyclcal fluctuatons. So the orgnal data set conssts of about 50.000 frm-year observatons, spannng the tme perod from 999 to 2005. The forecast horzon s three and a half years. That s, n each perod a company s consdered nsolvent, f t has been the subect of legal proceedngs wthn the three and a half years snce the observaton date. Solvent companes are those that have not gone bankrupt wthn three and a half years after the observaton date. Wth shorter term forecast horzons, such as one-year, data qualty would be poor, snce most companes do not fle a balance sheet, f they are on the pont of falure. Moreover, companes that go nsolvent already show weakness three years before falure. In order to mprove the accuracy of analyss, a dfferent model was developed for each of the followng three sectors: manufacturng, wholesale/retal trade and other companes. The three models for the dfferent sectors were traned on data over the tme perod 999-200 and then valdated out-of-tme on data over the tme perod 2002-2005. Two mportant ponts for the selecton of an accurate SVM model are the choce of the nput varables,.e. of the fnancal ratos, whch are beng consdered n the score, as well as of the tunng parameters C and r (once a Gaussan kernel has been chosen). Table. Tranng and Valdaton Data Set Sze Wthout Mssng Values sector year total 999 2000 200 2002 2003 2004 2005 solv. ns. manufacturng 605 5436 466 5202 5066 453 698 30899 692 wholesale / retal 2806 230 9209 8867 806 703 996 5720 07 trade other 6596 6234 5252 5807 5646 569 650 34643 7 7 The database belongs to the balance sheet pool of the Deutsche Bundesbank. 0

The choce of the nput varables has a decsve nfluence on the performance results and s not ndependent from the choce of the classfcaton technque. These varables normally have to comply wth the assumptons of the appled classfcaton technque. Snce the SVM needs no restrctons on the qualty of nput varables, t s free to choose them only accordng to the model accuracy performance. The nput varables selecton methodology appled n ths paper s based on the followng emprcal tools. The dscrmnatve power of the models s measured on the bass of ther accuracy rato (AR) and percentage of correctly classfed observatons, whch s a compact performance ndcator, complementary to ther error quotes. Snce there s no assumpton on the densty dstrbuton of the fnancal ratos, a robust comparson of these performance ndcators has to be constructed on the bass of bootstrappng. The dfferent SVM models are estmated 00 tmes on 00 randomly selected tranng samples, whch nclude all nsolvent companes of the data pool and the same number of randomly selected solvent ones. Afterwards they are valdated on 00 smlarly selected valdaton samples. The model, whch delvers the best medan results over all tranng and valdaton samples, s the one whch s chosen for the fnal calbraton. A smlar methodology s used for choosng the optmal capacty C and the kernel-radus r of the SVM model. That combnaton of C and r values s chosen, whch delvers the hghest medan AR on 00 randomly selected tranng and valdaton samples.

Fgure 3. Choce of the Fnancal Ratos of an SVM Model for the Manufacturng Sector: An Example for the Choce of the Ffth Input Varable 2

Our analyss frst started by estmatng the three SVM models on the bass of four fnancal ratos, whch are presently beng used by the Bundesbank for DA and whch are expected to comply wth ts assumptons on lnearty and monotoncty. By ntegratng the model wth further non-lnearly separable varables a sgnfcant performance mprovement n the SVM model was recorded. The new nput varables were chosen out of a catalogue, whch s summarzed n Table 3, on the bass of a bootstrappng procedure by means of forward selecton wth an SVM model. Varables were added to the model sequentally untl none of the remanng ones would mprove the medan AR of the model. Fgure 3 shows the AR dstrbutons of dfferent SVM models wth 5 varables. Accordng to these graphcal results one should choose K24 as the ffth varable. As a result of ths selecton procedure, the medan AR peaked wth ten nput varables (0FR) and then fell gradually. Table 2. Fnal Choce of the Input Varables Forward Selecton Procedure Sector Manufacturng Wholesale/Retal Trade Other K0: pre-tax proft margn K0: pre-tax proft margn K02: operatng proft margn K03: cash flow rato K04: captal recovery rato, K05: debt cover K06: days recevable K06: days recevable K06: days recevable K07: days payable K09: equty rato ad. K09: equty rato ad. K08: equty rato K: net ncome rato K7: lqudty 3 (current assets to short debt) K2:guarantee a.o. oblgaton rato (leverage ) K5: lqudty K8: short term debt rato K8: short term debt rato K8: short term debt rato K24: net nterest rato K2: net ncome change K9: nventores rato K24: net nterest rato K2: net ncome change K26: tangble asset growth K3: days of nventores K3: days of nventores KWKTA: workng captal to total assets KL: leverage KL: leverage A unvarate analyss of the relaton between the sngle varables and the PD showed that most of these varables actually have a non-monotone relaton to the PD, so that consderng them n a lnear score would requre the aforementoned transformaton. Especally growth varables as well as leverage and net nterest rato showed a typcal non-monotone behavour and were at the same tme very helpful n enhancng the predctve power of the SVM. Fgure 4 summarzes the predctve results of the three fnal models, accordng to the above mentoned bootstrap procedure. Based on the procedure outlned above, the followng values of the kernel tunng parameters were selected: r = 4 for the manufacturng and trade sector and r = 2.5 for other companes. Ths suggests that ths sector s less homogeneous than the other two. The capacty of the SVM model was chosen as C = 0 for all the three sectors. It s nterestng to notce, that the robustness of the results, measured by the spread of the ARs over dfferent samples, became lower, when the number of fnancal ratos beng consdered grew. So there s a trade-off between the accuracy of the model and ts robustness. 3

Table 3. The Catalogue of Fnancal Ratos Unvarate Summary Statstcs and Relaton to the PD 8 Varable Name Aspect Q 0.0 medan Q 0.99 IQR Relaton to the PD K0 Pre-tax proft (ncome) margn proftablty -57. 2.3 40. 6.5 - n.m. K02 Operatng proft margn proftablty -53 3.6 80.3 7.2 - K03 Cash flow rato (net ncome rato) lqudty -38. 5. 73.8 0 - K04 Captal recovery rato lqudty -29.4 9.6 85. 5 - K05 Debt cover (debt repayment capablty) lqudty -42 6 584 33 - K06 Days recevable (accounts recevable actvty 0 29 222 34 + n.m. collecton perod) K07 Days payable (accounts payable actvty 0 20 274 30 + n.m. collecton perod) K08 Equty (captal) rato fnancng -57 6.4 95.4 27.7 - K09 Equty rato ad. (own funds rato) fnancng -55.8 20.7 96.3 3. - K Net ncome rato proftablty -57. 2.3 33.3 6.4 +/- n.m. K2 guarantee a.o. oblgaton rato leverage 0 0 279.2 -/+ n.m. (leverage ) K3 Debt rato lqudty -57.5 2.4 89.6 8.8 -/+ n.m. K4 Lqudty rato lqudty 0.9 55.6 7.2 - K5 Lqudty lqudty 0 3.9 36.7 6.7 - K6 Lqudty 2 lqudty 63.2 200 65.8 - n.m. K7 Lqudty 3 lqudty 2.3 6. 400 74.9 - n.m. K8 Short term debt rato fnancng 0.2 44.3 98.4 40.4 + K9 Inventores rato nvestment 0 23.8 82.6 35.6 + K20 Fxed assets ownershp rato leverage -232. 46.6 58.4 73.2 -/+ n.m. K2 Net ncome change growth -60 33 7 -/+/- n.m. K22 Own funds yeld proftablty -43.3 22.4 578.6 55.2 +/- n.m. K23 Captal yeld proftablty -24.7 7. 6.8 0.2 - K24 Net nterest rato cost. structure - 50.9 + n.m. K25 Own funds/penson provson r. fnancng -56.6 20.3 96. 32.4 - K26 Tangble assets growth growth -0.2 3.9 00 23 -/+ n.m. K27 Own funds/provsons rato fnancng -53.6 27.3 98.8 36.9 - K28 Tangble asset retrement growth 0. 9.3 98.7 8.7 -/+ n.m. K29 Interest coverage rato cost structure -2364 49.5 39274.3 55.3 n.m. K30 Cash flow rato lqudty -27.9 5.2 68 9.7 - K3 Days of nventores actvty 0 4 376 59 + K32 Current labltes rato fnancng 0.2 59 96.9 47. + KL Leverage leverage.4 67.2 00 39.3 + n.m. KWKTA Workng captal to total assets lqudty 565.9 255430 5845562. 86593 +/- n.m. KROA Return on assets proftablty -42. 0 5.7 4.8 n.m. KCFTA Cash flow to total assets lqudty -26.4 9 67.6 3.6 - KGBVCC Accountng practce, cut -2 0.6 0 n.m. KCBVCC Accountng practce -2.4 0.6 0 n.m. KDEXP Result of fuzzy expert system, cut -2 0.8 2 2.8 - KDELTA Result of fuzzy expert system -7.9 0.8 8.8 3.5 - n.m.= non-monotone + = postve relaton - = negatve relaton + n.m.= non monotone relaton, mostly postve - n.m.= non monotone relaton, mostly negatve +/- n.m. = non-monotone relaton, frst postve then negatve -/+ n.m. = non-monotone relaton, frst negatve then postve -/+/- n.m. = non-monotone relaton, frst negatve, then postve then agan negatve 8 K-K32 as well as KGBVCC and KDEXP are fnancal ratos belongng to the catalogue of the Deutsche Bundesbank. See [4]. 4

Fgure 4. Predctve Results: ARs of the Fnal SVM Model after Bootstrappng 5. Conclusons SVMs can produce accurate and robust classfcaton results on a sound theoretcal bass, even when nput data are non-monotone and non-lnearly separable. So they can help to evaluate more relevant nformaton n a convenent way. Snce they lnearze data on an mplct bass by means of kernel transformaton, the accuracy of results does not rely on the qualty of human expertse udgement for the optmal choce of the lnearzaton functon of non-lnear nput data. SVMs operate locally, so they are able to reflect n ther score the features of sngle companes, comparng ther nput varables wth the ones of companes n the tranng sample showng smlar constellatons of fnancal ratos. Although SVMs do not delver a parametrc score functon, ts local lnear approxmaton can offer an mportant support for recognsng the mechansms lnkng dfferent fnancal ratos wth the fnal score of a company. For these reasons SVMs are regarded as a useful tool for effectvely complementng the nformaton ganed from classcal lnear classfcaton technques. 5

References [] B. Baesens, T. Van Gestel, S. Vaene, M. Stepanova, J. Suykens and J. Vanthenen, 2003, Benchmarkng State-of-the-art Classfcaton Algorthms for Credt Scorng, Journal of the Operatonal Research Socety (2003), 0, -9. [2] Chh-We Hsu, Chh-Chung Chang, Chh-Jen Ln, A Practcal Gude to Support Vector Classfcaton, http://www.cse.ntu.edu.tw. [3] N. Crstann, J. Shawe-Taylor, An Introducton to Support Vector Machnes and Other Kernel-based Learnng Methods, Repr. 2006, Cambrdge Unversty Press, 2000. [4] Deutsche Bundesbank, How the Deutsche Bundesbank Assesses the Credt Standng of Enterprses n the Context of Refnancng German Credt Insttutons, Markets Department, June 2004, http://www.bundesbank.de/download/gm/gm_broschuere_bontaetunternehmen_en.pdf [5] B. Engelmann, E. Hayden, D. Tasche, 2003, Measurng the Dscrmnatve Power of Ratng Systems, Deutsche Bundesbank Dscusson Paper, Seres 2: Bankng and Fnancal Supervson, No 0/2003. [6] E. Falkensten, 2000, Rskcalc for Prvate Companes: Moody s Default Model, Moody s Investor Servce. [7] T. Van Gestel, B. Baesens, J. Garca, P. Van Dcke, A Support Vector Machne Approach to Credt Scorng, http://www.defaultrsk.com/pp_score_25.htm. [8] W. K. Härdle, R. A. Moro., D. Schäfer, Ratng Companes wth Support Vector Machnes, DIW Dscusson Paper No. 46, Berln, 2004. [9] W. K. Härdle, R. A. Moro., D. Schäfer, Support Vector Machnes Ene neue Methode zum Ratng von Unternehmen, DIW Wochenbercht No. 49/04, Berln, 2004. [0] E. Hayden, Modelng an Accountng-Based Ratng System for Austran Frms, Dssertaton, Fakultät für Wrtschaftwssenschaften und Informatk, Unverstät Wen, Jun 2002. [] E. Mammen, Estmatng a Smooth Monotone Regresson Functon, The Annals of Statstcs, Vol. 9, No. 2, June 99, Pp. 724-740. [2] B. Schölkopf, A. Smola, Learnng wth Kernels -Support Vector Machnes, Regularzaton, Optmzaton and Beyond, MIT Press, Cambrdge, MA, 2002, http://www.learnng-wth-kernels.org. [3] V. Vapnk, The Nature of Statstcal Learnng Theory, Sprnger, New York, 2000. 6