Ordinal Classification Method for the Evaluation Of Thai Non-life Insurance Companies

Ordial Method for the Evaluatio Of Thai No-life Isurace Compaies Phaiboo Jhopita, Sukree Sithupiyo 2 ad Thitivadee Chaiyawat 3 Techopreeurship ad Iovatio Maagemet Program Graduate School, Chulalogkor Uiversity, Bagkok, Thailad phaibooj@gmail.com 2 Departmet of Computer Egieerig, Faculty of Egieerig, Chulalogkor Uiversity, Bagkok, Thailad sukree.s@ chula.ac.th 3 Departmet of Statistics, Faculty of Commerce ad Accoutacy Chulalogkor Uiversity, Bagkok, Thailad Thitivadee@acc.chula.ac.th Abstract This paper proposes a use of a ordial classifier to evaluate the fiacial solidity of o-life isurace compaies as strog, moderate, weak, ad isolvecy. This study costructed a efficiet classificatio model that ca be used by regulators to evaluate the fiacial solidity ad to determie the priority of further examiatio as a early warig system. The proposed model is beeficial to policy-makers to create guidelies for the solvecy regulatios ad roles of the govermet i protectig the public agaist isolvecy. Keywords: Ordial classificatio, Imbalaced class classificatio, Solvecy coditio classificatio, No-life isurace compaies.. Itroductio Thailad Isurace idustry is subject to govermet regulatio to protect policyholders, third-party liability claimat, ad other related busiess. Solvecy supervisio, regulatios ad solvecy positio classificatio is a importat topic for o-life isurers. Most of the studies were implemeted i the Uited States ad may previous studies focused o biary classificatio ad the problem whose class values were uordered (bakrupt/obakrupt, solvecy/isolvecy, or healthy/failed)[2-6]. Ufortuately, they were ot implemeted i the multiclass classificatio fashio. I this paper, we hece proposed a ordial multi-class classificatio for solvecy coditio classificatio. Normally, The Office of Isurace Commissio (OIC) of Thailad uses the Capital ratio (CAR) system of o-life isurace i 2009 to evaluate the capital adequacy or fiacial solidity of the o-life isurers (as show i Table ). with the coditio distiguished by a level of CAR, the isurace compay ad regulator s actios are required. TABLE The solvecy evaluatio ad regulatory actios based o CAR system. Strog Moderate Weak Isolvecy Capital adequacy ratio (CAR) 50% 20-50% 00-20% < 00% The actio level No actio level Compay actio level Regulatory actio level Authorized cotrol & Madatory cotrol level Note: Compay actio level - compay must file pla with isurace commissioer & explaiig cause of deficiecy ad how it will be corrected. Regulatory actio level - The commissioer is required to examie the isurer ad take corrective actio, if ecessary. Authorized cotrol level & Madatory cotrol level - The commissioer has legal grouds to rehabilitate or liquidate the compay, the commissioer is required to seize a compay. The level of capital adequacy ratio (CAR) of isurer is affected by most isurace activities ad decisio makig processes such as premium rate makig, determiatio of the techical reserve, risk udertakig, reisurace activities, ivestmet, sales, credibility of compay to related party, ad also be affected by the coutry s ecoomy, ew legislatios, iflatio ad iterest rates []. With the help of our system, the compaies ca early detect the solvecy coditio of their ow ad ca decide the most suitable policy to reduce their risk.

2. Literature review Amog may empirical studies of isurace sciece, there are several studies with differet techiques used for improvig the performace of Isolvecy predictio ad/or classificatio model. Most studies applied traditioal statistic techiques, such as regressio aalysis [2], multivariate discrimiat aalysis (MDA) [3, 4, 5], logistic regressio (LR) [6], logit ad probit model [7-0], ad multiomial logistic regressio (MLR) []. O the other had, machie learig techiques such as eural etworks (NNs) [-5], ad geetic algorithm (GA) [6] were also used i Isolvecy predictio. Kramer (997) evaluated the fiacial solidity of Dutch o-life isurace by combiig a traditioal statistic techique (ordered logit model) with artificial itelligece techiques (a eural etwork ad a expert system). The complete model cotais three programs; logit model, eural etwork, ad expert system. The data from year 992 has bee used as traiig data set ad year 993 as the test set. The output of the multi-class classificatio model cosists of the priority for further examiatio (High, Medium, ad Low class). The system which evaluates the fiacial solidity ca be used to classify the isurers accordig to their degree of risk exposures. The model correctly classified 93% of the data test set. It showed very good performace for strog, medium ad weak compaies, 96.3% of the strog, 75.0% of the medium ad 94.4% of the weak are classified correctly. Pitselis (2009) studied the solvecy supervisio, regulatios ad isolvecy predictio of Greece isurace compaies usig statistical methodologies, e.g. discrimiat aalysis (DA), logistic regressio (LR), ad multiomial logistic regressio (MLR) to distiguish solvecy positio ito two cases; two-class classificatio (healthy ad isolvecy) ad multi-class classificatio (healthy, merged, ad isolvecy). The paper preseted the effects of solvecy positio of isurace compaies. Compay ad regulatory actios are required if a compay s solvecy positio falls below requiremet. Due to the imbalaced data problem, especially for isolvecy compaies, LR ad MLR failed to give reliable results. DA model was able to adequately classify Healthy, Merged, Isolvecy compaies; 93.5%, 33.3% ad 00% respectively (o the 998 data set). 2. A Simple Approach to Ordial Frak ad Hall (200) [7] preseted a ordial classificatio approach that eables stadard classificatio algorithms to classify the ordial class problems. Frak ad Hall applied stadard classifier i cojuctio with a decisio tree learer. The uderlyig learig algorithm takes advatage of ordered class values. First, the origial dataset problem is trasformed from a k-class V = {v. v k } to k - biary-class problems. The traiig starts by derivig ew datasets from the origial dataset, oe for each of the k- ew class attributes. I the ext step, the classificatio algorithm is applied to geerate a model for each of the ew datasets. To predict the class value of a usee istace, we eed to estimate the probabilities of the k origial ordial classes usig our k- model. Estimatio of the probability for the first ad last ordial class value depeds o a sigle classifier. I Geeral, for class values V i, a probabilities distributio o V i (k-classes) is the derived as follows: Pr (V ) = - Pr (Target > V) Pr (V i) = max { Pr (Target > V i-) Pr (Target > V i), 0 }, < i < k Pr (V k) = - Pr (Target > V k-) To classify a istace of a ukow class, each of the k- classifiers ad the probabilities of each the k ordial class value is calculated usig method above evaluate the istace. The class with maximum probability is assiged to that istace. 2.2 Decisio Tree Learig Algorithm The Decisio Tree Learig (DTL) algorithm we used i this research is the oe amed J48 implemeted i WEKA machie learig tool [8]. The J48 class is implemeted based o the same cocept as C4.5 decisio tree [9]. The DTL is a predictive machie learig model which begis with a set of the whole traiig examples. It creates a decisio tree based o the attribute values of the traiig data that ca best classify the set of samples at a time. The attribute which ca best discrimiate the sample set is evaluated based o the cocept of Etropy. The examples are the divided ito edges which is the value of the attribute. The child ode which cosists of examples from differet classes will be replaced with the ew attribute ode, while the child ode cotaiig examples from the same class will be a used as a decisio ode, i which all examples will be classified as the class of traiig examples collected i this ode. 3. Data ad Methodology The data set used i this study was collected from 70 olife isurace compaies i Thailad. The compaies which were i operatio or wet isolvecy were covered from 2000 to 2008. Durig this period, 66 cases (543 strog, 6 moderate, 3 weak ad 44 isolvecy) were selected as traiig data set as show i Table 2. The data of year 2009 were used as a separated test set. The data source comes from the aual report of The Office of Isurace Commissio (OIC) ad the health isurace compaies are ot icludig o this study.

TABLE 2 Number of No-life Isurace compaies i this study (Data from year 2009 are the separated test set). 2000 200 2002 2003 2004 2005 2006 2007 2008 Total % 2009 Isolvecy 5 3 6 5 7 4 6 5 4 45 7.% 6 Weak 2 0 3 3 3 2.% Moderate 0 2 2 4 3 3 7 2.6% Strog 64 65 62 62 59 60 56 56 57 54 88.% 57 Total 70 70 70 70 70 68 68 65 65 66 00 % 65 Note: The solvecy coditio i this study is determied by capital adequacy ratio = Total capital available (TCA) / Total capital required (TCR) The attributes selectio started from 3 attributes. We chose them from the most commoly used oes i empirical studies of isurace sciece. They were foud sigificat i previous studies of predictig o-life isuraces solvecy [-, 3-6]. I this paper, we select the relevat attributes usig the correlatio-based attribute subset evaluator ad greedy stepwise. All 3 attributed are show i Table 3. TABLE 3 Attributes used i this study V Net premiums writte / policyholders surplus V2 Solvecy margi to miimum required solvecy margi V3 Policyholders surplus & Techical reserve to et writte premium V4 Claims icurred to policyholders surplus & techical reserve V5 Gross aget s balace to policyholders surplus V6 Chage i policyholders surplus V7 Ivestmet yield V8 Ivestmet assets to policyholders surplus V9 Retur o total assets (ROA) V0 Loa & other ivestmet to policyholders surplus V Loss reserve & upaid losses to policyholders surplus V2 Capitalizatio ratio V3 Auto lies et writte premium to total et writte premium To attack the imbalaced data set problem, we employ the stadard resample techique to produce a ew radom set of data by samplig with replacemet. The distributio o the data sets after applyig resample techiques is preseted i Table 4. I this study, we use the ordial class classifier which employs the DTL algorithm as the base classifier. Figure shows the classificatio process. Fig. 2 ad 3 shows the cocept of testig approaches, 0 fold crossvalidatios ad 70:30% split data set validatio. TABLE 4 Traiig data set after applyig resample techique. Origial data set Resample data set Isolvecy 45 7.3% 57 25.5% Weak 3 2.% 37 22.2% Moderate 7 2.8% 44 23.4% Strog 54 87.8% 78 28.9% Total 66 00.0% 66 00.0% Origial data set Resample & Features Selectio Data set After we aalyzed the distributio of the traiig data, we foud that the distributio of the data set was imbalaced, as show i Table 2. The classificatio of data with imbalaced class distributio has posed a sigificat drawback o the performace of most stadard classifiers, which assume a relatively balaced class distributio ad equal misclassificatio costs [20]. May techiques were proposed to solve this problem, for example, re-samplig methods for the balacig the data set, modificatio of existig learig algorithms, measurig the classifier performace i imbalace domais, relatioship betwee class imbalace, ad other data complexity characteristics [2]. 0-fold cross-validatio 70:30% Split data set ifier ifier Test set (Y2009) Ordial classifier (DTL) ifier Results Fig. Model Costructio.

Total umber of examples TABLE 7 results from test set (2009 data set, 65 istaces i total) Experimet Experimet 2 Experimet 9 Experimet 0 Traiig Test example Fig.2 0-fold cross-validatio Total umber of examples 70% Traiig set 30% Test set Fig.3 70:30% Split data set I W M S Total Correctly I 4 2 0 0 6 66.7% W 0 0 0 00.0% M 0 0 0 00.0% S 0 0 3 54 57 94.7% Total 65 92.3% I = isolvecy, W = weak, M= moderate, S= strog The results of applyig the ordial class classifier ad DTL algorithms o the data itroduced above deped o our selected fiacial ratios (attributes). The model shows a good performace ad correctly classifies 98.7% from 0- fold cross-validatio, 95.7% from 30% spilt test set, ad 92.3% from the separated test set. The model ca classify the miority class well but fail to recogize isolvecy class i the separated test set (66.7% correctly classify). The relative importace of each attribute (iput variable) is aalyzed by calculatig the weak class of the relatioship betwee each iput ad output attribute. 4. Experimetal ad Results This paper used a 0-fold cross-validatio, 30% split test set ad separated test set (2009 data set). The classificatio results are show i Table 5, 6, ad 7. TABLE 5 results obtaied from 0-fold cross-validatio (total 66 istaces) I W M S Total Correctly I 54 3 0 0 57 98.% W 0 37 0 0 37 00.0% M 0 0 44 0 44 00.0% S 0 0 5 73 78 97.2% Total 66 98.7% I = isolvecy, W = weak, M= moderate, S= strog TABLE 6 results from 30% spilt test set (total 85 istaces) I W M S Total Correctly I 49 2 0 0 5 96.% W 0 44 0 0 44 00.0% M 0 0 40 3 43 93.0% S 0 2 44 47 93.6% Total 85 95.7% I = isolvecy, W = weak, M= moderate, S= strog TABLE 8 Performace evaluatio measure Evaluatio Cross-validatio method MAE RMSE 0 fold cross-validatio 0.032 0.0838 30% spilt test set 0.028 0.475 Test set (2009 data set) 0.0453 0.985 MAE- Mea absolute error RMSE- Root mea squared error Table 8 presets performace evaluatio measure of umeric predictio. I this study, we evaluated the performace of predictio by MAE ad RMSE. The MAE ad RMSE are give by Mea absolute error (MAE) = p a +... + p Root mea squared error (RMSE) a 2 2 = ( p a ) +... + ( p a ) Where, P,P 2.,.., P deote the predicted values o the test istaces ad a,a 2.,..,a deote the actual values. 5. Coclusios From the experimet settig ad results reported i the previous sectio, the results idicate that the obtaied model ca solve the problems of the multi-class classificatio ad also the imbalaced data set. I this study, we employ the ordial class classifier to solve the multi-class problem, so that our model ca classify the solvecy coditio of Thai

No-life isurace compaies ito four cases, strog, moderate, weak, ad isolvecy. To attack the problem of imbalaced data set, we use the stadard resample techique which ca highly improve the accuracy of the miority class which is the class that we are iterested. Our fial model are useful for isurace regulators, auditors, ivestors, maagemet, policy holders, ad related party to determie the priority for further examiatios as a early warig system. I our further research, we will apply the esemble methods ad stadard classifiers proposed here to better improve the imbalaced data set problem. Refereces [] P. Georgios, A Overview of Solvecy Supervisio, Regulatios ad Isolvecy predictio, Belgia Actuarial Joural, Vol. 8, 2009, pp.37-53. [2] H. Scott, ad N. Jack M., A Regressio-Based Methodology for Solvecy Surveillace i the Property-Liability Isurace Idustry, The Joural of Risk ad Isurace, Vol. 53, 986, pp. 583-605 [3] T. James S., ad P. George E., A Multivariate Model for Predictig Fiacially Distressed Property- Liability Isurace, The Joural of risk ad Isurace, Vol.40, 973, pp.327-338 [4] A. Ja Mills, ad S. J. Alle, Usig Best's Ratigs, Fiacial ratio ad prior probabilities i solvecy predictio, The Joural of Risk ad Isurace, Vol.55, 988. pp. 229-244. [5] C. James M., ad H. Robert E., Life Isurer Fiacial Distress: Models ad Empirical Evidece, The Joural of Risk ad Isurace, Vol.62, 995, pp. 764-775. [6] C. J. David, G. Marti F., ad P. Richard D., Regulatory Solvecy Predictio i Property-Liability Isurace: Risk-Based Capital, Audit Ratios, ad Cash Flow Simulatio, The Joural of Risk ad Isurace, Vol.66, No.3, 998, pp. 47-458. [7] B. Ra, ad H. Robert A., ifyig Fiacial Distress i the Life Isurace Idustry, The Joural of Risk ad Isurace, Vol.57, 990, pp.0-36. [8] A. Ja M., ad C. Ae M., Usig Best's Ratigs i Life Isurer Isolvecy Predictio, The Joural of Risk ad Isurace, Vol. 6, 994, pp. 37-327. [9] L. Suk Hu, ad U. Jorge L., Aalysis ad Predictio of Isolvecy i the Property-Liability Isurace Idustry: A Compariso of Logit ad Hazard Models, Joural of Risk ad Isurace, Vol.63, 996, pp. 2-30. [0] B. Ra, ad H. Joh, The Merger or Isolvecy Alterative i the Isurace Idustry, The Joural of Risk ad Isurace, Vol.64, 997, pp. 89-3. [] E.H. Duett, ad R.A. Hershbarger, Idetifyig Fiacial Distress i the Property-Casualty Idustry, Joural of the Society of Isurace Research, Vol.2, 990, pp. 33-45. [2] C.S. Huag, R.E. Dorsey, ad M.A. Boose, Life Isurer Fiacial Distress Predictio: A Neural Network Model, Joural of Isurace Regulatio, Vol.3, 994, pp. 3-67. [3] B. Patrick L., C. William W., G. Lida L., ad P. Utai, A Neural Network Method for Obtaiig a Early Warig of Isurer Isolvecy, The Joural of Risk ad Isurace, Vol. 6, 994, pp. 402-424. [4] K. Bert, N.E.W.S.: A model for the evaluatio of o-life isurace compaies, Europea Joural of Operatioal Research, Vol. 98, 997, pp.49-430. [5] H. Shu-Hua, ad W. Thou-je, A study of fiacial isolvecy predictio model for life isurers. Expert Systems with Applicatios, Vol.36, 2009, pp.600-607. [6] S.S. Sacho, F.V. Jose-Luise, S.V. Maria Jesus, B.C. Calos, Geetic programmig for the predictio of isolvecy i o-life isurace compaies, Computers & Operatios Research, Vol. 32, 2005, pp. 749-765. [7] F. Eibe, ad H. Mark, A simple approach to ordial classificatio. I L. de Raedt, & P. A. Flach (Eds.), Proceedigs of the Twelfth Europea Coferece o Machie Learig, 200, pp. 45 56. [8] H. Mark, F. Eibe, H. Geoffrey, P. Berhard, R. Peter, ad W. Ia H., 'The WEKA Data Miig Software: A Update', SIGKDD Exploratios Vol., Issue. 2009. [9] J.R. Quila, C4.5: Programs for Machie Learig. Morga Kaufma Publishers Ic., 993. [20] S. Yami., K. Mohamed S., W. Adrew KC., W. Yag, Cost-sesitive boostig for classificatio of imbalaced data Patter Recogitio, Vol.40, 2007, pp. 3358-3378. [2] V. Garcia, J.S. Sáchez, R.A. Mollieda, R. Alejo, J.M. Sotoca, The class imbalace problem i patter classificatio ad learig, 2007, pp. 283-29.