Performance Analysis and Coding Strategy of ECOC SVMs

Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School of Envronmental Scence and Spatal Informatcs, Chna Unversty of Mnng and Technology, Xuzhou, Jangsu, P.R.Chna Jangsu Key Laboratory of Resources & Envronmental Informaton Engneerng, Chna Unversty of Mnng and Technology, Xuzhou, Jangsu, P.R.Chna Correspondng author: Zhgang Yan, zhg-yan@63.com Abstract The theoretcal upper bound of generalzaton error for ECOC SVMs s derved based on Fat-Shatterng dmensonalty and coverng number. The factors affectng the generalzaton performance of ECOC SVMs are analyzed. From the analyss, t s beleved that n real classfcaton tass, the performance of ECOC depends on the performance of the classfers correspondng to ts codng columns, whch s rrelevant to the mathematcal characterstcs of the ECOC tself. The essence of ECOC SVMs s how to construct an optmal votng machne consstng of a number of SVMs, how to choose Sub-SVMs whch have better generalzaton ablty, and how to determne the number of Sub-SVMs tang part n votng, that s the most mportant ssue. Data sets ncludng Segment are selected for test. All the ECOC code columns are constructed usng an exhaustve technque. A Sub-SVM s traned for each code column, and the generalzaton ablty of each Sub-SVM s evaluated by classfcaton ntervals and error rates estmated by cross valdaton. Then, all the ECOC code columns are sorted by the generalzaton performance of Sub-SVMs. Three categores of ECOC SVMs, ncludng superor, nferor and ordnary categores, are constructed from the sorted ECOC code columns, by usng forward, bacward and orgnal sequences. Expermental results show that the performance of ECOC SVMs whch consst of Sub-SVMs wth better generalzaton ablty s better and vce versa, whch valdates our vew and ponts out the drecton for mprovng ECOC SVMs. Keywords: ECOC, SVM, Generalzaton Ablty, Code Matrx. Introducton Numerous supervsed learnng algorthms are desgned for two-class problems, for example, support vector machnes (SVM) []. However, n real applcatons, many problems are multclass problems. Therefore, generalzng SVM to deal wth multclass problems s stll one of mportant research actvtes n machne learnng. Currently, the usual practce s to convert a multclass problem nto a number of two-class problems and then combne them n some way to realze classfcaton nto multple classes. Error Correctng Output Codes (ECOC) s one of the commonly used combnaton way [], whch s called ECOC SVMs. However, there s not a general codng method whch can generate approprate ECOC for any class number. Furthermore, the exstng codng strategy s based on the research on mathematcal features of code matrx, whch gnores the fundamentals of classfcaton, mang t dffcult to progress for ECOC SVMs and ther appled research. In ths study, t s beleved that, n real classfcaton problems, dfferent codng sequences of ECOC SVMs have dfferent meanngs. The performance of codng does not depend on ISSN: 005-46 IJGDC Copyrght c 04 SERSC

Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04) code matrx tself; nstead, t depends on the performance of the real classfcaton problems correspondng to the code columns. Accordng to ths vewpont, we attempt to nvestgate ECOC SVMs from real classfcaton problems n ths study, whch ponts out the drecton for mprovng ECOC SVMs.. Code Matrx of ECOC and ts Correspondng Classfers. Code Matrx of ECOC ECOC s a codng matrx consstng of {0,} shown n Table, denoted as MQ S. In multclass problems, row Q represents the class number of samples, whle column S represents the number of classfers to be traned. When Mqs=(Mqs=0), ths sample s postve (negatve) for the q-th class and the s-th classfer fs. The worng process of ECOC s dvded nto two phases: tranng and classfcaton. In the tranng phase, the classfer f(x)=(f(x),,fs(x)) s traned accordng to the above-mentoned prncple; whle n the classfcaton phase, for a new sample X, the dstances between output vectors and the class vectors are calculated. Then class wth the mnmum dstance s the classfcaton result, whch s gven by: K arg m n ( d ( M, f ( X )) () q.. Q q where K s the class of X, and d s the dstance functon. The Hammng Dstance (HD) s usually used: d ( M, f ( x )) q s m s g n ( f ) S q s s () Table. All Possble ECOC Columns for a 4-Class Problem Class f f Code Word f 3 f 4 f 5 f 6 f 7 C 0 0 0 0 0 0 0 0 C 0 0 0 C 0 0 0 C 3 0 0 0 For ECOC, when the coded rows are the same, the classes correspondng to the rows cannot be dentfed; when the coded columns are the same, they correspond to the same classfer, therefore deletng a column does not affect the output; when the code of two columns are complementary, the outputs of ther correspondng classfers are complementary, therefore they are dentcal; columns of all 0 or all are mae no sense, because they cannot be used to tran classfers. In one word, an avalable ECOC must satsfy the followng condtons: ()The rows of the codng matrx are not correlated, and nether correlated nor complementary are the columns of the codng matrx; ()None of the columns s all 0 or all. (3)For a -class problem, the codng length L must satsfy lo g L ; 68 Copyrght c 04 SERSC

Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04) Accordng to the codng theory, for an error correcton code wth mnmum HD d, [(d - )/ ] bts of error can be corrected. Therefore, for an output code wth error correcton ablty, the HD between code words should be larger than 3. Detterch proposed four commonly used ECOC codng methods [], ncludng Exhaustve Codes, Column Selecton from Exhaustve Codes, Randomzed Hll Clmbng Codes and BCH Codes. In addton, Crammer and Snger proposed the concept of contnuous codng [3]. Utschc proposed expectaton maxmzaton codng algorthm [4], n whch the ECOC s selected by constructng maxmzed objectve functon. Ludmla and Kuncheva used hybrdzaton and mutaton n evoluton algorthm to derve new ECOC codes from random ones [5]. The recent research about ECOC s a general codng method - searchng codng method whch was proposed n reference [6]. The method s not only sutable for problems of any class number, but also can automatcally generate alternatve codes accordng to dfferent crtera, ncludng class numbers and mnmum HDs. However, t cannot deal wth the problem caused by dentcal columns. For the evaluaton of codng performance, Francesco beleves that the performance of ECOC s related to many factors, ncludng: the smlarty of codng words, the performance of the classfers, the complexty of the real problems, the choce of classfers, and the correlaton of the codng columns, etc., [7]; Xa beleves the performance of ECOC s related to codng length, the mnmum HD between code words, and the dstrbuton order of the code words [8]. It can be seen from the above that the evaluaton of ECOC codng performance and the applcaton n classfcaton start from codng tself, whle attenton s seldom pad on classfcaton. Next, we ntroduce the ECOC SVMs n real classfcaton problems... SVM SVM s a machne learnng method based on statstcal learnng theory. To resolve the n pattern recognton problem, a calculable recognton functon y f(x ), x R, y -, s found. For the gven samples (x, y ),(x, y ), (x, y ), x n R, y {, }, a hyperplane (decson surface) n needs to be found, namely, W x b 0,W R, b R, and the correspondng recognton functon s: f ( x ) sg n (( W x ) b ) (3) The decson surface should meet the followng constrants: y [ W x b ],,,, (4) The optmal decson surface should meet the requrement that the smallest dstance from the two classes of samples to the decson surface s the bggest, hence, the classfcaton problem becomes that the condton of Formula (4), namely: 0 should be met and the mnmum problem of m n : ( W ) W C (5) The frst tem n the formula maes the smallest dstance from the two classes of samples to the decson surface the bggest, whle the second tem maes the error the mnmum, and the constant C splts the dfference of two above. Ths optmzaton problem wth constrants Copyrght c 04 SERSC 69

Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04) can be resolved wth Lagrangan approach, and the correspondng classfcaton functon can be changed to: f ( x ) sg n ( y ( x x ) b ) (6) For the non-lnear separable condton, a non-lnear functon can be found, and then the data are mapped to a hgh dmensonal feature space, n whch an optmal hyperplane s establshed, and the correspondng classfcaton functon s as follows: f ( x ) sg n ( y ( ( x ) ( x )) b ) Only pont multplcaton algorthm K ( x, y ) ( x ) ( y ) n the hgh dmensonal feature space s consdered n SVM theory, n whch K ( x, y ) s called ernel functon, and the functon s not used drectly, hence, formula (7) can be transferred nto formula (8): (7) f ( x ) sg n ( y K ( x, x ) b ) The common ernel functon ncludes: lnear ernel functon, K ( x, y ) ( x y ) ; RBF ernel functon, ( x, y ) e x p ( x y / ). (8).3. ECOC SVMs Combnng SVMs wth ECOC to classfy multple classes, we have ECOC SVMs. The upper bound of the generalzaton error for ECOC SVMs s derved n reference [8] based on the concept of Fat-Shatterng dmenson and coverng number. Assumng m samples can be correctly classfed by -class ECOC SVMs, wth ECOC codng length beng L, mnmum HD between code words beng d, and the sorted SVM classfcaton ntervals n descendent order denoted by,,,, the generalzaton error ECOC SVMs, wth probablty at least L δ, s no larger than: M 3 0 R ( m ) M N K! ' D lo g ( 4 e m ) lo g ( 6 m ) lo g m (9) where D ' L, R s the mnmum radus of enclosure ball, M L ( d ) /, N s the number of codes wth codng length L and HD d between codes. Each group has K code words. It s beleved n [8] that: () Gven a fxed code length, the longer mnmum HD between codes, the better generalzaton ablty of ECOC SVMs; () Gven a fxed mnmum HD between codes, the longer code length, the worse generalzaton ablty of ECOC SVMs; (3) Once the code length and the mnmum HD between codes are fxed, there exsts optmal allocaton order for code words whch guarantees the ECOC SVMs the best generalzaton ablty. 70 Copyrght c 04 SERSC

Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04) The relaton between the generalzaton ablty of ECOC SVMs and the mnmum HD between codes and the code length s dscussed n [8]. However, no dscusson on the relaton between the mnmum HD between codes and the code length s gven. Furthermore, t s beleved n [8] that there do exst optmal codng sequence but no soluton s provded accordng to codng tself. Clearly, t s dffcult to fnd a way to determne the code length and the code sequences from a mathematc pont of vew. Exhaustve search s defntely unfeasble and t does not provde a reasonable explanaton to codng. Intutvely, the answer should be found n the classfcaton problems themselves. 3. New Understandngs about ECOC SVMs Analyzng formula (9) agan, one nows that D, M and N affect the upper bound of the generalzaton error of ECOC SVMs. In formula (9), M L d, the mnmum HD d s related to code length L. Generally, the larger L, the larger d. However the ncrement of d s less than or equal to that of L. Therefore, M s non-decreasng; N s also related to L. It ncreases when L ncreases. D ' L, wth the ncrease of code length L, D s ncreasng. Therefore, the generalzaton ablty of ECOC SVMs decreases when L ncreases. In ths study, we beleve that ECOC SVMs should have enough Sub-SVMs for decson. That s to say, L should be bg enough. However, a bgger L may harm the performance of ECOC SVMs. Thus, the value of L should be a compromse. When L s determned, the effect of D on the generalzaton ablty of ECOC SVMs s major. Select L Sub-SVMs wth good generalzaton ablty, then the generalzaton ablty should be good f one constructs ECOC SVMs wth these Sub-SVMs. If t s mpossble to evaluate the generalzaton ablty of each Sub-SVM, or the dfference between each s nsgnfcant, the effect of M and N on the performance of ECOC SVMs, whch s the concluson of reference [8]. A votng process s used to vvdly descrbe the above analyss. The essence of ECOC SVMs s to tran a number of two-class SVMs, then determne the class of an unnown sample accordng to the classfcaton results of these two-class SVMs. Usng mnmum HD to determne the class of a sample s equvalent to votng. In the votng stage, each Sub-SVM votes for a number of classes whch t supports; n the callng stage, the sample class s the class whch most corresponds to the results of Sub-SVMs. Each column of the codes corresponds to a Sub-SVM, therefore, the process of constructng ECOC determnes whch of the Sub-SVMs have the votng rght. Clearly, t s mportant to gve votng rght to those Sub- SVMs wth good generalzaton ablty. Thus, the codng problem s actually how to construct an optmal votng machne consstng of a number of two-class SVMs, where how to select Sub-SVMs wth good generalzaton ablty and how to determne the number of Sub-SVMs tang part n votng are two mportant factors. Next, experments are used to valdate the vewpont. It s ponted out n [] that VC dmenson and the classfcaton nterval when lnearly separable are the crtera for the generalzaton ablty of an SVM. However, t s dffcult to determne the VC dmenson. Therefore VC dmenson s dffcult to deal wth and apply. The classfcaton nterval s a relable crteron for evaluatng the generalzaton ablty of an SVM, but t needs the precondton that the SVM can lnearly separate samples, whch s usually dffcult to satsfy. When samples are not lnearly separable but are lnearly separable after beng mapped nto a hgh dmensonal space, the generalzaton ablty of an SVM s descrbed by the classfcaton nterval n hgh dmensonal space. In ths stuaton, the classfcaton nterval of an SVM s / W, where W s the normal vector of the Copyrght c 04 SERSC 7

Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04) classfcaton hyperplane of the SVM, whch s calculable. However t s mpossble to drectly calculate W. Notcng the dualty of (6) and (8), one can obtan: (0) j j j W C y y K ( x, x ), j When samples are completely separable, C 0, then (0) s smplfed as: () j j j W y y K ( x, x ), j W can be calculated accordng to (), thereby the classfcaton nterval n hgh dmensonal space can also be obtaned. When samples are partally separable, we use the error rate E of the cross valdaton to evaluate the generalzaton ablty of SVMs. The smaller E, the better generalzaton ablty. To reduce the msclassfcaton, the classfcaton nterval s also consdered when usng E as a evaluaton crteron. But the classfcaton nterval now ncludes the msclassfed samples. The effect of msclassfed samples should be elmnated when calculatng. Data sets from UCI database are selected for test, ncludng Segment, Landsat, Optdgts, Zoo, Page Blocs, etc., Lnear ernel and RBF ernel are used n the test. The process of the test s as follow: Gven classes of samples, all the ECOC code columns are constructed by exhaustve method, totalng columns. For each column of code, tran the Sub-SVMs. Then sort the code columns by the generalzaton ablty of Sub-SVMs, accordng to the followng rules: () When samples are lnearly separable, the lnear ernels are chosen. They have lower VC dmensons and better generalzaton ablty compared wth RBF ernels; () When samples are lnearly nseparable but separable f RBF ernels are used, the RBF ernels are chosen; (3) When samples are completely nseparable, ernel functons wth hgher accuraces are chosen; (4) When samples are separable, sort the classfcaton nterval by descendng order; (5) When samples are nseparable, sort the error rate E by ascendng order, meantme are referred. A number of code columns n postve sequence are chosen from the sorted ECOC code columns to tran the ECOC SVMs, then the same number of code columns n reverse order are chosen as comparson group. The orgnal order of the exhaustve code s ept unchanged. The same number of code columns are successvely chosen as reference groups. The expermental results of the selected data sets are bascally the same. Tang the Segment data set as an example, the results are shown n Fgure. There are 7 classes n Segment data set, wth each sample havng 9 features. Each class provdes 30 tranng samples and 300 test samples. The code length ranges from 3 to 63. In the experments, ECOC SVMs consstng of 0 to 63 code columns are tested. RBF ernels have hgher accuracy, so they are chosen n the experments. To facltate the test, same parameters are used for all Sub-SVMs. 7 Copyrght c 04 SERSC

0 3 6 9 5 8 3 34 37 40 43 46 49 5 55 58 6 Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04) Predcton accuracy 0.95 the relatonshp between ECOC codng sequence and the correspondng predcton accuracy 0.9 0.85 0.8 0.75 0.7 0.65 Code Length forward sequence reverse sequence orgnal sequence Fgure. Relatonshp between ECOC SVMs Encodng Method and ther Predcton Accuracy for Segment Data Set It can be seen n Fgure that the predcton accuracy of ECOC forward sequences s much hgher than that of reverse sequences, whle the orgnal sequences have the medum accuracy. The orgnal sequences can be vewed as the predcton of random ECOC, whle the reverse sequences have the worst predcton results. The forward sequences have the best predcton results. In addton, the more overlappng code between forward and reverse sequences, the closer predcton accuraces they have. When code length ncreases to a certan degree, the predcton accuracy of forward sequences decreases and becomes stable, whle the accuracy of orgnal sequences ncreases wth fluctuatons and fnally becomes stable. However the accuracy of reverse sequences eeps ncreasng. The results suggest that when code length ncreases, f the generalzaton ablty of the SVMs correspondng to the newly added columns are strong, the codng performance mproves, le reverse sequences; Conversely, f the generalzaton ablty of the SVMs s wea, the codng performance degrades, le forward sequences; n orgnal sequences, code lengths are short at the begnnng, whch maes the codng performance bad, however wth the ncrease of code lengths, the mnmum HD between codes ncreases, mprovng the generalzaton ablty. But f the code lengths stll ncrease, more code columns wth wea generalzaton ablty exst, whch stops the overall performance from ncreasng. It s beleved n [8] that the generalzaton ablty of ECOC SVMs depends on the frst [L -(d - )/ ] hgh performance SVMs, rrelatve to the rest (d-)/ SVMs. However we further show that when generalzaton ablty s good, nsuffcent codng numbers wll also affect the generalzaton ablty of ECOC SVMs. In ths stuaton, the generalzaton ablty of ECOC SVMs s relatve to the Sub-SVMs wth bad generalzaton ablty. More mportantly, a codng strategy s derved n ths study showng how to construct ECOC wth good generalzaton ablty. It can be seen n Fgure that, the performance of ECOC SVMs nether necessarly ncreases wth the ncrease of the mnmum HD, nor wth the ncrease of code length. Instead, t has a complex relatonshp wth both of them. Copyrght c 04 SERSC 73

Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04) Fgure. Relatonshp between Predcton Accuracy, Code Length, and the Mnmum Hammng Dstance The vewpont n ths study s valdated through real tests. That s, the generalzaton ablty of SVMs correspondng to code columns has the most sgnfcant effect on ECOC performance, whle code lengths and mnmum HD between codes are both mathematcal features represented by codng. The ey factor n mprovng the performance of ECOC SVMs s to fnd Sub-SVMs wth good generalzaton ablty. When mpossble to fnd such Sub- SVMs or the generalzaton abltes of all Sub-SVMs are the same, the performance of ECOC SVMs can be consdered from the codng pont of vew. Then code length, mnmum HD between codes, allocaton order of codes, correlatons between code columns can be the evaluaton crtera of the generalzaton ablty of ECOC SVMs, whch s dscussed n [8]. The exhaustve codng method can assure the code columns are nether correlated nor complementary. However when code length s short, same columns may exst n ECOC, mae t mpossble to determne the class of a sample. In ths stuaton, remedal measures should be taen, that s, tranng addtonal SVM classfers correspondng to the dentcal code words, n order to judge the decson results of ECOC SVMs. In Fgure, there are cases n whch the mnmum HD between codes s 0 when the reverse sequence and the orgnal sequence are both short, mplyng that there are dentcal codes. Strctly speang, the ECOC s wrong n these cases. However for convenence, ths part of code s reserved after tang measures to deal wth t, mared by n Fgure. 4. Conclusons and Dscussons Startng from the essence of problems, ECOC SVMs s analyzed n ths study. New constructng method s proposed. The man concluson and problem are as below:. The performance of ECOC SVMs depends on the performance of ts correspondng Sub- SVMs, whle the mathematcal features represented by codng are secondary. When mpossble to evaluate the performances of Sub-SVMs or the performances are the same, the code lengths, the mnmum HD between codes, the allocaton order of code words, and the 74 Copyrght c 04 SERSC

Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04) correlatons between codes can be the crtera of ECOC. For the error correcton ablty, t s beleved n ths study that for optmal classfers, less error s prmary, whle error correcton s secondary.. What needs to further solve s that: () currently, there are stll no exact theory about the evaluaton of the SVM performance because ernel functons, parameters and sample spaces are dfferent. Also, there s stll no exact theory about the comparablty of the generalzaton ablty of each Sub-SVM; () What s the approprate code length of ECOC SVMs? What s the relaton between the code length of ECOC SVMs and the generalzaton ablty of each Sub-SVM? And how to convenently and rapdly construct a reasonable code matrx? These problems are stll to be nvestgated. From the ntal results, code can be short for Sub-SVMs wth better generalzaton abltes, or qualty s more mportant than quantty; when the performance of Sub-SVMs s mpossble to evaluate, code length should be longer, or compensate qualty by ncreasng quantty. But the code length should be approprate, by no means the longer the better. Acnowledgements Ths wor was supported by a grant from Natural Scentfc Fund of Chna (No. 47445) and a Project Funded by the Prorty Academc Program Development of Jangsu Hgher Educaton Insttutons. References [] V. N. Vapn, The Nature of Statstcal Learnng Theory, Sprnger, New Yor, USA, (995). [] T. Detterch and G. Bar, Solvng multclass learnng problems va error-correctng output codes, Journal of Artfcal Intellgence Research, vol., (995), pp.63-86. [3] K. Crammer and Y. Snger, On the learnablty and desgn of output codes for multclass problems, Proc. of the 3th Annual Conf. on Computatonal Learnng Theory, (000), pp. 35-46. [4] W. Utschc and W. Wechselberger, Stochastc organzaton of output codes n multclass learnng problems, Neural Computng, vol. 3, no. 5, (00), pp. 065-0. [5] K. Ludmla I, Usng dversty measures for generatng error-correctng output codes n classfer ensembles, Pattern Recognton Letters, vol. 6, no., (005), pp. 83-90. [6] Y. Jang, Q. Zhao and X. Yang, A Search Codng Method and Its Applcaton n Supervsed Classfcaton, Journal of Software, (In Chnese), vol. 6, no. 06, (005), pp. 08-089. [7] F. Masull and G. Valentn, An expermental analyss of the dependence among codeword bt errors n ECOC learnng machnes, Neuro computng, vol. 57, (004), pp. 89-4. [8] X. Jantao and H. Mngy, Multclass Classfcaton Usng Support Vector Machnes (SVMs) Combned wth Error-Correctng Codes (ECCs), Journal of Northwestern Polytechncal Unversty, (In Chnese), vol., no. 4, (003), pp. 443-448. Authors Zhgang Yan receved B.Sc. degree from Chna Unversty of Mnng and Technology n 997 and Ph.D. degree from Chna Unversty of Mnng and Technology n 007. He s currently a assocate professor at faculty of Chna Unversty of Mnng and Technology, Chna. Hs feld of nterest s spato-temporal data mnng and nowledge dscoverng. Yuanxuan Yang, Master student, receved B.Sc. degree n Geographc Informaton System n 03 from Chna Unversty of Mnng and Technology. Now he study n Chna Unversty of Mnng and Technology, supervsed by Zhgang Yan. Copyrght c 04 SERSC 75

Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04) 76 Copyrght c 04 SERSC