Support vector domain description

Pattern Recognton Letters 20 (1999) 1191±1199 www.elsever.nl/locate/patrec Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty of Technology, F263 Lorentzweg 1, 2628 CJ Delft, The Netherlands Abstract Ths paper shows the use of a data doman descrpton method, nspred by the support vector machne by Vapnk, called the support vector doman descrpton (SVDD). Ths data descrpton can be used for novelty or outler detecton. A sphercally shaped decson boundary around a set of objects s constructed by a set of support vectors descrbng the sphere boundary. It has the possblty of transformng the data to new feature spaces wthout much extra computatonal cost. By usng the transformed data, ths SVDD can obtan more exble and more accurate data descrptons. The error of the rst knd, the fracton of the tranng objects whch wll be rejected, can be estmated mmedately from the descrpton wthout the use of an ndependent test set, whch makes ths method data e cent. The support vector doman descrpton s compared wth other outler detecton methods on real data. Ó 1999 Elsever Scence B.V. All rghts reserved. Keywords: Data doman descrpton; Outler detecton; One-class class caton; Support vector machnes 1. Introducton Most pattern recognton tasks deal wth class caton or regresson problems. But there s a thrd, less well-known extenson of the class caton problem, the data doman descrpton problem (also called one-class class caton). In doman descrpton the task s not to dstngush between classes of objects lke n class caton problems or to produce a desred outcome for each nput object lke n regresson problems, but to gve a descrpton of a set of objects. Ths descrpton should * Correspondng author. Tel.: +31-15-2781845; fax: +31-15- 278-6740. E-mal address: davdt@ph.tn.tudelft.nl (D.M.J. Tax) 1 Ths work was partly supported by the Foundaton for Appled Scences (STW) and the Dutch Organzaton for Scent c Research (NWO). cover the class of objects represented by the tranng set, and deally should reject all other possble objects n the object space. The data doman descrpton s used for outler detecton or novelty detecton, the detecton of objects whch d er n some sense sgn cantly from the rest of the dataset. D erent methods for data doman descrpton or outler detecton have been developed. When an underlyng statstcal law for the outlyng patterns s assumed, ths underlyng dstrbuton should be estmated (Rtter and Gallegos, 1997). When nothng about the outler dstrbuton can be assumed (or f an nsu cent number of outler examples s avalable), only a descrpton of (the boundary of) the target class can be made. Most often a probablty densty of the data s estmated and new test objects whch are under some probablty threshold wll be rejected. For nstance, n 0167-8655/99/$ - see front matter Ó 1999 Elsever Scence B.V. All rghts reserved. PII: S 0 1 6 7-8 6 5 5 ( 9 9 ) 0 0 087-2

1192 D.M.J. Tax, R.P.W. Dun / Pattern Recognton Letters 20 (1999) 1191±1199 the paper of Tarassenko et al. (To appear), anomaltes n mammographs are detected by applyng Parzen densty estmaton and a mxture of Gaussans on the normal class. A drawback of these densty methods s that they often requre a large dataset, especally when hgh dmensonal feature vectors are used. Also problems may arse when large d erences n densty exst: objects n low densty areas wll be rejected although they are legtmate objects. In ths paper, another method for data doman descrpton s presented and analyzed (the dea was rst presented n (Tax and Dun, 1999)). The method s nspred by the support vector machnes by Vapnk (1995). For data doman descrpton not the optmal separatng hyperplane has to be found, but the sphere wth mnmal volume (or mnmal radus) contanng all objects. Frst we gve a theoretcal dervaton of the basc method n Secton 2. In Sectons 3 and 4 we focus on choces for the parameters whch are stll free and look at some characterstcs of the methods. Expermental results wll be shown n Secton 5, and we gve conclusons n Secton 6. 2. Theory Of a data set contanng N data objects, fx ; ˆ 1;... ; Ng, a descrpton s requred. We try to nd a sphere wth mnmum volume, contanng all (or most of) the data objects. Ths s very senstve to the most outlyng object n the target data set. When one or a few very remote objects are n the tranng set, a very large sphere s obtaned whch wll not represent the data very well. Therefore, we allow for some data ponts outsde the sphere and ntroduce slack varables n (analogous to (Vapnk, 1995)). Of the sphere, descrbed by center a and radus R, we mnmze the radus F R; a; n ˆ R 2 C X n ; 1 where the varable C gves the trade-o between smplcty (or volume of the sphere) and the number of errors (number of target objects rejected). Ths has to be mnmzed under the constrants x a T x a 6 R 2 n 8 ; n P 0: 2 Incorporatng these constrants n (1), we construct the Lagrangan, L R; a; a ; n ˆ R 2 C X X a fr 2 n x 2 2ax a 2 g X n c n ; 3 wth Lagrange multplers a P 0 and c P 0. Settng the partal dervatves to 0, new constrants are obtaned: X P a ˆ 1; a ˆ P a x ˆ X a x ; a C a c ˆ 0 8 : 4 Snce a P 0 and c P 0 we can remove the varables c from the thrd equaton n (4) and use the constrants 0 6 a 6 C 8. Rewrtng Eq. (3) and resubsttutng Eqs. (4) gve to maxmze wth respect to a : L ˆ X a x x X a a j x x j ; 5 ;j wth constrants 0 6 a 6 C; P a ˆ 1. The second equaton n (4) states that the center of the sphere s a lnear combnaton of data objects, wth weght factors a whch are obtaned by optmzng Eq. (5). Only for a small set of objects the equalty n Eq. (2) s sats ed: these are the objects whch are on the boundary of the sphere tself. For those objects the coe cents a wll be non-zero and are called the support objects. Only these objects are needed n the descrpton of the sphere. The radus R of the sphere can be obtaned by calculatng the dstance from the center of the sphere to a support vector wth a weght smaller than C. Objects for whch a ˆ C have ht the upper bound n (4) and are outsde the sphere. These support vectors are consdered to be outlers. We wll dscuss the parameter C n more detal n the next secton. To determne whether a test pont z s wthn the sphere, the dstance to the center of the sphere has

D.M.J. Tax, R.P.W. Dun / Pattern Recognton Letters 20 (1999) 1191±1199 1193 to be calculated. A test object z s accepted when ths dstance s smaller than the radus,.e., when z a T z a 6 R 2. Expressng the center of the sphere n terms of the support vectors, we accept objects when z z 2 X a z x X ;j a a j x x j 6 R 2 : 6 L ˆ X a K x ; x X a a j K x ; x j ; ;j 7 wth constrants 0 6 a 6 C; P a ˆ 1. A test object z s accepted when (see (6)) K z; z 2 X a K z; x X ;j a a j K x ; x j 6 R 2 : 8 3. Generalzng to other kernels The method just presented only computes a sphere around the data n the nput space. Normally, data are not sphercally dstrbuted, even when the most outlyng objects are gnored. So, n general, we cannot expect to obtan a very tght descrpton. Snce the problem s stated completely n terms of nner products between vectors (Eqs. (5) and (6)), the method can be made more exble, analogous to (Vapnk, 1995). Inner products of objects x x j can be replaced by a kernel functon K x ; x j, when ths kernel K x ; x j sats es Mercer's theorem. Ths mplctly maps the objects x nto some feature space and when a sutable feature space s chosen, a better, more tght descrpton can be obtaned. No explct mappng s requred, the problem s expressed completely n terms of K x ; x j. Therefore, we replace all nner products (x x j ) by a proper K x ; x j and the problem of ndng a data doman descrpton s now gven by (see (5)) D erent kernel functons K result n d erent descrpton boundares n the orgnal nput space. The problem s to nd a sutable kernel functon K x ; x j. We dscuss two choces: a polynomal kernel and a Gaussan kernel. The rst choce for kernel K x x j s the extended nner product: K x ; x j ˆ x x j 1 d, where the free parameter d s the degree of the polynomal kernel. As argued by Vapnk (1995), ths kernel maps the objects nto the hgh dmensonal feature space by addng products of the orgnal features, up to degree d. (For example, a 2D vector x 1 ; x 2 s mapped to x 1 ; x 2 ; x 1 x 2 ; x 2 1 ; x2 2 when a polynomal kernel wth d ˆ 2 s used.) Ths kernel does, n general, not result n good tght descrptons. For hgher degrees d, the n uence of objects most remote from the orgn of the coordnate system ncreases and overwhelms all other nner products. Ths e ect s shown n Fg. 1 wth a two-dmensonal dataset contanng 10 objects. For d erent values of the degree (d ˆ 1; 10; 25) a sphere descrpton s computed. Fg. 1. Dstance to the center of the hypersphere, mapped back on the nput space for a polynomal kernel. The darker the color, the smaller the dstance. The whte dashed lne ndcates the surface of the hypersphere. The small crcles ndcate support objects.

1194 D.M.J. Tax, R.P.W. Dun / Pattern Recognton Letters 20 (1999) 1191±1199 The dstance to the center of the sphere s plotted n the orgnal nput space. The dashed whte lne crossng the support vectors (ndcated by the small crcles) s the boundary of the descrpton. The objects n the upper part are most dstant from the orgn and although these objects are not the most outlyng objects n the data n two dmensons, they become the support vectors when hgher degrees are used. Note that a large part of the nput space becomes accepted. Ths descrpton results n a very large and sparse sphere n the orgnal two-dmensonal nput space. To suppress the growng dstances for larger feature spaces, a Gaussan kernel K G x ; x j ˆ exp x x j 2 =s 2 s more approprate. Eq. (7) then becomes L ˆ 1 X a 2 X 6ˆj a a j K G x ; x j ; and the accepton rule, Eq. (8), becomes 2 X a K G z; x 6 R 2 C X 1; 9 10 where C X only depends on the support vectors and the a and not on the test object z. In Fg. 2, agan a 2D art cal dataset contanng 10 objects s shown. Now a support vector doman descrpton wth a Gaussan kernel for d erent values of s s used. The wdth parameter s ranges from very small (s ˆ 1:0 n the leftmost gure) to large (s ˆ 25:0 n the rghtmost gure). Note that the number of support vectors decreases and that the descrpton becomes more sphere-lke. We can derve explct solutons for Eq. (7) for the two d erent extreme stuatons, one for very small values and one for very large values of s. For very small s, K G x ; x j ' 0; 6ˆ j and L ˆ 1 P a2. Ths s maxmzed when a ˆ 1=N and L becomes 1 1=N. Ths s smlar to the Parzen densty estmaton, where each object supports a kernel (see Eq. (10)). All dstances to the center of the sphere become 1 1=N. For very large s, K G x ; x j ˆ 1 and L ˆ 1 P a2 P 6ˆj a a j. Ths s maxmzed when all a ˆ 0 except for one a j ˆ 1 and all dstances to the sphere center become 0. Ths n ntely large sphere wll not be obtaned n practce and s wll not be large enough to gve equal K G x ; x j for all pars ; j. In the rghtmost subplot of Fg. 2 a realstc lmt stuaton s plotted. The data descrpton s agan the smallest sphere whch covers the complete dataset, wthout outlers. A Taylor expanson of Eq. (9) shows that when hgher orders are gnored, Eq. (5) s obtaned (up to a scalng and o set factor). In the case of moderate values of s (mddle plot n Fg. 2) just a fracton of the objects become support objects. Eq. (10) shows that n ths case an edted and weghted Parzen densty estmaton s obtaned. Ths does not estmate the total densty of the data, but tres to descrbe just the boundary of the dataset. The parameter C gves the upper boundary for the parameters a and thus lmts the n uence of the ndvdual support vectors on the descrpton, Eq. (10). When an object x 1 obtans a ˆ C, the descrpton wll not be adapted any further Fg. 2. Dstance to the center of the hypersphere, mapped back on the nput space for a Gaussan kernel. The darker the color, the smaller the dstance. The whte dashed lne ndcates the surface of the hypersphere. The small crcles ndcate the support objects.

D.M.J. Tax, R.P.W. Dun / Pattern Recognton Letters 20 (1999) 1191±1199 1195 towards ths object and t wll stay outsde the sphere. Because of the constrants P a ˆ 1 and a P 0, only the choces for whch C can have any n uence on the soluton of Eq. (9) s when 1=N 6 C 6 1. For C < 1=N no soluton can be found because then the constrant P a ˆ 1 can never be met, whle for C > 1 one can always nd a soluton (a 's are always less or equal to 1). When C s restrcted to small values, the cost of beng outsde the sphere s not very large and a larger fracton of the objects s allowed to be outsde the sphere. In practce the value of C s not very crtcal. In the experments of ths paper, C ˆ 0:25 s chosen and n none of the cases an outler s detected n the target class. When a smaller C ˆ 0:2 or a larger C ˆ 0:4 s used, the same results are obtaned. 4. Generalzaton To get an ndcaton of the generalzaton or the over ttng characterstcs of the SVDD, we have to get an ndcaton of (1) the number of target patterns that wll be rejected (errors of the rst knd) by ths descrpton and (2) of the number of outlyng patterns that wll be accepted (errors of the second knd). We can estmate the error of the rst knd by applyng the leave-one-out method on the tranng set contanng the target class (Vapnk, 1995). When we leave out an object from the tranng set whch s not a support object, the orgnal soluton s found and all tranng objects wll be found. When a support object s left out, the optmal sphere descrpton can be made smaller, because ths support object s on the boundary of the sphere. Ths left-out object wll then be rejected, whle the rest of the tranng objects wll stll be accepted (because the method s traned on these data). Thus, the error can be estmated by E P error Š ˆ #SV N ; 11 where #SV s the number of support vectors. When we use a Gaussan kernel, we can regulate the number of support vectors by changng the wdth parameter s. Therefore, we can also set the error of the rst knd. When the number of support vectors s too large, we have to ncrease s, whle when the number s too low, we have to decrease s. To check how well the estmate of Eq. (11) s, we plotted n Fg. 3 the estmaton of the errors of the rst knd as a functon of the wdth parameter s. The method was appled to a two-dmensonal dataset contanng 10 objects. Also the error, estmated on an ndependent test set of 100 objects, s shown. We can conclude that ths estmate works well. So when a descrpton of a dataset s requred, we can set beforehand a bound on the expected rejecton rate of the target data. The Lagrangan from Eq. (9) s solved and the expected error for ths soluton s obtaned va Eq. (11). When ths error s too large, the wdth parameter s s ncreased, or when ths error can stll ncrease, the wdth parameter s s decreased. Ths guarantees that the wdth parameter n the SVDD s adapted for the problem at hand, gven the error. The chance that outlyng objects wll be accepted by the sphere descrpton, the error of the second knd, cannot be estmated by ths measure. In general, only a good descrpton of the target class n the form of a tranng set s avalable. All other patterns are consdered outlers. To get an estmate for the error of the second knd, data Fg. 3. Comparson between the fracton of objects whch are support objects and the fracton of test ponts whch s rejected, wth respect to the parameter s. The target class conssts of the 10 objects shown n Fgs. 1 and 2.

1196 D.M.J. Tax, R.P.W. Dun / Pattern Recognton Letters 20 (1999) 1191±1199 around the orgnal data set should be created and tested. Ths method then requres a way of obtanng or creatng data around the tranng data, but not n the tranng set. Also the number of test patterns should be su cently hgh for a reasonable estmate, whch can be a problem n hgher dmensonal feature spaces. In the experments n ths paper we crcumvent ths problem by usng class caton problems for testng the method. From the class caton tasks we take one class as beng the outler class, and all other classes wll be used as the target class. In ths way art cal outlers can be constructed. Ths means that a performance bas s ntroduced. These class caton problems often contan overlappng classes and by usng the class caton problems n ths way, the performance of the outler methods wll be lower than that of normal class caton methods for the class caton task. Stll, t gves an ndcaton of the performances when d erent outler methods are compared. 5. Experments The SVDD method s compared wth four other outler detecton methods: normal densty estmaton, Parzen wndows, a k-nearest-neghbor dstance comparson and an nstablty method. These methods are descrbed n more detal n (Tax and Dun, 1998). The rst two methods rely on a densty estmaton of the data. The thrd method compares the dstance from a test object x to ts nearest neghbor NN tr x) n the tranng set wth the dstance from ths nearest neghbor NN tr x to ts nearest neghbor NN tr NN tr x n the transet. The nstablty method s specally desgned for outler detecton n class caton tasks. By tranng several smple class ers, such as lnear class ers, on bootstrapped versons of the tranng set, one obtans varatons n the class er outputs. Objects whch experence large varatons n these outputs are lkely to resde n low densty or low con dence areas and wll be rejected. These methods wll be compared to the SVDD method wth a Gaussan kernel. The wdth parameter s s found by usng the procedure mentoned n Secton 4. The varable C s set to 0:25. On the bass of the performance on the tranng set samples, we set a target rejecton threshold value of 10% on the d erent measures. Ths relatvely large value s chosen, because some datasets contan a small number of objects, and usng a 10% rejecton rate ensures that some of the target objects wll ndeed be rejected. After that the performance on a test set contanng the target class and one contanng the outler class s measured. Ths means that the optmal performance s reached when all outlyng objects are rejected and 90% of the target class s accepted. All methods are appled to a set of standard datasets taken from the UCI Machne Learnng Dataset Repostory (Blake et al., 1998). The datasets consdered are lsted n Table 1. As explaned n the prevous secton, one of the classes s consdered as the outler class, the rest s target class. To estmate the errors (of the rst and the second knd) n-fold cross-valdaton wth n ˆ 5 s used. In Table 2, the performances of the outler detecton methods on all UCI datasets are shown. For each method, the performance on a target valdaton set (left) and an outler test set (rght) s shown. Each of the classes s outler class once (ndcated n the rst column). Results on the balance-dataset already show that the estmaton of the errors of type 1 on the tranng set s not very precse for the Parzen densty estmaton and Table 1 UCI datasets used for the evaluaton of the data descrpton methods Name # Objects # Classes # Features Balance-scale 625 3 4 Breast-cancer-Wsconsn 699 2 9 Ionosphere 351 2 34 Irs 150 3 4

D.M.J. Tax, R.P.W. Dun / Pattern Recognton Letters 20 (1999) 1191±1199 1197 Table 2 Outler detecton performances on the UCI datasets a Class no. Set sze Gauss Parzen knn Instab SVDD Balance data 1 337, 288 0.13, 0.74 0.46, 1.00 0.00, 0.65 0.12, 0.73 0.14, 0.89 2 576, 49 0.11, 0.30 0.40, 0.51 0.00, 0.00 0.08, 0.76 0.12, 0.13 3 337, 288 0.12, 0.74 0.40, 1.00 0.00, 0.65 0.12, 0.76 0.11, 0.88 Breast cancer data 1 241, 458 0.14, 0.46 0.91, 1.00 0.10, 0.17 0.00, 0.00 0.09, 0.94 2 458, 241 0.11, 0.99 0.28, 1.00 0.07, 0.45 0.00, 0.00 0.10, 0.99 Ionosphere data 1 126, 225 0.36, 0.06 0.91, 0.98 0.11, 0.03 0.00, 0.00 0.13, 0.00 2 225, 126 0.11, 0.90 0.94, 1.00 0.09, 0.67 0.00, 0.00 0.11, 0.90 Irs data 1 100, 50 0.13, 1.00 0.33, 1.00 0.12, 1.00 0.11, 0.46 0.11, 1.00 2 100, 50 0.13, 0.93 0.30, 0.97 0.09, 0.49 0.12, 0.15 0.11, 0.40 3 100, 50 0.12, 0.91 0.43, 1.00 0.09, 0.51 0.14, 0.58 0.09, 0.90 a The rst column gves the class whch s consdered as outler. In the second column, the target (left) and outler (rght) set szes are gven. In the other columns, the leftmost number n each column gves the performance for a test set contanng the target class and the rghtmost number the performance on an outler set contanng the outler class. the knn method. In both cases the error on the `target' class s far larger than the prede ned 0:1. All methods perform poorly on the case n whch the second class s consdered outler. Ths can be understood by lookng at the dstrbuton of the data, where class 2 s between classes 1 and 3. Only the nstablty method s able to reject objects from the second class. In the breast-cancer data set, the second class s clearly easer to dstngush than the rst class. Lookng at the orgn of the data, ths means that by descrbng the bengn class, the malgnant class can be rejected qute well. All methods perform well n descrbng class 1, except for the nstablty method. Snce the orgnal dataset contans only two classes, the nstablty method could not be used. When one class s consdered as the outler class, the nstablty method cannot tran smple class ers on the remanng class. Also vsble s that the Parzen method overtrans heavly and performs poorly when class 1 s the outler class. The SVDD performs best overall. In the onosphere dataset, the Parzen densty estmaton agan overtrans and the nstablty method cannot be used because only two classes are avalable. From the results we see that class 1 s almost Gaussan dstrbuted and class 2 s scattered around t. The SVDD cannot dstngush one class 2 object from class 1. Fnally, the performance of the outler methods are appled on the rs dataset. Here, all methods work reasonably well, whch ndcates that the data dstrbutons of the classes are well clustered. Only the Parzen densty estmaton slghtly overtrans. From these results we can conclude that the SVDD works comparably and often better than the other outler methods, from the smple Gaussan dstrbuton to the nearest neghbor method. Another advantage of the SVDD s that an estmate of the error on the target set can be obtaned mmedately by lookng at the fracton of support vectors. Ths guarantees that the scale of the SVDD, set by the wdth parameter s, s adjusted to the data and no extra leave-one-out estmaton s requred (lke n the Parzen estmaton). 6. Conclusons Data doman descrpton s an mportant tool for robust and con dent class caton. Data whch do not resemble a target class should be rejected. In ths paper we propose a sphere shaped data

1198 D.M.J. Tax, R.P.W. Dun / Pattern Recognton Letters 20 (1999) 1191±1199 descrpton whch does not have to make a probablty densty estmaton. The sphere descrpton depends on a few target objects, the support objects and new test objects only have to be compared wth these support objects by an nner product or some more general kernel functon. By adaptng the kernel functon, ths method becomes more exble than just a sphere n the nput space. The SVDD also allows for target objects not ncluded n the sphere descrpton. An extra parameter C s ntroduced to gve the trade-o between the number of errors made on the tranng set and the sze of the sphere descrpton. In practce, the sze of ths parameter s not very crucal for ndng a good soluton. In ths paper two kernel types are consdered: the polynomal and the Gaussan kernel. In general, the polynomal kernel does not gve tght descrptons of the tranng data. On the other hand, the Gaussan kernel seems to work very well. In the SVDD usng a Gaussan kernel, another free parameter, the wdth of the kernel s, can be adapted. By choosng d erent extremes for ths wdth parameter, the sphere method obtans more or less exble descrptons. For very small values for s, a Parzen densty estmaton s obtaned. In that case, all target objects become support objects. For very large values of s, just one prototype for the complete data set s used and almost the complete tranng set can be dsregarded. Applyng a moderate value for the wdth parameter, an edted and weghted Parzen estmaton s obtaned. An extra feature of ths SVDD method s that the error on the target class can be estmated mmedately by calculatng the fracton of target objects whch become support objects. Settng the error on the tranng set beforehand, the wdth s can be set such that the fracton of support objects s equal to ths error. Snce the SVDD focuses on the boundary descrpton and not on the complete data densty, the requred number of objects s smaller than for, e.g., the Parzen densty estmaton. We can conclude that the SVDD gves both an e cent and robust method for descrbng a dataset. For further readng, see (Ypma and Pajunen, 1999). Dscusson Gmel'farb: Can you tell why the SVDD approach works so poorly? Snce t depends on your choce of the kernel functon, then f s ˆ 0, t s smply a nearest neghbor class er. And such a nearest neghbor class er, n ths case, cannot perform so poorly. Tax: No, but I sad that the error on my target set s about 10%. So, I tune my parameters n such a way that I wll reject about 10% and 10% of my tranng set wll be support vectors. In that case, the descrpton wll be d erent from the normal Parzen estmator. It wll be a more crude approxmaton of the boundary. If I had more ponts, t would be comparable, better than the Parzen estmator. Gmel'farb: One more queston: why do you need to restrct yourself to sngle sphere approxmaton, because by usng the earler results of Vapnk, you can approxmate any dstrbuton of your tranng ponts by a mnmal number of spheres, and n that case, you have a much better descrpton. Tax: Frst of all, f you have a mult-modal dstrbuton, for nstance, three Gaussan dstrbutons, and f you have enough tranng ponts, t wll automatcally nd three spheres. If you do not have enough data and you stll restrct yourself to an error of 10% on your target set, t wll stll gve you just one complete blob. So t only nds that soluton for whch t nds enough just caton n the data. If you are very strct on the number of errors you make, t wll gve very broad, very crude approxmatons. Gmel'farb: But ths means that you should not restrct yourself to a xed error on the target set, because n that case, the result depends on your data. Sometmes, f you x the rejecton rate, then, even for beautful data, you ntentonally obtan bad results. Tax: True, but f you know that you have three clusters, I would not recommend ths method. I would then rather take three Gaussans. If that s the pror knowledge that you have, then use t.

D.M.J. Tax, R.P.W. Dun / Pattern Recognton Letters 20 (1999) 1191±1199 1199 Kanal: You mght assume one, two, three clusters, and so on, to see whch assumpton gves the best results. Tax: But then, the trcky part s always to nd a good threshold value. And here, I nd my threshold on the bass of the number of support vectors, and that gves a more drect lnk to how good or bad the descrpton s. From the SVDD, you cannot nd drectly the number of clusters n the data. References Blake, C., Keogh, E., Merz, C., 1998. UCI repostory of machne learnng databases. http://www.cs.uc.edu/ mlearn/mlrepostory.html, Unversty of Calforna, Irvne, Department of Informaton and Computer Scences. Rtter, G., Gallegos, M.T., 1997. Outlers n statstcal pattern recognton and an applcaton to automatc chromosome class caton. Pattern Recognton Letters 18, 525±539. Tarassenko, L., Hayton, P., Brady, M., To appear. Novelty detecton for the dent caton of masses n mammograms. Tax, D., Dun, R., 1998. Outler detecton usng class er nstablty. In: Amn, A., Dor, D., Pudl, P., Freeman, H. (Eds.), Advances n Pattern Recognton Proc. Jont IAPR Internat. Workshops SSPR'98 and SPR'98, Sydney, Australa. Lecture Notes n Computer Scence, Vol. 1451. Sprnger, Berln, pp. 593±601. Tax, D., Dun, R., 1999. Data doman descrpton usng support vectors. In: Verleysen, M. (Ed.), Proc. European Symposum Art cal Neural Networks 1999. D. Facto, Brussel, pp. 251±256. Vapnk, V., 1995. The Nature of Statstcal Learnng Theory. Sprnger, New York. Ypma, A., Pajunen, P., 1999. Rotatng machne vbraton analyss wth second-order ndependent component analyss. In: Proc. 1st Internat. Workshop Independent Component Analyss and Sgnal Separaton, ICA'99, pp. 37±42.