Procdngs of th Twnty-Thrd Intrnatonal Jont Confrnc on Artfcal Intllgnc A Probablstc Approach to Latnt Clstr Analyss Zhpng X R Dong, Zhnghng Dng, Zhnyng H, Wdong Yang School of Comptr Scnc Fdan Unrsty, Shangha Chna {xzp, 11210240011, 11210240082, zhnyng, wdyang}@fdan.d.cn Abstract Facng a larg nmbr of clstrng soltons, clstr nsmbl mthod prods an ffct approach to aggrgatng thm nto a bttr on. In ths papr, w propos a nol clstr nsmbl mthod from probablstc prspct. It assms that ach clstrng solton s gnratd from a latnt clstr modl, ndr th control of two probablstc paramtrs. Ths, th clstr nsmbl problm s rformlatd nto an optmzaton problm of maxmm lklhood. An EM-styl algorthm s dsgnd to sol ths problm. It can dtrmn th nmbr of clstrs atomatcally. Exprmnal rslts ha shown that th proposd algorthm otprforms th stat-of-th-art mthods ncldng EAC-AL,,, and. Frthrmor, t has bn shown that or algorthm s stabl n th prdctd nmbrs of clstrs. 1 Introdcton Th goal of clstr analyss s to dscor th ndrlyng strctr of a datast (Jan t al., 1999; Jan, 2010. It normally parttons a st of obcts so that th obcts wthn th sam grop ar smlar whl thos from dffrnt grops ar dssmlar. A larg nmbr of clstrng algorthms ha bn proposd,.g. k-mans, Spctral Clstrng, Hrarchcal Clstrng, Slf-Organzng Maps, to nam bt a fw, yt no sngl on s abl to sccssflly ach ths goal for all datasts. On th sam data, dffrnt algorthms, or n mltpl rns of th sam algorthm wth dffrnt paramtrs, oftn lad to clstrng soltons that ar dstnct from ach othr. Confrontd wth a larg nmbr of clstrng soltons, clstr nsmbl or clstrng aggrgaton mthods ha mrgd, whch try to combn dffrnt clstrng soltons nto a consnss on, n ordr to mpro th qalty of componnt clstrng soltons (Vga-Pons and Rz-Shlclopr, 2011. Clstr nsmbl mthods sally consst of two or thr phass: th nsmbl gnraton phas to prodc a arty of clstrng soltons; thn th nsmbl slcton phas to slct a sbst of ths clstrng soltons, whch s optonal; and fnally th consnss phas to ndc a nfd partton by combnng th componnt ons. In th gnraton phas, dffrnt clstrng soltons can b gnratd by dffrnt clstrng algorthms, th sam algorthm wth dffrnt paramtr sttngs or ntalzaton, and ncton of random dstrbanc nto data st sch as data rsamplng (Mna-Bdgol t al., 2004, random procton (Frn and Brodly, 2003, and random fatr slcton (Strhl and Ghosh, 2002. Followng th gnraton phas, an optonal nmbl slcton phas wll slct or prn ths clstrng soltons accordng to thr qalts and drsts (Frn and Ln, 2008; Azm and Frn, 2009. In ths papr, w focs on th fnal phas - clstrng combnaton. Thr ar a lot of algorthms for th combnaton, whch can b catgorzd accordng to th knd of nformaton xplotd. Th algorthm proposd hr falls nto th catgory makng s of th parws smlarts btwn obcts, whch form a co-assocaton matrx n th contxt of clstr nsmbls. Any clstrng algorthm can b appld on ths nw smlarty matrx to fnd a consnss partton. Ednc Accmmlaton Clstrng (Frd and Jan, 2005, or EAC n short, prforms a hrarchcal clstrng of arag lnkag (AL or sngl lnkag (SL on co-assocatong matrx, whr a maxmm lftm crtron s proposd to dtrmn th nmbr of clstrs. Clstr-basd Smlarty Parttonng ( algorthm (Strhl and Ghosh, 2002 ss a graph-partonng algorthm nstad, bt rqrs th nmbr of clstrs b spcfd manally. Anothr algorthm (Strhl and Ghosh, 2002 can b thoght as an approxmaton to. Ot of ths catgory, algorthm maks a clstrng of clstrs basd on th smlarts btwn clstrs, and thn assgns obcts to ts closst mta-clstr. For a thorogh lst of rlatd algorthms, plas rfr to th sry papr by Vga-Pons and Rz-Shlclopr (2011. Althogh ths mthods ha achd som sccss, thy ar stll dfcnt n sral aspcts: frst, thy lack of thortc ndrpnnng; scond, thy thnk that all th clstrng soltons b of th sam qalty, and ths assgn th sam wght to ach clstrng solton; last bt not th last, most of thm (xcpt EAC rqr th nmbr of clstrs to b spcfd manally. As to th maxmm lftm crtron adoptd by EAC algorthm, t s mor or lss a rl-of-thmb that s lack of stfcaton. As w shall s n th xprmnts, th maxmm lftm crtron s nstabl n dtrmnng th nmbr of clstrs. 1813
To tackl ths problms, w propos a probablstc mthod calld LAtnt Clstr Analyss, or LACA n short. It assms that thr s a latnt clstr modl whch s nobsrabl. All th obsrd Clstrng soltons ar gnratd from th latnt modl ndr th control of two probablstc paramtrs. Or obct s to sk th latnt clstr modl wth th maxmm lklhood. Ths papr s organzd as follows. In Scton 2, w ntrodc th latnt clstr modl and bld ts conncton wth th obsrd clstrng soltons. W dot Scton 3 to an EM-styl algorthm for nfrrng th latnt clstr modl from th obsrd clstrng soltons. In Scton 4, w prsnt th xprmntal rslts of th proposd mthod compard wth sral stat-of-th-art clstr nsmbl algorthms. Fnally, w mak th conclson n Scton 5. 2 Latnt Clstr Modl Lt X = {x 1 2, n } b a st of n obcts, whr ach obct x may b rprsntd as a mltdmnsonal ctor, a strng, or n any othr form. Takng X as npt, a clstrng algorthm (calld clstrr prodcs a clstrng solton that parttons th n obcts nto grops. By rnnng clstrng algorthms mltpl tms, w can obsr an nsmbl of clstrng soltons, E = {C 1, C 2,, C E }, whr ach clstrng solton C (1 E parttons th n obcts nto grops c, c,, c. C Wth rspct to a gn clstrng solton C, a co-assocaton rlatonshp btwn two obcts x and x s dfnd accordng to whthr thy ar assgnd to th sam grop: 1, f ck C sch that { x } c k. (1 0, othrws Two obcts x and x ar sad to b co-assocatd n th clstrng solton C f 1 ; othrws, thy ar not co-assocatd. Gn th nsmbl of obsrd clstrng solton, what w wold lk to xplor s th latnt clstr modl. W dnot th latnt clstr modl as = { 1, 2,, s }, whr l (1 l s s a clstr rprsntd as a sbst of obcts, and s s th ral nmbr of clstrs. In ths papr, w assm that ths s latnt clstrs ar non-orlappng,.., = for 1 s. Basd on th latnt clstr modl, w dfn th co-clstr fncton for a gn par of obcts {x, x }:, f k sch that { x } k (2, othrws Ths latnt clstr modl srs as th maor factor n dtrmnng what clstrng soltons can b obsrd, whl othr factors sch as th bas of appld clstrng algorthm nflncs th obsrd rslts and lads to som fals posts and fals ngats. To bld th conncton btwn a clstrng solton C and th latnt clstr modl, w ntrodc two probablstc paramtrs: Th paramtr to dnot th condtonal probablty that two obcts ar co-assocatd n C gn that thy ar co-clstr n th hddn modl, that s = Pr( =1 =1; and Th paramtr r to dnot th condtonal probablty that two obcts ar co-assocatd n C gn that thy ar not co-clstr n th hddn modl, that s r = Pr( =1 =0. Inttly, ach obsrd clstrng solton prods som dncs abot th latnt (nobsrabl co-clstr rlatonshp btwn obcts. Gn a st E of clstrng soltons, or obct s to maxmz th postror probablty of th latnt clstr modl, as follows: * E Pr( arg max Pr( E arg max (3 Pr( E whr E = Pr(E s th lklhood fncton. Assm that ths clstrng soltons ar ndpndnt of ach othr and pror probablts of all th possbl latnt clstr modls ar dstrbtd nformly, t can b wrttn as th followng maxmm-lklhood problm: * arg maxl ( E arg max C, (4 whr C = Pr(C s th lklhood of th latnt modl gn th obsrd clstrng solton C (or th probablty of obsrng C gn th latnt modl Th dnc of obsrng a clstrng solton C can b dcomposd nto th dnc st of co-assocaton rlatonshps of all obct pars,.. whthr a par of obcts {x } ar assgnd to th sam grop. Thn, w ha C ( r ( r (5 : : : : Takng logrthm on both sds, w gt th log-lklhood: L ( C log C (6,,,, n11 log n log( n01 log r n log( r whr, n1 {{ x } and } rprsnts th nmbr of obct pars that ar co-clstr n th hddn modl and also co-assocatd n C ;, n10 {{ x } and 0} rprsnts th nmbr of obct pars that ar co-clstr n th hddn modl, bt not co-assocatd n th obsrd clstrng rslt C ;, n01 {{ x } 0 and 1} rprsnts th nmbr of obct pars that ar not co-clstr n th hddn modl, bt co-assocatd n th obsrd clstrng rslt C ; and 1814
, n {{ x } 0 and } rprsnts th 00 nmbr of th obct pars that ar not co-clstr n also not co-assocatd n th clstrng rslt C. By sbstttng qatons (5 or (6 nto (4, w can rformlat th optmzaton problm as: * arg max l ( C arg max L( C (7 3 Algorthm Dsgn Unfortnatly, th latnt clstr modl and th probablstc paramtrs of ths obsrd clstrng rslts ar all nknown, whch mak t mpossbl to sk th solton drctly. Hr, w proposd an EM-styl algorthm to dal wth th problm, dpctd n Fgr 1. Fgr 1. Th flowchart of LACA algorthm Th algorthm conssts of for maor stps: Stp 1 (Paramtr Intalzaton: ntalz th probablstc paramtrs for ach clstrng solton; Stp 2 (Latnt Modl Gnraton: fxng th probablstc paramtrs, look for a nar-optmal solton (a latnt modl to th maxmm-lklhood problm wth a hll clmbng stratgy; Stp 3 (Paramtr Estmaton: fxng th latnt clstr modl, stmat th probablstc paramtrs for ach clstrng solton; Stp 4 (Conrgnc Tst: Rpat Stp 2 and Stp 3 ntl conrgnc. 3.1 Paramtr Intalzaton Gn E obsrd clstrng soltons C 1, C 2,, C E, w s cont(x to dnot th nmbr of clstrng soltons whr th obcts x and x ar co-assocatd, and rcont(x to dnot th nmbr of clstrng soltons whr x and x ar not co-assocatd, that s: and Clstrng 1 Clstrng 2... Clstrng E Paramtr Intalzat cont(x = rcont(x = Latnt Modl Gnraton Paramtr Rstmaton Conrgnc? (8 (. (9 It s dnt that cont(x + rcont(x = E for 1 n,. So far as paramtr ntalzaton s concrnd, t s takn for grantd that dffrnt clstrng soltons b qally No Ys Otpt th Solton plasbl. Th hghr s th al of cont(x, th mor probably th two obcts x and x ar co-clstrd. Hnc, th two probablstc paramtrs for ach clstrng solton C s ntalzd as th followng M-stmats: cont( x x ESS, :, (10 cont( x ESS and r : 1 rcont( x ESS, (11 rcont( x, y ESS whr ESS, standng for qalnt sampl sz, s st as 30 by dfalt. 3.2 Latnt Modl Gnraton Onc th probablstc paramtrs ar ntalzd or r-stmatd, th latnt clstr modl can dtrmnd n a hll-clmbng mannr wth rspct to al th log lklhood fncton. Lt s start wth an mpty latnt modl whr ach obct corrsponds to a snglton clstr, and ths any two obcts ar not co-clstr. Thn, w tratly mrg two clstrs nto a largr on. Th slcton crtron for mrgng at ach stp s dscrbd blow, stp by stp. Lt = { 1, 2,, t } b th latnt clstr modl at th crrnt stp, and ( b prodcd by mrgng and n,,. D to th fact that ( n (,, n = ( n n and 11, (,, ( n ( n = ( n, n, t can b drd that: ( E E (12 (,, ( n11 n11 log ( n r Frthr, t s dnt that: and ( n ( n,, 11 n11 11 (, x n (,, 10 n10 ( x Sbstttng (13 and (14 nto (12, w gt: ( E E x log r x 01, log 01 r, (13 ( log. (14 r (15 By ntrchangng th ordr of smmaton, t can b drd that: ( E E (16 log ( log x r r If w dfn th affnty scor btwn two obcts x and x otd by a clstrng solton C as: 1815
scor x x ( log ( log, (17 r r w can sm p all th scors otd by C 1, C 2,, C E nto th corrspondng ntry M[ ] of a scor matrx M, that s, M [ ] scor ( x. (18 Sbstttng (17 and (18 nto (16, w ha: ( E E M[ ] x (19 Howr, sng qaton (19 drctly as th slcton crtron may faor mrgng largr clstrs or smallr ons. Ths, w choos to mrg two clstrs s and t sch that ( E E ( s, t arg max, (20 1 arg max M[ ], x If w thnk of th scor matrx M as a smlarty matrx, th slcton crtron s actally th arag-lnkag (AL. As a rslt, a hrarchcal clstrng wth arag lnkag can b appld on th matrx M. What s mportant s that th lmnts n th scor matrx has clar probablstc manng: ach lmnt rprsnts actally th log-lklhood rato of th corrpondng two obcts bng co-clstr to bng not. Hddn Modl Infrnc a Hrarchcal clstrng By fxng all th paramtrs and r (1 E, th scor matrx M can b constrctd accordng to (17 and (18. Th hddn modl can b gnratd by applyng th followng agglommrat hrarchcal clstrng on M: Stp 1: W start from th smplst snglton modl 0 whr ach clstr conssts of on and only on obct. Stp 2: Lt t dnot th crrnt traton, and st t = 1. Stp 3: At ach traton t, lt b th clstr modl n th pros traton,.., = t1. Wthot loss of gnralty, w dnot ={ 1, 2,, }. Two clstrs s and t n ar slctd accordng to qaton (20. Stp 4: If th arag lnk btwn s and t s ngat thn trmnat th loop and otpt as th gnratd hddn clstr modl; othrws, contn to stp 5. Stp 5: Th clstrs slctd at stp 3 gt mrgd nto a nw clstr nw = s t. W pdat by rmong s and t, and nsrtng nw. Th pdatd thn srs as th clstr modl n th crrnt traton, that s t =. Stp 6: If only two clstrs ar lft, trmnat and otpt ; othrws, st t = t+1, and go to stp 3. Stoppng Crtron: Ths procss s rpatd ntl w can not fnd a par of clstrs wth post arag lnk (Stp 4, or thr ar only two clstrs lft (Stp 6. 3.3 Paramtr R-stmaton Onc a latnt clstr modl s gnratd, t can b sd to stmat th probablstc paramtrs and r for ach clstrng solton C. Snc th paramtr rprsnts th probablty that two obcts ar co-assocatd n C on condton that thy ar co-clstr n th latnt modl, that s, r = Pr( =1 =1, t can b stmatd as: {( } 0. ESS, (21 {( } ESS whr th ESS s also st as 30 by dfalt. Bcas th paramtr r dnots th probablty that two obcts ar co-assocatd n C on condton that thy ar not co-clstr n, that s r = Pr( =1 =0, t can b stmatd as: {( }. ESS r. (22 {( } ESS If a clstrng solton assgns ach obct nto a dstnct grop, th corrspondng and r wll b both clos to 0. On th othr xtrm, f t assgns all obcts nto a sngl grop, th corrspondng and r wll b both nar to 1. 3.4 Conrgnc Tst Onc th probablstc paramtrs and r for ach clstrng solton C ar r-stmatd, w compt th dffrnc btwn th r-stmatd als and th pros ons. If th sm of absolt dffrncs or all clstrng soltons s lss than a sr-spcfd thrshold al, w consdr th algorthm as conrgd and otpt th latnt modl. 4 Exprmnts W ha condctd xtns xprmnts to compar LACA wth sral stat-of-th-art clstr nsmbl mthods. Or xprmnts ar dsgnd to dmonstrat: 1 LACA s mor stabl than EAC-AL n dtrmnng th nmbr of clstrs; 2 LACA otprforms EAC-AL whch s also abl to dtrmnng th nmbr of clstrs atomatcally; 3 A arant rson of LACA, calld, otprforms, and. 4.1 Exprmntal Sttngs DataSt #Obct #Fatr #Class Irs 150 4 3 Glass 214 9 6 Ecol 336 7 8 Lbras 360 90 15 Sgmntaton 210 19 7 Sd 210 7 3 Pma 768 8 2 Pndgts 1000 16 10 Tabl 1: Dscrptons of th datasts. 1816
Data sts. W s ght data sts from th UCI machn larnng rpostory (Frank and Asncon, 2010 n or xprmnts. Th charactrstcs of th data sts ar smmarzd n Tabl 1. Not that, for Pndgts, w randomly slct 100 obcts from ach class. Clstr Ensmbl Gnraton. W choos to s th K-mans algorthm (MacQn, 1967 as or bas clstrr, bcas of ts poplarty n many pros clstr nsmbl stds. At ach rn, w gnrat a clstr nsmbl of 200 clstrng soltons for a gn data st. To b mor spcfc, for a datast of n obcts and m fatrs, ach clstrng solton s prodcd as follows: Th sz s of fatr sbst s frstly dtrmnd by randomly drawng an ntgr al from th rang [mns, maxs], whr mns s st to b 3, and maxs s st to b m. A random fatr sbst FS of sz s s gnratd by drawng s dffrnt fatrs from th orgnal m fatrs. An random ntgr al K s drawn from [mnk, maxk], whr mnk s st to b 2, and maxk s st to b n/15. A clstrng solton s obtand by applyng K-mans algorthm on th datast, wth accss to all th obcts, bt only th s fatrs n FS. Ealaton Crtron. As all th datasts ar labld, w s th class labls as a srrogat for th tr ndrlyng strctr of th data. Two commonly sd masrs, Normalzd Mtal Informaton ( and F-masrs, ar chosn to alat or approach aganst othrs. (Strhl and Ghosh, 2002 trats clstr labls X and class labls Y as random arabls and maks a tradoff btwn th mtal nformaton and th nmbr of clstrs: I( X, Y, H ( X H ( Y whr I( s th mtal nformaton mtrc and H( s th ntropy mtrc. F-masr (Mannng t al., 2008 ws a clstrng solton (on a datast wth n obcts as a srs of n(n1/2 dcsons, on for ach par of obcts. It maks a compoms btwn th prcson and th rcall of ths dcsons: Prcson Rcall F. Prcson Rcall 4.2 Stablty of Prdctd Clstr Nmbrs To th bst of or knowldg, most clstr nsmbl mthods rly on a sr-spcfd nmbr of clstrs. Th only xcpton s th maxmm lftm crtron sd n EAC-AL mthod. In ordr to stdy th stablty of or algorthm and EAC-AL n prdctd nmbr of clstrs, w gnrat 30 clstr nsmbls for ach datast n th way as dscrbd n Scton 4.1. Or algorthm and EAC-AL ar appld on ths clstr nsmbls to gt thr prdctd clstr nmbrs. Th statstcs abot ths nmbrs ar prsntd n Tabl 2. It can b sn from th tabl that th rang [Mn, Max] of EAC-AL s mch wdr than that of LACA on ach datast, sggstng that th clstr nmbrs prdctd by EAC-AL flctats a lot for ach data st, and spcally for Pndgts and Pma. Smlar obsratons can also b mad from th standard daton of clstr nmbrs on ach datast. W conctr that ths s bcas lf tm s not always ffct n th prdcton of clstr nmbrs, bcas th maxmm lftm stratgy s mor or lss a rl of thmb, lack of thortc stfcaton. As to th arag al of th prdctd clstr nmbrs, LACA s largr than EAC-AL on 3 datasts, and smallr on 5, showng nconsstncy n som dgr. Datast Mthod Mn Max Arag Std D Irs LACA 3 4 3.73 EAC-AL 2 4 2.73 0.98 Glass LACA 5 6 5.03 0.18 EAC-AL 2 6 4.40 1.65 Ecol LACA 3 5 3.73 3 EAC-AL 3 12 4.30 2.37 Lbras LACA 7 9 7.70 0 EAC-AL 6 21 11.50 5.54 Sgmntaton LACA 5 7 5.23 7 EAC-AL 2 13 2.53 2.08 Sd LACA 4 5 4.40 0 EAC-AL 2 7 4.53 0.97 Pma LACA 4 10 8.70 1.06 EAC-AL 4 30 11.80 8.04 Pndgts LACA 10 16 14.03 1.75 EAC-AL 4 48 23 12.27 Tabl 2: Statstcs of prdctd clstr nmbrs 4.3 Comparson wth EAC-AL Tabl 3 rports th and F-masr als of or algorthm and EAC-AL on th sam clstr nsmbls. Each al rportd hr s obtand by aragng across 30 rns. W can s that or algorthm prforms bttr than EAC-AL on 7 ot of 8 datasts. Th only xcpton s on th Lbras datast. W conctr that t s bcas th arag clstr nmbr prdctd by EAC-AL s closr to th ral nmbr of classs n Lbras. F-masr Datast LACA EAC-AL LACA EAC-AL Irs 533 995 535 528 Glass 502 395 869 614 Ecol 693 545 790 712 Lbras 267 470 014 333 Sgmntaton 091 091 052 617 Sd 423 333 680 613 Pma 751 597 0.0674 0.0665 Pndgts 712 809 721 389 Tabl 3: F-masr and als of LACA and EAC-AL 4.4 Comparson wth, and 1817
0.9 5 5 rs 6 4 2 8 6 glass 6 4 2 8 6 col 6 4 2 8 lbras 5 5 4 2 0.28 4 2 6 4 2 clstr nmbr k sgmntaton 8 6 4 2 8 6 4 2 8 10 12 14 16 18 20 clstr nmbr k 5 5 6 8 10 12 14 16 18 clstr nmbr k sd clstr nmbr k 0.09 0.08 0.07 0.06 0.05 0.04 8 10 12 14 16 18 20 22 24 clstr nmbr k pma 0.03 0.02 0.01 0 2 3 4 5 6 7 8 clstr nmbr k Fgr 2. Comparson of,, and 15 20 25 30 35 40 45 clstr nmbr k pndgts 8 6 4 2 8 6 4 10 15 20 25 30 clstr nmbr k F al F al 1 0.9 clstr nmbr k sgmntaton 5 5 5 rs 8 10 12 14 16 18 20 clstr nmbr k 6 8 10 12 14 16 18 clstr nmbr k sd 0.95 0.9 5 D to th fact that most xstng clstr nsmbl mthods rqr a sr-spcfd nmbr of clstrs, to mak a far comparson wth thm, w mak a small modfcaton of LACA by accptng a sr-spcfd clstr nmbr k, rsltng n a arant rson calld. In, whn th probablstc paramtrs gt conrgd, w forc th hrarchcal clstrng to stop mrgng only f thr ar xactly k clstrs lft, whch ar thn sd as th consnss clstrng of. On ach datast of l classs, w compar wth,, and by aryng th clstr nmbr k from l to 3 l wth stp sz 1. Fgr 2 and Fgr 3 dpct th s and th F-masrs of,, and, rspctly, whch ar also argad or 30 rns, wth dffrnt sr-spcfd clstr nmbrs. It s obos that th cr of or mthod s bttr or at last comptt on almost all th datasts. Th only xcpton s obsrd on th Pma datast, whr th of or mthod s lowr than th othrs. Bsds, w also fnd that th cr of or mthod s mor smooth and stabl across ths dffrnt k als. Ths sggsts that or mthod has achd hgh qalts consstntly on ths lls of hrarchcal clstrng. F al F al 5 5 5 5 5 glass clstr nmbr k F al F al 5 5 5 5 col 0.25 8 10 12 14 16 18 20 22 24 clstr nmbr k pma 0.25 2 3 4 5 6 7 8 clstr nmbr k Fgr 3. F-masr comparson of,, and 5 5 5 F al F al 9 8 7 6 4 3 2 1 15 20 25 30 35 40 45 clstr nmbr k pndgts 5 5 5 5 Conclsons In ths papr, w proposd a nol clstr nsmbl approach by assmng that th obsrd clstrng soltons ar gnratd from a latnt clstr modl. An EM-styl algorthm, calld LACA, was dsgnd and mplmntd to maxmz th lklhood fncton. It has xhbtd a satsfactory prformanc on th xprmntal datasts, for two rasons: frstly, t can mak a stabl and rlabl prdcton of th clstr nmbrs; scondly, ot of ach bas clstrng solton s wghtd whch rflcts th qalty of th bas solton. Acknowldgmnts Ths work s spportd by Natonal Arplan Rsarch Program (MJ-Y-2011-39, Natonal Natral Scnc Fnd of Chna (No. 61170007, Shangha Hgh-Tch Proct (11-43 and Shangha Ladng Acadmc Dscpln Proct (No. B114. W ar gratfl to th anonymos rwrs for thr alabl commnts. lbras 10 15 20 25 30 clstr nmbr k 1818
Rfrncs [Azm and Frn, 2009] Jaad Azm Xaol Z. Frn. Adapt clstr nsmbl slcton. In Procdngs of th 21 st Intrnatonal Jont Confrnc on Artfcal Intllgnc, pags 993-997, 2009. [Frn and Brodly, 2003] Xaol Z. Frn, and Carla E. Brodly. Random procton for hgh dmnsonal data clstrng: a clstr nsmbl approach. In Procdngs of th 20 th Intrnatonal Confrnc on Machn Larnng, pags 186-193, 2003. [Frn and Ln, 2008] Xaol Z. Frn, and W Ln. Clstr nsmbl slcton. Statstcal Analyss and Data Mnng, 1(3: 379-390, 2008 [Frank and Asncon, 2010] A. Frank, and A. Asncon. UCI Machn Larnng Rpostory. Irn, CA: Unrsty of Calforna, School of Informaton and Comptr Scnc, 2010. [http://arch.cs.c.d/ml] [Frd and Jan, 2005] Ana L.N. Frd, and Anl K. Jan. Combnng mltpl clstrngs sng dnc accmlaton. IEEE Transactons on Pattrn Rcognton and Machn Intllgnc. 27(6: 835-850, 2005. [Jan t al., 1999] Anl K. Jan, M.N. Mrty, P.J. Flynn. Data clstrng: a rw. ACM Comptng Srys, 31(3: 264-323, 1999. [Jan, 2010] Anl K. Jan. Data clstrng: 50 yars byond K-mans. Pattrn Rcognton Lttrs, 31(8: 651-666, 2010. [MacQn, 1967] J. MacQn. Som mthods for classfcatons and analyss of mltarat obsratons. In Procdngs of th Ffth Brkly Symposm on Mathmatcs, Statstcs and Probablty, Unrsty of Calforna Prss, pags 281-297, 1967. [Mannng t al., 2008] Chrstophr D. Mannng, Prabhakar Raghaan, and Hnrch Schtz. Introdcton to Informaton Rtral, Cambrdg UnrstyPrss, 2008. [Mna-Bdgol t al., 2004] Bhroz Mna-Bdgol Alxandr Topchy, Wllam F. Pnch. Ensmbls of parttons a data rsamplng. In Procdngs of Intrnatonal Confrnc on Informaton Tchnology: Codng and Comptng (ITCC 2004, pags 188-192, 2004. [Strhl and Ghosh, 2002] Alxandr Strhl, and Joydp Ghosh. Clstr nsmbls - a knowldg rs framwork for combnng mltpl parttons. Jornal of Machn Larnng Rsarch, 3: 583-617, 2002. [Vga-Pons and Rz-Shlclopr, 2011] Sandro Vga-Pons, and Jos Rz-Shlclopr. A srry of clstrng nsmbl algorthms. Intrnatonal Jornal of Pattrn Rcognton and Artfcal Intllgnc, 25(3: 337-372, 2011. 1819